A Web Service for Evaluating the Level of Speech in Korean

: Speaking is a way for humans to communicate with others using language. The ability to speak according to the speaker is very diverse. In general, language skills improve as intelligence improves. Therefore, it is known that the analysis of a speaker’s utterances is a good tool to evaluate the intellectual maturity of the speaker. Until recently, these evaluations have been done manually based on the experience of a handful of experts, but this approach is not only time consuming and costly, but also highly subjective. In this paper, we propose a Korean automatic speech analysis system based on Natural Language Processing (NLP) and web service to solve this problem. For this study, we constructed a web service based on Django to respond to the requests of various users. When a user delivered a transcription ﬁle of utterances to the server via the web, the server analyzed the speech ability of the speaker based on various indicators. The server compared the transcription ﬁle with the language ability indicators of persons of the same age as the speaker and displayed the result immediately to the user. In this study, we used KoNLPy, a Korean language-processing tool. The automatic speech analysis service analyzed not only the overall language ability of the speaker but also individual domains such as sentence completion ability and vocabulary ability. In addition, a faster and immediate service was made possible without sacriﬁcing accuracy as compared to human analysis.


Introduction
Humans use language to communicate with others.There are documentary, behavioral, and linguistic aspects of communicating [1,2].One of the linguistic aspects is how to communicate with others through dialogue.Speaking is the most natural way to engage in social interaction or social participation.In addition, utterances vary widely depending on the speaker's ability.In speech pathology, we also use test tools to classify speech disorders by analyzing utterances [3].The reason why the use of utterances as an evaluation measure is preferred is the method's advantage of understanding how language is actually used in a natural situation [4].Language development does not stop even after physical growth stops.As adults, language skills increase or decrease."Whole-life language development", according to which language development continues not only in infancy and childhood but also into adolescence, adulthood, and old age, is emphasized in the field of speech pathology [5][6][7].However, research regarding language development analysis is mostly carried out with respect to subjects in infancy and childhood.Recently research about the language development of adults and seniors has grown.However, there is no research on the entire age range.
Two methods are used for analyzing utterances: the first involves experts who analyze the results by hand, whereas the second uses an automatic analysis system equipped with software technology such as Natural Language Processing (NLP).The first method has been used by a few experts [8], who analyze the utterances based on their experience and assess verbal skills.This type of analysis is not only time consuming and costly, but it is also difficult to be objective.This is because the results analyzed by the experts involve much personal experience and varied opinions as well as established rules.To overcome these drawbacks, a second method has recently been adopted more frequently [9].An automated analysis system equipped with software technology is a method of analyzing human language based on natural language processing.In English-speaking countries, active research studies are being carried out on automatic analysis systems, but there are few research studies about Korean language.At first, Korean Computerized Language Analysis (KCLA) was developed as a Korean automatic analysis system, followed by the development of Korean Language Analysis (KLA).However, these systems are little different from manual analysis, except that they work on a computer [10,11].
To overcome these drawbacks, KSTARS was recently developed [12].KSTARS provides automatic analysis of the word frequency and type of morphemes and words.Compared to KLA, KSTARS differs in that it performs morphological analysis automatically, but its analysis is still limited to subjects in infancy between 2 and 5 years old.Considering that language development continues through all ages, the limitations of KSTARS are clear.In speech pathology, the language of a speaker is generally compared to the language of persons from the same age group to determine a language disorder, based on whether it is within the normal range [13].However, since KSTARS provides only the number of words or the frequency of morphemes, its disadvantage is that it is difficult to easily discriminate between language disorders or language development degree.
In this paper, we propose a Korean automatic spoken analysis system that overcomes the problems mentioned above.Instead of manual morphological analysis of transcription data, we automatically analyzed morphemes using natural language processing techniques.Thus, we significantly reduced the time and cost spent by professionals.Since automated morphological analysis determines the type of morpheme in accordance with established rules, the subjective opinions and experience of the expert are excluded, enabling an objective evaluation.In this study, the accuracy of morphological analysis was relatively low due the fact that the targets were dialogue sentences.However, we showed that the problem caused by morphological analysis did not affect the overall performance of the experimental element.
The system also showed the assessment results of the speaker's language ability for the relative evaluation, along with the evaluation results of the same age group.We applied the web to the system so that users could access the system without time and space restrictions.This allowed users to receive results in real time whenever the Internet was accessible.The server received the file that the user wanted to analyze through the web and performed a morphological analysis.Upon completion of the morphological analysis, a speech analysis was applied and the result was immediately provided to the user along with the average data.In addition, the proposed system can be used for language analysis for all ages, not just for specific age groups.The immediate results and age independence are the most significant differences between our system and the previously developed system [14].
Section 2 describes the whole system including web client, web server, and databases.Section 3 shows how to use the NLP system and utterance analysis system to evaluate language age.The experiments and their results are shown in Section 4 and the conclusions are presented in Section 5.

Materials and Methods
We developed a system to automate the assessment of Korean language ability of all ages.For this purpose, we built a database that contained the analysis results for the age group.The automated analysis system was implemented on the web for immediate use by users.Therefore, the analysis results could be obtained in real time without being spatially or temporally restricted by a web browser, wherever the Internet was available.
Figure 1 shows the overall configuration of the system proposed in this paper.This system is largely divided into a client part and a server part.In the client part, the user inputs the file through the web browser and views the analysis results.The server part carries out natural language processing and language analysis with the file received from the user.In addition, the average value of the language ability of persons of the same age as the speaker is retrieved from the database.Thus, the user can easily recognize whether the speaker's ability is within the normal range of the corresponding age group.

Client
The client part is the interface that is manipulated when the user uses the system.The system used for this paper has a web-based interface.The web page viewed by the user is composed of a function for uploading a file, a selection of the age of the speaker, and a button for starting the analysis.To evaluate the speaker's ability to speak, a file containing a spoken word is required.These files are referred to as transcription files.Transcription files stored in the user's computer can be analyzed by selecting the files through the file selection button.After selecting the transcription file to analyze, the user selects the age group of the speaker.A screen shot of the web page is shown in Figure 2.
The distinguishing feature of this paper is that this system makes it possible to evaluate the speaker's linguistic ability by showing the evaluation results of the same age group at the same time.To compare and analyze the speaker's ability, the user should add information about the speaker's age.The speaker's age group is one of six categories, divided into 5-7 years old, 8-13 years old, 14-19 years old, 20-39 years old, 40-59 years old, and over 60 years old.After selecting the age group corresponding to the speaker and clicking on the analysis button, the age group's analysis result and the speaker's analysis result are obtained at the same time.
The results of the analysis are given in numerical form according to each evaluation item.The mean value of the same age group as the speaker is also retrieved from the database according to the evaluation item.Thus, the ability of the speaker can be easily compared with the ability of persons from the same age group.The evaluation items provided to the user are the same as those for evaluating language ability in speech pathology.The items, called dependent measurements, are as follows: Total number of utterances, Mean Length of Utterances in morpheme (MLUm), Mean Length of Utterances in words (MLUw), Total number of words (TNW), Number of different words (NDW), and Type-Token Ratio (TTR) [15].

Web Server
In this paper, we constructed a web server using Django, which is a free open-source web application framework built on the Python language [16].Django works the same way as is shown in Figure 3.When a request is received from HTTP, the event is processed by the corresponding method through the URL.The methods are defined in Views.py.The output of the method is displayed to the user again by HMTL and the processed data is read and written to the database.The basic screen design that the user faces is stored in the Template.The Template and its URL are linked so that the screen is displayed differently according to the URL.Django is efficient and systematic in that it can develop both client and server simultaneously.Django provides mySQL3 as an embedded database.The internal repository is easy to use, has fast access to the database, and can provide fast results to users.

Database
The database stores the results of the language ability evaluation for six groups.In order to build this database, many individual transcription files were required.All transcription files were converted into evaluation values through natural language processing and language analysis, and the average evaluation values of each age group were stored in the database.We saved the file to analyze the interview with the interviewee.This is because dialogue is mainly used when assessing language ability in speech pathology.The most natural language is obtained when communicating through dialogue.
Table 1 shows the results of analyzing one person's transcription file.In addition to this, the present study also provides analysis results by part of speech, but these results are not presented because of the page limitation.In order to construct the database, we analyzed 120 transcription files, created by 20 persons per each age group.Each indicator is described in detail in Section 3.2.This system can provide faster results by constructing a database of analysis results for all ages in the internal repository of the web.Using the analysis result data stored in the database, the system provides an average value of each index for the same age as the interviewee.Therefore, the user can make a relative evaluation by instantly comparing the same age with that of the interviewee.In speech pathology, we compare the index of the speaker's analysis with the average index of the same age when discriminating speech disorders [17,18].Although there is no absolute criterion for language impairment, existing Korean automatic analysis systems provide only the results of the speaker's analysis.However, this study has solved this problem through database construction.

Analysis Modules
The analysis module consists of two main parts, namely, a natural language processing module and a dialogue analysis module.The server reads the sentences one by one in the transcription file and performs the preprocessing and morphological analysis using the natural language processing module.The dialogue analysis module then uses the dependent measures to provide the evaluation results in the same way as language pathology does.

Natural Language Processing Module
The role of the natural language processing module is data preprocessing and morphological analysis.The sentences read from the transcription file contain various meaningless symbols and characters.These symbols and characters are removed because they are not used for the evaluation.For example, '., " ?, /, *, !', indicating admiration or the end of a sentence, are removed.
When preprocessing is completed, morphological analysis is performed on the sentence.A morpheme is the smallest unit of a word that has a meaning, and if it breaks down, it loses its meaning [19].The process of segmenting a word into a morpheme and finding a part of speech from an individual morpheme is called morphological analysis [20].In addition, a system that automatically performs morphological analysis using a computer is called a morphological analyzer [21].For this paper, KoNLPy was used for Korean speech analysis.KoNLPy is an open source software for Korean language processing with a Python programming language [22].The morphological analysis package provided by KoNLPy includes Hannanum, Kkma, Komoran, Mecab, and Twitter Class.Among them, this study used Kkma Class that provides a more detailed analysis.Table 2 shows an example of analyzing sentences using a morphological analyzer.When a sentence is analyzed, the sentence is divided into morpheme units.Each morpheme acquires a corresponding tag, which is used to evaluate language ability.Kkma used in this study provides nine parts of speech.Korean has nouns, verbs, adjectives, adverbs, determiners, exclamations, josas, and eomies.
Table 2. Results of analyzing sentences using a morphological analyzer.

Input
Output ('My grandfather and I are at my daughter's house.')

Conversation Analysis Module
The conversation analysis module calculates the individual evaluation index based on the morphological analysis results.For this study, the following dependent measures were used to calculate the evaluation index as used in speech pathology.These indices, extracted from [23][24][25][26] with the help of language pathology experts, are as follows: Total number of utterances; Mean Length of Utterances in morpheme (MLUm), Mean Length of Utterances in words (MLUw), Total number of words (TNW), Number of different words (NDW) and Type-Token Ratio (TTR).Among them, TNW, NDW, and TTR correspond to the semantics, and MLUm and MLUw correspond to the syntax and morphology.The evaluation of each item was made using the following calculation method.Type − Token Ratio (TTR) = The total number of words in utterances The total number of different words in utterances (6) In language pathology, the more we talk, the higher the language ability.In addition, the more vocabulary that constitutes one sentence, the higher the language ability.Equations ( 1)-(3) reflect this.
The ability to use various vocabulary words also helps to judge language ability.Equations ( 4)-( 6) reflect this.

Transcription Utterances
The user inputs the transcription file in the same format as Table 3 to evaluate the speech ability of the speaker.Transcription files consist of turn, utterance, and utterance contents.The turn is the number of questions asked by the interviewer, and the utterance is the number of sentences answered by the interviewee.Therefore, if an interviewee answers a question with two sentences such as utterances 5 and 6, the turn does not change but the utterance is increased by one.The utterance contents refer to the contents of the interviewee's answer.

Execution Results
Figure 4 shows the results of the analysis.The six indicators obtained from the results are as described in Section 3.2.From this screen, it is easy to evaluate the individual value of the speaker and the average value of the same age group at the same time, so that the relative evaluation is easy to obtain.Figure 4a shows the evaluation results of a speaker with a slightly higher language ability than the speakers of the same age.For all indicators, including the total number of utterances, the speaker has a slightly higher language level than the average for persons of the same age.On the other hand, the speaker in Figure 4b shows a slightly lower language level than speakers from the same age group.However, this difference does not necessarily mean that the speaker needs language therapy.

Comparison with Manual Analysis
The main difference between manual analysis and automatic analysis is whether the morphological analysis is done by a person or a machine.One of the main reasons why clinical researchers are reluctant to perform automatic analysis is that morphological analysis is less accurate.The accuracy of the morphological analysis of spoken sentences is known to remain at 70%.However, in this study, the level of language is analyzed through the relative comparison of individual indicators.Therefore, the consistency of the relative position of indicator values is more important than the accuracy of the morphological analysis itself.Table 4 shows the correlation coefficients between human-derived and machine-derived results for individual indicators.The correlation is calculated from 0.87 to 1.In other words, morphological analysis by a computer does not affect the quality of the whole analysis.In addition, since the morphological analysis is performed by one algorithm, there is no inconsistency in the analysis result according to the analyst.Therefore, a more objective and consistent analysis is possible.Finally, the time required for analysis differs to such an extent that it cannot be compared.Therefore, the automatic analysis system proposed in this study shows that the efficiency is very high, while the accuracy is never lower than the manual analysis.

Figure 1 .
Figure 1.Configuration of the whole system.

Figure 2 .
Figure 2. Screen shot of the client part.

3 ) 4 )
Total number of utterances = The number of utterances of the speaker (1) Mean Length of Utterances in morpheme (MLUm) = The total number o f morphemes in utterances Total number o f utterances (2) Mean Length of Utterances in words (MLUw) = The total number o f words in utterances Total number o f utterances (Total number of words (TNW) = The total number of words in all the utterances (Number of different words (NDW) = The total number of different words in all the utterances (5)

Figure 4 .
Figure 4. Result screen of our system.(a) shows the result of an interviewee having a higher language ability than his/her age.However, (b) shows the opposite case.

Table 1 .
Example of the analysis results from a single transcription file.

Table 3 .
Example of transcription file.

Table 4 .
Comparison between machine and human analysis.