Artificial Intelligence-Assisted Diagnosis for Early Intervention Patients

Sierra, Ignacio; Díaz-Díaz, Norberto; Barranco, Carlos; Carrasco-Villalón, Rocío

doi:10.3390/app12188953

Open AccessArticle

Artificial Intelligence-Assisted Diagnosis for Early Intervention Patients

by

Ignacio Sierra

^1,*

,

Norberto Díaz-Díaz

¹

,

Carlos Barranco

¹ and

Rocío Carrasco-Villalón

²

¹

Intelligent Data Analysis Group (DATAi), Pablo de Olavide University, Ctra. de Utrera, Km 1, 41013 Sevilla, Spain

²

Centro Atención Temprana, Hospital San Juan de Dios, 41005 Sevilla, Spain

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(18), 8953; https://doi.org/10.3390/app12188953

Submission received: 20 December 2021 / Revised: 31 May 2022 / Accepted: 15 July 2022 / Published: 6 September 2022

(This article belongs to the Special Issue Artificial Intelligence Developments in Healthcare: Diagnosis, Rehabilitation and Screening)

Download

Browse Figures

Versions Notes

Abstract

:

The use of artificial intelligence to aid decision making is widely adopted today. Its application is found in different areas, among which the medical one is the most disruptive. However, there are few or no applications in Early Care that aid in the diagnosis and automatic assignment of therapy processes for children to help these centers. The objective of this work is to make a first approach to the problem and carry out a real proof of concept that demonstrates that this type of system can be useful in Early Care where the diagnosis and subsequent treatment must be determined by a multidisciplinary team. To measure the quality of the use of this type of technology, different machine learning techniques will be used on a real data set provided by the San Juan de Dios Hospital. This study will allow us to analyze the behavior of these techniques compared to traditional diagnosis. To make this comparison, there will be a qualified point of view in the field of children diagnosis.

Keywords:

artificial intelligence; assisted decision making; early intervention; computer assisted medicine; computer-aided diagnosis

1. Introduction

Early Care provides special care and services to children and young adults with functional diversity or developmental delays. It consists of a series of measures, treatments and programs aimed at children up to 6 years of age suffering from or at risk of developmental disorders. The aim is to minimize any potential negative effects of these disorders on the growth and evolution of these children and reduce limitations. It also provides parents and other adults playing key roles in the lives of these children with tools to help them access mechanisms and knowledge, in turn giving these children experience and opportunities to develop skills of use for everyday life. The importance of early intervention has been recognized by international bodies including UNESCO [1] and the European Agency for Development in Special Needs Education [2]. These studies highlight the importance, benefits, and complexity of choosing a suitable treatment [3]. As with any medical treatment, accurate diagnosis is essential to ensuring a correct treatment is put in place. However, this process is further complicated by the type of target patient (pediatric) and the multidisciplinary nature of the field of Early Care. Cognitive disorders, language disorders, physical disorders, emotional disorders, and behavioral disorders are all part of this field where, unlike in other disciplines, professionals must make a diagnosis based primarily on their subjective experience, observation, and other studies of the subject’s development, not solely from a pathological perspective. In this regard, the use of artificial intelligence can play a very important role in helping the professionals in the process of the diagnosis and classification of the patient.

Focusing on the application of artificial intelligence (AI) in the field of medicine, there are many instances of its application in areas such as diagnosis, the discovery of new medication, surgery, and personalized medicine [4]. However, as Reddy Rajula [5] highlights, comparatively few studies have been devoted to pediatrics, although many studies have been carried out in the different medical fields.

Artificial intelligence (AI) is capable of detecting certain illnesses with levels of precision similar to those displayed by health professionals. The main advantage of the use of AI is the ability to review thousands of medical files in order to identify illness patterns. This offers great potential to improve diagnosis precision and speed. Although these algorithms have rarely improved on the results of experienced medical teams, they have proved their usefulness in aiding with the diagnosis and support of less experienced teams. This type of system is extremely useful in complex or multidisciplinary fields such as the diagnosis of psychiatric disorders [4,6], prediction of the risk of Type-II Diabetes [7], and election of genes relevant to the development of cancer that can help improve genetic diagnoses Guyon et al. [8]. Studies such as that by Davenport and Kalakota [9] highlight the vast potential of AI application in the field of medicine.

This successful application of AI in different fields of medicine suggests it may also be of use in the field of Early Care, where no approaches of this sort are found. Developing these solutions would result in a reduction of diagnosis periods, helping the physicians to adequately assign the treatment process to be followed by patients. It should be noted that an accurate initial classification speeds up the process for improving patient health.

The audience of this article includes those AI specialists that want to understand the methodology we can use to implement an AI system to multidisciplinary complex processes as well as those doctors that want to understand the methodology and results we found applying AI to Early Care Diagnostics. With the AI system described in the article, a single doctor will be able make the decision instead of having a multidisciplinary team do it. The hospital will be able to reduce the time they need to diagnose a patient, and the treatment assignment will be more homogeneous as the different doctors will be advised by the system to apply the same treatment to patients with equivalent symptoms.

2. Methology

The methodology selected for this research is CRISP-DM (CRoss-Industry Standard Process for Data Mining) [10], which proposes six different stages: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. The article is based on the Spanish health system; it may not fully reflect systems from other countries around the world, but the methodology can be used in other multidisciplinary fields and in other countries. The way that the medical community is registering the information makes it quite challenging to develop a standard model, as even the way of registering it can differ from medical organization to medical organization. We will further discuss this point in the data understanding section.

The approach proposed, taking into account the phases stated above, is detailed below.

2.1. Problem Understanding

The first step was to gain an understanding of the requirements for objective results for the research in question, transforming the problem at hand into an artificial intelligence problem. It is therefore necessary to clearly define the proposed objectives.

A major challenge of Early Care is the need for interdisciplinary work from several specialists—or from a professional qualified in different disciplines—in order to produce a suitable diagnosis. When the diagnosis is provided by multidisciplinary teams, this usually results in overlaps from different specialists, as at these stages of patient development, it is common for the diagnosis to be unclear and evolve over time. Furthermore, even when the responsibilities and fields of the different specialists are clearly defined, their individual interpretations may not match. Different approaches for possible discrepancies can be found in studies such as Rocio et al. [11] (Rocio Carrasco Villalon, Coordinator of the Early Care Centre at San Juan de Dios Hospital in Seville), which specifies a process to be followed to simplify diagnosis. Selecting a single diagnosis ensures that children and their relatives do not receive conflicting information from different specialists and are provided with a detailed rundown of the steps to be followed by the patient, helping to relieve the pressure they are under in these situations. For this, diagnoses and treatments are organized into processes, in turn limiting the amount of information provided. Our AI system will classify the patient into one of the four diagnostics; each diagnostic will imply a specified treatment plan (process). The treatment length will be specific from patient to patient.

Four different processes (diagnosis) are specified:

1.: Cognitive, also known as neurocognitive disorders (NCDs). This category covers mental health disorders which fundamentally affect cognitive skills such as learning, memory, perception, and problem solving.
2.: Communication and language disorders, communication disorders which hinder learning and the use of any type of spoken language, written language, sign language, etc.
3.: Sensory-motor disorders which display physical symptoms such as pain, poor strength, and lack of mobility.
4.: Socio-communicative disorders which are often linked to behavioral disorders.

Thus, assigning a specific process to a patient becomes an essential part of their treatment. Having access to a smart system to assist in the decision-making process could help relieve the medical specialist of pressure while increasing the accuracy of the assignment made.

The aim of this study is to carry out a concept test which analyzes the feasibility of developing a system based on artificial intelligence to aid in decision making when selecting a specific treatment process for a given patient. It should be stressed that this classification can be especially complex and require highly experienced specialists, as the patients are still developing cognitive and motor skills, something which also leads to shifts in the process throughout treatment. Our AI system will assist on classifying our patients into four groups (processes); these groups are mapped to a specific treatment.

2.2. Data Understanding

Data understanding begins with an initial data compilation in order to become acquainted with the information available at the outset and identify the quality of the initial information.

It should be noted that when the patient arrives at the Early Care center, they have already completed a basic questionnaire for the primary care physician identifying the reasons for their referral to the specialist center. Later, at the Early Care center, the specialist carries out a more in-depth interview with the relatives, which is documented in natural language. Following this interview, the patient is classified under one of the following processes: physical, sensory, cognitive, or behavioral disorders.

The data available can therefore be divided into two groups:

1.: Basic questionnaire with different options, completed by the early care physician and easily processed using an automatic learning algorithm.
2.: Fields completed in natural language by a qualified specialist who interviews the child’s family during the first visit to the primary care center. These fields are not consistent in terms of format, units, number of registers completed or other concepts for structuring information. Therefore, these data must be transformed prior to their use in artificial intelligence techniques.

2.3. Data Preparation

The third phase covers all the activities for the construction of the final dataset using raw data. Focusing on the study at hand, this phase was one of the longest and most tedious given the nature of the initial data available.

More complex preparation was required to process fields expressed in natural language, as the rest were Boolean or categorical. During processing, keywords were identified in order to detect the meaning of the statement or to identify the basic concept expressed. For every natural language text, the doctor that helped in this project helped us define the key words we should look for inside the text. We built a script that transforms the free text field in several Boolean fields that will be true if we detect the keyword or false in case we did not find the word in the natural language field.

For example, if considering an original field expressing the patient’s family antecedents, a search would be carried out for keywords such as: ASD (autism spectrum disorders), ADHD (attention deficit hyperactivity disorder), speech problems, attention issues, mental issues, Asperger’s and Down syndrome. To do so, the field of family antecedents is divided into eight Boolean fields, which are used at a later stage to train the unsupervised AI system. The same process is followed for the rest of fields with these characteristics such as complications at birth or type of birth.

Furthermore, it is also necessary to prepare the data for some fields due to a lack of consistency in the units used. For instance, there are fields such as newborn baby weight, where values can be expressed both in grams and kilograms. The process for assigning a consistent unit was automated for the purposes of homogenization.

It was also necessary to treat blank values. To do so, any fields exceeding

25 %

of the total record blanks were eliminated.

In those cases where we have values in more that

75 %

of the total records but we find blank values, we replace the blank value with the mean for the numerical variable cases, and the median was used for categorical variables.

Finally, it should be noted that atypical values, known as outliers, were ruled out as due to their nature, they were considered errors when inputting data. We consider an outlier a value greater than two times the medium.

As a summary, Table 1 specifies the different fields used as well as the transformations and decomposition carried out. In order to explain this scheme, each individual column will be detailed paying attention, for instance, to the patient antecedents field.

1.: Medical Report field: ACOIDPEAMTFAMILIARES is the identifier of the field chosen as an example.
2.: Field description: This value is an interpretation of the meaning of the field in question.
3.: Data type: The possible values are: Categorical, Numerical, Free text (equivalent to natural language), or Not relevant.
4.: Completness: Percentage of values completed in this field.
5.: Transformation: Expresses where a transformation has been carried out.
6.: Final variable: Contains the final variable type after transformation.
7.: Comment: Specifies whether there are any additional aspects to be taken into consideration, i.e., No comment.
8.: S (selected): Value is 1 if this field was chosen as an explanatory variable to train the model.

After completing these data preparation tasks, the final list of explanatory variables is 40, which shows the tasks used to train the system’s unsupervised learning system.

2.4. Modeling

During the modeling phase, different learning algorithms and their parameterizations were applied. For the purposes of selection, algorithms used successfully in medical applications were considered. The following subsections describe these and some of their results in the field.

2.4.1. Random Forest

This approach [12] is a combination of prediction trees where each tree depends on the values of an independently tested random vector with identical distribution. This is a collection of uncorrelated trees, the results of which are averaged. The models based on decision trees are easy to understand as they can be graphically represented. These algorithms have been used in certain medical fields including to predict the risk of Type-II Diabetes [7].

2.4.2. Linear Regression or Adjustment

Linear regression analysis is used to forecast the value of a variable based on the value of another. The variable to be forecast is known as a dependent variable. The variable used to forecast the value of the first is termed an independent variable. This form of analysis calculates the coefficients of the linear equation, involving one or more independent variables which offer a more precise prediction of the value of the dependent variable. Linear regression adjusts to a straight line or surface, minimizing discrepancies between the predicted and actual output values [13]. In the field of medicine, this technique is applied in the study by Anne B. Newman et al. [14] on the relationship between sleep-disordered breathing and changes in weight.

2.4.3. Linear Support Vector Machine (LSVM)

This is a set of supervised learning algorithms developed by Cortes and Vapnik [15]. These methods are characteristically linked to classification and regression problems. Given a set of training examples (of samples), it is possible to label the classes and train an SVM to build a model to predict the class of a new sample. Intuitively, an SVM is a model representing sample points in space, separating the classes into two spaces as large as possible using a separation hyperplane. This hyperplane is defined as the vector between two points, of both classes, closer to the support vector. When the new samples are correlated to this model, based on the space they are found in, they can be classified. An adequate separation between classes will lead to a correct classification.Applications of this approach can be found, for instance, in Guyon et al. [8], which carries out a selection of genes relevant to the development of cancer that can help improve genetic diagnoses.

2.4.4. C5 Classifier

C5 is a type of decision tree [16] that divides the sample based on the field offering maximum information gain. The different subsamples defined by the first division can be divided again, generally based on a different field, and the process is repeated until it is no longer possible to divide the subsamples. Finally, the lower level divisions are examined, and those making no significant contributions to the model value are eliminated or cut off.

Use of this classifier in the field of medicine can be found in the work of [17], where it is used to predict coronary arteriopathy.

2.4.5. CHAID: Chi-Square Automatic Interaction Detection

CHAID, presented by Kass [18], consists in the use of categorical variables to segment the population into progressively smaller groups, based on the best predictor categories. This decision tree technique uses Chi-square statistics to measure the strength of association between a given predictor and the criterion to be maximized, while the model results are presented in an easy-to-interpret tree diagram. Decision trees are represented working downwards from the root node to the leaf nodes or terminal nodes. This approach creates an initial level of decision nodes, showing the strongest predictor values for the dependent variable and automatically establishing how to group the values of this predictor into a manageable number of categories. Subsequently, another level of decision nodes is created using the strongest of the remaining predictors, thus continuing to work down to generate the terminal nodes of the decision tree.

An example of the use of this algorithm in medicine can be found in the study by Pavlína [19], which proposes the automatic evaluation of diagnostic tests.

2.4.6. XGBOOST or eXtreme Gradient Boosting

This approach, established by Chen and Guestrin [20], is part of the decision tree family and aims primarily to reduce bias and variance. To do so, weak trees are created initially in order to generate a sequence of new (or student) trees focusing on the weakness (poorly classified data) of the preceding one. After adding a weak learner, data weighting is readjusted, which is a process known as “reweighting”. The whole forms a solid model following convergence due to self-correction after each new learner is added. The algorithm can prune trees to eliminate low-probability branches. There is a limit to the level of loss of the model in order to penalize the complexity of the model with regularization and to soften the learning process (reduce the possibility of overfitting).

In the field of medicine, this technique can be found in the work of Liu et al. [21] and that of Wei and Mooney [22] to predict the progression of breast cancer or to detect epileptic attacks in clinical electroencephalograms, respectively.

3. Results and Discussion

The analysis of our proposal, which includes the development of an MVP (Minimum Viable Product), was carried out at San Juan de Dios Hospital (Seville), specifically in the Early Care unit, which treats children up to 6 years of age, aiming to work on patient skills in order to acquire the skills required for integration into society. For this, the models described in Section 2.4 were trained taking into consideration the 420 files in the hospital.

The goodness of fit, summarized in Table 2, was calculated comparing the treatment automatically proposed by the algorithm with that proposed by the expert, carrying out an 80–20 division of test–training. It should be noted that the field “VN” states the number of explanatory variables used in each individual model. As explained in Section 2.3, the initial number of variables chosen was 40. However, 10 and 9 were used for the CHAID and C5 methods, respectively, as the other variables did not add value to the result.

Based on the precision of the different approaches, it can be observed that XGBOOST displays the best behavior, with a precision value of up to

86.45

. Thus, it can be stated that the analysis of 420 files resulted in 370 correct diagnoses and 58 erroneous ones.

Due to the nature of the study, where it is important to prevent an excessive concentration of errors in some diagnoses, the XGBOOST confusion matrix is presented in Table 3. Here, the different diagnoses (processed) are coded from I to IV following the description made in Section 2.1. According to the values presented in the aforementioned section, it can be observed that the errors are not imbalanced. Some of the errors of the classification system are confusions between a cognitive disorder and a language and communication disorder, as even for experienced specialist physicians, it can be difficult to distinguish between the two at certain ages.

Finally, Figure 1 shows the system gain. Gains are defined as the proportion total hits that occur in each quantile or increment. That is:

G a i n s = \frac{number of hits in quantile}{total number of hits} \times 100

Notice that our graph can only show results for one categorical value at a time: in our case, processes that were correctly identified. The lines in the figure represent the following:

The thick diagonal red line is the at-chance model.
The blue line is the perfect classifier.
The green line in the middle is our current model.
The area under the green curve represents how well our system is behaving.

In the figures, the Y-axis represents the cumulative percent of hits or the gained, so for example, at value 20, we have found 20% of the category correctly identified for a certain category, at value 60, we have found 60%, and so on. The X-axis represents the percentile groups ordered by confidence.

Taking a closer look at the perfect prediction line, we can see that, for example, in the sensory–motor process, once we have gone through approx. 20% of the data, the model correctly predicts 100% of the time. If we take a look at the distribution (see Table 4), this corresponds to the distribution of sensory–motor cases. The area under the green line is known as “under the curve”, which indicates how much better our model is than the at-chance model, and the area under the blue line indicates where our model can be improved, as our goal is to be as close to the perfect prediction model as possible.

Based on this information, we can see that the models behave close to the perfect blue line and highly improve from the at-chance model.

Deployment

After training the model and confirming its viability, the following step is to integrate it into the real system. This step, which is in the initial phase, consists of deploying a Cloud system model, which can be accessed by the different centers of San Juan de Dios Hospital (see Figure 2). In our case, we use SPSS modeler to test and select the model we want to implement, and we deploy it in IBM cloud. We based our deployment on application programming interface (API); basically, an API defines which variables we will use to call the model and which format will receive the answer. API permits distributed systems to be easily connected. Normally, this API uses a standard formatting called JSON. A JSON, or JavaScript Object Notation, is a minimal, readable format for structuring data. It is used primarily to transmit data between a server and web application. The advantage of this type of deployment is that we create a web-based service, and anyone with the required access will be able to call this API via HTML call or a web browser, and passing the parameter in the URL, he will be able to see the result. The system can be used easily, because any hospital can call the API and obtain the prediction, as the hospital will pass only the parameter without any private information (PI) data of the patient: it will protect our patients’ PI.

In our case, we call the JSON API microservice with a JSON containing the value of the forty explanatory variables. Section 2.3.

This service would answer by predicting the classification integrated into another JSON. The proposed architecture, based on a microservice, would facilitate its integration into the work flow by reducing the complete process to a single call.

The advantage of deploying this type of microservice in the cloud is that everyone with access will be able to call the service and obtain the prediction. If the workload grows, we will be able to easily escalate the system, adding additional power to our prediction cloud infrastructure.

4. Conclusions and Future Work

This article proposes the use of artificial intelligence techniques in the field of Early Care to help experts make decisions. Specifically, the application of different techniques has been proposed to classify real patients in different treatments. The data set as well as the results obtained have been provided and validated by the San Juan de Dios hospital. This initial study proves that this type of technique will help in the diagnosis in ECI.

The most challenging part was to make the transformation from natural language to variables that we can send to the AI system in a way that can be processed so that the system can make a prediction (transforming the natural language into simple numerical categorical variables). As the next steps, we propose creating a new form where apart from the NL (Natural Language), the specialist code contains the main idea via a drop-down or numeric field. We would like to test if this way of registering the information will improve the results.

Author Contributions

Conceptualization, I.S. and R.C.-V.; Data curation, I.S. and C.B.; Formal analysis, N.D.-D.; Funding acquisition, N.D.-D.; Methodology, N.D.-D. and I.S.; Resources, R.C.-V.; Software, I.S.; Supervision, N.D.-D. and C.B.; Validation, R.C.-V.; Writing—original draft, I.S.; Writing—review and editing, N.D.-D. and C.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by Pablo de Olavide University (PPI1901) and by the Junta de Andalucia, under the Andalusian Plan for 334 Research, Development and Innovation, TIC-239.

Institutional Review Board Statement

All participants gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of Pablo de Olavide University. (22/7-8).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ministry of Education and Science of Spain. Declaración de Salamanca y Marco de Acción Sobre Necesidades Educativas Especiales. In Proceedings of the World Conference on Special Needs Education: Access and Quality, Salamanca, Spain, 7–10 June 1994; Available online: https://unesdoc.unesco.org/ark:/48223/pf0000098427_spa (accessed on 1 December 2021).
Soriano, V.; Alonso Gutierrez, M.V. Atención Temprana Análisis de la Situación en Europa Aspectos clave y Recomendaciones; European Agency for Development in Special Needs Education: Brussels, Belgium, 2005; Available online: https://www.european-agency.org/sites/default/files/early-childhood-intervention-analysis-of-situations-in-europe-key-aspects-and-recommendations_eci_es.pdf (accessed on 1 December 2021).
Monsalve González, A.; Núñez Batalla, F. La importancia del diagnóstico e intervención temprana para el desarrollo de los niños sordos. Los programas de detección precoz de la hipoacusia. Psychosoc. Interv. 2006, 15, 7–28. [Google Scholar] [CrossRef]
Liu, G.D.; Li, Y.C.; Zhang, W.; Zhang, L. A Brief Review of Artificial Intelligence Applications and Algorithms for Psychiatric Disorders. Engineering 2020, 6, 462–467. [Google Scholar] [CrossRef]
Rajula, H.S.R. Comparison of Conventional Statistical Methods with Machine Learning in Medicine: Diagnosis, Drug Development, and Treatment. Med. J. 2020, 8, 455. [Google Scholar] [CrossRef] [PubMed]
Tai, A.M.; Albuquerque, A.; Carmona, N.E.; Subramanieapillai, M.; Cha, D.S.; Sheko, M.; Lee, Y.; Mansur, R.; McIntyre, R.S. Machine learning and big data: Implications for disease modeling and therapeutic discovery in psychiatry. Artif. Intell. Med. 2019, 9, 101704. [Google Scholar] [CrossRef] [PubMed]
Xu, W. Risk prediction of type II diabetes based on random forest model. In Proceedings of the 2017 Third International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB), Chennai, India, 27–28 February 2017. [Google Scholar] [CrossRef]
Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene Selection for Cancer Classification using Support Vector Machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
Davenport, T.; Kalakota, R. The potential for artificial intelligence in healthcare. Future Healthc. J. 2019, 6, 94–98. [Google Scholar] [CrossRef] [PubMed]
Azevedo, A.; Santos, M.F. KDD, SEMMA and CRISP-DM: A parallel overview. In Proceedings of the IADIS European Conference Data Mining, Amsterdam, The Netherlands, 24–26 July 2008; Abraham, A., Ed.; IADIS: Lisbon, Portugal, 2008; pp. 182–185. [Google Scholar]
Ponce Rodriguez, L.; Carrasco Villalon, R. Propuesta y aplicación de una gestión por procesos para la intervención y atención temprana. In Proceedings of the VXII Jornadas de Atención Temprana en Andalucía, Andalucía, Spain, 19 February 2021. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 57. [Google Scholar] [CrossRef]
Weisberg, S. Applied Linear Regression, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2005. [Google Scholar]
Newman, A.B.; Foster, G.; Givelber, R.; Nieto, F.J.; Redline, S.; Young, T. Progression and Regression of Sleep-Disordered Breathing with Changes in Weight: The Sleep Heart Health Study. Arch. Intern. Med. 2005, 165, 2408–2413. Available online: https://jamanetwork.com/journals/jamainternalmedicine/articlepdf/486784/ioi50103.pdf (accessed on 14 November 2021). [CrossRef] [PubMed]
Cortes, C.; Vapnik, V. Support Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
Ahmadi, E.; Weckman, G.R.; Masel, D.T. Decision making model to predict presence of coronary artery disease using neural network and C5.0 decision tree. J. Ambient Intell. Humaniz. Comput. 2018, 9, 999–1011. [Google Scholar] [CrossRef]
Kass, G.V. An Exploratory Technique for Investigating Large Quantities of Categorical Data. J. R. Stat. Soc. Ser. Appl. Stat. 1980, 29, 119–127. [Google Scholar] [CrossRef]
Kuráňová, P. Evaluation of the Phadiatop test results using CHAID algorithm and logistic regression. In Proceedings of the 2015 International Conference on Information and Digital Technologies, Zilina, Slovakia, 7–9 July 2015; pp. 178–182. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. J. R. Stat. Soc. Ser. Appl. Stat. 2016, 785–794. [Google Scholar] [CrossRef]
Liu, P.; Fu, B.; Yang, S.X.; Deng, L.; Zhong, X.; Zheng, H. Optimizing Survival Analysis of XGBoost for Ties to Predict Disease Progression of Breast Cancer. IEEE Trans. Biomed. Eng. 2021, 68, 148–160. [Google Scholar] [CrossRef] [PubMed]
Wei, L.; Mooney, C. Epileptic Seizure Detection in Clinical EEGs Using an XGboost-based Method. In Proceedings of the 2020 IEEE Signal Processing in Medicine and Biology Symposium (SPMB), Philadelphia, PA, USA, 5 December 2020; pp. 1–6. [Google Scholar] [CrossRef]

Figure 1. Gain of our artificial intelligent system (AI) against at-chance model (AG). Specialist diagnostic (SP) for the different processes.

Figure 2. Architectural diagram.

Table 1. Data extracted from the system/transformation/selection.

Early Care Doctor Report Fields	Fields Description	Data Type	%Completness	Transformation	Final variable	Comment	S
ACO_CK_P_DERIVAR	Transfer	Boolean	85		Boolean
ACO_CK_P_REALIZAR_VI	Make diagnosis	Boolean	85		Boolean
ACO_CK_P_SEGUIMIENTO	Follow-up	Boolean	85		Boolean
ACO_CK_PAÑAL_24	Diaper	Boolean	85		Boolean		1
ACO_CK_PAÑAL_NOCHE	Night diaper	Boolean	85		Boolean		1
ACO_CK_PAÑAL_SUCIO	Dirty diaper	Boolean	85		Boolean		1
ACO_CK_RET_MADURATIVO_LENGUA	Language delay disorder	Boolean	85		Boolean		1
ACO_CK_RETRASO_COGNITIVO	Cognitive	Boolean	85		Boolean		1
ACO_CK_RETRASO_MOTOR	Motor delay disorder	Boolean	85		Boolean		1
ACO_CK_RETRASO_PSICO	Psychological	Boolean	85		Boolean		1
ACO_CK_SEÑALES_DE_ALERTA	Warning sign	Boolean	85		Boolean		1
ACO_CK_TRASTORNO_COGNITIVO	Cognitive disorder	Boolean	85		Boolean		1
ACO_CK_TRASTORNO_LENGUA	Language disorder	Boolean	85		Boolean		1
ACO_CK_TRASTORNO_COMUNICA	Comunication disorder	Boolean	85		Boolean		1
ACO_CK_TRASTORNO_MOTOR	Motor disorder	Boolean	85		Boolean		1
ACO_CK_TRASTORNO_PSICO	Psychological disorder	Boolean	85		Boolean		1
ACO_CK_TRASTORNO_SENSORIAL	Sensory disorder	Boolean	85		Boolean		1
ACO_CMB_MEDICOS_SERVICIO	Doctors	Non-Relevant	85			Non-Relevant
ACO_ID_ALERGIAS	Allergies	Non-Relevant	0.1			Barely filled
ACO_ID_AP_ALIMENTACION	Feeding	Natural Language	57	True (Eats well)False (Eats Badly)	Boolean		1
ACO_ID_AP_COMPLICA	Birth Complications	Free text	67	Respiratory(itis)	Three boolean variables		1
				Convul			1
				Cardio			1
				Other			1
ACO_ID_AP_HISTORIA_FAMI	Family background	Free text	8	Separation	Boolean		1
ACO_ID_AP_HOSPITALIZA	Hospitalization	Free text	56	Boolean	Boolean		1
ACO_ID_AP_TIPO_LACTANCIA	Lactation	Free text	68	Breastfeeding	Numeric	Until when, Numeric	1
ACO_ID_CENTRO_SALUD	Family hospital	Categorical	85		Categorical		1
ACO_ID_DF_ACTITUD_DIAG	Attitude toward diagnosis	Non-Relevant	1			Barely filled
ACO_ID_DF_COINCIDENC_PROB	Problem awareness	Non-Relevant	10			Barely filled
ACO_ID_DF_MOTIVA_COLABORAR	Collaboration	Non-Relevant	1			Barely filled
ACO_ID_DF_NECES_APOYO	Need help	Non-Relevant	0.2			Barely filled
ACO_ID_DF_REL_FAMI	Family Relationship	Free text	23	Good/Bad	Boolean		1
ACO_ID_EMBARAZO	Pregnancy	Free text	76	With/without problems	Boolean		1
ACO_ID_FECHA_ACOGIDA	Admission date	Non-Relevant	82			Non-Relevant
ACO_ID_FECHA_DERIVACION	Derivation date	Non-Relevant	70			Non-Relevant
ACO_ID_HE_CEFALICO	Head diameter	Free text	13	Unit consistency	Numeric	Barely filled
ACO_ID_HE_BIPEDESTA	Stand up	Free text	4	Unit consistency	Numeric	Barely filled
ACO_ID_HE_ESFINTERES	Sphincters	Free text	18	Unit consistency	Numeric	Barely filled
ACO_ID_HE_FRASE	First sentence	Free text	6	Unit consistency	Numeric	Barely filled
ACO_ID_HE_GATEO	Crawl	Free text	38	Unit consistency	Numeric	Barely filled
ACO_ID_HE_INI_MARCHA	Begin Walking	Free text	42	Unit consistency	Numeric	Barely filled
ACO_ID_HE_PRI_PALABRA	First word	Free text	45	Unit consistency	Numeric	Barely filled
ACO_ID_HE_MARCHA	Walk	Free text	15	Unit consistency	Numeric	Barely filled
ACO_ID_HE_SEDESTACION	Seat	Free text	11	Unit consistency	Numeric	Barely filled
ACO_ID_MEDICO_FAMILIA	Family Doctor	Non-Relevant	77			Non-Relevant
ACO_ID_P_A_TERMINO	End of the process	Non-Relevant	14			Non-Relevant
ACO_ID_P_CON_ANR	Birth weight	Non-Relevant	2			Non-Relevant
ACO_ID_P_DERIVAR_A	Change to	Non-Relevant	0			Non-Relevant
ACO_ID_P_MOTIVO_ALTA	Reason medical discharge	Non-Relevant	0			Non-Relevant
ACO_ID_P_MULTIPLE	Multiple birth	Non-Relevant	0			Non-Relevant
ACO_ID_P_OBSERVACIONES	Observations	Difficult to extract	0	Non-Relevant		Non-Relevant
ACO_ID_P_PREMATURO	Premature	Free text	17	True/False	Boolean		1

Table 2. Precision of the AI algorithms considered.

Model	Precision	VN
XGBoost	86.45	40
Random Forest	79.44	40
Linear Regresion	73.83	40
LSVM	68.46	40
C5	61.45	9
CHAID	68.46	10
NEURAL NETWORK	48.131	40

Table 3. Confusion matrix.

Predicted	I	II	III	IV
Actual	I	II	III	IV
I (Cognitive)	129	18	1	1
II (Communication and language)	8	121	2	4
III (Sensory–motor)	10	1	73	0
IV (Socio-communicative)	6	7	0	47

Table 4. Distribution of all the cases classified by the different processes.

Process	Distribution
I (Cognitive)	34.81
II (Communication and Language)	31.54
III (Sensory–Motor)	19.63
IV (Socio-Communicative)	14.02

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sierra, I.; Díaz-Díaz, N.; Barranco, C.; Carrasco-Villalón, R. Artificial Intelligence-Assisted Diagnosis for Early Intervention Patients. Appl. Sci. 2022, 12, 8953. https://doi.org/10.3390/app12188953

AMA Style

Sierra I, Díaz-Díaz N, Barranco C, Carrasco-Villalón R. Artificial Intelligence-Assisted Diagnosis for Early Intervention Patients. Applied Sciences. 2022; 12(18):8953. https://doi.org/10.3390/app12188953

Chicago/Turabian Style

Sierra, Ignacio, Norberto Díaz-Díaz, Carlos Barranco, and Rocío Carrasco-Villalón. 2022. "Artificial Intelligence-Assisted Diagnosis for Early Intervention Patients" Applied Sciences 12, no. 18: 8953. https://doi.org/10.3390/app12188953

APA Style

Sierra, I., Díaz-Díaz, N., Barranco, C., & Carrasco-Villalón, R. (2022). Artificial Intelligence-Assisted Diagnosis for Early Intervention Patients. Applied Sciences, 12(18), 8953. https://doi.org/10.3390/app12188953

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Artificial Intelligence-Assisted Diagnosis for Early Intervention Patients

Abstract

1. Introduction

2. Methology

2.1. Problem Understanding

2.2. Data Understanding

2.3. Data Preparation

2.4. Modeling

2.4.1. Random Forest

2.4.2. Linear Regression or Adjustment

2.4.3. Linear Support Vector Machine (LSVM)

2.4.4. C5 Classifier

2.4.5. CHAID: Chi-Square Automatic Interaction Detection

2.4.6. XGBOOST or eXtreme Gradient Boosting

3. Results and Discussion

Deployment

4. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI