Exploring the Eating Disorder Examination Questionnaire, Clinical Impairment Assessment, and Autism Quotient to Identify Eating Disorder Vulnerability: A Cluster Analysis

Eating disorders are very complicated and many factors play a role in their manifestation. Furthermore, due to the variability in diagnosis and symptoms, treatment for an eating disorder is unique to the individual. As a result, there are numerous assessment tools available, which range from brief survey questionnaires to in-depth interviews conducted by a professional. One of the many benefits to using machine learning is that it offers new insight into datasets that researchers may not previously have, particularly when compared to traditional statistical methods. The aim of this paper was to employ k-means clustering to explore the Eating Disorder Examination Questionnaire, Clinical Impairment Assessment, and Autism Quotient scores. The goal is to identify prevalent cluster topologies in the data, using the truth data as a means to validate identified groupings. Our results show that a model with k = 2 performs the best and clustered the dataset in the most appropriate way. This matches our truth data group labels, and we calculated our model’s accuracy at 78.125%, so we know that our model is working well. We see that the Eating Disorder Examination Questionnaire (EDE-Q) and Clinical Impairment Assessment (CIA) scores are, in fact, important discriminators of eating disorder behavior.


Introduction
According to the Diagnostic and Statistical Manual of Mental Disorders 5th Edition (DSM-V), an eating disorder is defined by a "persistent disturbance of eating or eating-related behavior that results in the altered consumption or absorption of food" [1]. Currently, an eating disorder can be categorized into one of six subtypes: pica, rumination disorder, avoidant/restrictive food intake disorder, anorexia nervosa (AN), bulimia nervosa, and binge-eating disorder [1]. The data that were used in this analysis specifically focused on individuals with a previous diagnosis of AN [2].
A study conducted in 2010 found that 2.7% of US adolescents aged 13-18 experience a lifetime prevalence of eating disorders [3]. Some of the researchers report that 1 to 2% of individuals will develop an eating disorder, specifically AN, at some point; among adults, 0.6% experience a lifetime prevalence of AN [4]. Furthermore, Hudson and colleagues observed that 56.2% of adult participants who were diagnosed with AN also met the criteria for at least one other disorder. These disorders include anxiety disorders, mood disorders, impulse control disorders, and substance disorders [4].
Mortality rates have been reported at 5 to 8% [5]. There are also many other serious lifetime problems that are associated with eating disorders: heart failure, kidney damage, a compromised immune system, and other serious medical complications [6,7]. Unfortunately, the rate of eating disorders has not decreased in recent years, even though effective treatments have become more available [8,9].
Despite the growing popularity of machine learning techniques, its application to data in the psychology domain remains limited when compared to other disciplines, such as biology and chemistry. However, in recent years, applications of machine learning have proven useful to identifying clinically significant phenomena in areas such as autism spectrum disorder (ASD) [10,11] and eating disorders [12]. Here, we apply unsupervised machine learning to publicly available datasets with the goal of identifying clinically relevant cluster topologies that can be used to better understand eating disorders.

Factors in Manifestation
Eating disorders are very complicated and many factors play a role in their manifestation. There are biological, sociocultural, and psychological components that affect each person differently, and what may manifest as an eating disorder in one person may not manifest itself in another [1,7].

Biological
Certain biological traits are known to be associated with eating disorders. In fact, previous research has shown that as much as 84% could be due to genetic factors [13]. First-degree biological relatives of those diagnosed with an eating disorder are at an increased risk [1]. Historical research shows that females, Caucasian females in particular, are at a much greater risk for developing an eating disorder than any other group [4,5,7,13]. Researchers have also been able to identify brain abnormalities in those diagnosed with AN while using functional magnetic resonance imaging (fMRI) and other technologies [1,2]. It has been concluded using fMRI scans that participants with a previous eating disorder diagnosis had reduced activation in the part of the brain responsible for social reward processing [2,14]. Additionally, Sweitzer and colleagues [2] found that the longer the person had an eating disorder, the greater decrease in brain activation. Other researchers hypothesize that eating disorders may be caused by neurochemical and hormonal imbalances, specifically in serotonin and dopamine levels, due to their relationship with reward experience [7,[15][16][17][18][19]].

Sociocultural
It is no surprise that cultural influences must be considered when examining eating disorders. Post-industrialized, high-come cultures see the highest rates, where there are more intense fears of gaining weight [1]. Researchers have been able to connect eating disorders with the changing standards of beauty over time, with icons for women getting thinner and thinner [7,[20][21][22][23]. Occupations that value thinness, such as models and athletes, are also known to be at greater risk of developing an eating disorder [1,7]. Sundgot-Borgen [24] concluded that eating disorder behavior varied, depending on what type of sport was played. Athletes in aesthetic sports or weight-dependent sports, such as gymnastics, figure skating, and wrestling, were more likely to have an eating disorder than athletes in endurance, technical, or ball game sports [24].

Psychological
There are also psychological factors that influence an individual's eating behavior. Some researchers hypothesize that eating disorders may serve as a way to deal with painful emotions; studies show that those who engaged in emotional eating were at a much greater risk for developing an eating disorder [7,21,25,26]. Furthermore, individuals who have anxiety disorders are also at greater risk [1]. Other researchers have suggested that an obsession on appearance is directly related to eating disorder behavior [26,27]. It has been determined over time that those at greater risk for an eating disorder exhibit more perfectionist and rigid thinking patterns [5,7,28,29]. Moreover, children who exhibit obsessional behaviors are more at risk for developing an eating disorder, particularly AN [1].

Previous Research
There have been many studies conducted in order to determine the effects of cultural and psychological influences on eating disorder behavior. For example, Stice and Shaw [30] concluded that young women who were exposed to images of fashion models reported more depression, insecurity, stress, and body dissatisfaction than those who were not exposed to the images. Another group of researchers found that college-age women who were exposed to a cosmetic surgery makeover show were more likely to feel pressure to be thin than women who were exposed to a home improvement show [31]. Stice, Maxfield, and Wells [32] demonstrated how social pressure can also influence behavior when exposed to others who are dissatisfied about their bodies. College women were more likely to feel dissatisfied with their bodies after they were exposed to someone complaining about weight, discussing extreme exercise routines and restrictive diet behavior [32].
In recent years, several works have been carried out focusing on eating disorders through the lens of machine learning [33]. The literature has focused on detecting anorexia patterns on social media and had promising results. Paul, Kalyani and Basu [34] determined that the ada boost classifier was the best model to predict anorexia and depression, particularly when combined with the bag of words model. Additionally, Ramirez-Cifuentes and colleagues [35] compared different machine learning models and found that a logistic regression model detected anorexia behavior with the highest confidence level.

Current Assessment Tools
In order to assess an individual for an eating disorder, he or she first needs a medical examination [36]. After the initial examination, there are many ways of assessing the magnitude of the eating disorder. As a result, it is up to the professional to decide which tools to use. The most accurate and popular form of assessment for eating disorders is a structured interview with a professional, while using the most current edition of the DSM [36][37][38]. However, interviews are costly and time-consuming [36]. There are also many other problems that may arise when diagnosing an eating disorder, including manipulative behavior, the reluctance to cooperate, and even denial of the disorder altogether [36,39].
Eating disorders affect all areas of a person's life; some diagnostic tools focus on one specific facet of life, while other tools focus on a range of dimensions [36]. The assessment tools that are currently available can be categorized into five main groups: General measures, DSM questionnaires, screening questionnaires, body image assessments, and quality of life measures [36].
General measures, like the Eating Disorder Examination Questionnaire (EDE-Q), are used as early diagnostic tools and they assess the core pathology symptoms that are related to the disorder, such as interpersonal insecurity, emotional dysregulation, low self-esteem, and perfectionism [36]. There are diagnostic tools that are based on the current DSM criteria and produce categorical results that are parallel to those in the DSM. Screening questionnaires, on the other hand, are much shorter than other self-report measures and they tend to focus on broad symptoms, such as fear of gaining weight and body perception; assessments include the Eating Attitudes Test [40], Bulimia Test [41], and the Clinical Impairment Assessment (CIA) [42]. Body image measures have been developed to evaluate concerns with body shape and size, which commonly focus on an individual's self-evaluation of body size and attitudes about gaining weight [43]. Finally, there are other measures that have been developed to determine the impact of an eating disorder on a person's overall quality of life and aim to assess specific domains of daily life [36,44,45].

Treatment
Treatment for an eating disorder is unique to the individual due to the variability in diagnosis and symptoms. This makes it difficult for professionals because there is no standard treatment plan.
Some individuals recover after one episode, some experience fluctuating weight patterns and relapses over many years, while others may need hospitalization to fully recover [1]. Furthermore, studies show that about one third of patients with an eating disorder "continue to meet diagnostic criteria five years and longer after initial treatment" [46,47], and as many as 40% of those diagnosed with an eating disorder will experience crossover between the various subtypes [1,48]. This presents another difficulty when treating and diagnosing eating disorders, because professionals can only diagnose current symptoms with the DSM [1].
It is still unclear whether the abnormalities seen in those diagnosed with an eating disorder are the consequences or the causes of eating disorders [7]. In addition, most of us experience the same cultural pressures of being thin, though many individuals never struggle with an eating disorder [7]. Some researchers have concluded that those who internalize the thin ideal presented in our culture are more likely to develop eating disorder behavior [32], although there is still much we do not comprehend about why someone does or does not internalize this cultural stigma.

Unsupervised Machine Learning
There are a multitude of different diagnostic tools, treatments, and therapies available today, and which tools are utilized is at the discretion of the professional. Furthermore, better treatment leads to better outcomes. Machine learning can help us to differentiate the tools that perform better.
Clustering is an unsupervised machine learning technique used to find latent patterns and structure in data [49,50]. These models allow for us to visualize multi-dimensional data by organizing and grouping observations, where the groupings make some natural sense [50]. Clustering models most often use bottom-up processing, where each observation starts as its own group and they are iteratively grouped together until an optimal and natural number of clusters has been reached. Clustering has been known to improve performance in many applications [50,51]. There are three main types of clustering techniques: hierarchical clustering, Bayesian clustering, and partitional clustering [50,52]. The results in this paper are a result of using a hierarchical clustering model.

Dataset
The dataset that was used in this analysis was originally produced in 2018 by Dr. Maggie Sweitzer, Dr. Nancy Zucker, and Savannah Erwin from the Department of Psychiatry and Behavioral Sciences at Duke University School of Medicine. They used a Qualtrics survey to collect the data, Excel to clean the data, and SPSS for their analysis [2]. Researchers also calculated the total scores and subscale scores for the following, which will be discussed later in more depth: EDE-Q global score, CIA global score, and Autism Quotient (AQ) total score.
The dataset originally included 54 participants, ages 19-32. Participants were split into two groups, clinical and control, and they were matched on age, race, education, and medication status [2]. See Figure 1 for the summary statistic. Some observations were removed due to missing data, errors, and other issues [2]. The final dataset used in the analysis included a total of 44 participants, 20 participants in the clinical group and 24 participants in the control group.
The participants in the clinical group were required to have a previous diagnosis of AN, as defined by the DSM-V, while also having maintained a healthy weight for at least six months [2]. Researchers used portions of the Structured Interview for Anorexia and Bulimia [53] as well as the EDE-Q [54] in order to measure onset, course, and duration [2]. The control group participants were required to have no previous history with any form of eating disorder. They were also required to be free of psychiatric disorders, psychosis, substance use, and neurological disorders [2].

Survey Measures
Much of the dataset consists of personal information, such as race, age, years of education, height, and BMI. Additionally, the researchers asked the clinical group about details regarding their eating disorders, including age of onset, lowest weight, duration, and recovery time. They focused on these attributes, as well as fMRI scans, in order to explore social reward processing [2]. We chose to focus on the EDE-Q, CIA, and AQ scores, so that our analysis offered novel results.

EDE-Q
The EDE-Q was developed in 1994 by Fairburn and Beglin and it is based on the Eating Disorder Examination (EDE) that was previously created by Fairburn and Cooper in 1993. The EDE is a structured clinical interview, which is known for its excellent ability to assess eating disorders [37,38]. However, the EDE is very time consuming, costly, and requires a trained professional to administer, since it is an interview [55,56]. Therefore, the EDE-Q was developed to allow individuals to self-report on their eating disorder [57]. The original version had 36 items, though newer versions have been developed with 28 items [54]. The EDE-Q includes a global score as well as scores for four subscales: restraint, shape concern, weight concern, and eating concern. It is scored using a 7-point Likert scale; each subscale item is converted to a number and then added and averaged to create one score per subscale [56,57]. Higher scores indicate greater eating disorder expression. The researchers have determined that the EDE-Q is a reliable and accurate self-report measure, specifically on these four subscales [38,58].

CIA
The CIA is a supplemental questionnaire, created by researchers Kristin Bohn and Christopher Fairburn in 2008. This measure was to be used alongside the EDE-Q to determine the overall severity of psychosocial impairment in areas that are typically affected by an eating disorder, including mood, self-perception, and work performance [55,59]. The questionnaire is comprised of 16 items and scored with a four-point Likert scale: "Not at all"; "A little"; "Quite a bit"; and "A lot". These answers were scored as 0, 1, 2, or 3, respectively. Each participant's answer was added together to produce the global CIA score as well as three subscale impairment scores: personal, social, and cognitive [55,56,59]. A higher score indicates more psychosocial impairment. Researchers have determined the CIA to be valid [59].

AQ
The AQ, developed by Simon Baron-Cohen and his fellow researchers in 2001, is a self-reported questionnaire that is designed to characterize participants who may have ASD. The questionnaire consists of 50 questions that assess five different areas: social skill, attention switching, attention to detail, communication, and imagination [60]. The possible responses are: "Definitely agree"; "Slightly agree"; "Slightly disagree"; and "Definitely disagree". There is a rubric to follow for scoring; each item can receive up to one point and the total number of points is the total AQ score [60]. Researchers determined that a score of 32 and above qualifies an individual as having "clinically significant levels of autistic traits" [60,61]. Based on the results of Baron-Cohen's research, the AQ is a valid assessment tool, both for adolescents as well as adults. Though the AQ is not directly related to eating disorder behavior, the original researchers included this score [2] and, therefore, we also included it in our analysis.

Clustering Model
Each participant was labeled in a Group column with clinical or control, however this column was removed prior to analysis so that our results were not influenced by this attribute. All of the data pre-processing steps, as well as the final analysis, were conducted using the R statistical computing software RStudio. Once the data were cleaned, scaled, and ready to be analyzed, there were 32 remaining observations that were run through a k-means clustering model. K-means uses an algorithm that aims to partition the data into k sets or groups [62]. It uses an iterative technique with two essential steps: Assignment and Recalculation. Consider a multidimensional data matrix E. Each data point can be thought of as a vector, x i , where i = 1, 2, . . . , k, that contains multiple attributes per observation.
The number of clusters must be chosen prior to analysis for k-means, so we chose to run the model for k = 2-5. To begin, E is split into k groups. The mean value is calculated for each group and this value becomes the centroid.

Assignment (a)
For each new point m E, determine the closest centroid and assign m to this group. The distance is calculated using some distance measure. In this paper, we emulated MacQueen's [62] original application of this algorithm and used Euclidean distance: Recalculate the centroid value.
These two steps are repeated until the centroids no longer change.
It is important to note that k-means expects all of the clusters to be similarly, and regularly, shaped. The algorithm can also suffer from pitfalls stemming from the curse of dimensionality for high dimensional datasets in the same way as instance-based supervised learning algorithms, such as k-nearest neighbor. Thus, it is important that the clusters that are produced by k-means are assessed for quality as part of the modeling process.

Validation Measures
In this particular situation, we had access to the truth data, so we know which participants were in the control and clinical groups. We used this information to validate our model by comparing the cluster results with the pre-labeled groups. We formed a confusion matrix and calculated our model's accuracy by adding the correctly labeled data points for each group and dividing by the total number of participants.
We also used an internal validation measure, the Silhouette Method, in order to further confirm that our results were accurate. The Silhouette Method was developed to validate partitioning techniques [63], while using proximities between datapoints to create an easy-to-interpret graphical representation of the data. It utilizes a simple equation to determine a value between −1 and 1, which measures how well each datapoint has been classified [63]. However, unlike other validation measures, the Silhouette Method uses mean score and subtraction to relate compactness and separation, rather than division [64]. The final output is a plot of these values. One simply looks for the value that corresponds to the highest peak in the graph to determine the optimal number of clusters. Roousseeuw [63] believed that the true benefit of this method was its interpretability and validity, specifically with clustering results. Research has shown that the Silhouette Method performs well as compared to other validation measures [64].

Clusters
See Figure 2 for the results from our k-means clustering model for our first model k = 2. This model clustered the data, as follows: • Cluster 1: 13 participants • Cluster 2: 19 participants For k = 3, the data were clustered as follows: cluster 1, 14 participants; cluster 2, 14 participants; cluster 3, 4 participants. The k = 4 model clustered the data into 16, 6, 8 and 2 participants, respectively. Lastly, the k = 5 model was an overfitting as well, with the clusters having 3, 12, 9, 2 and 6 participants, respectively. Our model k = 2 clustered the dataset in the most appropriate way. Table 1 shows a snapshot of the final table that includes group assignment, cluster assignment, and CIA, AQ and EDE-Q scores. We converted the group values to number variables and then compared these values to the cluster assignment values. We created a confusion matrix, which is presented in Table 2. We used this table to calculate our model's accuracy at 78.125% so we know our model is working well. Additionally, the Silhouette plot used to validate our model is shown in Figure 3. The dotted line represents the optimal number of clusters for this dataset, and we see that two clusters is the optimal solution.

Radar Plots
We used the results and Excel to generate a radar plot representing the two clusters once we determined that two clusters produced the optimal solution. Figure 4 shows this radar plot. Because the EDE-Q, CIA, and AQ scores are calculated in different ways, the data needed to be scaled. A common practice is to scale between 0 and 1. However, we see in Figure 4 that the AQ score extremely skews the results. Therefore, we rescaled the data using z-scores and re-ran our analysis to have more interpretable results. Figure 5 shows the new radar plot. Now that the scores are scaled more appropriately, we see that the EDE-Q and CIA scores are, in fact, important discriminators of eating disorder behavior.  Based on our truth data, we can determine that cluster 1 represents the control group and cluster 2 represents the clinical group. See Table 1 to compare group assignment and cluster assignment. These groups are also shown in the radar plots by the blue and orange lines, respectively. We see that the clinical group is more driven by EDE-Q and CIA scores than the control group, which is to be expected due to the nature of the dataset.

Discussion
Our results prove that the EDE-Q and CIA are valid measures when determining eating disorder behavior, even though they are different types of diagnostic tools. Although each psychological test and measure has been tested for basic validity and accuracy before being adopted by professionals, there is not much research to date on whether these tests perform well with real-world datasets. Moreover, the medical field is quite subjective in the sense that each professional decides what resources to use when diagnosing patients. A professional may simply choose not to use certain diagnostic tools, even when they may give the best results. Alternatively, a professional may simply not know there is a better diagnostic tool than the one used. Therefore, it is imperative that researchers begin testing the efficacy of these tools in real-world settings. The analysis in this paper focuses on three of these tools, the EDE-Q, the CIA, and the AQ.
We see a very strong association between the EDE-Q and CIA scores and cluster assignment. Accordingly, if we know a participant's EDE-Q or CIA score, we have a very good chance of assigning them to the correct group. What is interesting, and slightly unexpected, is that the EDE-Q and CIA scores influence the clusters in a very similar way. In fact, based on the radar plot in Figure 5, it would appear as though the two scores affect the clusters to the same degree. Certain implications may be drawn from this conclusion. For example, the CIA is not as costly as the EDE-Q, the CIA is not a formal interview but rather a self-reported questionnaire, and the CIA is able to be completed in a shorter amount of time. For these reasons, the CIA may be a more viable option for teenage patients. Furthermore, if a professional only has access to the CIA and not the EDE-Q, he or she can be confident that the results are accurate and valid.
The dataset had some discrepancies that may have led to mixed results. For example, there were some participants in the control group who reported using disordered driven exercise to control their weight. Similarly, there were participants who reported binge eating, maintaining an unhealthy low weight, and even abusing diuretics to control weight. These are clearly eating disorder behaviors, yet the participants were part of the control group. This is likely because the disordered behavior was at a subclinical level and, therefore, did not get diagnosed. Professionals must identify these outlying cases and determine whether subclinical, yet still reportable, levels need to be considered.
In this work we also scaled the data so that the AQ score would not skew the results, but supplementary research into the relationship between AQ score and eating disorder behavior is a necessary next step. It may be hypothesized that someone who scores higher on the AQ will also score higher on the EDE-Q and CIA, since these measures are indications of mental disorders and mental disorders often occur together. It is unfortunate that the original researchers did not offer any insight as to why they included the AQ score in their analysis [2], so, at this point, we cannot conclude if there is a connection between this dataset and the AQ measure. Regardless, the link between autism and eating disorder behavior is an interesting topic for additional research.
Additionally, our analysis was based on a small sample size, with a total of 32 participants contributing to the results. This limits our ability to generalize our results to the greater community. However, it is a good example of how machine learning algorithms can accurately predict grouping classifications and our results will hopefully be motivation for a larger study in the future.

Conclusions
Eating disorders have become prevalent in our society, yet the research is still very mixed regarding why or how one develops this type of disorder. There are many factors that could play a role in manifestation, which means that there is no one perfect treatment plan for all cases. In addition, eating disorders are often co-occurring with other disorders, which makes them more complex and not easily recognized or treated. Although more research has been conducted recently, deaths from eating disorders have continued and the rate of eating disorders does not seem to decrease despite better available therapies. It is critical that we begin to dissect this interesting cultural phenomenon, especially here in the United States. This paper presented a novel approach to understanding eating disorder behavior by incorporating machine learning to an otherwise purely statistical field. With a final dataset of 32 participants, we employed a k-means clustering model to predict the optimal number of clusters to be two. Our results are easily confirmed by the truth data given in the dataset. We also utilized the Silhouette score as a validation measure to justify our results. The EDE-Q and CIA scores seem to influence the results to the same degree, so the correlation of these two scores is a topic for future research. It is unclear after our analysis how AQ is related to eating disorder behavior, so additional research is certainly needed. This paper is but a small introduction into how machine learning can help to detect and predict patterns in many types of psychological data.

Conflicts of Interest:
The authors declare no conflicts of interest.

Abbreviations
The following abbreviations are used in this manuscript: