Predictive Model for Detection of Depression Based on Uncertainty Analysis Methods †

Currently, advances in technology have permitted increases in the life expectancy of older adults. As a result, a large segment of the world population is 60’s years old, and over. Depression is an important disease in older adults is depression, which seriously affects the moods and behavior of elderly. Novel technologies for smart cities allow us to monitor people and prevent problematic situations related to this mental illness. In this paper, we propose a predictive model to automatically detect depression in older adults. The model is based on machine-learning techniques to analyze the data obtained by a sensor that monitores the daily activities of older adults. Also, the model was evaluated obtaining promising results.


Introduction
Advances in scientific and medical fields have produced very effective medical treatments, and technology has also made it possible to diagnose diseases in very early stages.As a result, life expectancy has increased in recent decades.However, this aging of society has caused new challenging issues in older adults, such as limited social contact, which could generates physical and psychological diseases.Many older adults live alone with very limited social contact, which affects their mental health and could lead to in depression [1].
Depression is considered to be the most frequent mental illness, and it is among the main causes of disability.Depression is a mood disorder that produces feelings of sadness, loss, anger, and frustration for a long time, interfering with daily life [2].This mental disorder is characterized by recurrence and remission of symptoms, and it is related to genetic, biochemical, psychological and social factors [3].Depression in older adults has been detected in ranges from 7% to 36% in the assessment of outpatients and increases to 40% in hospitalized patients.Depression affects 10% of the older adults living at home, between 10% and 20% of those hospitalized, between 15% and 35% of those living in nursing homes, and 40% of those with multiple diseases [4].
Several proposals have been made in computer science to support the detection and treatment of depression.For example, programs with natural language processing have been developed to identify loneliness [5] and social isolation [6].However, a study of the state of art has revealed a lack of studies to detect depression using daily activities of older adults.Therefore, the main purpose of this research work is to build a predictive model for automatically detecting depression in older adults.Our proposed model uses machine-learning techniques and fuzzy sets to recognize the behavior patterns of daily activities of older adults.It is important to highlight that this research work has been strongly supported in psychological theories.
We use fuzzy logic because it allows us to model imprecise and quantitative knowledge as well as transmit, manipulate uncertainty and support human reasoning in a natural way to a reasonable extent.
The paper is organized as follows: Section 2 shows related works.Section 3 describes our predictive model for detecting depression in older adults.Section 4 describes the model evaluation.Finally, conclusions and future work are discussed in Section 5.

Related Work
Affective states have been recognized as important components in every human activity; therefore, research in several fields has been conducted to detect them automatically.The interest in detecting affective states lies in the need to improve quality of life [7] and processes such as learning and training [8].An important trend in recent research is to use mobile devices and social networks to detect affective states such as loneliness, social isolation, etc. enjoyment and anger.In this section, we present some relevant works that relate emotions with depression by means of artificial intelligence techniques.
In the work by Campos [6], a model to identify loneliness in the elderly through smartphones has been proposed.This application mainly monitors the geographic location of a person using sensors.The significant variables were obtained through data-mining techniques.The model was constructed using the J48 decision-tree technique.
In the work of Sanchez [8], the environmental intelligence and analysis of social networks have been included in the detection of social isolation.This predictive model monitors activity by means of radiofrequency devices, global positioning systems, and social networks.The feature selection process was performed by means of statistical methods (Spearman correlation).The J48 algorithm was included in the development of the intelligent application.Another approach includes keyboard, mouse, and touchscreen actions to detect emotions such as surprise, enjoyment, and anger [9].This model relies on support vector machines and fuzzy logic to recognize depression in people older than 18 years old.
The works of Guido Silva [10], Arava and Vamsidhar [11] used image processing and artificial vision techniques for extracting characteristic points from different facial regions such as eyes and mouth.The emotional states analyzed in these works were happiness, sadness, and surprise.Other studies have investigated and developed systems that focus on automatically detecting depression from either audio or video input, multimodal input, or speech analysis.In [12], the authors investigated the generalizability of a vocal biomarker-based approach to detect depression in clinical interviews.The work of Cavazos-Rehg [13] examines depression-related chatter on Twitter to glean insight into social networking about mental health.The work of Scherer, et al. [14] investigates the capabilities of automatic audiovisual nonverbal behavior descriptors to identify indicators of psychological disorders such as depression, anxiety, and post-traumatic stress disorder.They propose a number of nonverbal behavior descriptors that can be automatically estimated from audiovisual signals.These works all try to detect depression in adults and adolescents, but not in older adults.

A Predictive Model for Detecting Depression in Older Adults
The main contribution of this research work lies in the development of a predictive model to detect depression in older adults.Our proposed model is based an uncertainty analysis method with a set of fuzzy rules for recognizing the behavior patterns of the daily activities of older adults.
Fuzzy sets are used to generate grouping algorithms in order to obtain sets that allow us to differentiate groups with greater expressivity, and also to model algorithms for prediction or regression.
In this section, we describe the process of building of our model through four main phases: Phase 1, Collection of data from a representative group; Phase 2, Data cleaning and discretization; Phase 3, Setting of data subsets by variable; and Phase 4, Building a predictive model using classification techniques for each attribute.

Phase 1: Collection of Data
The first stage of model construction involves gathering the information that serves as the basis for the model.This stage involves having a group of older adults who are monitored for the period of a week.The participants were informed of their participation and received the necessary resources for data collection.At the end of the monitoring, the Yesavage test was applied to identify the depression level of each participant.The Yesavage test provides three levels of depression: established, average, without depression [15].Each element of this study is detailed in depth below: (a) profile and participants of the group; (b) materials and resources used in this study; and (c) variables that were monitored.

Profile and participants:
The participants in the study group were 36 older adults (21 women and 15 men) with ages between 60 and 70 years old.The Participants were required to have complete physical and cognitive abilities without mobility impairments, own a mobile phone, and have the ability to use it to make calls or send text messages.They were also required to not have difficulty understanding questions and to sign an informed consent form indicating that they were willing to take part in the research.

Materials and resources: (i)
The initial questionnaire which was is used to collect the sociodemographic information of the test subjects; (ii) mobile applications designed with the FIWARE ecosystem, which open platform to develop smart services for Smart Cities providing interoperability and standard data models [16].We developed a module, which allows developers to connect mobile devices with FIWARE to collect information from the sensors of Android smartphones.FIWARE is an open platform that provides a set of tools for different functionalities.It is an open innovation ecosystem for the creation of new applications and Internet services.This platform is especially useful in terms of Smart Cities since it ensures the interoperability and the creation of standard data models.The collection of the monitored data and the analysis of the polarity of the texts (namely, if a message sent through to a smartphone is positive, negative or neutral) is carried out by applications: the bePOSITIVE and SWePT service respectively, (iii) the wrist-worn sensor Fitbit ® [17].The monitoring of variables physiological was obtained through of this wristworn, (iv) Yesavage Scale.

Phase 2. Data Preparation: Data Cleaning and Discretization
The data preparation phase is carried out when all of data of the different sources have been obtained, i.e., the bracelet information, the smartphone information shown in the Figure 1.Screen from the bePOSITIVE This app app was developed in this investigation.The polarity information of the texts (SWePT service), and the level of depression presented by each participant (The Yesavage Scale).The data collected were cleaned and discretized in order to have correct data that could be analyzed statistically in order to build our predictive model Table 1 shows a portion of the data collected by each of the variables after the cleaning and discretization of the data from study group.The complete table is available at the following link: https://drive.google.com/file/d/1oOAMxmxlSzZUO4nD8aJIAy-x_fgXjgO_/view.The data cleaning consisted of analyzing each value of the variable and eliminating the database of empty or erroneous values, i.e., the values that contain incomplete information due to error in their collection, redundant values, or repeated values.The data discretization consisted of the transformation of the collected values of non-numeric variables to numeric values.The objective was to have all the variables in a numeric format to be able to properly manipulate this information with statistical methods.For example, the level depression variable had a value of High, Medium, or Null.Once this value is discretized, the variable can have values of 2, 1, or 0 respectively.We carried out this analysis manually.

Phase 4. Building Our Predictive Model Using Fuzzy Theory
In this phase, the classified information was analyzed using statistical techniques that integrate data sets using fuzzy rules.The objective was to obtain a set of fuzzy rules for the detection depression.These rules could be implemented in a software application.The steps for generating the fuzzy rules are explained in detail below: Step 1. Building limits from subsets.The conversion of data to a fuzzy scheme requires finding the upper and lower range of the values of each of the subsets of each variable.In other words, we identify the upper and lower limits of the subsets: high, medium, and null of each variable monitored.To accomplish this, the statistical formula for confidence intervals ( 5) was used for each data subset.
The elements of the formula are the following: n = Number of elements of set; z = Confidence level; ơ= Standard deviation; x= Average; μ = Confidence intervals.For example, for the subset of MEDIUMDSetAvrgDistTrav level, the values applied to the Formula (6) were the following: n = 11; z = 2.575; ơ = 0.20; and x = 2.6.Substituting these values, the confidence intervals were the following: 2.44 and The upper and lower limits can be visualized using a Gaussian distribution chart [18].Figure 2 shows graphically the upper and lower limits (2.44, and 2.74 respectively) for the data subset of MEDIUMDSetAvrgDistTrav level.These values represent the confidence interval for subset.Figure 2 also shows the lowest value of the subset analyzed, which corresponds to Older Adult x1 = 2.21 and the highest value for this set correspond to the Older Adult x2 = 2.75.This process is executed for each one of the levels (Null, Medium, and High) of each subset.Step 2. Generation of rules using fuzzy theory.The rules generated for this research work are based on fuzzy set theory [19].The formulas of the fuzzy sets used to generate rules are specified by taking into account the type of graph generated.In fuzzy logic [19,20], the representative graph for the kind of elements used in this work is similar to a trapezoidal graph.
The range of values in the graph can be between 0 and 1.In this case, the value 0 means that the value represented by x does not belong to the hypothesis formulated, and the value 1 means that the hypothesis is completely valid.
For our example, the value 0 indicates a person without depression Medium and value 1 indicate a person with depression (Medium).The symbols associated to the graph are the following: α = the first value of the set; β = the first value of the set that is completely in the zone of belonging; δ = the last value of the set that is completely in the zone of belonging; φ = the last value of the set.
The values obtained from the confidence intervals are considered to be the limits that are taken to generate rules for each one of the levels.When a new value x is placed in a subset, this can belong to more than one depression level.Each rule has a formula to calculate the level of belonging.The formulas of the fuzzy logic for the trapezoidal graph are the show in Figure 3: Following with our example, the data values for the MEDIUMDSetAvrgDistTrav subset are: α = 2.21 (this value represents the first value of the data values the subset); β = 2.44 (this value is the lower limit of the confidence interval); δ = 2.74 (this value represents the upper limit of the interval confidence); and φ = 2.75 (this value corresponds to the last value of the subset).Once the values of the elements of the fuzzy rules have been identified, we substitute these values, obtaining the rules for shown in Table 2 MEDIUMDSetAvrgDistTrav subset.The rules generated for the Average distance travelled in the day (km) variable, and the MEDIUM subset are shown in Table 2.
This process must be applied for each subset of each variable.Thus, the rules obtained for the HIGHDSetAvrgDistTrav, and NULLDSetAvrgDistTrav subset, for the Average distance travelled in the day (km) variable are shown in Table 2 respectively.At the end of this phase, 180 rules were obtained.This allows us to enter a new value in a variable automatically determined the level of depression that corresponds to the user.Twelve rules correspond to each variable analyzed, 4 rules for the high subset, 4 rules for the medium subset, and 4 rules for null subset.This can be represented as Trapezoidal graph of fuzzy logic.When this variable obtains a new value "x", the value of "x" is valued by the rules and the resulting value will be the value of the level of depression that corresponds to the Older Adult.Step 3.This step consists of transformation of fuzzy rules of each subset into pseudocode.
Step 4. Implementation of rules, this step consists of implementing all of the rules generated in the software system.The result of this implementation will result in values on scale from 0 to 1.For each variable, the degree of belonging to each level of depression will be obtained.This value must be converted to a percentage value for its interpretation.Finally, the last column of this table will show the average of the sum of all of the percentages obtained in that level of depression.In this way we can determine the percentage probability of the depression level that presents an older adult, has these values are shown in the Table 3.
Finally, the development of the software system is complemented with the results of model creation in order to automatically detect depression using the variables detected from the smartphone sensors and the bracelet.

Model Evaluation
The total sample population was 41 individuals.Of these, 36 were used to generate the model, and tests were carried out with five remaining subjects.Because of these the largest population sample was used in the development of the model.The evaluation of our predictive model was carried out through an experiment with five adults (2 men and 3 women), with an average age of 69 ± 9 years old.The participants were monitored for three days.The variables recorded were the same as those shown in Table 4.At the end of the monitoring process, the level of depression was obtained using the Yesavage scale and compared with the results of the proposed method.Table 4 shows the results obtained after applying our predictive model.The table also shows the level of depression obtained from the Yesavage Scale.The data shows that Older Adult number 3 presents a high level of depression according to the model and a medium depression level according to the clinical test.The quantitative data shows that there was a very small margin between the percentage of belonging to the high level (52.1%) and the medium level (41.5%).This deviation occurs because some elderly people can behave and interact differently than most when they are in the high, medium and null depression states.The values monitored by this small portion of individuals will show a trend that is different from the model rule categories.

Conclusions and Future Work
The elderly population is constantly grow and will be the segment of population with the highest rate in the coming years.Medical services will be a priority for the physical and mental attention of this segment of the population.It is important to work on social solutions for the health of the elderly.The research work presented in this paper proposes a model for the automatic detection of depression in the elderly.This model uses a computer system that is able of automatically monitoring humancomputer interactions and generating alerts for the attention of the elderly.The relevance of this research is to provide computational health tools that allow an early diagnosis of depression in the elderly in order to give them early attention, and provide them with a better quality of life.
The bePOSITIVE application was developed for the monitoring and collection of data by means of smartphones as part of the research.Future work will include the development of an application that is easier to manage and is less intrusive.Another objective is to perform intensive tests of the proposed model, comparing the results with other models based on several data mining techniques.We also want to analyze other machine learning techniques such as neural networks, decision trees, and vector support machines that use the data collected from this research in future work.

Figure 2 .
Figure 2. Representation of the intervals for the medium depression level of the Average distance travelled in the day (km) variable.

Figure 3 .
Figure 3.The formulas of the fuzzy logic for the trapezoidal graph are the following.

Table 1 .
View with the variables and a sample of the data.

Adult Variables Related to the Use of the Mobile Phone
Legends: (1) Average outgoing calls to family (daily calls); (2) Average outgoing calls to friends (daily calls); (3) Average incoming calls from family (daily calls); (4) Average incoming calls from friends (daily calls); (5) Average duration of incoming calls from family (minutes); (6) Average duration of incoming calls from friends (minutes); (7) Average polarity outgoing messages to family; (8) Average polarity incoming messages to friends; (9) Average polarity incoming messages from family; (10) Average polarity incoming messages from friends; (11) Average distance travelled (km); (12) Average steps in the day; (13) Average calories spent (calories); (14) Average time sleep in the day (Min.)and (15) Average amount of minutes active in the day (Min.).

Table 2 .
The rules generated for the Average distance travelled in the day (km) variable.

Table 3 .
Values converted into percentages one older adults.Average time sleep in the day (Min.), and 15) Average amount of time active in the day(Min.).