Analysis of Behavioral Characteristics of Smartphone Addiction Using Data Mining

: In 2016, the number of mobile phone subscriptions worldwide had surpassed the total world population; moreover, the number of smartphone addicts is increasing each year. Thus, the objective of this study is to analyze smartphone addiction by considering the differences between smartphone usage patterns as well as cognition. Our proposed method involves automatically collecting and analyzing data through an app instead of using the existing self-reporting method, thereby improving the accuracy of data and ensuring data reliability from respondents. Based on the results of our study, we observed that there is a signiﬁcant cognitive bias between the self-reports and automatically collected data. As a result of applying data mining, among the six criteria out of the total 24 items of the questionnaire, the higher the “recurrence” item, the higher the addiction; further, “forbidden” item 1 had the largest effect on addiction. In addition, the input variables that have the greatest inﬂuence on the high-risk users were the number of times the screen was turned on and real-use time/cognitive-use time. However, the amount of data and time of smartphone usage were not related to addiction. In the future, we will modify the app to obtain more accurate data, based on which, we can analyze the effects of smartphone addiction, such as depression, anxiety, stress, self-esteem, and emotional regulation, among others


Introduction
According to the Korean Statistical Information Service [1], in 2016, the number of mobile phone subscriptions worldwide had surpassed the world's population by reaching 7.5 billion subscriptions; the number of mobile subscribers in Korea alone had exceeded 60 million. Furthermore, based on this survey, the risks of smartphone addiction are increasing every year. Owing to the increase in the use of smartphones, the issue of smartphone addiction has become a serious social problem; for example, the emergence of "Smombies" (Smombie is a compound word including smart phone and zombie; it refers to individuals who walk on the road, while looking at their smartphone; because they are immersed in their smart phones, and are not aware of the surrounding environment, and such walking leads to a high risk of accidents.) has taken place. Even with the large number of studies performed in Korea, there is still no appropriate method to solve this addiction problem [2,3].
In particular, the data provided by the Korea National Statistical Office (KNSO) and various other studies are primarily based on old research methods such as questionnaires and interviews; these methods lead to inferior analysis results, because they are based on analyses and evaluation of the self-reported data, which can be intentionally manipulated.
Therefore, in recently years, studies using data collected via apps were conducted to analyze patterns of smartphone addiction; however, this method has several restrictions based on the software development kit (SDK) used to develop the app in terms of the analysis of the usage pattern of the app. Nevertheless, using the new approach developed by Park [4], the phone usage pattern can be easily analyzed by measuring the usage time of the smartphone, time spent using apps by category, and number of times the apps were executed. Furthermore, Lee [5] compared self-reporting and smartphone usage methods and found that there is a significant cognitive bias between the results of the two techniques. Lee [6] concluded that smartphone use should be revised to some extent as a measure of addiction, and that smartphone use also plays a role of innocence.
Thus, in this study, we combine both abovementioned approaches; in particular, we use data mining technology to collect data collected via self-reports using a smartphone app called "How often do you use" [7]. The data mining techniques were used to study the difference in perception and behavior of smartphone use as well as the effect of addiction on learning. In summary, the objectives of this study are as follows: • Understanding the pattern of smartphone usage by collecting long-term smartphone usage data in a manner that cannot be intentionally manipulated; • Identifying the impact of smartphone addiction in a reliable manner; • Identifying the differences between the self-reported data on smartphone usage.
The smartphone industry is growing rapidly with ever increasing options that influence users to follow through with newer adoptions and increased usage. Thus, by analyzing smartphone usage and addiction via unconscious usage pattern collection and data mining techniques rather than the existing survey methods, we can find methods to utilize smartphones efficiently and effectively.

Smartphone Addiction and Data Mining Research
In a related study for smartphone addiction based on data mining techniques, Kim et al. [8] proposed a method to diagnose internet addiction. In order to prevent the intentional manipulation of surveys, they have based their research on data available on the Internet; however, because this data was analyzed using simulation, the actual usage data measurement was insufficient and had some limitations.
Ahn [9] analyzed college students' "tendency to get addicted to smartphones according to the amount of time they spend on their phones." They showed that major smartphone applications used by college students were social networking site (SNS), search engine, music, and call applications; in addition, they observed that, in terms of smartphone addiction, general users accounted for 82.2% of all users, whereas high-risk and potential-risk users accounted for 4.6% and 13.2%, respectively. They also emphasized that the percentage of women who were addicted (with both high-risk and low-risk apps) was higher than that of men. Most of the studies on Internet addiction have been conducted in a similar manner, and have obtained similar results; however, with the rapid advancements in both the software (including apps) and hardware of phones, we seek for a new, creative method to study Internet addiction that lead to more robust results.
Kim [10] identified smartphone addiction trends by obtaining addiction patterns based on the degree of the smartphone overuse. Their study showed that smartphone addicts suffered from the sleep deprivation phenomenon by delaying their sleeping time beyond midnight by engaging in phone use. However, this study also has limitations in that it shows the survey method and predictable results. Table 1 lists the data provided by the KNSO for 2017, which is similar to those obtained in the abovementioned study. The similarity between both statistics highlights that smartphone addiction risk continues to increase.

Smartphone Usage Pattern Research
In their study on smartphone usage pattern, Ryu et al. [11] studied a smartphone addiction and disease prevention system through smartphone usage pattern collection and analysis. Smartphone usage pattern is defined as the current state of the smartphone and user, i.e., the type of smartphones, behavior, posture, and usage time of smartphones by the users. In their study, the usage pattern data was collected through the orientation sensor and display activation state available in the smartphone. In particular, the objective of their study was to prevent the prevalence of addiction and illness caused by smartphone overuse by educating the users about usage patterns through the collected data. However, although their proposed method can be used to predict smartphone addiction disease according to the behavior type, its simplicity in evaluating for the disease using only the smartphone usage time serves as a limitation.
Yu et al. [12] conducted a quantitative analysis of changes in gait pattern based on the use of smartphones while walking; the purpose of their study was to analyze and report on the risk of walking while using smartphones as awareness material.
Song et al. [13] developed an app that extracts log data generated from a smartphone as well as a system to build a dataset by storing it in a server database (DB). In addition, they implemented an algorithm that finds groups of users with similar patterns of smartphone usage through collaborative filtering and predicts users' Smartphone Addiction Scale (SAS) index; their obtained SAS results showed a similarity between the survey and algorithm index predictions that proves the effectiveness of their approach.

Research on Apps to Prevent Smartphone Addiction
New apps are being developed every year for smartphone addiction prevention. In particular, apps for elementary, middle, and high school children are emerging as a way for parents to protect their children from the imminent danger of smartphone addiction; in addition, there is a need for adult control app as well, as it will allow users to lock their smartphones or specify the usage patterns such that they do not become addicted themselves. 'Marshmallow' is an example of such an app for self-control.
The "Marshmallow" app [14] aims to encourage children to develop self-control abilities. In particular, it is an app that encourages children between 9 and 15 years of age to earn points for themselves every time they obey the rules set by their parents.
In contrast, the previously mentioned 'How often do you use' app [7] lets one record and analyze one's own unconscious usage patterns. It graphically visualizes the usage time of day, type of apps that are most used, amount of user data stored by apps, and number of times the screen is turned on or off. Furthermore, smartphone addiction prevention methods to solve the problem of smartphone add-on applications are being developed, such as those based on app usage time, data usage time, application blocking, usage time limit, and harmful site blocking, among others. In general, when a smartphone addiction prevention application is executed, all functions of the smartphone that were selected are blocked during the time set by the user. In addition, once the app is executed, the app feature could be powerful enough that one cannot unset it.

Research Method
In our study, we randomly selected 125 students without distinction of major or grade, but most of the selected students attend computer classes. It should be noted that, though it was a relatively small group of participants, it was challenging to manage the large amount of continuously collected real-time data for a month and manually process it; in addition, we checked whether the data collection was being performed appropriately twice every week.
First, the self-report questionnaire was developed based on the SAS developed and standardized by the Korea Informatization Promotion Agency [15]; in particular, for the implementation and validation of our addiction behavioral study, we used a questionnaire based on six criteria by Griffiths [16]. Then, the collected data was used to analyze the smartphone usage patterns (SUP) using the "How often do you use" app and analysis program RapidMiner 7.3.

Self-Report Data Collection
Before collecting smartphone usage data, the KISA [17] standardized adult smartphone addiction selfdiagnosis scale was used as the smartphone addiction self-diagnosis scale in our study. Our questionnaire consisted of four item categories with 15 questions (five, four, four, and two items for daily living disability, withdrawal, other times for tolerance, and virtual world orientation, respectively). Based on the score factors, the results of the questionnaire were classified into high-risk, potential risk, and general use. In addition, based on the study by Griffiths [16], the smartphone-addiction scale classification was based on the following six criteria: selinenes, mood control, tolerance, conflict, withdrawal, and recurrence. Table 2 lists the obtained results for self-reported addiction in our study.
Our results show that 6.4% of the participants where 9 out of 13 students were female students suffer from smartphone addiction referred to as high-risk use. According to KNSO, in 2016, only 22.5% of the students suffered from smartphone addiction; however, based on our study (which was conducted in 2017), 29.6% of the participating students were part of the smartphone addiction risk and smartphone addiction high-risk user groups, thus showing an increase by 5.1% in smartphone addiction only in one year, indicating the danger of smartphone overuse on the youth. Table 2. Respondent distribution based on personal characteristics (Unit: ratio (persons)).

Addiction Group All
Gender Grade Level High-risk user group 8 6.4% 3.2% 7.2% In particular, grade-specific characteristics with high-risk were observed in the first-and second-year college students. There was no high-risk use for students achieving grade A, but 13.6% of grade B students were at risk for addiction. Thus, by measuring smartphone usage, it is found that students with high class are included in the group with high use of smartphone.

Real-Time Data Collection and Preprocessing
To investigate the patterns of smartphone usage, among the 125 people who participated in the self-report questionnaire study, only 64 participants agreed for data to be collected from their smartphone to measure and analyze their usage patterns in a more effective manner using data mining. In particular, we collected the following data items: total usage time, usage time by day, data usage, number of screen turns, usage time by app, number of executions by app, and frequently used apps. We collected the individual usage data, standardized, and combined the results of addiction on the SAS scale with the final dataset. The data collection lasted for a month (10 April 2017 to 10 May 2017); the recorded measurements are shown in Figure 1.

Data Mining Model
Data mining refers to the process of discovering useful correlations that are hidden among a large amount of data and extracting that information for future predictions or decision making. Smartphone usage data is an example of such massive data that can be mined for useful correlations.
Our research process is depicted in Figure 2; it involves combining self-reported data (via questionnaires), collected data from users' smartphones, data mining for analyzing the effect of smartphone addiction on users, and finally, learning achievement. A decision tree model was used as the data mining model. The purpose of this study was to identify the factors that influence the addiction, rather than determining whether it is the addiction. This factor analysis method is traditionally used in correlation multiple regression, and structural spinning model analyses. However, in recent years, decision tree models have been widely used. This is because decision trees internally use the evaluation criteria such as the information gain concept and the Chi-square test to find important variables. We analyze the relationship between several input variables and dependent variables using decision tree analysis. We also visualize the results as a tree model, so that we can easily understand which factors have a considerable effect on addiction and apply them to decision-making.

Data Mining for Self-Report Questionnaires
In order to validate the results of our study for behavioral characteristics of smartphone addiction, six criteria were used for the classification of the factors of addiction (salience, mood control, tolerance, conflict, withdrawal, and recurrence) including 3, 3, 2, 1, 2, and 2 items for salience, mood, tolerance, forbidden, conflict, and relapse, respectively.
To investigate the relationship between addiction and the abovementioned six factors, multiple linear regression and a decision support tool (decision tree) were used for classification analysis. Our study involved the following steps; first, for the multiple linear regression analysis, we used the data from the SAS scale measurement results as the dependent variable, whereas, for the independent variable, we utilized the dataset obtained from the questionnaires for action addiction. In order to solve the problem of small dataset observations compared with the independent variables, we used the Bootstrapping operator to increase the number of observations by three times. Table 3 lists the results of applying the multiple linear regression model with a p-value 0.05 or less.
The results of the SAS scale were set as the label, while the attribute variable was set as the questionnaire for behavioral addiction. For the self-report questionnaires, classification analyses were conducted using RapidMiner's Decision Tree to predict which of the participants' answers (features) indicate addiction; these results are graphically depicted in Figure 3.
The high-risk users' group was further classified into four sub-groups. The first sub-group comprised recurred item 3 and forbidden both items 1 and 2. The second sub-group is same as the first sub-group with relapse in item 3 and conflict in item 1. However, it is a sub-group proceeding with withdrawal item 2, salient item 2, and salient item 3. The third sub-group, similar to the second sub-group, is a sub-group that progresses to relapse in item 3, conflict in item 1, resistant in item 3, conflict in item 2, the mood control in item 4, and salient in item 3. The fourth sub-group is progressing to relapse item 3 and relapse item 1. This is a very important problem for addiction item 3, where users pointed out that they have been using smartphones only from time to time. Therefore, all items from 1 to 5 are essential for addiction measurements. The higher the recurrence rate, the higher the addiction; withdrawal item 1 was the most influential item in our addiction measurement approach. The decision tree is analyzed as follows. Overall, "recurrence 3" had the greatest impact on high-risk users. Among the 24 items of the questionnaire, it can be seen that items "recurrence 3," "conflict 1," "withdrawal 2," "withdrawal 1," and "recurrence 1" have an influence on the high-risk user group. Furthermore, items affecting the potential-risk user group are "salience 3," "salience 4," and "tolerance 3." Therefore, we propose that we weight all items according to the items rather than 1 to 5 points equally.

Data Mining for Smartphone Usage Pattern
In order to confirm the relationship between addiction and smartphone usage pattern, the classification criterion was analyzed using the gain ratio. For supervised learning, the label variables were assigned to the addiction level. The input variables were demographic characteristics such as gender and grade and usage time of the smartphone (0-6 h, 6-12 h, 12-18 h, 18-24 h). The final result indicates that, among the 64 users who provided accurate data for smartphone usage pattern measurement, 4.7%, 18.7%, and 76.7% of the participants were high-risk, potential-risk, and general or normal smartphone users, respectively; it is evident that the general smartphone users were the majority. The accuracy of our prediction was 89.7% as listed in Table 4. The decision for the individuals to be grouped in the high-risk user group was made based on the following points. First, the screen is turned on for more than 110 times per day. Second, users spent more than 6.07 h and less than 8.29 h per day, while they spent more than 72.5 h per week using their smartphone. In particular, the third-grade students are smartphone-addicted, for whom, the results show that their phone screen was turned on more than 1971 times and most of their time is spent on entertainment applications. For the risk group, the usage time was less than 0.215 h and the number of the times the screen was turned on was less than 565 times; therefore, The input variables that have the greatest influence on the high-risk user group are the "number of screen turns" and the "actual use time-use time" variable. These results are shown in Figure 4. The results of applying the decision tree using the app are as follows. As the 'Number of screen turns" variable increases, the higher-risk users increase. In addition, "Entertainment" and "Actual use time" variables belonged to high-risk users. The greater the difference between the actual use time and the cognitive use time, the more potential-risk users were. In the existing studies, the cases where the usage amount was high were found to be addictive; however, the new variables influencing the addiction were found. In addition, the addicts found that there was a large difference from the actual use time than the perceived use time. The high-risk users did not recognize the use of smartphones.

Conclusions
In this study, we analyze the problem of smartphone addiction by applying a new technology using smartphone apps data. In particular, the difference between the current study and previous studies is that most of the existing studies were based on self-reporting surveys; thus, their results were not always satisfying owing to possible intentional manipulation or bias. Therefore, in this study, we analyzed smartphone usage patterns by combining both the self-reporting method and data mining techniques for more accurate results.
Our results show that 6.4% of the participants where 9 out of 13 students were female students suffer from the smartphone addiction-high-risk use. In 2016, according to the National Statistical Office (NSO), only 22.5% of the students suffer from the smartphone group addiction, which is 158 based on our result (conducted in 2017), whereas 29.6% of the participating students suffer addiction from the 159 smartphone addiction risk group and smartphone addiction high-risk user group, which has 160 increased by 5.1% in only one year showing the danger of the overuse of smartphones on the youth.
In addition, there was a considerable difference between the results of our study and the previous studies in terms of smartphone overuse. In 2016, the average weekly smartphone usage was 8.29 h; however, based on our obtained result, the average smartphone usage time is more than 6 h per day. In addition, most of the users turn on their smartphones more than 300 times unconsciously; more than 50% of the respondents said that they spend more than half of the day using smartphones either for learning or getting assistance in their daily lives. Thus, though smartphones seem to be an essential tool, their use also needs to be well-controlled.
The self-reported questionnaire was classified into 13 items based on six criteria developed as the addictive scale for the behaviors of overuse. In this study, we used multiple linear regression analysis and decision tree analysis to determine which six factors are highly correlated with addiction.
The following results were obtained. First, the higher the "recurrence," the higher the addiction, and the higher the risk users, the higher the "withdrawal," "conflict," and "salience" order. Second, "salience" was the highest for potential-risk users and some "tolerance" items were included. Third, as a result of applying the variables that may affect the addiction, the higher the number of screen turns, the greater the difference between the actual use time and the perceived use time, the higher the risk users. Fourth, potential-risk users have a large difference in "actual use time-perceived use time." Fifth, the high-risk users were unable to identify the actual smartphone usage time.
Therefore, the results can be analyzed as follows. First, it can be interpreted that the variables "number of screen turns" and "actual use time-perceived use time" are more influential on poisoning than the previous research that used the smartphone as poisoning. Second, we propose that we weight all items according to the items rather than 1 to 5 points equally. Third, in the existing research results, the more the amount of usage and the time of use, the more detailed the factors and variables than the simple result of addiction.
This study has a limit of 125 in the experimental group; thus, it is necessary to add more experimental groups. In the future, by combining the self-report methods and use of smartphone data and analyzing it using data mining, we can also analyze the danger of overuse, such as depression, anxiety, stress, self-esteem, and emotional regulation.