Finding College Student Social Networks by Mining the Records of Student ID Transactions

: Information about college students’ social networks plays a pivotal role in college students’ mental health monitoring and student management. While there have been many studies to infer social networks by data mining, the mining of college students’ social networks lacks consideration of homophily. College students’ social behaviors show signiﬁcant homophily in the aspect of major and grade. Consequently, the inferred inter-major and inter-grade social ties will be erroneously omitted without considering such an effect. In this work, we aimed to increase the ﬁdelity of the extracted networks by alleviating the homophily effect. To achieve this goal, we propose a method that combines the sliding time-window method with the hierarchical encounter model based on association rules. Speciﬁcally, we ﬁrst calculated the counts of spatial–temporal co-occurrences of each student pair. The co-occurrences were acquired by the sliding time-window method, which takes advantage of the symmetry of the social ties. We then applied the hierarchical encounter model based on association rules to extract social networks by layer. Furthermore, we propose an adaptive method to set co-occurrence thresholds. Results suggested that our model infers the social networks of students with better ﬁdelity, with the proportion of extracted inter-major social ties in entire social ties increasing from 0.89% to 5.45% and the proportion of inter-grade social ties rising from 0.92% to 4.65%.


Introduction
Modern society is full of competition and cooperation.Social networks connect everyone, and interpersonal skills have become one of the most important metrics to measure one's talent.In daily life, individuals' interpersonal behaviors reflect their mental health status.College students (mostly aged between 18 and 24) with psychological disorders, such as anxiety and depression, usually suffer from an interpersonal disorder at the same time [1,2].The lack of social ties also poses a serious threat to college students' physical and mental development.Students who lack social connection to others have increased failure experience.They might gradually lose self-confidence and become susceptible to psychological disorders [3][4][5].Therefore, for better education and management of college students, it is important to know college students' social ties.
At present, using users' daily behavior data to mine their social ties has attracted wide attention.Spatiotemporal data, such as those obtained via GPS and cellular networks, are frequently used to extract geographical similarity and social ties between users, e.g., [6,7].The strength of such ties can be determined based on users' spatiotemporal co-occurrences, where co-occurrence can be counted using a fixed time slicing method [8].There are also approaches that use the trajectory of a user's location data, such as in References [9][10][11].As such, data are clustered and the accuracy of the prediction can be enhanced.
However, the mining of college students' social networks is not so well-researched.In college students' social network mining, one potentially useful information source is the student ID card.With the development of digital and informational campuses, most Chinese universities have established student ID card systems [12][13][14][15].Student ID cards record students' daily behaviors, including students' dining, shopping, book borrowing, library access history, and other data.There are several studies inferring college social networks from the student ID card records.Yao et al. [16] used the consecutive check-in records of each student pair to infer the students' social network.Liu et al. [17] used the students' dining transaction data and employed a fixed time slicing method.In the latter, they sliced the time into 5-min slots and if two students appeared in the same slot, then one co-occurrence was counted.Based on the co-occurrence data, they inferred the students' social networks.The networks were further distilled with a hypothesis test.
Current data mining studies of college students' social networks lack consideration of the following three issues.
Firstly, although the fixed time slicing method from Reference [17] shows high computational efficiency, transaction data may be sliced into different time slots so that some co-occurrences may get lost.
Secondly, empirical evidence shows that contact between similar people occurs at a higher rate than among dissimilar people.This feature is commonly defined as homophily [18].Homophily mainly arises from two mechanisms, namely choice homophily and induced homophily [19].Induced homophily is a passive effect.It arises from the homogeneity of structural opportunities for interaction.Choice homophily is a higher level of homophily.It is mainly a consequence of active choices of individuals.Choice homophily also brings in the social ties that we are interested in.Therefore, in this work, we only considered the alleviation of induced homophily.In the field of education, student social networks also have inherent homophily in terms of race, gender, grade, age, region, and major, among which major homophily and grade homophily are the most significant [20][21][22][23][24][25][26].Students from the same major and grade have a higher probability of co-occurrence due to similar behaviors resulting from the same courses, examination times, and residence locations.We define inter-(intra-)major or inter-(intra-)grade behaviors as inter-(intra-)group behaviors.Students in different groups are less likely to co-occur even if they have social ties.By ignoring homophily, current models lose fidelity in mining the social network, especially in the inter-group social ties.
Lastly, current models lack a theoretical rationale for setting thresholds for the count of co-occurrences that must take place before a social tie is inferred.
For the above issues, we proposed a hierarchical encounter model based on association analysis [27,28] for inferring college student social networks using the spatial-temporal data recoded by the student ID card.Similar to the approach presented in Reference [8], we used spatiotemporal co-occurrence for inferring the strength of social ties.In this work, one co-occurrence means that two students check in at the same location within a short time period.We used a sliding time-window method to calculate the number of co-occurrences.To combat the homophily effect, we propose a hierarchical encounter model, with which we can mine the intra-group and inter-group social ties separately.The difficulty of setting the co-occurrence threshold was tackled with an adaptive method that varied the threshold for each individual.
This paper is organized as follows.Section 3 provides a description of the dataset used to illustrate the results of the procedure.Then we introduce the sliding time-window approach, and the hierarchical encounter model based on association rules.In Section 5, We discussed that the effectiveness of our method.The last section discusses some limitations of the method and directions for future research.

Ethics Statement
This study complies with the guidelines of the 1975 Declaration of Helsinki.This study has been approved by the Institutional Review Board (IRB) from Central China Normal University (CCNU).Our study involves the data of 662 students, so it is hard to get informed consent from every student.Fortunately, in most Chinese university including CCNU, for students from every one to two majors, a grade counselor is assigned to supervise the students.Each major is divided into classes and each class is monitored by a class adviser.Our study was approved by the head of college, all the grade counselors, and all the class advisers.In addition, to protect the privacy of students, all data were anonymized by encrypting student IDs.All subsequent calculations were performed on anonymous data.

Data Collection and Data Description
In this paper, we used the subset of the data from Reference [17], which include the check-in data at the food courts and the supermarkets of 662 undergraduate students of the College of Physical Science and Technology (CPST), CCNU.These data are recorded by the student ID card system.We considered 17 check-in locations.When students intend to have their meals or shop at these 17 locations, they can swipe their student ID card at the card scanners to complete the transaction.The scanners will record the student's transaction information, including the student ID, location, time, money, and item.The scanners will then upload the transaction information to the database of the card system.Through accessing the database, we downloaded the transaction data.The raw data contain all the information we just mentioned, but we only used the student ID, the location, and the time information.We processed the raw data by numbering the student ID and location date and re-formatting the time data.
In fact, there is more than one scanner in each food court or supermarket.We assumed that each food court or supermarket only has one scanner so that a co-occurrence will be recorded if two students check in simultaneously in the same location, even if they use two different scanners.We used D = {d z } Z z=1 to represent the check-in dataset, where d z is the z-th check-in record and Z is the total check-in number.Samples of the dataset are tabulated in Table 1, where one row refers to one check-in record d z = {u z , l z , t z }, d z ∈ D. It means student u z has one check-in event at location l z at time t z .u z ∈ {1, . . ., N} represents a student entity, where N is the total number of students.For any student i, it is possible for multiple z to satisfy u z = i.l z ∈ {1, . . ., L} represents the location with L (L = 17) the total number of locations.t z is the timestamp for event d z (the total seconds since 1 January 2014).For example, the first row in Table 1 refers to a check-in for a student whose ID is 153 at location 1 at 1,008,798 s after 1 January 2014.We also gathered some personal information of students, including grade, major, gender, and residence location.

Mining of the Social Network
To mine the social network, our method mainly consists of two parts, namely the co-occurrence data acquisition, and the hierarchical encounter model based on association rules.Details of the method are introduced in the following sections.

Co-Occurrence Acquisition
The first step in mining the students' social network is to obtain the co-occurrence data.In this section, we give our definition of the co-occurrence and compare two co-occurrence acquisition methods, the fixed time slicing method and the sliding window method.

Co-Occurrence and Its Definition
To infer the social network, we first obtained the co-occurrence dataset between any student pair, which reads: where i and j are the indices of students and σ(i ∪ j) is the count of the co-occurrence between students i and j.
Consider the check-in data z=1 of location l, where d z } and Z (l) is the total number of the check-in data in the location l.The time sorted version of

The Fixed Time Slicing Method
In the fixed time slicing method, time is sliced into consecutive, non-overlapping time slots ∆t with equal length [17].In the above equation, w m,n is expressed as: For two students, if they check in during one time slot at location l, then it is considered as one co-occurrence between these students.However, if they check in within time ∆t at location l, but in two different time slots, no co-occurrence will be counted.For example, the time interval between the timestamps t(l) 3 and t(l) 4 , for event d(l) 3 and event d(l) 4 , is less than ∆t, but the events are in different time slots.Therefore, the fixed time slicing method will erroneously omit this co-occurrence.

The Sliding Time-Window Method
An alternative to the fixed time slicing method is the sliding time-window method.In the latter method, the summation term w (l) m,n is calculated as: with ∆t being the fixed time interval.If student i checks in at time t(l) m and student j checks in at time t(l) m during one time slot at location l, then it is considered as one co-occurrence.Moreover, there are two special cases when only one co-occurrence will be recorded even if there are multiple co-occurrence.
The cases are illustrated in Figure 1.For any student pair, at most one co-occurrence will be recorded in one time-window, and one recorded co-occurrence will not be recorded again in the next time-window.
The selection of the starting point and step size in the sliding time-window is crucial.We take the first time point t(l) 1 as the starting point and set a variable step size δ, which is given by: Algorithm 1 shows the steps to acquire co-occurrence data using the sliding time-window method.In this method, the window slides from tn to tn+1 in each iteration where n ∈ {1, . . ., Z (l) − 1}.For example, records d(l) 1 and d(l) 2 are within ∆t; student ũ1 and student ũ2 are counted as one co-occurrence if ũ1 and ũ2 do not refer to the identical student.Note that if one student has multiple check-in records within ∆t, it is treated as one effective check-in record.As a result, the effect of the consecutive check-in of one student is mitigated.Table 2 gives samples of the co-occurrence, where the first row means that student 53 and student 57 had 120 co-occurrences.; k between startTime and endTime then

The Hierarchical Encounter Model Based on Association Rules
Having acquired the co-occurrence data, the hierarchical encounter model based on association rules was then used to extract the social ties data: with c(i → j) being the level of the association of students i and j.

The Hierarchical Encounter Model
College students' social networks have significant homophily-intra-group students have larger co-occurrence counts, while inter-group students have smaller co-occurrence counts even if they have social ties.In References [16,17], it is assumed that all students are mutually independent and the homophily effect is ignored.Under this assumption, inter-group social ties are buried.Therefore, we propose the hierarchical encounter model to mine the social ties of inter-group students and those of intra-group students separately.
This model is illustrated in Figure 2. It is divided into four levels according to major, college, and grade.Here we use the major homophily as an example, as other types of homophily follow the same steps.For a student, this model mines the intra-major social ties at the first level, and then inter-major social ties at the second level.Detailed steps for the model can be found in Algorithm 2. The social ties of students between college within a university or between grades can be mined using the same method.

Social Ties Mining With Association Analysis
Based on the co-occurrence data, it is common to use a hypothesis test or a shuffling test to decide how many co-occurrences a pair must have before they are said to be friends [29,30].The problem is that both approaches require the assumption that student co-occurrences are mutually independent, which is hard to reconcile with the homophily effect.Our main idea is that the co-occurrence count of two students is proportional to the probability that these two students are friends.Therefore, we used association rules to further extract the social ties from the co-occurrence data.Specifically, we calculated the support and confidence to find the association rules, i.e., to find students with strong correlation (social ties) by analyzing the association of each student pair.
For mining intra-major social ties, we used traditional association analysis, which mines association rules in the form of i → j.A social tie from student i to student j exists if the association rule from student i to student j exists.We defined support s(i → j) and confidence c(i → j) of the check-in data as: and: respectively, with σ(i) the total number of the check-ins for student i, σ(i ∪ j) the number of co-occurrences of student i and student j, and Z the total number of check-ins in the entire dataset.Support s(i → j) and confidence c(i → j) describe the frequency of the co-occurrence of student i and student j relative to the total number of check-ins in the entire dataset, and student i's check-in record, respectively.Confidence c(i → j) can be used to represent level of association of student i and student j.
For mining inter-major social ties, we propose the "quasi-association analysis" method.While traditional association analysis can be used to mine rules in the form of i → j and also rules in the form of j → i, quasi-association analysis divides student set U into two separate subsets A and B and mines the association rules in the form of (i Support s(i → j) and confidence c(i → j) are important metrics in association analysis.Association rules with low support might come from random co-occurrence and are usually meaningless.It is common to use a support threshold h s to sift out those meaningless items and to keep the frequent items.For a certain rule i → j, higher confidence means a larger probability that j appears in the event related to i.The confidence threshold h c is commonly used to extract high confidence items from frequent items, which are the students' social ties.Support and confidence thresholds are usually set by users or experts in the traditional approach and are usually fixed [31,32].However, in mining the college students' social network, different students have different check-in data numbers.It is therefore unreasonable to use the same threshold for every student.This motivated an adaptive threshold determination approach.

Threshold Determination
There are three thresholds in this paper, namely h s1 , h s2 , and h c , among which h s1 and h s2 are the support thresholds with h s1 used to sift out the students with small numbers of check-ins-it is hard to mine the social ties of students who seldom use the ID card.Threshold h s2 is designed to delete pairs of students with very small numbers of co-occurrences.If the co-occurrence of two students is less than h s2 , then these two students are considered as entirely uncorrelated.There exists a trade-off between computation complexity and fidelity-a smaller threshold leads to identification of more social networks at the cost of more computation.Algorithms 3 and 4 illustrate steps for determining frequent itemset for students from a specific major and that from other majors through support thresholds h s1 and h s2 , respectively.Algorithm 3: Determining frequent itemset for students from a specific major The last threshold, h c , is the confidence threshold.It is used to extract rules with high confidence.Taking student i as an example, let { σi,k } N−1 k=1 be the result sorting the entire co-occurrence σ(i ∪ j) N j=1 between student i and others in descending order.Figure 3 is the plot of { σi,k } N−1 k=1 , where the x-axis is the index k while the y-axis is the number of co-occurrences σi,k .Figure 3a,b is the co-occurrence of the intra-major students and of the inter-major students, respectively.In both the intra-major and the inter-major case, there exist apparent inflection points in the figures.There are only a few students having large co-occurrence counts with student i, with a large count being one that is higher than the count at the inflection point.We considered these students to have social ties with student i.Most students have small co-occurrence counts with student i, and we considered these co-occurrences to be the result of randomness.Figure 3 provides theoretical fundamentals for the determination of the threshold.
For student i, only when confidence c(i ∪ j) is no less than h c do we consider there is a social tie between student i and student j.Threshold h c is defined as: where σi,k satisfies: In the above equation, ∆σ is a parameter; specifically, it is the threshold of difference.From the above definition, we know that student i has social ties with the first k sorted students with the confidence being the strength of the social ties.When mining the social ties of different students, h c is set to different values so that h c adapts to different students.In this work, we set the parameter ∆σ based on a known social sub-network.We did the experiment and adjusted the value of the parameter so that our inferred sub-network fit the known sub-network.Algorithm 5 details the way to obtain social ties data through confidence thresholds.

Results and Discussion
We collected the student ID card transaction records of 335 students (113 from the electrical engineering (EE) major and 222 from other majors) from the 2012 grade and 327 students from the 2013 grade of CPST, CCNU.Our results were verified by questionnaires or interviews from our investigated students.(For details of the verifications, see Appendix A).Most students often interact with their roommates, friends from the same major, and friends from other majors or grades.The results of the verification showed that our method can extract social ties with better accuracy.

Mitigation of the Homophily Effects
We compared the social networks extracted using the original model and the networks extracted with our new model.Figure 4 illustrates the results, where nodes are the students and the connection lines are the social ties with their width representing the strength of the connection.In Figure 4a,b, we first investigated the major homophily effect.We labelled the networks from Figure 4a,b by Net major and Net major .There are few connections among the inter-major students without considering the major homophily.In Net major , the proportion of extracted inter-major social ties amongst entire social ties is 0.89% only, while with the hierarchical encounter model shown in Net major , the same proportion grows to 5.45%.We further tested our model by extracting social ties of students from different grades.Social networks inferred with the original model and the new model are illustrated in Figure 4c,d, respectively (labelled with Net grade and Net grade ).Similar to the previous scenario, the ratio of the inter-grade social ties rises from 0.92% to 4.65% with our new model applied.
To further analyze the inferred networks, we introduced two commonly used measurements [33].The first one is path length L, which reflects the global characteristic of a network.It represents the average length of every path in a network.The path length is calculated by: where: is the average distance from one node to other nodes.From the above equations, we can see that the path length satisfies L ≥ 1.When L = 1 the network is the most centralized and is strongly connected.
For convenience, we used the closeness centrality η i , which is the reciprocal of the average distance of a node d i , to analyze the network.The closeness centrality is unitless and η i takes the value from [0, 1].When η i = 1, the node is directly connected to any other nodes in the network.Another measurement is the clustering coefficient, which describes the local characteristic of a network.It shows the degree of convergence of a friend cluster.For a node i with degree k i (the number of connected adjacent nodes), its clustering coefficient is defined as: where E i is the number of lines among the k i neighbors.The clustering coefficient C i is also unitless and ranges from 0 to 1.A node with C i = 1 has all its neighbors mutually directly connected.
( In Figure 5a,b, we show the closeness centrality and the clustering coefficient of networks Net major and Net major , respectively.Recall here that the network with the prime label represents the network extracted using our method, when, without considering the major homophily, the distribution for the closeness centrality of the social network is more discrete.Compared with Net major , the closeness centrality of Net major increases significantly.We further calculated the path length for the networks Net major and Net major .It turns out that the path length for the network without considering the major homophily is 10.97, while that value is 8.82 when we consider the major homophily.A shorter path length results from an increment of the number of social ties, indicating that more inter-major social ties are found.As illustrated in Figure 5b, since more inter-major social ties are found, the clustering coefficients of network Net major are smaller and are more concentrated when compared to Net major .
The average clustering coefficients for a network with or without considering the major homophily are 0.33 and 0.35, respectively.The cluster is slightly more scattered when considering the major homophily, also illustrating that more social ties are extracted.
Similarly, we analyzed networks Net grade and Net grade by the two measurements.The results are illustrated in Figure 5c,d.In Figure 5c, the central closeness of Net grade is larger than 0.1 except for some extreme outliers, which indicates that Net grade has more nodes with better observation horizon for information flow.Figure 5d has roughly the same distribution as Figure 5b.With our new model, the path length for the network decreases from 8.96 to 8.20, and the average clustering coefficient drops from 0.39 to 0.37.
Having considered the effect of homophily, our model can mine more genuine inter-major and inter-grade social ties and can infer the social ties of students with better fidelity.We also found that inter-group social ties are far fewer than the intra-group social ties.To facilitate communication between inter-group students and to boost major and grade crossing, the college can organize more activities to help to improve this situation.

The Effect of the Adaptive Threshold Method
We further investigated the effect of the adaptive threshold determination method.Using the records of the 113 EE students (2012 grade) as examples, we employed the community mining algorithm from Reference [31].Figure 6a,b illustrates the community network extracted using the traditional fixed threshold method and the adaptive threshold method, respectively.The two networks are labeled with Net f ixed and Net adap .In the figures, the nodes represent the students; the sub-networks with various colors and number of nodes are the communities.The isolated nodes scattered across the figures are students without a connection to others.From Net f ixed , we found that there are many connections between communities and there may exist false social ties.By contrast, in Net adap , social ties are much more explicit.The number of communities of Net f ixed and Net adap are 12 and 14, respectively.In Net f ixed , the maximum number of students in one community is 24, while for Net adap , it is 18.The number of communities with 3 to 5 members increases from 6 to 9, indicating that our model rejects some false social ties.For Net f ixed , the modularity is 0.741, and for Net adap , the modularity reaches 0.858.This increment further proves the outperformance of our model.We analyzed the distribution of the degree value of the two networks Net f ixed and Net adap , and the results are illustrated in Figure 7.The degree distribution for the adaptive threshold method tends to concentrate around small values of degree, while that for the fixed threshold method centers around large values of degree.Moreover, the average value of the degree for the adaptive threshold method and that for the fixed threshold method are 2.13 and 2.91, respectively.Usually, a large degree value results from false social ties, while the adaptive threshold method shows higher fidelity.

Conclusions
To alleviate the homophily effect, we proposed a social network mining method with the student ID card-the hierarchical encounter model based on association analysis.For the social ties of one particular student i, we proposed the following mining procedures.Firstly, calculate the co-occurrences between student i and others using the sliding window method.The traditional association analysis model should then be employed to extract the intra-group students' social ties.This is followed by the inter-group students' social ties using quasi-association analysis.The adaptive threshold method is used to sift out weak social ties.Lastly, by combing the social ties of the intra-group students' social ties and the inter-group students' social ties, we obtain the complete social ties of student i.We used the data of 662 college students from CPST to verify the effectiveness of our model regarding the homophily and the adaptive thresholds method.Having considered the homophily effect, our new model extracted more inter-group social ties, while with the adaptive threshold method, our model could sift out more false social ties.Results suggested that our model can infer the social ties of students with better fidelity.The method proposed here requires the setting of somewhat arbitrary thresholds (the threshold of difference ∆σ) whose selection influences the final result.To determine the influence of the thresholds, a researcher can perform sensitivity analyses to assess how the threshold choices affect results-analyses such as those shown in Figures 6 and 7.
The model we proposed considers issues such as co-occurrence, homophily, and threshold determination.However, several issues remain for future research.Firstly, we did not investigate the setting of the length of the time-window.A length too large risks false social ties, while a length too small may lead to underestimation of the number of social ties [34][35][36][37].We only considered one characteristic, co-occurrence; it is, however, possible to increase the number of the characteristics [8,38,39], taking into account characteristics such as the number of locations, location entropy, and time entropy.One major limitation of the data comes from the fact that the ID card is exclusive to the food courts and the supermarkets in the university.As more and more students tend to have their meals or shop outside the university, especially through the more and more popular on-line orders, the width and amount of the data will shrink as time goes by.Data from other irreplaceable and commonly visited locations, such as the libraries and the classrooms, can be used to compensate for such an effect.
The applications of our research are manifold.Universities can use our method in a dynamic way.They can infer student social networks every certain period so that the networks are up to date.With the knowledge of students' social networks, universities can take proactive actions to alleviate possible mental health problems of students and prevent possible damages from students.If contact with a student is lost, universities can contact the friends of this student in the first place.Although there are issues remaining for further study, ID card data may prove a useful tool for mapping social networks based on geographic and temporal proximity when GPS data are unavailable.Because most college campuses have student ID cards, the approach seems particularly useful for monitoring social ties in higher education, particularly in settings where ID cards are frequently used for a variety of activities, such as dining, shopping, rentals, online access, admission to campus activities, and/or library books.We admit that the applicability of our approach might vary across different countries, since for some countries, informed consent of students is hard to get.Nevertheless, our approach can be directly adopted by other Chinese Universities where data collection is relatively easy in the ethical sense due to the special reason we stated in Section 2. Other organizations such as high schools and companies can also adopt our method to monitor the mental status of their students or employees in a similar fashion.

Algorithm 1 :
Acquiring co-occurrence data using sliding time-window method Input: dataset D = {d z } Z z=1 and fixed time interval ∆t Output: co-occurrence dataset C for each location l = 1 in L do dataset D (l) = checks-in data of location l; dataset D(l) = time sorted version of D (l)

Figure 1 .
Figure 1.Two special cases in the sliding time-window method, where (a) one student labeled with i checks in multiple times in one time-window and (b) the same two check-in records of two students labeled with j and k appear in two successive time-windows.

Algorithm 2 :Figure 2 .
Figure 2. The hierarchical encounter model.The arrows here represent the statistical process across different levels.

Algorithm 4 :
Determining frequent itemset for students from other majors Input: dataset of students from a specific major D 1 = {d z } Z z=1 , dataset of students from other major D 2 = {d t } T t=1 , thresholds h s1 and h s2 Output: frequent itemset F 1 and

Algorithm 5 :Figure 3 .
Figure3.The scatter plot of co-occurrences between students (1 to 3) and others.The time of co-occurrences here is sorted in descending order for (a) intra-major social ties and (b) inter-major social ties.

Figure 4 .
Figure 4. Social tie revealing alleviation of homophily effects.Here, (a,b) are the social network of 335 students, where red notes represent students from electrical engineering (EE) and blue nodes the others, and (c,d) are the social network of 662 students, where green nodes represent students from grade 2012, and orange nodes students from grade 2013.Networks in (a,c) are extracted with the original model, where the homophily effects are not considered.Networks in (b,d) are extracted by our new model.

Figure 5 .
Figure 5.The boxplot for closeness centrality (a,c) and for clustering coefficient (b,d).Here, the label on the horizontal axis represents networks extracted with the original model or with our new model.Boxplots (a,b) are for networks Net major and Net major , and boxplots (c,d) are for networks Net grade and Net grade .For each box in the graph, the red '+' represents the extreme outliers.

Figure 6 .
Figure 6.Social tie inferred from different threshold determination methods with (a) fixed determination method and (b) adaptive determination method.Here, nodes with different colors represent students from a different community.

Figure 7 .
Figure 7.The distribution for the degree value of the networks mined by the fixed threshold method (dark blue), and by the adaptive threshold method (light red).Here the dark red area represents the overlap between the two areas.

Table 1 .
Examples of check-in data.

Table 2 .
occur for the first time then w Examples of the time of co-occurrence.