1. Introduction
Social relationships have important impacts on people’s lives, and as such, studying them can help to understand a person’s life. To understand a student’s social relationships, especially on campus, can help administrators to better serve the student. For example, we should pay more attention to lonely students who have few associations with others, and provide timely and useful psychological guidance if necessary. In addition, it is useful in team building to fully play the positive leading role of students who have greater associations with others. Hence, it is worth studying how to analyze the social relationships among students on campus.
Students’ social relationships, also called associations, can be reflected in their behaviors objectively. When two students are in a close association, their behavior should overlap significantly on campus, so we can study this relationship based on their behavior data. With the improvement of hardware and software on the “smart campus,” various daily activities of students are recorded, such as having meals, showering, borrowing books, and visiting the library. These data describe students’ daily behavior on campus from many aspects, which makes the multi-source behavior data available for in-depth analysis of their associations. Previous works based on behavior data has addressed topics such as constructing student social networks [
1,
2], predicting academic performance [
3,
4,
5,
6,
7], and forecasting career choices [
8]. These works showed that it is possible to analyze students’ lives through their behavior data.
To accurately and intuitively analyze associations, we propose a visual analytic method in which an association network is constructed based on the similarity of students’ behavior. However, with a growing number of students, the association network becomes too complex to clearly observe. To overcome this issue, we use the Louvain algorithm [
9], a kind of mostly adopted community detection algorithm based on modularity optimization, hierarchically partition the association network into communities. The students belonging to one community have higher probability of being friends with each other than with students of other communities. Identifying student communities offers insight on how the association network is organized. It allows us to focus on regions of interest and helps to classify the students based on their role with respect to the communities they belong to. For example, we can distinguish students who are embedded within the community from students at the boundary of the community, the two types of students play different roles in the community and may need different attention. Finally, we use visualization charts to express these associations intuitively. The main contributions and advantages of our method are as follows.
The multi-source behavior data of students are collected from different information management systems, and then they are fused to construct an activity sequence for each student, which is the basis for analysis of association among students.
Three kinds of similarity operators are proposed to compute the degree of similarity of behavior data among students. We take the similarity value as a weight of edges to generate association network representing social ties among students.
The student communities were discovered by adopting the Louvain algorithm in the association network. We can further understand the organizational structure of association network through the hierarchical community structure.
A visual analytic system was developed that interactively exhibits the association among students and allows us to intuitively explore the social relationships of specific students or communities of interest.
The remainder of this paper is organized as follows.
Section 2 reviews the related work from the three aspects of how to infer social ties from spatiotemporal behavior data, detect communities from a complex network, and apply visualization techniques to detect communities.
Section 3 introduces the framework of our method.
Section 4 highlights privacy issues.
Section 5 describes the experimental dataset and seven extracted behavior features.
Section 6 explains how to construct the association network with three similarity operators and discover student communities via Louvain algorithm.
Section 7 describes the visual expression of association. We conducted four experiments on the dataset and explain the results in
Section 8. Finally, we conclude this paper in
Section 9.
2. Related Work
We introduce related work from three perspectives.
2.1. Constructing Social Networks via Spatiotemporal Data
Much current research infers social ties between people from spatiotemporal behavior data, based on the assumption that two people should know each other if they have been in the same location at the same time on multiple occasions. However, is this assumption true? To answer this question, Crandall et al. [
10] proposed a probabilistic model to quantify the correlation between the number of co-occurrences and the strength of social ties, and their experimental results proved that the number of co-occurrences has significant implications in forming inferences about social ties, which lays a foundation for related works.
To address the shortcomings of straightforward methods (richness and frequency), Huy et al. [
11] proposed an entropy-based model that infers the social connections and quantifies their strength by analyzing people’s spatiotemporal co-occurrences. Their method uses Renyi entropy to measure the diversity of co-occurrences. A parameter controls how much coincidences contribute to diversity, and a weighted frequency can increase the impact of co-occurrences in uncrowded locations on the strength of social connections. It utilizes location semantics to improve the model. The experimental results show that it outperforms its counterparts and can be applied in networks, such as computing social strength in the social internet of things [
12] and recommending friends in location-based social networks [
13].
Based on the co-occurrences of behavior, He et al. [
14] proposed that the impact of a given location on social strength should vary by time period. They exploited four spatiotemporal features, including fine-grained temporal features, weekday and weekend features, fine-grained location weight, and co-occurrence features, to infer friendship. Zhou et al. [
15] proposed a theme-aware social strength inference method to extract themes from spatiotemporal behavior data and analyzes the contribution of each theme of behavior to social strength.
To analyze the social ties among large number of students over prolonged periods of time, Tao Liu [
2] proposed an unsupervised statistical validation method based on the spatiotemporal behavior data on campus, taking spatiotemporal co-occurrence between two students as a measure to judge whether they have a tie, and validating ties against the null hypothesis that all co-occurrences are due to coincidence. They analyzed the structure of a network based on several metrics including degree distribution, degree assortativity, and attribute assortativity. Several differences exist between their work and our method, although both infer social ties from spatiotemporal behavior data. First, they only collected the check-in records in canteens and stores, while our behavior dataset also includes records of library entrance, book borrowing, and showering. Second, the community detection algorithms of the two methods are different. Our method uses the Louvain modularity optimization algorithm, while they used OSLOM. Third, the visualizations differ. We developed a visualization system, while they used Gephi to draw a force-directed graph.
2.2. Detecting Communities via Modularity
Identifying communities in networks, especially complex networks with huge numbers of nodes and edges, can offer insight on how a network is organized, and it has been a hot topic in various fields. There are many algorithms, which can be categorized by different criteria. In [
16], the categories include spectral methods and methods based on statistical inference, optimization, and dynamics. Among them, optimization techniques have attracted the most attention. They aim to find an extremum of a quality function, which indicates the quality of clustering. Modularity is the most popular quality function. Newman in [
9] proposed an algorithm to partition a complex network into densely connected subnetworks using the gain value of modularity. They iteratively remove edges that are selected by “betweenness” measures, and define the strength of the community structure to provide an objective metric to determine the number of communities into which a network should be divided.
However, tuning the resolution parameter is usually computationally intractable and time-consuming. To circumvent this issue, Blondel et al. [
17] proposed the Louvain method to extract the community structure, which hierarchically performs a greedy optimization of modularity. Their experimental results show that the Louvain algorithm can accurately identify communities in network with millions of nodes in a short time, and can uncover the hierarchical community structure to facilitate observation at the desired resolution.
Many papers [
16,
18,
19,
20,
21] compare the performance of the wildly adopted community detection algorithms based on real-world and artificial datasets. Most conclude that the Louvain algorithm outperforms others or has similar detection results in terms of general metrics such as modularity, computing time, precision, and recall.
Due to its advantages, the Louvain algorithm is used in many applications. The modified Louvain algorithm was used in an online social network to discover the opinion leader [
22], in economics to delineate the regional economic geography [
23], and in location-based networks to improve recommendation accuracy [
24].
2.3. Visualization in Community Detection
Visualization can effectively represent data, especially those with a complex structure and huge volume, where it can facilitate the intuitive discovery of hidden information. Some work has incorporated visualization techniques in community detection.
To choose the most appropriate community-detection algorithms for a scenario, Claudio et al. [
25] proposed a statistical visual methodology using visual interactive charts to present the detecting results of the commonly used community detection algorithms, enabling users to observe the distribution of nodes in a region of interest by zooming and panning. CRAMPES et al. [
26] proposed a unified community detection method to handle bipartite graphs, directed graphs, and overlapping communities using the Louvain algorithm. Visualization is used to observe community partitioning, overlapping, and possible assignment contradictions. These visualizations help to understand the processes and results of community detection.
3. Proposed Methodologies
We propose a visual analytic method to construct association network of students by computing the degree of similarity of daily living behavior for pairs of students, and discover communities by using the Louvain algorithm. Students’ behavior in a community should be similar, indicating high correlation. Visualization charts ensure an intuitive understanding of associations among students.
Figure 1 shows the method’s framework, with the three stages of data collection and pre-processing, qualitative analysis of associations, and visualization analysis of associations.
3.1. Data Pre-Processing
Different types of behavior data of students are usually stored in independent information systems maintained by different departments. For example, book-borrowing records are stored in the library’s database, and meal records paid via smart card are stored in the smart card system supported by the information support center. These “information islands” bring great difficulty to the analysis of students’ overall behavior. To completely describe each student’s behavior trajectory, we collect records of book borrowing records, library entrance, showering, and meals using extraction-transformation-loading (ETL) tools; we call these behaviors activities. After denoising, we sort activities by time to construct the activity sequence for each student, where represents activity Z, identifies a student, and and are respectively the time and location of activity Z, respectively. The activity sequence lays a foundation for subsequent association analysis.
3.2. Qualitative Analysis of Association
We propose three operators to measure the similarity of students’ activity sequences. The higher the similarity the stronger the association. We construct an association network of N students, using the similarity value of two students as the weight of the edge connecting them. From this network, we can understand the overall relationship among students. However, the network will become increasingly complex with a growing number of students. To solve this problem, we used the Louvain algorithm to discover a hierarchical community structure, which reduces the complexity of the network while preserving the main information.
3.3. Visual Analysis of Association
Compared with qualitative analysis of association, visualization analysis can make the association more intuitive. We introduce some visualization techniques, such as chord diagram and Fruchterman–Reingold (FR) layout, which are widely used to represent association patterns such as in social and biological networks. These help us to intuitively and interactively understand the association among students.
4. Privacy Protection
The student services department provided support regarding privacy. During the student enrollment period, they asked students whether they would like to share their campus activity data for analysis. With students’ consent, we took additional measures to protect their privacy. We created a mapping table between real student IDs and encoded student IDs; every real ID was encoded as a unique, anonymous, alphanumeric identifier. During data collection, real student IDs in every data source were replaced by their corresponding encoded identifiers, assuring anonymity throughout the experiment. A day was divided into 144 10-min bins. Behavior time was therefore encoded as a number in the range [1,144], and we could not obtain the precise behavior time. Verification was performed by the student services department; they invited some students as volunteers to verify the experimental results. Finally, the researchers involved in these experiments signed a confidentiality agreement.
5. Data Collecting and Feature Extracting
Our dataset includes records of book borrowing, showering, library entrance, and meals of 8685 students enrolled in the Spring 2017 semester.
To compute the behavior similarity among students, we extracted the following seven features from the original behavior data.
Number of smart card transactions. In most Chinese universities, smart cards serve as the unique payment medium in canteens, bathrooms, and libraries. Therefore, smart card consumption records can describe the activities of each student on campus. We count the number of transactions via smart card at given locations in a specified period. The higher this number, the more active a student at a specified location and time.
Amount of transactions via smart card. The sum of transaction amounts can evaluate the consumption level of a student in a specified location and period.(i.e., a semester or a month).
Consumption frequency during peak period. A peak period is a time interval during which most students perform the same activity. According to the teaching schedule, we define three peak consumption periods: 7 a.m. to 9 a.m. for breakfast, 11 a.m. to 1 p.m. for lunch, and 5 p.m. to 8 p.m. for dinner. We defined a triple to store the frequency of consumption during the three peak periods, through which we can measure the level of orderliness of each student.
Number of days that students have a regular lifestyle on campus. A regular lifestyle connotes consumption frequency during peak periods in the normal range.
Entropy of activity locations. The degree of dispersion of activity locations reflects a student’s level of disorder in a period. We acquire all the activity locations and compute the entropy as Equation (
1), where
denotes all activity locations visited by student
u,
represents the activity locations in the specified area
l visited by student
u,
is the total number of visits to area
l by student
u, and
is the probability that student
u visits area
l.
Entropy of activity time. This measures the degree of dispersion of each student’s activity time. Similar to the entropy of activity locations, we acquired all activity times and locations, and computed this entropy as Equation (
2), where
is overall time distribution of visits to the specified area by student
u.
is the specified part of
,
is the number of times student
u visits the area in the specified time interval, and
is the total number of visits by student
u to the given area.
is the probability that student
u visits the given area in time interval
t.
Number of books borrowed from library. This can be computed through the book-borrowing records, which are filtered by borrowing time and book type.
6. Qualitative Analysis of Association among Students
When two students’ activity sequences are similar, we generally think that they are closely associated. To measure this similarity, we propose spatial similarity, spatiotemporal similarity, and behavior features-based similarity operators, through which we can compute the degree of association among students, and take this as the weight of the edge in the association network. However, the network will become increasingly complicated with the number of nodes, so to clearly understand the association structure, we introduce the Louvain algorithm [
17] to discover the student communities.
6.1. Three Similarity Operators
6.1.1. Similarity of Spatial Patterns
The behavior spatial pattern refers to students’ spatial movement mode on campus, and does not consider the activity time. To express this pattern, we built a vector for each student to represent the frequency of activity at the main locations on campus. When two students are often active in the same locations, we think their behavior spatial patterns are similar. We compute the similarity by Equation (
3), where SpaSim
is this similarity operator,
u and
v are two students,
is the set of activity locations of student
u,
is the
ith activity location,
is the frequency of visits by user
u at the
ith location, and len
is the cumulative visiting frequency at all activity locations of student
u. If the co-visited location set is not empty, then
, and SpaSim
; when
, SpaSim
:
6.1.2. Similarity of Spatiotemporal Patterns
Compared to the spatial pattern, the spatiotemporal pattern considers both the activity location and time. To express this pattern, we constructed the activity sequence including all activities. We divided a day into 144 time bins, so an activity time can be mapped to a discrete value. The activity location was labeled according to the campus map. After these pre-processing steps, we sorted the activity set by time and construct the activity sequence. When the sequences of two students overlap significantly, the similarity value of their spatiotemporal patterns should be high. We formulated this as Equation (
4), where Act
indicates the frequency of activities in which students
u and
v participated simultaneously,
is the activity sequence of student
u, ActSim
is the spatiotemporal similarity between
u and
v,
is the total number of activities participated in by all students, and len
is the number of individuals appearing in the common sequence when Act
occurs:
6.1.3. Similarity of Behavior Features
The above two operators use the spatiotemporal co-occurrences to measure the similarity. However, whether we can use the behavior features to infer social association between students? To validate this hypothesis, we propose the behavior features-based similarity operator based on the features described in
Section 5, and formulate this operator as Equation (
5), where
u and
v denote two different students,
is the feature vector of student
u, and
is the
d-dimensional feature of
. We compute the Euclidean distance FeaDis
and similarity of behavior features FeaSim
, which is the negative exponential function of FeaDis
, between
and
.
is a tradeoff factor to restraint FeaSim
to the range [0,1],
, where
is the maximum number of pairwise students:
6.2. Discovering Student Communities Based on Modularity Optimization
Based on the three operators, we can compute the similarity value among students, and then construct the association network. However, it becomes more difficult to retrieve the association information from the network as the number of nodes increases. A promising approach is to decompose the network into communities, whose nodes are highly interconnected and among which there are fewer connections [
27]. Researchers have proposed many algorithms to reasonably partition the network. Among them, Louvain algorithm is one of mostly adopted community algorithms, and it has superior performance compared with its competitors, as stated in [
16,
18,
19,
20,
21]. We use the algorithm to discover student communities.
6.2.1. Definition of Modularity
Newman highlighted the property of community structure and proposed the algorithms to divide the network into communities based on modularity optimization [
9,
27,
28]. Modularity in a weighted network is defined as Equation (
6) [
29], where
is the weight of the edge-connecting nodes
i and
j, and
is the sum of the weights of edges connected to node
i, and
is the community to which node
i is assigned. The function
is 1 if
and 0 otherwise, and
. The modularity
Q is between −1 and 1, and it can be used as an objective function to look for the divisions with high modularity over all possible partitions:
6.2.2. Discovering Student Communities Using the Louvain Algorithm
The Louvain algorithm has two phases that are reapplied iteratively. For a weighted network of
N nodes, it first views each node as an independent community. Then it takes the neighbors
j of
i for each node
i and evaluates the gain of modularity from placing the node
i in the community of node
j; placing occurs when the gain is positive and maximum; otherwise, node
i remains in its original community. This algorithm executes this operation repeatedly on all nodes in a specified order until a local maximum of the modularity is attained. The gain in modularity
obtained by moving node
i to community
C can be computed as Equation (
7) [
17]:
where
is the sum of the weights of the edges in community
C,
is the sum of the weights of the edges connected to nodes in community
C,
the sum of the weights of the edges connected to node
i,
is the sum of the weights of the edges from node
i to nodes in community
C, and
m is the sum of the weights of all the edges in the network.
The second phase of the Louvain algorithm constructs a new network, whose nodes represent the communities discovered during the first phase. The weight of the edges between the new nodes is the sum of the weights of the edges between nodes in the corresponding communities, and the new nodes may have self-loops, whose weights are the sum of the weights of all the edges in the corresponding communities. When the second phase is completed, we can reuse the first phase in the new network; this iterative operation does not converge until the maximum of modularity is attained or the gain of modularity does not exceed a specified threshold. Through the iterative operation, the Louvain algorithm constructs the hierarchical community structure.
To discover student communities using the Louvain algorithm, we constructed a weighted network
of 8685 nodes, where
V is the set of nodes representing the students, and
E is the set of edges, whose weight is computed using the above similarity operators. The process is shown as Algorithm 1.
Algorithm 1 Algorithm to discover student communities. |
Require: Weighted Association Network Ensure: - 1:
viewing each node as an independent community - 2:
Flag - 3:
while !Flag do - 4:
Flag - 5:
for do - 6:
maxgain=0 - 7:
index= - 8:
for neighbors of node i do - 9:
calculating by using Equation ( 7) - 10:
if > maxgain then - 11:
maxgain= - 12:
index=j - 13:
end if - 14:
end for - 15:
if maxgain>0 then - 16:
placing node i in the community of corresponding node - 17:
Flag - 18:
end if - 19:
end for - 20:
if !Flag then - 21:
constructing the new network - 22:
. - 23:
end if - 24:
end while
|
We implemented this algorithm using Java and ran it in multi-threading mode. The network can be divided into k non-intersecting subnets . Based on the results, we can understand the association subnet for each student.
7. Visual Analysis of Association
Different than qualitative analysis, visual analysis provides a more intuitive way for educational administrators to understand a student’s social association. We introduce visualization charts, such as chord diagrams and the FR algorithm, to express the association among students. We next explain the usage of chord diagram and FR algorithm; see
Section 8.5 for their specific application.
7.1. Chord Diagram
A chord diagram is a popular visualization chart that can express the relationship among nodes in complex social networks. It consists of nodes and chords, where a chord connects two nodes that have a relationship. We use a tuple
to represent the chord diagram [
30], where
G is a social network graph;
N is the set of nodes;
is the set of chords;
is a function that assigns a non-negative, non-zero real number, known as a weight, to each chord;
is the radius of the graph;
is a function that computes the size of the node in the chord diagram;
is a function that computes the width of the chord.
We used the chord diagram to develop an association visualization view representing the relationship among students, and take the spatiotemporal similarity operator as the function
to compute the weight. When the similarity value exceeds a threshold, there is a chord connecting the nodes. The function
is proportional to the number and weight of the chords connected to the node, and
is identical to
. As shown in
Figure 2, one node denotes a student identified by a student ID. For example, the large size of the node representing student 150302cf indicates that this student has many associations with others; while student 150301ai has fewer associations. For better visualization effect, we used prior knowledge, such as class or grade, to categorize students, and we marked the corresponding nodes and their chords in different colors. Meanwhile, users can interact with the chart, when a user mouses over a node, the connected chords and their weights are highlighted.
7.2. FR Algorithm
When the number of nodes increases, there are increasingly more intersections of the edges in the chord diagram. In this case, it becomes difficult to efficiently observe the associations for each node, especially those with fewer associations. To solve this problem, Di Battista conducted a bibliographic survey on algorithms to understand the data-presentation theories and applications [
31], and proposed many algorithms to draw general graphs, which seek to avoid edge crossings, avoid bends in edges, keep edge lengths uniform, and uniformly distribute vertices. Through these algorithms, we can solve the problems of the chord diagram.
One of these, the FR algorithm [
32], takes the force-directed layout and uses the elastic and energy operator to evenly distribute nodes. The operator is defined as Equation (
8), where
is the Euclidean distance between two nodes,
is the length of the spring between two nodes in their natural status, and
is the weight of the edge connecting the nodes. We used the algorithm to lay out the nodes, and we set the elastic coefficient
k to 0.05.
Figure 3 is a sample layout of the FR algorithm; nodes with less association are arranged on the periphery, nodes with much association are in the center, and its visualization effect is clearer than in the chord diagram.
8. Experimental Results
To evaluate the performance of the proposed method, we carried out a set of experiments on a real dataset, and provided the experimental results to student services department for verification.
8.1. Comparison of Similarity Operators
We compared the accuracy of three similarity operators and their weighted combination to select the best one. The six operators being compared are listed in
Table 1. We separately use these operators to calculate the similarity value among students. We take the top 50 most similar individuals for each student. We randomly invite some undergraduates as volunteers to label whether they were familiar with the 50 students, and computed the ratio.
Table 2 presents the partial experimental results. For example, 78% of the 50 most similar students computed through the spatiotemporal operator are labeled by student 151441af as being familiar; this ratio is the highest among the results of the six operators, which indicates that the spatiotemporal similarity operator is the best one. We took the results of these volunteers as samples, and calculated the average accuracy ratio for each operator. This was 87% for the spatiotemporal similarity operator, which is the highest, so we used it to compute the similarity in subsequent experiments.
8.2. Discovering Communities via Louvain Algorithm
To illustrate the process of discovering student communities via the Louvain algorithm, we carried out five iterative operations; the results are shown in
Table 3. After the first iteration operation, there are 1083 communities, the number of people in the largest community is 630, and the number in the smallest is 1. With increasing iteration operations, the number of communities decreases and the number of people in the largest community increases. After the fifth iteration, the gain of modularity approximately approaches zero, indicating that the algorithm has reached a stable state. We further combine the basic information of students, such as grade and class information, to analyze the discovery results, and conclude that students with lower grades tend to form large communities and those with higher grades tend to form small communities. This is consistent with common sense.
In addition, to manually verify the experiment, we constructed an association network of 88 students from a designated major, and discover the community structure via Algorithm 1. After two iterations, the iterative operation converged and the community structure stabilized at nine communities, as shown in
Table 4. Students belonging to the same community should have similar spatiotemporal patterns. The no. 1 community consists of two students 150301ci and 150301dg, who live in the same room and often have meals and attend classes together, and the similarity value between them is 0.82. The no. 2 community has 42 members belonging to the same class. By communicating with the counselor, we found that the performance of this class is excellent. The 11 students in community no. 3 often play games and have meals together. The no. 4 community also has two members, 150301ab and 150301bd, who are lovers, and the similarity value is 0.63; they often have meals and participate in campus activities together. The no. 5 community also contains two students, 150302cj and 150241de, with a similarity value of 0.78, and while they have different majors, they are very familiar with each other and often do things together. The 14 students in no. 6 community are from same major, with similarity value 0.83, they belong to a learning team and hold regular study activities in library twice one week. The no. 7 community is the same as the no. 8 community, they have two girl students who are in a close friendship. The no. 9 community has 11 students who are from different majors, with a similarity value of 0.69, these students belongs to one basketball team, they often shower and have meals together after training. Through these offline verifications, we showed that the Louvain algorithm can effectively discover the associations among students based on their activity sequences.
8.3. Features of Communities
For the communities discovered whose members have specific behavior features that are different from other communities, we used a parallel coordinate graph to simultaneously describe the distribution of values of the features.
Figure 4 is a parallel coordinate graph that shows the features of a group. Through this graph, we can understand that the members of this group have lower social activity index and lower regular consumption index, which represent that these students have an irregular living style, have fewer social activities, and often are not on the campus.
8.4. Exploring Associations via Chord Diagram
We used chord diagrams to visualize the association among students, as shown in
Figure 5 and
Figure 6. The thickness and direction of a chord are important factors that represent the similarity of activity sequences of students. Through these diagrams, we can explore each individual’s social association. In
Figure 5, for example, the node representing student 150304bi only connects to one node belonging to the same category with a thicker chord, which means that this student has a close relationship with student 150304bd. They should have highly similar spatiotemporal activity sequences, but student 150304bi seldom gets involved in activities with other classmates. We suggest that this type of student should expand his or her social association by participating in various activities. In
Figure 6, node 150304bf has more associations with others, and these associated nodes belong to different classes, which means that this student has much richer social relationships, we may recommend this type of students as team leaders.
8.5. Exploring Associations via FR Algorithm
We used the FR layout to visualize the association network. When the similarity of the activity sequences between two students exceeds a threshold, there is an edge connecting the corresponding nodes. As shown in
Figure 7, the node representing student 150303ba is at the periphery and has no connection with others, which means that this student’s activity sequence is quite different from others in the specified group.
Figure 8 is an enlargement of the center area of
Figure 7; the nodes located in this area have many more connections with others, which means that the corresponding students have rich social associations. Through the FR layout, we can clearly observe the structure of network. For students who has fewer association with others, we can guide them how to link their social ties with others; and for students who have much associations, it is clear that which students they should build connection.
9. Conclusions and Future Work
We have proposed a visual analytic method to exploit the association among students based on their spatiotemporal behavior data on campus. We collected multi-source behavior data from different departments to construct a relatively complete activity sequence, which is superior to single-behavior data. We proposed three similarity operators with which to construct an association network. In addition to the commonly used spatiotemporal co-occurrences operators, we introduce an operator based on behavior features, although its performance is far lower than other spatiotemporal related operators. Based on the best spatiotemporal similarity operator, with 87% accuracy, we constructed an association network among students, and further discovered the student communities using the Louvain algorithm. In an experiment with 88 students, most detected communities are consistent with ground truth. Finally, we visualized the association network through chord diagram and FR layout. The experimental results prove that the proposed method can be helpful to educational administrators. For example, they can obtain some clues about outlier students who have less association with others, especially with classmates or roommates, and find the leaders at the centers of the communities. We plan to continue our research from the following aspects: (1) Integrate more types and longer periods of behavior data to construct a dynamic association network; (2) detect communities in dynamic network to understand the changes in students’ social relationship; (3) analyze the correlation between measures such as social association, academic performance, and mental health.
Author Contributions
Conceptualization, Y.Z. and X.L.; methodology, X.L., Q.Y. and Y.Z; software, X.L. and Q.Y.; validation, Q.Y., J.D. and X.L.; formal analysis, X.L. and Q.Y.; investigation, Y.Z.; resources, B.Y. and Y.Z.; data curation, X.L. and Q.Y.; writing–original draft preparation, X.L., Q.Y. and J.D.; writing–review and editing, X.L. and Y.Z.; visualization, Q.Y. and J.D.; supervision, Y.Z.; project administration, B.Y.; funding acquisition, B.Y. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by National Natural Science Foundation of China under Grant 61632006, U19B2039.
Acknowledgments
The authors thank the volunteers for validating the experimental results.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Xu, J.; Liu, T.; Yang, L.; Davison, M.L.; Liu, S. Finding College Student Social Networks by Mining the Records of Student ID Transactions. Symmetry 2019, 11, 307. [Google Scholar] [CrossRef] [Green Version]
- Liu, T.; Yang, L.; Liu, S.; Ge, S. Inferring and analysis of social networks using RFID check-in data in China. PLoS ONE 2017. [Google Scholar] [CrossRef] [PubMed]
- Ghosh, S.; Ghosh, S.K. Exploring the association between mobility behaviours and academic performances of students: A context-aware traj-graph (CTG) analysis. Prog. Artif. Intell. 2018, 7, 307–326. [Google Scholar] [CrossRef]
- Cao, Y.; Gao, J.; Lian, D.; Rong, Z.; Shi, J.; Wang, Q.; Wu, Y.; Yao, H.; Zhou, T. Orderliness predicts academic performance: Behavioural analysis on campus lifestyle. J. R. Soc. Interface 2018, 15. [Google Scholar] [CrossRef] [Green Version]
- Zhang, X.; Sun, G.; Pan, Y.; Sun, H.; He, Y.; Tan, J. Students performance modeling based on behavior pattern. J. Ambient Intell. Hum. Comput. 2018, 9, 1659–1670. [Google Scholar] [CrossRef]
- Wu, Y.; Gong, R.; Cao, Y.; Wen, C.; Teng, Z.; Pu, J. eduCircle: Visualizing Spatial Temporal Features of Student Performance from Campus Activity and Consumption Data. In Proceedings of the 13th International Conference, CDVE 2016, Sydney, NSW, Australia, 24–27 October 2016; pp. 313–321. [Google Scholar]
- JO, I.; Park, Y.; Kim, J.; Song, J. Analysis of Online Behavior and Prediction of Learning Performance in Blended Learning Environments. Educ. Technol. Int. 2014, 15, 71–88. [Google Scholar]
- Nie, M.; Yang, L.; Sun, J.; Su, H.; Xia, H.; Lian, D.; Yan, K. Advanced forecasting of career choices for college students based on campus big data. Front. Comput. Sci. 2018, 12, 494–503. [Google Scholar] [CrossRef]
- Newman, M.E.; Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E 2004, 69, 026113. [Google Scholar] [CrossRef] [Green Version]
- Crandall, D.J.; Backstrom, L.; Cosley, D. Inferring social ties from geographic coincidences. Proc. Natl. Acad. Sci. USA 2010, 52, 22436–22441. [Google Scholar] [CrossRef] [Green Version]
- Huy, P.; Shahabi, C.; Liu, Y. Inferring Social Strength from Spatiotemporal Data. ACM Trans. Database Syst. 2016, 71, 1–47. [Google Scholar]
- Jung, J.; Chun, S.; Jin, X.; Lee, K. Quantitative Computation of Social Strength in Social Internet of Things. IEEE Internet Things J. 2018, 5, 4066–4075. [Google Scholar] [CrossRef]
- Rafailidis, D.; Crestani, F. Friend Recommendation in Location-based Social Networks via Deep Pairwise Learning. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Barcelona, Spain, 28–31 August 2018; pp. 421–428. [Google Scholar]
- He, C.; Peng, C.; Li, N.; Chen, X.; Guo, L.Y. Exploiting Spatiotemporal Features to Infer Friendship in Location-Based Social Networks. In Proceedings of the 15th Pacific Rim International Conference on Artificial Intelligence, Nanjing, China, 28–31 August 2018; pp. 395–403. [Google Scholar]
- Zhou, N.N.; Zhang, X.; Wang, S. Theme-Aware Social Strength Inference from Spatiotemporal Data. In Proceedings of the 15th International Conference on Web-Age Information Management (WAIM), Macau, China, 16–18 June 2014; pp. 498–509. [Google Scholar]
- Fortunato, S.; Hric, D. Community detection in networks: A user guide. Phys. Rep.-Rev. Sect. Phys. Lett. 2016, 659, 1–44. [Google Scholar] [CrossRef] [Green Version]
- Blondel, V.D.; Guillaume, J.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory E 2008, 2008, 10008–10012. [Google Scholar] [CrossRef] [Green Version]
- Yang, Z.; Algesheimer, R.; Tessone, C.J. A Comparative Analysis of Community Detection Algorithms on Artificial Networks. Sci. Rep. 2016, 6. [Google Scholar] [CrossRef] [Green Version]
- Lancichinetti, A.; Fortunato, S. Community detection algorithms: A comparative analysis. Soft Comput. A Fusion Found. Methodol. Appl. 2009, 2. [Google Scholar] [CrossRef] [Green Version]
- Mothe, J.; Mkhitaryan, K.; Haroutunian, M. Community detection: Comparison of state of the art algorithms. In Proceedings of the 2017 Computer Science and Information Technologies (CSIT), Yerevan, Armenia, 25–29 September 2017; pp. 125–129. [Google Scholar]
- Chejara, P.; Godfrey, W.W. Comparative analysis of community detection algorithms. In Proceedings of the 2017 Conference on Information and Communication Technology (CICT), Gwalior India, 3–5 November 2017; pp. 1–5. [Google Scholar]
- Jain, L.; Katarya, R. Discover opinion leader in online social network using firefly algorithm. Expert Syst. Appl. 2019, 122, 1–15. [Google Scholar] [CrossRef]
- Wu, K.; Tang, J.; Long, Y. Delineating the Regional Economic Geography of China by the Approach of Community Detection. Sustainability 2019, 11, 6053. [Google Scholar] [CrossRef] [Green Version]
- Cai, W.; Wang, Y.; Lv, R.; Jin, Q. An efficient location recommendation scheme based on clustering and data fusion. Comput. Elect. Eng. 2019, 77, 289–299. [Google Scholar] [CrossRef]
- Linhares, C.D.G.; Ponciano, J.R.; Pereira, F.S.F.; Rocha, L.E.C.; Paiva, J.G.S.; Travencolo, B.A.N. Visual analysis for evaluation of community detection algorithms. Multimedia Tools Appl. 2020. [Google Scholar] [CrossRef]
- Crampes, M.; Plantie, M. A Unified Community Detection, Visualization and Analysis Method. Adv. Complex Syst. 2014, 17. [Google Scholar] [CrossRef] [Green Version]
- Girvan, M.; Newman, M.E.J. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 2001, 99, 8271–8276. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Newman, M.E.J. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 2006, 103, 8577–8582. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Newman, M. Analysis of weighted networks. Phys. Rev. E 2004, 70, 056131. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jalali, A. Supporting Social Network Analysis Using Chord Diagram in Process Mining. Int. Conf. Bus. Inform. Res. 2016, 16–32. [Google Scholar] [CrossRef]
- Battista, G.D.; Eades, P.; Tamassia, R.; Tollis, I.G. Algorithms for drawing graphs: An annotated bibliography. Comput. Geom. 1994, 4, 235–282. [Google Scholar] [CrossRef]
- Fruchterman, T.M.J.; Reingold, E.M. Graph drawing by force-directed placement. Softw. Pract. Exp. 1991, 21, 1129–1164. [Google Scholar] [CrossRef]
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).