Identifying Human Daily Activity Types with Time-Aware Interactions

: Human activities embedded in crowdsourced data, such as social media trajectory, represent individual daily styles and patterns, which are valuable in many applications. However, the accurate identiﬁcation of human activity types (HATs) from social media is challenging, possibly because interactions between posts and users at di ﬀ erent time are overlooked. To ﬁll this gap, we propose a novel model that introduces the interactions hidden in social media and synthesizes Graph Convolutional Network (GCN) for identifying HAT. The model ﬁrst characterizes interactions among words, posts, dates, and users, and then derives a Time Gated Human Activity Graph Convolutional Network (TG-HAGCN) to predict the HATs of social media trajectory. To examine the proposed model performance, we built a new dataset including interactions between post content, post time, and users from the open Yelp dataset. Experimental results show that exploiting interactions hidden in social media to recognize HATs achieves state-of-the-art performance with high accuracy. The study indicates that interactions among social media promotes ability of machine learning on social media data mining and intelligent applications, and o ﬀ ers a reference solution for how to fuse multi-type heterogeneous data in social media.


Introduction
Urban spaces, where citizens live, move, and engage in different activities, are socialized, and dynamic [1]. Understanding the complexities underlying the emerging behaviors of human travel patterns on the city level is essential for making informed decision-making pertaining to urban transportation infrastructures [2]. Emerging crowdsourced data provides available data sources for capturing human behaviors and modeling urban systems. The human activity types (HATs) (e.g., sports, shopping, and travel) in human trajectory characterize the user's daily behaviors, and therefore, present human lifestyles and patterns [3,4]. The trajectory with HATs contributes not only to understanding human society and urban systems, but also to performing many tasks for different purposes, such as customer recommendation, traffic forecasting, travel demand modeling, urban planning, and so on. However, identifying HATs from the trajectory on a large scale by experts is always labor-intensive and time-consuming. Hence, it is significant to develop an approach, which can automatically identify accurate HATs on large-scale. This study aims to improve the prediction accuracy of HATs by characterizing time-aware interactions in social media. To the best of our knowledge, this study is the first to highlight the interactions in social media and introduce a graph convolutional model for predicting HATs. The original contributions of this study are as follows: 1) For effectively modeling the interactions hidden in social media, this study models the implicit interactions from two views: text view and time view. With two views, this model not only contacts the post content and users, but also helps capture the user's activity patterns at different times with a date encoding method. 2) To synthesize time-aware interactions and trajectory texts in social media, a Time Gated Human Activity Graph Convolutional Network (TG-HAGCN) is developed to fuse the embeddings of two GCN components through a gating mechanism. It offers a reference solution for how to fuse multi-type heterogeneous data with multiple views.
3) The proposed method achieves state-of-the-art performance in identifying HATs, with high accuracy, and less time expenditure. The results indicate that time-aware interactions among social media promote ability of machine learning on social media data mining and intelligent applications. This study aims to improve the prediction accuracy of HATs by characterizing time-aware interactions in social media. To the best of our knowledge, this study is the first to highlight the interactions in social media and introduce a graph convolutional model for predicting HATs. The original contributions of this study are as follows: (1) For effectively modeling the interactions hidden in social media, this study models the implicit interactions from two views: text view and time view. With two views, this model not only contacts the post content and users, but also helps capture the user's activity patterns at different times with a date encoding method. (2) To synthesize time-aware interactions and trajectory texts in social media, a Time Gated Human Activity Graph Convolutional Network (TG-HAGCN) is developed to fuse the embeddings of two GCN components through a gating mechanism. It offers a reference solution for how to fuse multi-type heterogeneous data with multiple views. (3) The proposed method achieves state-of-the-art performance in identifying HATs, with high accuracy, and less time expenditure. The results indicate that time-aware interactions among social media promote ability of machine learning on social media data mining and intelligent applications. The rest of this paper is organized as follows. Section 2 overviews related works. Section 3 demonstrates the proposed approach in detail. We introduce the datasets and experimental results in Section 4, and discuss the effects from hypermeters and network components in Section 5. Section 6 concludes this study.

Related Work
One goal of activity recognition research is to identify physical activities [7][8][9][10][11]. They can be used for human trajectory tracking, recommended system, regions-of-interest detection [12,13], and so on. Researchers use popular machine learning methods, such as support vector machine (SVM), convolutional neural network (CNN), and long short-term memory network (LSTM) to identify human activities and propose a fixed approach that treats activities as sequential text or multi-dimensional images, and uses context-sensitive grammar to identify physical activities. Actually, this research heavily depends on video cameras, wearable sensor, or monitors, and human behavior in urban scope is difficult to capture because data collection is time consuming and maybe conflicts with peoples' privacy.
Different from physical activities and events, human daily activities are high-level logical activities that can be obtained from many types of crowdsourced data. These data are easily obtained and closely connected with human behavior and urban systems. In this paper, we aim to identify human daily activities (e.g., sports, shopping, and travel) from social media trajectory. User's daily logical activities can be inferred from social media through several methods. These works can be classified into two groups, with and without Point of Interest (POI).
In works with POI, the trajectory data under study contain or correlate directly with POI data. Given the close correlation between POIs and human activities, information from POIs can be leveraged to infer HATs. Some studies used activities and POIs interchangeably [14] because activities performed at a location usually define its meaning to some degree [15] (e.g., café is a place where people drink coffee). A common practice is to use semantic location labels as a proxy to high-level activities (e.g., office →working) [16,17]. Dominant functional places, such as home or work can be detected from people's daily routines [18]. Thus, such an approach is an understandable simplification. Nevertheless, a considerable number of locations can be a scene of more than one activity at the same moment (e.g., at a shopping mall), and one activity can be a characteristic of various locations (e.g., "taking picture"). Defu Lian [19] attempted to discover the relationship categories between POIs and activities. However, most trajectory data in social media do not contain location or POIs because of limitations in privacy, devices, and so on.
Works without POIs mostly regard HATs as a text classification task in machine learning. Marco Aurelio Beber integrated trajectory data and social media to reveal individual activities in daily life [20]. However, extracting knowledge from Twitter and matching trajectories to knowledge base. Weerkamp tried to establish a set of activities that will occur at a future time such as tonight, tomorrow, and next week, by using a future time-window and keywords related to activities [21]. They only extracted unexpected activities, but ignored frequently recurring activities. Song proposed a novel collaborative boosting framework comprising a text-to-activity classifier for each user to recognize user activities [22]. Zhu built a multi-label classifier using tweets manually annotated with activities to predict up to three activities [23]. The classifier considers the tweet text, the tweet posting time, the POI type from Foursquare, and the POI name from Foursquare. Their approach relies on crowdsourced annotations with the limited scalability of human labor involved in the process. Mazumder interpreted the concept of activities as verb-noun phrase pairs (e.g., drinking coffee), which they extracted using a pattern-based paradigm [24]. Those methods applying various supervised machine learning or natural language processing (NLP) techniques can capture semantic information in local consecutive word sequences well. However, the texts of trajectory in social media are short and are not always connected directly with HATs, which makes it extremely difficult to accurately infer a user's HAT by using only text features.
HATs should be connected with interactions among social media, such as the posts of one user may have the same text style. However, the above researches overlook their interaction that may contribute to identifying HAT. To fill the gap of, the study tries combine texts and interactions in social media trajectory to improve the performance of HAT prediction. Compared with text classification works based on graph neural networks (GNNs), HAT related interactions involve users and time, in addition to interactions between documents and words. This study constructs two GCN modules from two different views, and finally fuses them through a gate mechanism to achieve HAT recognition. The details will be introduced in Section 3.

Methods
This study proposes a model that introduces the interactions in social media into HAT identification. As shown in Figure 1, most objects in social media are one of four types, word, posts, users, and dates. We first build interactions under a text view, including between words and posts (word-post interaction), between words and words (word-word interaction), and between a user and a post (user-post interaction). Therefore, through these interactions, we can get the "user-post-word" graph under the text view and the "user-date-word" graph (built by replacing the post nodes in the "user-post-word" graph with the date nodes) under a time view. Then, a graph neural network-based method for human activity recognition is described, which is derived from Graph Convolutional Network (GCN) [25], named Time Gated Human Activity Graph Convolutional Network (TG-HAGCN). In the following subsections, we will explain why and how to model the three types of interactions and how to build the TG-HAGCN model to identify human daily activity types.

Modeling Words Interaction Based on Point-Wise Mutual Information (PMI)
The study uses word-word interaction to present the relationship between words. In classical NLP, the co-occurrence of two words reflects the two words that have a close connection, especially when two words co-occur in near location. It is highly similar to the first law of geography [26], whose assumption is that everything is related to everything else, but near things are more related to each other. To gather the word co-occurrence information, we use a fixed-size sliding window for each post. Then, the weight of a word-word interaction is calculated by PMI [27], a popular measure for word associations, as follows: where i and j present two different words, PMI(i, j) is the weight of interaction between words i and j, #W(i) is the number of sliding windows in posts that contain word i, #W(i, j) is the number of sliding windows that contain words i and j, and #W is the total number of sliding windows in the whole dataset. If PMI(i, j) is larger than 0, then a high semantic correlation exists between word i and j, and vice versa. The point-wise mutual strategy builds the relation among the word pairs. Thus, many interactions will have a negative weight. Obviously, more interactions mean more time costs for follow-up operations. Thus, we only add an interaction between two words when their PMI value is positive, because a negative PMI means little or no relation between two words.

Modeling Word-Post Interaction Based on Term Frequency-Inverse Document Frequency (TFIDF)
The post-word interaction is built between a word and a post when the word occurs in the post. The number of word occurrence in posts reflects the relation between posts and words, and can be derived to examine the weight of post-word interaction. Actually, each word in a post may contribute differently to human activity recognition because some words occur in many posts, whereas others occur only in a few posts. A word that occurs in a few posts should contribute more semantics than others [28].
Based on this idea, we employ the Term Frequency-Inverse Document Frequency (TFIDF) model to calculate the weight of the post-word edge. Here, term frequency is the number of times that the word appears in the post, and inverse document frequency is the logarithmically scaled inverse fraction of the number of posts that contain the word. Using TFIDF weight is better than using term frequency only. The mathematical expressions of TF, IDF, and TFIDF are defined as follows: (2)

Modeling User-Post Interaction Based on Frequency
Each user has his or her own activity pattern and special expression style in text, which means that the activity in a post could be inferred better by introducing the related user information. Thus, we build a user-post interaction between a post and its owner.
All of the words in the posts are collected, forming a vocabulary set. The weight of user-post interaction is calculated by the number of words in the post that appear in the vocabulary set, which reflects the post importance of a user. The equation to calculate user-post interaction weight is derived as follows: where S i ( j) represents the function that counts the number of words in post j published by user i, n is the number of users, and m is the number of posts published by the same user k.
In the study, the user only links to post, and no edges exist between users and words, because the relationship between users and words could be indirectly associated with connected posts.

Identify HATs
The unique capability of graphs enables capturing the structural relations among posts, thereby allowing us to harvest more insights compared with analyzing post texts in isolation. As shown in Figure 1, a heterogeneous graph module can be formed by taking the words, posts, dates, and users as nodes and the interactions between them as edges. Through the three types of interactions defined in Sections 3.1-3.3, we can get the "user-post-word" graph. The number of nodes in the graph is the number of posts (dataset size) plus the number of unique words (vocabulary size) and the number of users. Then, the "user-date-word" graph module can be built by replacing the post nodes in the "user-post-word" graph with the date nodes. The built "user-post-word" graph and "user-date-word" graph can be defined as G post = V post , E post and G date = (V date , E date ), respectively, where V and E are the sets of nodes and edges.
Recently emerging GCN [25] model can capture the local features and classify nodes in graphs effectively, which is a multiple layer neural network that generally operates a graph network and generates embedding vectors for nodes based on the properties of their neighbor nodes. It has made a significant improvement in node classification and provides a new opportunity to achieve our task. Thus, the local and global node co-occurrence can be explicitly modeled. Meanwhile, the relation between four types of nodes (users, posts, dates, and words) can be modeled. As a result, human activity types can be examined from local and global factors.
In this paper, the built graphs ("user-post-word" graph under text view and "user-date-word" graph under time view) are fed into a multi-view activity GCN (TG-HAGCN) as illustrated in Figure 2. Thus, the identification of human activities in a post can be regarded as a node classification task.
The TG-HAGCN model contains two GCN modules, which are used to process "user-post-word" graph and "user-date-word" graph respectively. Then, for the post embeddings and date embeddings obtained by the two GCN modules, this model uses a gating mechanism to fuse them. At last, the fused embeddings are fed to a Softmax classifier for predicting HATs.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 16 graph and "user-date-word" graph respectively. Then, for the post embeddings and date embeddings obtained by the two GCN modules, this model uses a gating mechanism to fuse them. At last, the fused embeddings are fed to a Softmax classifier for predicting HATs.
The model defines ∈ × as a feature matrix containing nodes with their feature vectors and as the dimension of the feature vectors. The initial features of date nodes are their corresponding date vectors. Specifically, each date vector is encoded by four parts: 4-digit season, 12digit month, 1-digit working day, 1-digit weekend day, and 7-digit week, where each part is represented by a one-hot vector. The initial feature vectors of words and users for GCN input is random initialized. The initial features of post nodes are the mean value of all the words they contain. The GCN modules capture information from neighbor nodes, which are two-layer GCNs and can be defined as follows: where is the layer number and ( ) = ; = is the symmetric normalized adjacency matrix; is the adjacency matrix of graph whose diagonal elements are 1; is the degree matrix of , where = ∑ ; ∈ × is a weight matrix; and is an activation function.
By taking "user-post-word" graph and "user-date-word" graph as the input of two-layer GCN, the embeddings of post nodes and date nodes, and , can be obtained, respectively.
where denotes the filter function, _ is used to get the post embeddings from all the output embeddings and _ is used to get the date embeddings from the output embeddings. Then, A gating mechanism-based fusion ( )⨀ is proposed to fuse post embeddings and date embeddings , where is an approximated gating function such as sigmoid and ⨀ is element-wise product. Moreover, softmax classifier can be defined as follows: where ( ) = ̂ ( ) and ̂= ∑ ( ) . The model defines X ∈ R n×m as a feature matrix containing n nodes with their feature vectors and m as the dimension of the feature vectors. The initial features of date nodes are their corresponding date vectors. Specifically, each date vector is encoded by four parts: 4-digit season, 12-digit month, 1-digit working day, 1-digit weekend day, and 7-digit week, where each part is represented by a one-hot vector. The initial feature vectors of words and users for GCN input is random initialized. The initial features of post nodes are the mean value of all the words they contain.
The GCN modules capture information from neighbor nodes, which are two-layer GCNs and can be defined as follows: where j is the layer number and L (0) = X; A = D − 1 2 AD 1 2 is the symmetric normalized adjacency matrix; A is the adjacency matrix of graph G whose diagonal elements are 1; D is the degree matrix of G, where D ii = j A ij ; W j ∈ R m×k is a weight matrix; and ρ is an activation function.
By taking "user-post-word" graph G post and "user-date-word" graph G date as the input of two-layer GCN, the embeddings of post nodes and date nodes, L post and L date , can be obtained, respectively.
where f denotes the filter function, f post_mask is used to get the post embeddings from all the output embeddings and f date_mask is used to get the date embeddings from the output embeddings.
Then, A gating mechanism-based fusion σ(L date ) L post is proposed to fuse post embeddings L post and date embeddings L date , where σ is an approximated gating function such as sigmoid and is element-wise product. Moreover, softmax classifier Z can be defined as follows: where so f tmax(x i ) = 1 z exp(x i ) andẑ = i exp(x i ). Finally, the loss function is defined as the cross-entropy overall labeled posts: where y D is the set of posts that have labels, F is the dimension of the output features that have the same size as the number of types, and Y is the activity type indicator matrix. The weight matrixes W 0 and W 1 are trained using the gradient descent method.

Dataset
To examine the proposed model performance, we built a new dataset extracted from the open Yelp dataset (https://www.yelp.com/dataset) that includes interactions on post content and users. We selected 300 posters (users) from the downloaded dataset and removed two users whose attributed data are missing. Then, the 298 user items and their posts were collected. The collected data consisted of 23,701 posts published by the 298 users. Finally, each post was labeled into one activity type over 14 types, which are "Food", "Beauty&spa", "Entertainment", "Travel", "Shopping", "Service", "Sports", "Health", "Car", "Nightlife", "Pets", "Education", "Religious" and "Mass media", and the dataset was built.
As illustrated in Table 1, each item composes of a post text, date, an activity type, and the id of the user who published the post. The average word count of one post was 84.51. Social media activity is closely connected with human daily behavior, which means that the number of each type of activity is not balanced. As shown in Table 2, food and shopping are the two most popular activities. The unbalanced distribution between them challenges the accurate identification of HATs and requires a powerful identity model.

Experiment Settings
We used a random sampling method to extract 70% of the items from the dataset built in the last section for training, and the remaining 30% of the items for the test, which was 16,592 records for the training and 7109 records for the test.
For the proposed model HAGCN, the embedding size m of each node in the first convolution layer was set to 200, and the sliding window size in Section 3.1 was set to 20. The model learning rate and the dropout rate were set to 0.1 and 0.5, respectively. The proposed model was trained with 300 epochs by using Adam network optimizer [29]. Several classic models, CNN and activity LSTM with dictionary embedding (ALSTM-DE) model [6], were selected as baselines in the experiment. In CNN and ALSTM-DE, the word embedding vector of each word is trained using the Word2Vec model. Training was run for up to 300 epochs with early stopping if the validation loss did not improve within the last 10 epochs.
The widely used metrics, including macro/micro-average precision, recall, and F1 score, were used as evaluation metrics to examine the results of the proposed methods and the baseline models. The metrics were calculated as follows: where TP, FP, and FN represent the true positive rate, the false positive rate, and the false-negative rate of all samples, respectively. n represents the number of activities types. P i and R i are the corresponding prediction probability and the recall rate for each activity type, respectively. The calculations of P i and R i are similar to those of Mic_Precision and Mic_Recall, but for samples of each activity type.

Results
The built graph structure has 41,917 nodes, including 17,918 word nodes, 23,701 post nodes, and 298 user nodes. These nodes are linked by 12,436,864 word-word edges, 1,666,595 word-post edges, and 23,701 user-post edges. We conducted experiments on the built dataset using baseline models and the proposed model. The initial feature vectors of unknown words are random initialized during evaluation, and the evaluation results of these models on the test dataset are shown in Table 3. Table 3. Comparison with baseline models (where "Acc" means accuracy, "F1" means F1-score, "Mac" and "Mic" are the prefixes of "Acc" and "F1" to indicate macro/micro-average accuracy and F1-score). For each row, the best Acc and F1 are indicated in bold.  As shown in Table 3, TG-HAGCN model achieves the best performance on both Mic accuracy and Mac accuracy. It should be noted that CNN models make very lower Mac accuracy, which may be raised by the unbalance of training sample.

HAT
To observe the misclassify item, we plotted the confusion matrixes of different models in Figure 3. The figure shows that our model performed the best for dealing with category confusion in the unbalanced dataset. We argue that TG-HAGCN model may perform better in other tasks. As shown in Table 3, TG-HAGCN model achieves the best performance on both Mic accuracy and Mac accuracy. It should be noted that CNN models make very lower Mac accuracy, which may be raised by the unbalance of training sample.
To observe the misclassify item, we plotted the confusion matrixes of different models in Figure  3. The figure shows that our model performed the best for dealing with category confusion in the unbalanced dataset. We argue that TG-HAGCN model may perform better in other tasks. Comparatively, the proposed model captures the network structure (interaction) between nodes and edges by exchanging features with their neighbors in training process. In this way, the samples in each category are not trained independently any more, hence, achieving best Mac accuracy in those models. In addition, the TG-HAGCN model has enough features from interactions to make accurate activity classification in a small dataset. Besides, the model performs well in the recognition of activity types in the unbalanced dataset. The results suggest that interactions and date information can supplement additional information, and this information together with methods to fuse them should be considered on improving the model performance in related works.

Training Time and Convergence
We also calculated the time consumption for different models. Detailed results are presented in Table 4. Our model has the similar time consuming compared with CNN model and obtain the best accuracy compared with conventional machine learning methods. We illustrated the accuracy and loss curves in Figure 4 to examine the performance changes of different models during the training process. As shown in Figure 4a, the model accuracy gradually increases as the training epoch increases. It demonstrates the recognition ability of our model to improve continuously. Figure 4b shows that the model loss gradually decreases as the training epoch increases. It demonstrates that the errors in our model decrease gradually. The most important is that our two models can converge before 150 training epochs. Although other models have less loss and higher accuracy before 30 epochs training, their values always have small fluctuations. This result indicates that the training process of the proposed methods are stable. Comparatively, the proposed model captures the network structure (interaction) between nodes and edges by exchanging features with their neighbors in training process. In this way, the samples in each category are not trained independently any more, hence, achieving best Mac accuracy in those models. In addition, the TG-HAGCN model has enough features from interactions to make accurate activity classification in a small dataset. Besides, the model performs well in the recognition of activity types in the unbalanced dataset. The results suggest that interactions and date information can supplement additional information, and this information together with methods to fuse them should be considered on improving the model performance in related works.

Training Time and Convergence
We also calculated the time consumption for different models. Detailed results are presented in Table 4. Our model has the similar time consuming compared with CNN model and obtain the best accuracy compared with conventional machine learning methods. We illustrated the accuracy and loss curves in Figure 4 to examine the performance changes of different models during the training process. As shown in Figure 4a, the model accuracy gradually increases as the training epoch increases. It demonstrates the recognition ability of our model to improve continuously. Figure 4b shows that the model loss gradually decreases as the training epoch increases. It demonstrates that the errors in our model decrease gradually. The most important is that our two models can converge before 150 training epochs. Although other models have less loss and higher accuracy before 30 epochs training, their values always have small fluctuations. This result indicates that the training process of the proposed methods are stable. Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 16 (a) Accuracy (b) Loss Figure 4. Curves of (a) accuracy (b) loss of different models.

Effect of Word Vector Size and Sliding Window Size
We performed two sets of experiments to evaluate the influence of different embedding dimensions and sliding window sizes.
The sliding window size defines the number of words associated with each word. If the sliding window size is too small, this model cannot capture enough information; if the sliding window size is too large, this model may introduce excessive noise. In the first set of experiments, the word vector dimension was set to 200, and the sliding window size ranged from 5 to 20. As shown in Table 5, the change in sliding window size slightly influences on the final experimental results, and it is only slightly increased. This finding suggests that our model is insensitive to the sliding window size. The dimension of the word vector will directly determine the number of parameters in the hidden layer. When the word vector dimension is small, the trained model may fit poorly; when the word vector dimension is large, the model is likely over-fitted. In this set of experiments, the sliding window size is 20, and the word vector dimension ranges from 100 to 300. As shown in Table 6, the model performs best at 200 dimensions and has slightly improves compared with other dimensions. In a word, the results are insensitive to the word vector dimension and sliding window size when the two hyperparameters are set to the values commonly used in existing studies, suggesting that the proposed model is robust.

Effect of Word Vector Size and Sliding Window Size
We performed two sets of experiments to evaluate the influence of different embedding dimensions and sliding window sizes.
The sliding window size defines the number of words associated with each word. If the sliding window size is too small, this model cannot capture enough information; if the sliding window size is too large, this model may introduce excessive noise. In the first set of experiments, the word vector dimension was set to 200, and the sliding window size ranged from 5 to 20. As shown in Table 5, the change in sliding window size slightly influences on the final experimental results, and it is only slightly increased. This finding suggests that our model is insensitive to the sliding window size. The dimension of the word vector will directly determine the number of parameters in the hidden layer. When the word vector dimension is small, the trained model may fit poorly; when the word vector dimension is large, the model is likely over-fitted. In this set of experiments, the sliding window size is 20, and the word vector dimension ranges from 100 to 300. As shown in Table 6, the model performs best at 200 dimensions and has slightly improves compared with other dimensions. In a word, the results are insensitive to the word vector dimension and sliding window size when the two hyperparameters are set to the values commonly used in existing studies, suggesting that the proposed model is robust.

Effect of Different Edge Weights
In Section 3, we designed three strategies, PMI, TFIDF, and UC, to quantitatively model the interaction between three types of nodes. An alternative simple solution is to set the edge to constant 1. To examine the three strategies, we built different edges among the three types of nodes by using PMI, TFIDF, and UC in our model, as illustrated in Section 3. To verify whether or not the edges built by PMI, TFIDF, and UC are better than using fixed constant only, we changed every edge as constant 1 in the whole graph and compared it with our models. The experimental results are listed in Table 7. As shown in Table 7, the edges built by PMI, TFIDF, and UC are better than constant 1 in the final result (F1 score). It shows that using different models to build different types of edges can capture favorable features between nodes. When all the edge weights are set to 1, the model has the best precision according to the Mac value, and the F1 score is worse. The reason is that in this setting, the differences between the nodes are weakened, resulting in high classification accuracy in categories with a large number of samples and poor results in other categories. When P_W weight is 1, the result is worse than others. This finding illustrates that using TFIDF to calculate weight is better than constant 1, and the edges between posts and words have the greatest influence on the final result because TFIDF can indicate the relation between posts and words accurately.

Ablation Study
To further examine the benefit brought by each component of TG-HAGCN, an ablation study is performed in this study. The experiment results are listed in Table 8, as shown in Table 8, the "TG-HAGCN" model is better than the "TG-HAGCN /Date & /User" model, and the "TG-HAGCN /Date & /User" model is better than the "TG-HAGCN /Date" model and the "TG-HAGCN /User" model. This shows that both time and users are important for the TG-HAGCN model. When only time or only users are considered in the model, the model does not significantly improve the performance of predication. The reason may be that the HATs of different users are diverse, and the distribution of HATs in time is varied. Table 8. Precision and F1 score of ablation study. "TG-HAGCN /Date" means that the date nodes are removed in the TG-HAGCN model and the input graph is only "user-post-word" graph. "TG-HAGCN /User" means to remove user nodes from TG-HAGCN model, that is, the input graphs are "post-word" graph and "date-word" graph. "TG-HAGCN/Date & /User" means that the input graph is only "post-word" graph.

Model Precision F1
Mic

The Relationship between Activity Types and Related Terms
While the proposed method trains the samples, various vectors, including post, word, and user vectors, are obtained, corresponding to the posts, words in posts, and users.
To examine the relationships between activity types and related text words, we considered the word vectors as a sample and employed semantic similarity evaluation to conduct experiments. We first manually select key words from the activity type name as the center word for each activity, respectively. The dataset contains 14 activity types, which are "Food", "Beauty&spa", "Entertainment", "Travel", "Shopping", "Service", "Sports", "Health", "Car", "Nightlife", "Pets", "Education", "Religious" and "Mass media". The selected 14 center words are "Food", "Beauty", "Entertainment", "Travel", "Shopping", "Service", "Sports", "Health", "Car", "Nightlife", "Pets", "Education", "Religious", and "Mass media". Then, we calculated the cosine distances between vectors of all the words to the vector of 14 center words and listed the top 7 closest words for each center word in Table 9. Table 9. Words that have the closest semantics with the selected words (the semantic distance is increased from left to right). Food  drink  foods  supplies  meat  fresh  eating  fish  Beauty  psyche  beautiful  beast  truth  her  pleasure  happiness  Entertainment  theater  movie  stage  museum  acrobatics  film  dancer  travel  travels  trip  journey  explore  go  visit  visitors  shopping  mall  restaurant  retail  shops  entertainment  stores  attractions  Service  clean  customer  carpet  realtor  professional  fix  technician  Sports  sport  teams  football  sporting  racing  clubs  basketball  Health  medical  care  mental  health  treatment  disease  benefits  Car  vehicle  oil  auto  repair  kia  drive  mechanic  Nightlife  bar  drink  night  club  dj  beers  pub  Pets  dogs  cat  cats  mouse  horses  mice  boss  Education  school  teacher  class  student  college  learning  instructor  Religious  political  religion  social  christian  spiritual  secular  moral  Mass media  radio  news  music  listen  channel  listener  tv As shown in Table 9, the top 7 closest words in each row usually appear in the same activity scene, as words in the same activity type can be correctly mapped into the same vector space. It suggests that the training process have successfully captured the semantic relationship between activity types and text words and the identified activity type is meaningful.

Conclusions
To fill the gap of overlooking the interaction that contributes to identifying HATs, this study proposes a novel Time Gated Human Activity Graph Convolutional Network (TG-HAGCN). The model identifies human activities from social media trajectory by synthesizing two GCN modules. Experimental results show that the proposed model significantly outperforms classical models, especial on Mac accuracy in unbalanced datasets. Furthermore, detailed analyses suggest that all three strategies, PMI, TFIDF, and UC, contribute to the improvement of the model. The results of the ablation study show that both users and time are important for the identification of human activities. The results of this study indicate that interactions among social media promotes ability of machine learning on social media data mining and intelligent applications. Benefiting from the fact that GCN can link different types of objects, it is easy to synthesize information from heterogeneous objects and is expected to serve a variety of domain-specific applications. The fruits of this study not only highlight the importance of interactions hidden in social media, promote ability of machine learning on social media data mining and intelligence, but also offer a reference solution for how to fuse multi-type heterogeneous data in social media.
In the future, other interactions, such as friend relationships between users, will be incorporated. Spatial information of posts can also be used as additional supplementary information. Finally, the performance of word, post, and user vectors need to be examined in downstream applications.