Extracting Knowledge from the Geometric Shape of Social Network Data Using Topological Data Analysis

: Topological data analysis is a noble approach to extract meaningful information from high-dimensional data and is robust to noise. It is based on topology, which aims to study the geometric shape of data. In order to apply topological data analysis, an algorithm called mapper is adopted. The output from mapper is a simplicial complex that represents a set of connected clusters of data points. In this paper, we explore the feasibility of topological data analysis for mining social network data by addressing the problem of image popularity. We randomly crawl images from Instagram and analyze the effects of social context and image content on an image’s popularity using mapper. Mapper clusters the images using each feature, and the ratio of popularity in each cluster is computed to determine the clusters with a high or low possibility of popularity. Then, the popularity of images are predicted to evaluate the accuracy of topological data analysis. This approach is further compared with traditional clustering algorithms, including k -means and hierarchical clustering, in terms of accuracy, and the results show that topological data analysis outperforms the others. Moreover, topological data analysis provides meaningful information based on the connectivity between the clusters.


Introduction
These days, social networks have attracted billions of users to generate, consume and propagate content everyday. In 2016, Twitter had about 313 million monthly active users, who shared more than 500 million tweets each day [1]. By the end of 2016, Facebook had an average of 1.23 billion daily active users [2]. In 2016, Instagram had 300 million users who were active on a daily basis and shared more than 95 million images and videos daily, which attracted more than 4 billion likes everyday [3]. The huge number of users, posts and interactions have allowed social networks to become a powerful source of information. However, finding meaningful data from social networks can be challenging because social network data can be high dimensional and noisy [4][5][6]. Therefore, extracting meaningful information from such data has become more critical.
We investigate topological data analysis as an alternative approach for mining social network data. Topological data analysis is an approach based on applied mathematics that analyzes data using a set of techniques from topology [7,8]. It analyzes high dimensional data by analyzing the geometric shape of the data and has been shown to be robust to noise [7][8][9][10], which will be further discussed in Section 3. Topological data analysis has been adopted in many areas of study, such as biology [9][10][11], image processing [12], and financial analysis [13,14].
In this paper, a topological data analysis approach is used to address the problem of image popularity on social networks, specifically on Instagram, to investigate the adaptability of topological 1. We investigated the feasibility of topological data analysis for social network analysis and mining since topological data analysis has not been previously investigated for social network analysis and mining to address the issues arising from the nature of social network data, and 2. in order to employ topological data analysis to social network data, the problem of image popularity on social network is addressed. Our results show that topological data analysis outperforms traditional data mining techniques in terms of accuracy.
The rest of this paper is organized as follows: topological data analysis is explained in Section 2. In Section 3, the problem of image popularity is discussed. Section 4 shows how topological data analysis can be adopted to analyze image popularity on social networks. We present our dataset in Section 5 and present the results in Sections 6 and 7. Results are discussed in Section 8. The discussion is provided in Section 9, and the conclusions are presented in the final section.

Topology
Topology is a branch of mathematics that is concerned with qualitative geometric information, e.g., the study of identifying the connected components of a space, more generally connectivity and homology [8]. Topology studies the properties of space that are algebraically invariant (i.e., spaces that stay unchanged under any kind of algebraic transformation without tearing or gluing) [24]. Topology has two main tasks: shape measurement and representation. Topology can be defined as below: Definition 1. Assume a set X that contains a collection τ of subsets of X. τ is defined as a topology of X if it has the following properties [24]: 1. Both ∅ and X are in τ, 2. the union of the elements of any subcollection of τ is in τ, and 3. the intersection of the elements of any finite subcollection of τ is in τ.
If τ is a topology of X, then the ordered pair (X, τ) is called a topological space. Moreover, a subset u of X is called an open set, if u ∈ τ. The following example illustrates the concept of topology and topological space.
Example 1. Set X contains three elements, X= {a, b, c}. Many possible topologies τ of X can be found. For example, one topology contains X, and another topology contains X, { {a, b},c} as shown in Figure 1 Points, as well as a set of neighbor points for each point, construct a topological space [11]. Any two topological spaces (E, τ E ) and (N, τ N ) have homeomorphism between them, if there is a function f that is continuous, one to one, and a bijection between the two spaces. Then, the two topological spaces would have the same topological type and are basically the same in terms of topology. A widely-known example of homeomorphism is between a donut and a mug. Homology measures connectivity by counting the number of wholes, connected components, faces, and triangles [25]. It can relate a serial of algebraic objects to topological space. A simplex is a topological space made of points, lines, segments, triangles, or their ndimensional counterparts. A simplical complex consists of multiple simplexes and/or complexes as shown in Figure 2.

Topological Data Analysis
Topological data analysis is a set of techniques invented to extract insight from data by studying its shape, which is driven from the fact that data has a shape, and a shape has meaning [26]. Topological data analysis is based on algebraic topology, a subfield of topology that aims to quantify shapes using persistent homology. Persistent homology is used to compute the topological features of data at different resolutions by considering different radii from the data points [27]. It increases the radius to connect more data points. The persistent homology concept provides stability and robustness against noise due to the fact that noise cannot be persistent [28]. Topological data analysis studies shapes that have three main properties [29]: 1. The shapes are not dependent on specific coordinates, 2. the shapes are not changed under any transformation without tearing the shape apart, and 3. the shapes are produced in a compressed representation that contains infinite distances.
In topological data analysis, high dimensional data in a point cloud is represented by distances, which are one-dimensional information. Therefore, it is independent of the dimensions of the data as shown in the following example. This makes topological data analysis a powerful technique to address high dimensional data. Example 2. On Twitter, let us have two users called U1 and U2. Each user uses a profile image to represent his/her visual identity. For each user, one vector is used to store the pixels for the user's profile image, which has 1000 dimensions. For users U1 and U2, we store their images in vectors A and B, respectively. Cosine similarity is one metric to evaluate the distance or closeness. The cosine similarity between A and B based on their profile images provides the distance or closeness of the two users, which is one-dimensional information.
In order to perform topological data analysis, a mapper algorithm is adapted [8,12,30]. Mapper is a method for topological data analysis. The aim of this algorithm is to extract, simplify and visualize high dimensional data. The mapper algorithm takes an inter-point distance matrix (D ∈ R N×N , where N = the number of data points) as the input. As for the parameters, users specify f , called a filter function in mapper, (which is computed for each data point and used to partition the data, such as density estimation), clustering algorithm (such as hierarchical clustering), and a cover method that is responsible for dividing the filter function output ranges of data points into intervals by specifying the number of intervals S, and overlap ratio p. Here, overlap is needed to determine connectivity between two intervals in topological data analysis. All data points in one cluster are in the same interval. All data points in one interval, however, are not necessarily in the same cluster.
Mapper generates a simplicial complex that represents clusters of data points and the relationship between them. The simplicial complex consists of nodes and edges. Each cluster is represented by a node, while edges represent the connectivity between the clusters (if p = 0). Clustering algorithms are used to move from a topological version to a statistical one, where mapper is not dependent on a specific clustering algorithm. A summary of mapper is presented below; for an in-depth description, refer to [12].
Let U = {U α } α∈A be a finite covering of the space X, so that set A is finite. We define the simplicial complex N{U} whose vertex set is the indexing set A, and where a family {α 0 , α 1 , ...., α k } spans a k-simplex in N(U) if and only if corresponding clusters have a point in common. It is necessary to generate reference maps f : X → Z, where X is a given point cloud and Z is the reference metric space. With the reference maps, subsets X α = f −1 U α are constructed. Different filters can be used: density estimation, eccentricity, and graph laplacians [12].
A simple example of a circle using mapper is shown in Figure 3. The left figure is a point cloud of a circle with random variation, X, and the right figure is the simplicial complex of the point cloud, N(U). We arbitrarily selected four levels for this example. The colors represent how filtered the data are. In this example, density estimator is used to filter the data (red being the most dense and blue being the least dense). Edges show the connectivity of clusters of the point cloud. If this is an example of the image popularity analysis, then the left figure is a point cloud of the social media image dataset, and the right is the clustering output of the image dataset from mapper. In addition, the output of mapper can be interpreted in such a way that the shape of the point cloud is a circle, and closer clusters may have higher similarity.
McParlane et al. [18] predicted the popularity of images on Flickr using an image's orientation and size, number of faces in an image, most dominant color and the image's scenes; they classified the images according to a number of scenes based on the image content. They measured popularity using the number of comments and views. Khosla et al. [23] predicted the number of views that images receive on Flickr using images' colors, gists, textures, and gradients. Can et al. [19] predicted the popularity of images posted on Twitter and Flickr using hash tags, users' ages, and color histogram; they measured the popularity of images on Twitter using the number of favorites and retweets, and number of views and comments on Flickr. Yamaguchi et al. [41] employed users' identities, number of posts, number of followers, tags, and images' colors to predict the popularity of images on Chictopia (a fashion-based social network); they measured popularity based on the number of votes. Totti et al. [42] predicted the popularity of images using aesthetics and users' information on Pinterest; they measured popularity using the number of repins. Niu et al. [43] predicted the popularity of images on Flickr using network-based features, such as centrality analysis; the number of views was used as a popularity measurement. Gelli [44] used visual sentiments, and users' information to predict the normalized number of views of images on Flickr. Aloufi et al. [40] used users' information, number of groups that users belong to, number of tags, images' colors, gists, and sentiments to predict the popularity of images on Flickr.
Previous works have not addressed noise when building their predictive models. In addition, most studies have only considered low dimensional data. They have focused on the prediction accuracy. Therefore, in this paper, we address the arising issues from the nature of social network data (i.e., high dimensional and noisy data) as well as prediction accuracy.

Popularity Threshold
As mentioned earlier in this paper, the number of likes is selected as a popularity measurement. However, popularity is subjective. Therefore, we classify the number of likes into popular or unpopular using the Pareto principle as employed in [16][17][18].The Pareto principle (80/20) is used in many fields of study, such as business. It is defined as an event where 20% of the causes produce more than 80% of the effects. For example, many companies found out that 80% of their incomes come from 20% of their customers. In our dataset, we observed that 20% of the images receive 99% of the total number of likes, which shows that only 20% of the images attract almost all of the interactions. Using the Pareto principle, a popularity threshold is defined to classify images as popular or unpopular.

Problems Statement
In this paper, the research problem is formalized differently to fit the topological data analysis approach. We consider it as a clustering problem, where a set of images are clustered together based on a set of features. The percentage of popularity is computed in each cluster, to compute the possibility of popularity in each cluster. Then, the popularity of images is predicted based on the closeness to clusters' centroids. The approach will be discussed in the following sections. The problem is formalized as follows: Given a set of images I M ={im 1 , ..., im m }, where each image is represented using a set of features FC ={ f c 1 , ...., f c n } ∀ I M, and the popularity of images is classified to {1 | 0}, where 0 is for unpopular images, and 1 for popular images. Using the number of likes based on the popularity threshold, a set of images I M are clustered using the features FC. The ratio of popularity in each cluster is computed to determine clusters with high or low ratio of popularity. Then, in order to predict the popularity of images, the image will be classified to the cluster that has a centroid with the closest distance to the image.

Features
In the past, many research papers have shown that image popularity is highly correlated with users' information, i.e., number of followers of users who uploaded the images [18,19,22,41,42,44], while other research papers showed that popularity can also be related to the content of images [16,17,23,44].
Therefore, we investigate the effects of users' information and image content on image popularity. In order to represent the users information, the normalized number of followers of users who uploaded the images is selected, while captions are used to represent the images' contents.

Image Content
Oglesbee [45] states that "Looking at a picture without a caption is like watching television with the sound turned off". Understanding the meaning of an image can be challenging because the image's semantic is subjective. Therefore, photographers can describe images using captions, which can provide meanings to images. A caption is a description of an image that accompanies the image. In this paper, we extract the semantics of images using their captions.
In order to extract semantics from captions, a natural language processing technique, Word2vec [46], is used. Word2vec [46] aims to map words that have similar meaning to nearby points using a continuous vector space. When enough data, usage and contexts are provided, Word2vec can guess a word's meaning based on past appearances using neural network, which is used to learn distributed representations of words; it represents each word in the vector-space using a 300-dimensional vector [46]. These vectors can be used to establish a word's association with other words in terms of the similarity between the words' meanings. For example, apple is to fruit is like orange is to fruit.
In our approach, we first tokenized the image's caption. Then, we remove stopwords and special characters, such as with. Since one caption from each image can have a number of words and each word has its own contribution to the image, all words-vectors from a caption are averaged to make one representative caption vector considering all the contributions of the words for one image. After this, each image has one caption vector with 300 dimensions, CC, which is computed as follows: where n represents the number of words, and V represents the 300-dimensional vector for each word. For example, let us have an image with a caption of "kitchen with refrigerator and oven". First, we tokenized the words from the caption, we will have five words: [kitchen, with, refrigerator, and, oven]. Then, the stop words are removed. Therefore, [with, and] are removed. The three remaining words will be converted to numerical forms using Word2vec. Each word is represented by a 300-dimensional vector, called v. Finally, we compute the average of the three vectors to represent the image content, CC = 1 3 (v kitchen + v re f rigerator + v oven ). This example is illustrated in Figure 4.

Social Context
As mentioned before, a number of research papers found that the popularity of the user who uploads an image is correlated with the image's popularity [18,19,22,41,42,44]. In order to represent the users popularity, the normalized number of followers is selected. We normalize the number of followers because we want to focus more on the order of magnitude of the followers, which shows that the ratios among the number of followers are more important than the exact number of followers. The normalized number of followers, i.e., S, is computed as follows: where S i ∈ [0, 1] and f ol is the number of followers for user i, while Max(#Fol) is the maximum number of followers in the dataset.

Clustering
Topological data analysis can be generalized to solve various problems. As mentioned earlier, the input to mapper is a distance matrix, while the output is a set of clusters.
A distance matrix is a square matrix that represents the distances between the elements in a set [47]. Since there are many problems that can be solved using clustering algorithms, topological data analysis can be adapted. Moreover, any distance metric can be used, such as Euclidean or cosine similarity.
For the content feature, we compute the distances between any two images i and j using the cosine similarity [48], called CD, of their 300-dimensional caption vectors, i.e., CC, which is calculated as follows: CD( cc i , cc j ) = cc i cc j cos θ, cos θ = cc i · cc j cc i cc j .
Cosine similarity is used because the similarity between cc i and cc j is shown using the directions of the two vectors. For the social context feature, we compute the distances between any two images i and j using the Euclidean distance [49] of their one-dimensional feature, called D, which is calculated as follows: Euclidean distance is used because the distance between any two users based on their number of followers is shown by computing the difference between the number of followers each user has.
With these distances, a distance matrix M is created for each feature. Then, each distance matrix is employed separately to mapper to cluster the data to analyze the relationship between the popularity of images and each feature.
Because the 80/20% rule was used to determine popularity, the ratio of popular images in each cluster is normalized by 0.2. Therefore, if the normalized ratio of popular images in a cluster is 1.0, then the effects of the feature on the popularity of the cluster were neutral. However, if the ratio of popularity is greater than 1.3, the popularity ratio is considered high, while if the ratio is less than 0.70, it is considered as a low ratio of popularity.
Regarding the images' popularity, the clusters can be classified into three groups: low possibility of popularity, Gr1; neutral, Gr2; and high possibility of popularity, Gr3, based on the criteria discussed above. If an image falls into Gr3, then it can be said that the image has a higher possibility of becoming popular, and if an image falls into Gr1, it has a lower possibility of becoming popular. Note that the ratio of popularity in each clusters is computed for three intervals: during first hour, after first day, and after the first week. Therefore, an image can belong to Gr1 in the first hour, then belongs to Gr3 after the first day, if the ratio of the popularity in that cluster increases after one day.

Prediction
Our mechanism predicts image popularity based on the cluster with the nearest centroid, which is determined by computing the distance between each image and the cluster's centroid. The centroid of a cluster d, i.e., C d is computed as follows: where N represents the number of images in the cluster d, while x contains the images in the cluster, which are represented using either of the two features discussed earlier.
For the prediction of images using the image content's feature, the cosine similarity distance is used to compute the distance. Therefore, in order to predict the popularity of images, the nearest cluster's centroid is determined by finding the cluster with the centroid that has the highest cosine similarity with the image's content. The objective function is computed as follows: where y represents the cluster with the highest cosine similarity to the image's content. On the other hand, for predicting the popularity of images using the social context's feature, Euclidean distance is used. In this case, the images will belong to the cluster that has the shortest distance to the image's social context. In this case, the objective function will change slightly, which is computed as follows: where y represents the cluster with the shortest Euclidean distance to the image's social context. Moreover, the images in our dataset are already labeled into popular and unpopular images using the Pareto principle as discussed in Section 3.2; therefore, we use these labels to determine whether the images are assigned to the correct clusters, i.e., Gr1 or Gr3 or not. For example, if an image is popular, and clustered to one of the Gr3 clusters, it means that it is correctly identified as popular. If an image is clustered in one of the Gr1 clusters, and the image is unpopular, it means that it is correctly identified as unpopular. However, if a popular image is clustered to Gr1, this means that it is not correctly identified as a popular image. In our experiments, we predict both the popular and unpopular images.

Instagram Dataset
We crawled our dataset from Instagram using given users IDs; based on our experiment with the Instagram API, we observed that users' IDs are simply numbered from one to millions. Therefore, we randomly selected more than 1,000,000 IDs. Using these IDs, we triggered the Instagram API to retrieve users. We found 149,520 users with public security settings. However, among these users, there are 89,093 who shared at least ten images. We use these users because they are active. We retrieved 69,000 images that were uploaded during the first hour when we triggered the API. However, after preprocessing, we had 49,045 images. Then, the same images are checked again after one day and after one week to track the changes in the number of likes. After applying the 80/20% rule, the popularity thresholds of the data set for one hour, one day and one week are measured. The data has been randomly split into training and testing datasets: 70% for training and 30% for testing. The training dataset contains 32,920 images, while the testing dataset contains 14,108 images. Table 1 shows the popularity thresholds for different time frames after applying the Pareto principle on the number of likes. The popularity thresholds are: 45 during the first hour, 69 after the first day, and 75 after the first week. Any image that receives a number of likes that is greater or equal to these thresholds is considered popular during these time frames. For example, if an image received 50 likes in one hour, 70 likes after one day and 71 likes after one week, it implies that this image was popular during the first hour and after the first day, but became unpopular after one week.  Figure 5 shows the plots of the number of images with respect to the number of likes in the one hour data, one day data and one week data. Both axes are log scaled. The y-axis represent the number of images; it is normalized by the maximum number of images, so the peak points at one. The x-axis represents the normalized number of likes. They are normalized by the popularity threshold from Table 1, so the threshold lines overlap each other and form one vertical dotted line. The normalized distributions shown in Figure 5 are similar to each other, and the popularity thresholds are also synchronized within the distributions. The figure shows that the distribution of images and likes over different time frames exhibits similar trends relatively; this can mean that there is a possibility that early information about popularity can be used to predict future popularity. It also indicates that the popularity of images are saturated in within the first hour of image upload. In addition, one note on the one hour set is that the peak in the low number of likes partly shows a deviation from the others. It may be from being in the process of maturity.

Implementation
As mentioned earlier, a mapper [30] is implemented to perform topological data analysis. It is available through a Python package. We have used Density estimation as the filter function.
In order to convert captions to numerical form, Gensim, a Python library that implements Word2vec (https://code.google.com/archive/p/word2vec/) is used [50]. The Word2vec model is trained using 100 billion words from Google News and achieves an accuracy rate of 73%.
In order to compare topological data analysis and clustering algorithms, k-means and hierarchical clustering are implemented. Hierarchical clustering is implemented using Scikit-learn [51]; we have used an average linkage, and for connectivity, we have employed kneighbors graph algorithm. In order to determine the cut-off, we have used the parameter n˘clusrer in [51].
In addition, k-means is implemented using the Natural Language ToolKit [52]. For selecting the initial means, k-means++ is used [53]. Both packages are implemented in Python. The number of clusters varies between 5 and 15 to observe their effects on popularity; however, only experiments with five clusters are presented since the results are almost identical.

Evaluation
In order to evaluate the accuracy of the three approaches, the F-score is computed. F-score computes both precision and recall to compute the accuracy of the test, which represents the harmonic mean of precision and recall. It is computed as below: We compute the F-score for both prediction classes: popular and unpopular images.

Empirical Results Using Topological Data Analysis
In this section, we discuss the experiments and results for topological data analysis. We employ topological data analysis using the two features discussed earlier to cluster the images in the training dataset and then compute the ratio of popularity in each cluster to identify clusters with high or low ratios of popularity. The number of intervals used in the experiment is five as mentioned earlier. Then, we predict the popularity of images using the proposed approach.

Clustering
First, we employed the mapper using the image content feature. The results show that from cluster 1 to cluster 5, the ratios of popular images increases. Cluster 1 has the lowest ratio, 30% lower than neutral, and cluster 5 has the highest, 55% higher than neutral, while clusters 2-4 have neutral ratios of popularity. Therefore, we assigned cluster 1 to Gr1, cluster 2-4 to Gr2 and cluster 5 to Gr3.
Next, we employed the social context feature to the mapper, and the results show that the ratios of popularity have increased significantly. In this experiment, the ratios of popularity decreased from clusters one to five, which produces a monotonic decrease relationship between the clusters. Cluster 1 has the highest ratio of popularity, 305% higher than neutral, while cluster 5 has the lowest ratio of popularity, 95% lower than neutral. No cluster with a neutral ratio of popularity is observed in this experiment. Clusters 1 and 2 are assigned to Gr3, while the remaining clusters are assigned to Gr1.

Prediction
In this experiment, we have predicted the popular and unpopular images using the two features. Using the image content, topological data analysis achieved an accuracy of 23% for predicting the popular images during the first hour. The accuracy of prediction have stayed the same for first day and first week periods. As for the prediction of unpopular images, topological data analysis achieved an accuracy of 68% for the first hour prediction. Then, the accuracy has decreased to 31% for the first day and first week periods.
On the other hand, the results have increased significantly when the social context is used. During the first hour, topological data analysis achieves an accuracy of 67% for predicting the popular images. For predicting the unpopular images, the accuracy has increased to 82%. For both prediction of popular and unpopular images, the accuracy stayed the same over the first day and week periods. The results are summarized in Table 2. For both features, the accuracy rates for the prediction of unpopular images are higher than the accuracy rates for the prediction of popular images because 80% of the images in our dataset are unpopular based on the Pareto principle. Table 2. Accuracy of topological data analysis for predicting the popular and unpopular images using the image content and social context.

Empirical Results Using Clustering Algorithms
In order to compare the topological data analysis approach with the clustering algorithms, we employed k-means and hierarchical clustering. In addition, the same distance metrics that are used for topological data analysis are used for k-means and hierarchical clustering.

k-Means
K-means [54] is one of the most popular clustering algorithms. It clusters data into a set of clusters, i.e., k, based on the nearest mean. In k-means, connectivity has no meaning. Therefore, there are no relationships between clusters. K-means is employed using both features.

Clustering
First, we employed the image content feature, and the results show that clusters 2-5 have neutral ratios of popularity and are assigned to Gr2. However, cluster 1 has a low ratio of popularity, 6% lower than neutral, and therefore is assigned to Gr1.
Second, the social context feature is used. The ratios of popularity have increased significantly as observed using topological data analysis. The result shows that clusters 2 and 3 have low ratios of popularity, 28% lower than neutral and 66% lower than neutral, respectively. They are assigned to Gr1. Other clusters have high ratios of popularity. Cluster 4 has a perfect ratio of popularity, at 100%. Cluster 5 has a popularity ratio that is 58% higher than neutral, while cluster 1 has a ratio that is 232% higher than neutral. Clusters 1 and 4-5 are assigned to Gr3.

Prediction
As discussed in the previous subsection, k-means failed to find any cluster with a high ratio of popularity when the image content is employed. Therefore, the prediction accuracy rate for predicting the popular images is 0.0%. However, for predicting the unpopular images, k-means achieved an accuracy rate of 0.39% for the first hour prediction, and then the accuracy rate has decreased to 0.31% for the first day and week.
On the other hand, the accuracy rate of popular images using the social context have increased significantly to 0.63%. Moreover, the accuracy rates for predicting the unpopular images have increased to 0.85%. For the two predictions, the accuracy rates have stayed the same over the three time frames. The results are summarized in Table 3. Table 3. Accuracy of k-means for predicting the popular and unpopular images using the image content and social context.

Hierarchical Clustering
In hierarchical clustering algorithm [55], clustering is performed differently than k-means. It builds a hierarchy of clusters. In hierarchical clustering, connectivity exists. Therefore, relationships exist between clusters.

Clustering
Using the image content feature, the result shows a new case, which occurred in cluster 4. Cluster 4 has a popularity ratio = 0, which means that in this cluster, the possibility for an image to become popular is 0%. This cluster is assigned to Gr1. The remaining clusters have neutral ratios of popularity and are assigned to Gr2. However, the ratio of popularity in cluster 1 has become higher than neutral after the first hour. Therefore, cluster 1 is assigned to Gr3 for the first day and week periods. For the connectivity part, no meaningful trend is detected.
Next, we employed the social context feature, and as observed in the other experiments that are based on the social context feature, the ratios of popularity have increased significantly. Cluster 4 has a popularity ratio of 0. Cluster 1 has a ratio that is 32% lower than neutral. Both clusters are assigned to Gr1. Clusters 3, 4 and 5 have high ratios of popularity: 140%, 180%, and 295% higher than neutral, respectively. They are assigned to Gr3. Moreover, the connectivity between these clusters is represented as a monotonic increase in the ratios of popularity along the connected clusters.

Prediction
As discussed in the previous subsection, hierarchical clustering failed to find any clusters with a high ratio of popularity during the first hour using the image content feature. Therefore, the accuracy rate for predicting the popular images is 0.0% during the first hour. However, as mentioned earlier, the ratio of popularity in cluster 1 has become higher than neutral; therefore, hierarchical clustering predicted popular images with an accuracy rate of 0.19 for the first day and week time frames. For predicting the unpopular images, hierarchical clustering achieved an accuracy of 49% during the first hour, and 18% for the first day and week.
As for the social context feature, the accuracy rate for predicting the popular images has increased significantly to 0.66%. Moreover, the accuracy rates for predicting the unpopular images have increased to 0.81%. For the two predictions, the accuracy rates have stayed the same over the thee time periods. The results are summarized in Table 4.

Comparison
In this section, we will compare between the performance of the three approaches in terms of accuracy using the two features. Figure 6 plots the accuracy rates for predicting the popular and unpopular images using the three approaches. The results show that topological data analysis outperforms the other approaches for predicting the popular and unpopular images. This shows that topological data analysis performs better than traditional data mining techniques when a high dimensional feature is employed, i.e., image content. In terms of the changes in the prediction accuracy rates over time, three approaches achieved high accuracy rates for predicting the unpopular images during the first hour. However, the accuracy rates decreased after that. However, for predicting the popular images, the three approaches have the same accuracy rated over different time frames, except for hierarchical clustering, because, as discussed before, during the first hour, hierarchical clustering could not find any cluster with a high ratio of popular images. The results show that the popularity of images is saturated during the first week.

Social Context
The three approaches have very similar accuracy rates for predicting the popular and unpopular images. For predicting the popular images, topological data analysis slightly improves the accuracy rate with 1% more than hierarchical clustering and 4% more than k-means. As for predicting the unpopular images, k-means slightly improves the accuracy with 3% higher than topological data analysis, and 4% higher than hierarchical clustering. In terms of changes of accuracy rates over time, no changes are observed. The results show that when using a low dimensional feature, i.e., social context, traditional data mining techniques perform as well as topological data analysis.
The results shows that social context achieves higher accuracy than image content, which supports the results produced by other studies that indicate that user's information has a large impact on images' popularity [16,[18][19][20][21][22].The results are plotted in Figure 7.

Discussion
In this paper, the feasibility of topological data analysis for mining social network data is explored. We addressed the problem of image popularity by analyzing the effects of image content and social context on image popularity. In order to address this problem, we randomly crawled images along with their metadata from Instagram. We first converted the images' captions to numerical vectors using Word2vec. In addition, the normalized number of followers is used to represent the social context. Then, we calculated the distances of each feature and applied it to mapper. These features are then employed to k-means and hierarchical clustering for comparing topological data analysis and clustering algorithms. Then, we predicted both the popular and unpopular images based on how close the images are to the centroid of the clusters. The results exhibited several outcomes: 1. Topological data analysis is feasible for social network analysis and mining; 2. Image content and social context have correlations to image popularity; 3. Topological data analysis significantly outperformed traditional clustering algorithms using the high dimensional feature, i.e., image content. It achieved higher accuracy rates than k-means and hierarchical clustering algorithms. It also generated a meaningful connectivity between the clusters, i.e., a monotonic increase in the popularity ratio along the connected clusters; 4. For predicting the popularity of images using the low dimensional feature, i.e., social context, traditional data mining techniques perform as well as topological data analysis; 5. The results show that using the context feature improves the accuracy rates significantly, which confirms that the popularity of images is highly related to users' popularity; 6. For the changes of popularity over time, a trend is only observed for the prediction of popular images using the image content; 7. Lastly, the results show that popularity of images is saturated in a short period of time.

Conclusions
In conclusion, in order to address high dimensional and noisy data, topological data analysis proved to outperform traditional clustering algorithms. It also showed that the geometric shape of data matters and can be adopted to produce meaningful information. With regard to future work, it would be interesting to investigate feature integration using topological data analysis since topological data analysis relies on distances.