Global trends show the increasing tendency of people to get news through social media. Reuters Institute Digital News Report 2017 reports that around 63% of public gets news from social media [1
]. Social media platforms increasingly mediate the relationship between readers and publishers. Social media helps the public to know the latest trends faster, as well as aids news media outlets to reach their audiences widely [2
This development shows that social media platforms have changed the way we access information and form opinions. Moreover, the emergence of various new phenomena in the realm of social media, such as opinion polarization and echo chamber, is closely related to how the public consumes news [5
]. This situation raises the need to better understand news audience formation in social media environments.
On social media, users interact with news sources through commenting and sharing the news published by the news outlet account. In an information-rich environment, these interactions collectively contain important information about how the public consumes news: user preferences are reflected through the news outlets they follow [6
], while the presence of shared followers between news outlets indicates a common readership base that collectively forms a pattern of interconnection between news media [9
In this study, we use a network approach to map the news media landscape in one of the most popular social media platforms, Twitter. Twitter is useful for studying the media landscape because of the large number of users, open access to data, and most importantly all interactions from both the news media and the audience recorded online [10
]. So far, Twitter has become a laboratory of various studies related to the dynamics and propagation of information through social networks [11
More specifically, this study proposes a methodology to build and analyze the network of online news media in Twitter based on follower overlap data. As shown in previous studies of audience networks [9
], audience overlap among news sources offer important insights on how the audiences navigate through many news sources that are available online. In the context of the Twitter environment, follower overlap between news media accounts can be interpreted as follows. First, as a measure of similarity: the more followers any two news media accounts share, the closer those news outlets are in terms of their audience base. Second, as a measure of the diversity of people consuming a given news outlet: if a media outlet has shared followers with many other media, it means that the followers of that media outlet have a wide range of interests.
Our proposed methodology adopts a weighted network model proposed in [19
] and extends its application to the behavior of news consumption in social media environment. However, in contrast to [19
], in this study we employ the Filter Disparity Method [25
], as used in [21
], to identify the most significant overlaps in the network. In this study we demonstrate how to apply the methodology we proposed by examining the news media landscape in three countries, namely Indonesia, Malaysia, and Singapore.
In general, considering the development of social media as the main news distribution platform, this research contributes to enrich the literature on empirical studies of news consumption behavior in the social media environment. Specifically, this study makes three contributions. First, this study proposes a framework for building and analyzing the news media landscape in Twitter using follower overlap data. Second, this study explores the characteristics of news outlets on Twitter and confirms the findings from previous studies regarding the distribution pattern of audience size. Third, this study measures the structural position of news outlets and identifies the prominent media in the news media landscape in three countries analyzed. The analysis shows that audience size of a media outlet does not always reflect the centrality of the outlet position in media network.
The remainder of this paper is organized as follows: news consumption studies using a network approach are reviewed in Section 2
. The steps for implementing a network approach to news media follower overlap data is proposed and explained in Section 3
. In Section 4
we will report the implementation results of the proposed methodology by using news media data in three countries, namely Indonesia, Malaysia, and Singapore. In Section 5
, we will discuss the relevance of the network approach in providing new insight about the news media landscape in social media space. Section 6
concludes this paper.
2. Related Work
The concept of audience overlap has a relatively long history in media studies [11
], but the potential of overlap data to build a network of media was only lately uncovered [9
]. Network is a natural representation of the news media landscape. In addition, network analysis provides analytical tools needed to investigate a number of issues in media studies such as fragmentation in news market, patterns of media clustering, media centrality, and structural comparison of media networks between countries.
] is the first study that proposes analyzing audience duplication data as a network, where the nodes represents media outlets and the level of duplication between their audiences are represented by ties or edges. A number of other studies that used the same methodology soon followed this work [15
]. However, these studies disregard a number of important things, such as edge weight that represents the strength of the overlap, and there was no assessment of the statistical significance of the observed overlap. More recent research has proposed some methodological improvements to the original approach [9
The pattern of relationship in media networks depends on how we measure the relationship between media nodes. Mukerjee et al. [19
] proposed the phi coefficient as a metric to measure the strength of audience overlap between news outlets. Unlike the metrics used in previous studies [14
], the phi coefficient not only considers the size of the overlapping audience between news outlets, but also the relative size of the audience of each outlet. In addition, Mukerjee et al. [19
] also applied a statistical test, namely the t
-test, to filter out insignificant overlaps between media outlets. With this approach, Mukerjee et al. was able to draw a convincing conclusion regarding the structural characteristics of the analyzed media networks.
Majó-Vázquez et al. [21
] made a methodological contribution by using the Disparity Filter algorithm to identify the most significant overlap in audience networks. The Disparity Filter algorithm is a network reduction technique developed by Serrano et al. [25
] to identify the ‘backbone’ structure of a weighted network. As opposed to a static value of the t
-test used in [19
], this algorithm operates the null model to accept or reject edges based on the distribution of their weights at the node (egocentric) level. In [21
] Majó-Vázquez et al. show that the implementation of this method can sufficiently reduce the network density without destroying the multi-scale nature of the network.
However, most of the studies mentioned above use web traffic data to build audience networks. We highlight the work of An et al. [10
] and Hahn et al. [24
] who used a network approach to explore the news media landscape in Twitter. An et al. [10
] uses a directed weighted network to represent news media landscape in Twitter, and applies a closeness metric, which is the probability that random followers of media B also follow media A, to measure the closeness value of media A from media B. For all directed pairs of media sources, this study calculates the closeness metric of media A to all other media sources and examined which ones appear the closest. However, compared to the Disparity Filter method, the network reduction technique used in [10
] has limitations because it ignores the multi-scale nature of media networks. In contrast to [10
], Hahn et al. [24
] uses the symmetry form of the closeness metric to measure similarity between news outlets, and then they represent the news media landscape as a weighted undirected network. Furthermore, Hahn et al. [24
] did not make any effort to reduce the density of the resulting media network.
In the same spirit as [10
] and [24
], in this study we use the phi coefficient to measure the similarity between two news outlets based on the number of common followers. We will further apply the Disparity Filter method, as used in [21
], to eliminate insignificant edges as well as to sparsify the resulting media networks. In general, we propose a framework for applying a network approach on Twitter follower data, from data collection strategies to network analysis using node centrality indicators.
In the early stages of data collection, this study succeeded in collecting 215 Indonesian online news media, 111 Malaysian online news media, and 58 Singaporean online news media. One of the main sources is a list of the most popular sites in a country published by Alexa [30
]. This study utilized the Alexa rank, a site rating system based on web traffic data, to ensure that the analyzed media outlets have a significant sized audience. The data collected was then further selected using the two criteria described earlier. Based on the final list, the follower extraction process was then carried out to retrieve followers of the news outlets on Twitter.
The number of online news media varies in the three analyzed countries. As shown in Table 2
, the number of news portals in Indonesia is far more than the other two countries, both before and after the selection process. This is not surprising given the size of Indonesia’s population and the increasing of internet access, and most importantly, press freedom in Indonesia is better than Malaysia and Singapore [31
The readership of news outlets in the three countries analyzed is also very diverse. As shown in Figure 2
, the histogram of followers of news outlets on Twitter has a right-skewed shape with a long right tail. This means that there are only a handful of news outlets with a very large number of Twitter followers, while the majority of news outlets only have a relatively small number of followers. This indicates that, despite the variety of news media available online, most consumers look to the most prominent media outlets [15
]. This distribution pattern was also found by previous studies that analyzed news consumption on different media platforms [9
], which further confirms the classic long-tail argument in media research that online news sources are very heterogeneous in terms of the size of the audience [33
Based on the follower data, this research builds news media networks where the edges tell us how many followers overlap between any two news sources on Twitter. Table 3
shows the statistics of news media networks for three countries analyzed. Before edge removal, the news media networks are very dense, that is, most news outlets share followers with most other outlets. As shown in Table 3
, the density of the initial networks, that is ratio of edges to total possible number of edges in the networks, almost reaches the maximum value, i.e., 0.912 for Indonesia, 0.839 for Malaysia, and 0.927 for Singapore. However, the difference in the number of followers shared, and in the total followers of each news media (popular outlets have a larger audience to share than smaller outlets) causes the strength of the overlap to vary across media pairs. Consequently, not all edges are significant. An edge is not statistically significant when the probability of observing a given overlap is very unlikely under the null hypothesis of random overlap distribution. Edge filtering step using the Disparity Filter method is carried out to eliminate edges that do not reach the significance level for a probability value p
< 0.05. On average, about 94% of all original edges were successfully removed from the three networks analyzed, i.e., 11,592 edges on the Indonesian media network, 3441 edge on the Malaysian media network, and 752 edges on the Singapore media network. As a result, as shown in Table 3
, the density values of the three analyzed networks are very small, indicating that the final networks have a sparse structure.
Network analysis provides tools and metrics to evaluate the structural characteristics of news-media network. Node degree, the number of edges connected to a node, is a node-level measure that is used to assess the centrality of a node in the network [35
]. In the context of news-media landscape, the degree of a node becomes a proxy to evaluate the diversity of people consuming a given media [18
]. For example, if node A has strong relation with five other nodes, and node B has strong relation with only one other, we can infer that media A attracts people with a wider range of interests than media B. This research found that news outlets with the largest node degrees are Antara News in the Indonesian media network, Free Malaysia Today in the Malaysian media network, and Yahoo Singapore in the Singapore media network
shows the top five online news media ranking based on the degree value and the number of Twitter followers. The node degree of a news outlet represents the number of other news outlets that share audiences with this outlet, while the number of followers represents the size of the audience of this outlet on Twitter. As shown in the Figure 3
, the results of the two indicators are relatively different. News outlets with the largest audience in their respective countries, namely Detik in Indonesia (15,070,895 followers), Astro Awani in Malaysia (1,686,834 followers), and Straits Times in Singapore (1,008,205 followers), have no position in the top five media list based on degree of centrality. Furthermore, in Indonesian media networks, the list of the top five news outlets based on these two indicators does not intersect at all. Meanwhile, there are three news outlets in Malaysia and Singapore that have a large readership as well as a high degree of centrality, namely The Star, Malaysia Kini, and Harian Metro for Malaysian media networks, and Straits Times, Channel News Asia, and The New Paper for Singapore media networks. This shows that the centrality of a news outlet on media networks is not always reflected through the size of their followers. However, the greater the followers of an outlet, the more likely the outlet is to share audience with other news outlets.
The number of followers of a news outlet on Twitter gives us an overview of the size of the audience that outlets have on social media. However, in order to get a complete understanding of audience behavior and reveal the pattern of news consumption at the population level, we must view the news media landscape as a collection of interconnected news sources based on overlapping audiences. In this study, we used Twitter follower data to build online news media networks, where the nodes represent news outlets and the edges tell us how much overlap any two news outlets have in the audience they share. In other words, we wanted to map the media landscape, as it emerges from the preferences of social media users for news sources available online
, Figure 5
and Figure 6
show the visual representation of online news media networks in the three countries analyzed in this study. The networks were built using Gephi 0.91, an open source software for social networking analysis [36
], with a list of edges between media outlets as input. By representing the landscape of online news media as a network, we can get a clear and measurable picture of the configuration of relationships between news outlets and the position of various news outlets in the media landscape.
There are several ways to build a news media network based on follower overlap data. We can measure the similarity between news outlets using phi coefficients, as done in this study and in [19
], or using closeness metrics [10
], and then represent all the ties between news outlets collectively as undirected weighted networks [19
] and directed weighted networks [10
]. However, we argue that the resulting media network can look very different depending on how we filter the insignificant edges in the network. A weighted graph can be easily reduced to a sub-graph in which any of the edges’ weights are larger than a given threshold. This global weight threshold technique has been applied in [19
] using the t
-test and in [10
] using simple strategy: choose the strongest one. The short coming of this method is that it overpasses the nodes with small strength. Meanwhile, other studies analyze media networks as they are [24
]. This can lead to wrong conclusions when carrying out network analysis given that the media networks constructed in these studies are correlational networks which naturally have a dense structure. In the analysis of media networks, network density needs to be reduced first to bring out the actual structure so that the media network can be analyzed reliably. Furthermore, in media networks, as shown in [19
], both strength and weight distribution in general follows heavy tailed distribution which spans several orders of degrees. Applying a simple cutoff on weight will remove all the information below the cut-off. In this study, we show the effectiveness of the disparity filter method to extract the relevant connection backbone in media networks. Compared to other studies that use t
], the density of the final media networks in this study, as in [19
], is very low which reflects this method can sufficiently reduce the network density without destroying its multi-scale nature.
After we represent the media landscape as a network, we can use the various metrics available in network analysis to summarize what the network reveals about news consumption. In this study, we applied metric degrees centrality to measure how central the position of news outlets is in the network. Traditionally, the size of the audience has been a measure of the prominence of a media. However, in this study we show that how strategic the position of a news outlet is in the network is not always reflected in the number of followers they have. In fact, compared to the size of their followers, media outlets reach a considerably larger audience through indirect exposure via social links [10
]. An et al. [10
] show that, based on indicators of social network exposure, a number of unconventional news sources made up the top list media, along with several established media sources with millions of direct followers. In Figure 4
, how central the position of a media outlet in the network is visualized by the size of the node, which is proportional to the value of the centrality of the media: the larger the node size, the more central the media represented by that node.
So far, the intersection between social media and news media mainly occurs on Facebook and Twitter. However, these two platforms are used for vastly different purposes when it comes to news consumption, due to the various features of the respective sites. Research on social media shows that consumers use Twitter for breaking news or searching information about their interest, but this is not the case for Facebook. Facebook members want to stay connected with their offline social network, and as such, tend to get news about their social environment through those networks [37
]. However, news consumption on social media is strongly affected by the tendency of users to limit their exposure to a few sites [37
]. That is why the structural characteristics of the news media landscape on the two platforms are relatively the same [38
The news media network on Twitter presented in this study should be evaluated with consideration of two limitations. First, not all followers of news media accounts on Twitter are true audiences. It is known that about 15% of Twitter accounts are bot accounts that are controlled by software [39
]. Second, not all news media followers are active users. Future research work should consider building a news media network based on followers who actively interact with news media accounts.
In this study, we used network analysis to represent and analyze the landscape of online news media in the social media environment based on follower overlap data. For this purpose, we developed the implementation stages of the proposed method, starting from data collection, modeling to data analysis, and applied the methodology to online news media data in three countries, namely Indonesia, Malaysia, and Singapore. In the data collection stage, we showed a long-tailed distribution pattern on the size of the media audience on Twitter. This indicates that the news media in the three countries analyzed is very heterogeneous in terms of the size of the audience, where there are a small number of news outlets with very large Twitter followers, while the majority of news outlets have only a few followers. This finding confirms a classic argument in media studies that public attention tends to be concentrated in a handful of news outlets. In the modeling stage, we showed the need for edge removal so that the news media network can be analyzed reliably. Comparison of network statistics before and after the elimination of non-significant edges indicates edge removal has an impact on the structure of networks. In the analysis phase, we used network indicators to evaluate the structural characteristics of the news media networks and show the ranking of news media outlets in the three countries analyzed based on the diversity of their audiences. In general, this study shows that the use of network analysis on follower overlap data can offer relevant insights about media diet and the way audiences navigate various news sources available on social media. Future work of this paper concerns deeper analysis of fragmentation phenomena in the pattern of news consumption on social media.