An Exploratory Data Analysis of the #Crowdfunding Network on Twitter

Together, social media and crowdsourcing can help entrepreneurs to attract external finance and early-stage customers. This paper investigates the characteristics and discourse of an issue-centered public on Twitter organized around the hashtag #crowdfunding through the lens of social network theory. Using a dataset of 2,732,144 tweets published during a calendar year, we use exploratory data analysis to generate insights and hypotheses on who the users in the #crowdfunding network are, what they share, and how they are connected to each other. In order to do so, we adopt a range of descriptive, content, network analytics techniques. The results suggest that platforms, crowdfunders, and other actors who derive income from the crowdfunding economy play a key role in creating the network. Furthermore, latent ties (strangers) play a direct role in disseminating information, investing, and sending signals to platforms that further raises campaign prominence. We also introduce a new type of social tie, the “computer as a social actor”, previously unaddressed in entrepreneurial network literature, which play a role in sending signals to both platforms and networks. Our results suggest that homophily is a key driver for creating network sub-communities built around specific platforms, project types, domains, or geography.


Introduction
It is widely accepted that social media has had, and continues to have, a profound impact on how individuals and organizations engage with each other. A key part of the success of social networking sites is the ability for users to create and consume content from other users, and identify and interact with others with similar and opposing views [1,2]. Social media and the so-called "power of the crowd" is also being used to solve problems for organizations through contests, collaborative communities, crowd complementors, and crowd labor markets [3]. For early stage start-ups, the combination of social media and crowdsourcing is solving two problems-attracting external finance, and early-stage, often pre-product, customers [4]. Since its emergence in 2010, crowdfunding has expanded in terms of the volume, variety, and value of transactions to which it is applied [5]; crowdfunding investments worldwide are expected to reach US$6.8 billion in 2019 through more than 8.6 million campaigns [6].
Despite the growing importance of crowdfunding, academic research is still limited and it typically focused on understanding decision-making in relation to the form of crowdfunding to adopt or engage in, and the characteristics of, successful campaigns [7]. Although crowdfunding is heavily based on and associated with social media, there is only a limited understanding of the dynamics of crowdfunding on social media [8]. It is well-established that social links determine what information people are exposed to. Similarly, previous studies have highlighted the prominent role of social media and online communication in the context of crowdfunding; however, these studies have focused on the relationship between discrete social media activity/engagement and campaign success [8][9][10][11]. To the best of our knowledge, this is the first study examining the structure and discourse of the wider crowdfunding public on a social network, in this case the #crowdfunding public on Twitter. As per Herdagdelen et al. [12], by understanding the individual choices of users to interact with each other in this #crowdfunding public, we seek to gain insights in to how the crowdfunding community behaves as a whole. Furthermore, while there is a well-established literature based on the role of strong and weak ties in entrepreneurial networks, this paper extends our understanding of the role and value of strangers in such networks.
The aim of this paper is to investigate the network characteristics of an issue-centered public on Twitter organized around the hashtag #crowdfunding for the calendar year 2014 through the lens of social network theory. This paper leverages Exploratory Data Analysis (EDA) techniques including network analytics to examine the degree of connectedness and prominence of sub-networks, and key brokers within these sub-networks; text mining to examine the topics being discussed; and peak detection analytics to identify content that triggered abnormal levels of activity in the network.
The remainder of the paper is organized as follows. Section 2 further elaborates the crowdfunding, social media, and related literature. Section 3 presents the dataset and the methodology, while Section 4 presents our findings. The paper concludes with a discussion of our research, limitations, and avenues for further research.

Crowdfunding
Crowdfunding is an open call, mostly through the internet, for the provision of financial resources from a group of individuals or organizations either in the form of donation or in exchange for the future product or some form of reward to support initiatives for specific purposes [4]. At its core, instead of raising funds from a small group of professional or sophisticated investors, firms obtain small amounts of money from a large number of individuals, typically non-professional i.e., the "crowd" [4,13]. In this way, crowdfunding is a two-sided market that links capital-seekers (crowdfunders) and capital-givers (investors) enabled by a crowdfunding intermediary (platforms) [14,15]. Crowdfunding platforms do not borrow, pool, or lend money on their own account, but they enable investors to pledge funds, often on an all-or-nothing or keep-it-all basis [14,16]. The economic model for these platforms is typically a commission based on funds raised or donations received [17]. These platforms cater for a wide range of projects, including products, experience goods, and social initiatives [9,18,19]. Commonly cited reports estimate the number of crowdfunding platforms at over 1250 worldwide in 2014 [20], and over 510 platforms operating in the European Union in a similar period [21].
Crowdfunding can be differentiated from traditional venture capital investments by the characteristics of investors, the investment model, and indeed the type of relationship that the investors have with the investee. Firstly, unlike traditional investment, the overwhelming majority of crowdfunding investors are not professional. Indeed, early crowdfunding investors, tend to be friends and family, and influenced by signals from these close social ties rather than more professional screening and due diligence processes [9]. Crowdfunders differ significantly from traditional venture capital investors in their investment motivation. In addition to profit sharing, they found that crowdfunders were motivated by early or preferential access to products and other community benefits, such as feelings of connectedness to a community [22]. Secondly, crowdfunding investment models are more varied and a number of main categories exist i.e., donation, reward and pre-purchase, lending, and equity-based [23]; the first two are sometimes referred to as crowd sponsoring, while the latter two are sometimes referred to as crowd investing [24]. Other variants include invoice trading and hybrid crowdfunding [25]. The various investment models differ in investment motivation and benefits. For example, donation-based models may have very issue-centered, personal, or community benefits [22]. In contrast, pre-purchase models have more consumer-oriented benefits, such as price discrimination and/or self-esteem. Such more intangible benefits contrast with the simple finance model of traditional investment. Thirdly, the relationship between investors and investees in crowdfunding models differs from traditional investment [26]. There is a well-established literature base on how venture capitalists control agency problems and mitigate risk throughout the investment process [27]. Ley and Weaven [26] examine the investment process while using an ex ante and ex post approach and identify a number of potential differences in agency dynamics between crowdfunding and traditional venture capital investment. Ex ante factors includes deal screening, deal referrals, information sensitivity, and due diligence whereas ex post factors include contractual rights, board representation, value adding capability, economic life, and exit options [26]. In each of these cases, the "crowd" aspects of crowdfunding impact the extent to which these factors can be maximized or minimized throughout the entire investment process and in some models e.g., donation and reward and pre-purchase, may not be relevant at all. Indeed, ex ante strategic involvement by investors, considered by many investors as a critical success factor for start-ups [28][29][30], may not be feasible, appropriate, or welcome in crowdfunding.
Recent studies in entrepreneurship literature focus on two main aspects of crowdfunding: (i) the incentives and motivations for starting or taking part in crowdfunding projects and (ii) factors associated with successful campaigns. Gerber et al. [22] found that initiators and funders are motivated by both extrinsic and intrinsic factors. Specifically, extrinsic factors include fundraising (initiators) and consuming products or experiences (funders); while, intrinsic factors include social interactions, reinforcing commitment to an idea on the basis of feedback (creators), and connecting with a community with similar interests and ideals (funders). Studies on the characteristics of successful campaigns suggest that project quality [7], spatial proximity between initiators and funders [9], and entrepreneurs' internal social capital [31] play a critical role in attracting both early capital and early backers, hence influencing the success of crowdfunding campaigns. Among these factors, the geography of crowdfunding has attracted particular attention, given some counterintuitive findings that emerge from previous studies (e.g., [9,32]). One of the major benefits of crowdfunding is that it may potentially eliminate geographical boundaries between entrepreneurs and investors. In other words, investors can fund projects, and projects can identify investors, in their own country as well as from any other country in the world without additional effort. However, empirical evidence suggests that the physical distance between investors and entrepreneurs still plays a significant role, with local investors investing relatively early [9] and very limited cross-border activities [32].

Entrepreneurial Networks and Social Network Theory
Emerging entrepreneurial firms face the challenge of attracting the resources needed to survive, and to achieve and sustain economic success [33] due to their nascent stage of development. In entrepreneurship literature, social networks are defined as "the composite of relationships in which small firms are embedded, which serve to link or connect small firms to the environments in which they exist and conduct their businesses" [34]. There is well-established theory on the role that these social networks play in acquiring strategic resources for emerging entrepreneurial firms [35][36][37][38][39]. Such networks can provide a form of social support, create opportunities, identify threats, and address resource issues [34,40,41]. Extant research suggests that social relationships influence business outcomes [42], and are the source of, and basis for, business opportunities and entrepreneurial ventures [43].
Social network theory differentiates between relationships in social networks based on the strength of the ties between dyads. These ties may be strong, where both parties benefit and represent significant emotional investments e.g., friends and family or business advisors, or weak such as customers or suppliers [44]. A third group, which has not been explored in greater depth, are referred to as "strangers". Aldrich et al. [45] characterize strangers as contacts (i) with whom the entrepreneur has no prior ties, (ii) about whom very little is known, and (iii) where the relationships are entered in to for pragmatic reasons, are fleeting and require little or no emotional involvement. The nature of strangers poses obvious research challenges in both identifying such actors and mapping relationships and impact. As such, our understanding of the value and importance of these strangers is limited [45].
Unsurprisingly, most research focuses on exploring the role of strong and weak (loose) ties within entrepreneurial networks and their impact on success. These ties are typically fully identifiable social relationships [40,46]. While such networks are initially based on the founder, over time the entrepreneurial team plays a critical role in expanding the size and value of these networks [47,48]. Hite [39] suggests that relationally-embedded network ties demonstrate three components-personal relationship, dyadic economic interaction, and social capital. For Hite [39], personal relationships demonstrate three main attributes-personal knowledge, affect, and sociality. Dyadic economic interaction is defined not by the content of a specific exchange, but rather by attributes that are related to the extent, effort, ease, and quality of an economic interaction [39]. While relationally-embedded network ties would seem to influence the economic actions and outcomes of the firm, they are not governed by traditional market mechanisms but rather by relational trust [49]. Consequently, the relational basis of the ties may constrain the actions and decisions of the entrepreneurial team due to perceived obligations and expectations, outside of the economic relationship, which may affect the personal relationship [50,51]. Such impacts clearly do not occur in the case of strangers.

Social Media
Social media includes a set of web-based and mobile tools and applications that allow users to create (consume) content that can be consumed (created) by others and that enables and facilitates connections [2]. Social media platforms differ from each other by the extent to which users (i) reveal themselves, (ii) know if others are available, (iii) relate to each other, (iv) know the social standing of others and content, (v) form communities, (vi) communicate with each other, and (vii) exchange, distribute, and receive content [52]. Twitter is a particular form of social media platform, called a micro-blogging site. Microblogging services allow users to send and read short posts instantaneously [53]. Originally limited to text, microblogging sites, such as Twitter, now support images, audio, live and recorded video, URLs, and other resources. The short nature of posts combined with the instantaneous nature of microblogging differentiates it from blogging and other social media resulting higher update frequencies and more real-time updates [54].
Twitter has a number of additional noteworthy characteristics and functionality. Firstly, Twitter is largely an open network, thus interactions are largely in the public domain. Users can follow other users, view their posts, and send messages to other users publically without permission (as is typically the case with Facebook or LinkedIn); it is largely a social network that connects strangers. Secondly, posts (tweets) can feature hashtags. Hashtags connect a post to a particular theme and acts as coordinating mechanism for users to organize all of the tweets on that theme and identify users with similar or opposing views, thereby facilitating the formation of ad hoc and calculated publics [55]. Thirdly, a retweet is a way to forward the message of another user to the followers of the user who retweets. This last characteristic represents a powerful mechanism for information sharing [56].
Previous studies have noted the opportunities and benefits of Twitter for entrepreneurs and small businesses [57][58][59][60]. In the venture financing literature, recent research suggests that there is a relationship between social media activity of entrepreneurs, start-up engagement and venture financing [61]. Extant literature on crowdfunding highlights the prominent role of social media and online communication in the context of crowdfunding [8][9][10][11]. For example, numerous studies have noted a relationship between the number of social media followers and crowdfunding success [7,62,63]. A smaller number of studies explore the role of strangers in crowdfunding success. Hui et al. [64] identify that strangers play a significant role in crowdfunding projects. Focusing on perceptual relations, Davison and Poor [65] find that the perceived proportion of known funders negatively predicts project success. More recently, Borst et al. [37] found that project initiators needed to vary the type of social media messaging to attract funding from latent or distal ties, i.e., strangers, depending on the network. They found that messages featuring solicitations were more effective on Facebook and informative messages more effective on Twitter. Furthermore, their results suggest that sending more tweets negatively impacted weak and latent ties, potentially explained by a bystander effect [37].
While previous studies of both entrepreneurial networks (e.g., [39]) and crowdfunding (e.g., [66]) have focussed on dyadic relationships, few have explored crowdfunding and social media through the lens of network theory. Borst et al. [37] explore the interplay of social media messages and social ties based on a sample of ten crowdfunding projects on the "Voorddekunst" platform for arts funding. Like Borst et al. [37], we draw on network theory as a lens though which to explore the use of social networking sites in crowdfunding; however, our focus is not at a campaign-level, but rather the sub-network created on Twitter around the hashtag #crowdfunding. In this way, we address calls for further research using social media data and on the role of social media in open innovation, product development, and crowdfunding [67,68]. Thus, we contribute to understanding the wider crowdfunding community on Twitter rather than platform-specific or campaign-specific messages or relationships thus expanding the literature and theory relating to both social networks and crowdfunding.

Data and Methodology
The aim of this paper is to investigate the characteristics of the #crowdfunding network on Twitter by identifying who the users in the #crowdfunding network are, what they share, and how they are connected to each other. Our analysis is based on a dataset of 2,732,144 tweets that were posted from 1 January 2014 to 31 December 2014 by 386,461 Twitter screen-names (accounts) featuring the hashtag #crowdfunding. The dataset was generated using Datasift, a commercial data aggregator. For each tweet, the text of the message, time-stamp, username, geographical location, URLs, and whether a message was an original tweet, a retweet or a reply was collated. The dataset contained 317,975 (11.6%) replies. 2,430,006 (89%) of tweets contained a URL and the tweets contained 136,317 distinct hashtags.
Given the novelty of this paper, EDA was deemed to be an appropriate initial analytical technique for data driven discovery [69]. EDA has been found to be useful for identifying patterns, trends, correlations, or relations among the data in order to generate insights or hypotheses [70]. The use of multiple analytical and visualization methods is a key element of EDA to enable the data "to speak for themselves" and provide more intuitive insights about "models" of relationships not expected a priori [71]. This paper sought to examine the characteristics of the #crowdfunding public, including the degree of connectedness and prominence of sub-networks, and key brokers within these sub-networks. By their nature, social networking sites can be examined through the lens of network theory. Twitter has clear nodes, i.e., screen-names, and edges, in our case replies, in order to establish links between the users in the network. Network analytics is useful to discover influential accounts, communities and the key brokers within communities, and Twitter communities specifically [72][73][74]. Network analytics was carried out using the Gephi open graph visualization platform. More specifically, the ForceAtlas-2 algorithm was used to construct a graphical representation of the overall network topology. The ForceAtlas-2 algorithm is a force-directed algorithm, meaning that node layout is determined by the forces pulling vertices together or pushing them apart. Attractive forces occur between adjacent vertices only, whereas repulsive forces occur between every pair of vertices in the graph [75]. Hence this algorithm is appropriate to visualize a social network. In this paper, the network topology was designed by grouping all the vertices into communities to which they belong using Blondel et al.'s community detection algorithm in Gephi [76].
As part of the network analytics, centrality analysis was undertaken in order to measure betweenness centrality and in-degree, metrics that are commonly used to identify the hubs and influencers in a social network. Betweenness centrality (BC) is a measure of centrality in a graph based on shortest path; the higher the betweenness centrality, the more frequently the node falls between other nodes. In our case, users with high betweenness centrality act as hubs helping to connect other most influential nodes in the network, thereby acting as bridges between users. In the social graph constructed from the dataset, vertices have an inbound degree, i.e., the number of replies received by a specific user, and an outbound degree, i.e., the number of replies sent by a specific user. The higher the in-degree of a user, the higher the influence. To supplement the analysis on influential users, the most active users and most visible users were identified in line with Cha et al. [77]. The activity (visibility) of a user was determined by the number of tweets (replies and mentions) contributed (received) by a user. In order to greater understand the #crowdfunding public, we use content analytics and peak detection analysis to conduct preliminary analysis on the topic discourse over a calendar year. Content analytics was carried out by cross-referencing the content and structural features to identify usage patterns. Items of interest were identified using three peak detection algorithms as per Healy et al. [78]. Word analysis (n-grams) and hashtag analysis were used to extract intelligence from the data set while spatial analysis was used to identify the geographical distribution of users in the network.

Topologicial Data Analysis
As discussed, network analytics helps in understanding the nature of the underlying network and is also useful in identifying influencers, key hubs, and communities in the network. It also aids in understanding the nature of the social network like the network density and degree of cohesion (how separated or how close the users are). The crowdfunding network had 128,293 nodes and 134,113 edges. The average degree of cohesion of the network was calculated as 1.045, suggesting that each user is engaged with at least one other user in the network. The low level of cohesion can be attributed to the presence of many users with a very low level of engagement further supporting the bystander effect noted by Borst et al. [37]. The network diameter i.e., the largest distance between any two nodes in the network calculated as 25. The average path length, which measures the average distance between any two nodes i.e., users who received replies, was calculated as 7.795. The small network diameter and large average path length suggest the presence of powerful hubs, which connect different users in the network, thereby acting as facilitators. Figure 1 presents a graphical representation of the network topology, constructed using the ForceAtlas-2 algorithm in Gephi. Each color in the graph identifies a different sub-community in the network. The modularity score (0.892), which measures the strength of a network when divided into communities or clusters, indicates that interactions within the communities are stronger than the interactions between the communities. Finally, the clustering coefficient, which measures the extent to which nodes tend to cluster together, has an average value of 0.0052, indicates that the network is sparse. This is consistent with the average degree of cohesion. Additional analysis was undertaken to examine the network topologies of the five largest communities that were identified in the dataset. These are presented in Figure 2, where each sub-community is denoted by SC1 to SC5 by degree of magnitude, with 1 being the largest sub-community. The topologies for sub-communities vary, as can be seen in Figure 2. SC1 and SC5 have topologies that represent the influence of one node in the network pulling the majority of other nodes. For example, Phundee, a reward crowdfunding platform for entertainment and arts (@havephun) pulls 97% of other nodes in SC1 and has an out-degree of 5836. In contrast, SC5 is dominated by @Agromaniacs, a science fiction related project that is supported by crowdfunding; @Agromaniacs pulls 75% of users (2311) in that sub-community. In contrast, SC2 and SC4 are sub-communities with more than one influencer with varying degrees of strength, as represented by the distribution in the topologies. In addition to reflecting the existing of centralized and distributed sub-communities, the topological analysis of centralized sub-communities suggests that these communities can be categorized by actor type or location. As mentioned earlier, SC1 is organized around a platform; SC5 around one project (@Agromaniacs) and SC2 around multiple projects (@TheAutismMovie, @JenniferStratn, @QuipoProject and @KSPublishingSC). SC3 is organized around a different set of actors focussing on knowledge sharing and projects' promotion e.g., @_Ayudos, @CrowdfunderUK, @StartupsMap, and @CrowdfundingBro. SC4 is regionally-focused around crowdfunding in France; the top influencers include @Eurosolidaire, @LendyFr, @Crowd2win, @Cultureuse and @Babyloan Again, SC1 is organized around a platform, SC3 around knowledge sharing and it should be noted that none of the sub-communities are organized around capital givers.

Centrality Analysis
Centrality analysis also reveals who are the influencers (or accounts that attract a high number of inward connections i.e., replies). Influencers are typically identified using two metrics: (i) in-degree, already defined above, and (ii) PageRank, which considers the link propensity of a node (i.e., user) and its centrality [79]. The topological analysis supports the presence of important hubs at the aggregate level and within sub-communities. Centrality analysis further confirms such a finding. Table 1 provides a list of the top 10 hubs and top 10 influencers, together with the corresponding BC and PageRank score, respectively, for the overall dataset. Top hubs and influencers are different in nature. The hubs are dominated by accounts that operate in the crowdfunding economy providing expertise, tools or platforms for crowdfunding. The top hub relates to a specialist crowdfunding marketing platform, Krowdster. Interestingly, while six platforms feature in the top hubs, the largest crowdfunding platforms are absent e.g., Kickstarter, IndieGoGo and GoFundme. Seven accounts related to crowdfunding campaigns. In contrast, the top influencers, as measured by PageRank, feature a wider mix of participants and notably the media e.g, VentureBeat. At a more granular level, centrality analysis can help to explain the thematic orientation of a sub-community. For example, in SC1, the main influencer and hub is @phundee consistent with the topological data analysis; however, the secondary hubs provide greater explanatory value. @mygirlfilm (5918) and @FilmFestDoctor (5915) reflect the film funding orientation of this community. Again, consistent with the topological data analysis, SC2 centrality analysis reflects a wider distribution of stakeholders and interested parties in the crowdfunding economy. @BizLoanFinder is the top influencer in SC2 (PageRank: 0.00226) followed by @YouTube (0.00191), @rustyrockets (0.00178), and @TheEllenShow (0.00158). SC2 hubs again provide better indicators of the thematic inflection of the sub-community. For example, @TheAutismMovie (BC: 868), @JenniferStratn (330) and @OuipuProject (261) are the most powerful hubs within SC2 and largely relate to socially-oriented projects. The influencers and hubs that result from analysis of SC3 suggest a more general crowdfunding sub-community with many crowdfunding-specific influencers, such as those that are associated to platforms, @IndieGoGo, and @GoGoSlava, and other community sites e.g., @Crowdclan, featuring in the top influencers. Similarly, the SC3 hubs are accounts representing platforms, information resources etc. SC3's main hubs reflect its structure well with multiple accounts holding sway over the community. Unsurprisingly, SC4 is geo-specific and it is dominated by French influencers and hubs. Finally, SC5 reflects the interests of the Agroamaniacs project with influencers and hubs that are related to art, comics, and science fiction.

Activity and Visibility
Cha et al. [77] suggest that the most active and visible users may indicate influence on Twitter. The most active users in this dataset are dominated by accounts offering services in the crowdfunding economy (CE). The content posted by these accounts includes promotion of services and specific campaigns. Table 2 summarises the activity and visibility of the 10 most active users in the dataset. An analysis of the top 10 most active accounts suggests that these accounts reflect the characteristics of accounts operated by automated software programs, commonly known as bots. Ferrara et al. [80] suggest bots on Twitter retweet more than human, have longer user names and generally produce fewer tweets, replies and mentions, and they are retweeted less than humans. While such bots have been found to be benign, for example, for the purposes of sharing news or updates [81], they are commonly used for spamming and/or manipulative marketing [82,83]. Extant literature suggests that between 10.5% and 16% of Twitter accounts exhibit a high degree of automation [81,84]. To further explore this phenomenon, the top 200 most active accounts in the dataset were examined to see whether (a) whether Twitter had suspended the account, and (b) whether these accounts reflect the behavior of bots. Typically, Twitter suspends accounts for three primary reasons-spam, security risks, or abusive content. By May 2017, 36.5% of the top 200 (50% of top 100) most visible accounts had been suspended by Twitter. The Bot or Not? Classifier (also known as "Botometer") was used to assess the presence of bots among the top 200 most visible accounts. After excluding the suspended accounts, a further nine accounts had a Bot Or Not score of 80% or higher indicating a high probability of automation or use of bots and a further 41 accounts had a score of between 60% and 79%.
In contrast, the 10 most visible accounts feature leading crowdfunding platforms, including Indiegogo and Kickstarter and media accounts. A number of highly visible accounts related to "experts" on crowdfunding who have typically published books, run events or comment in the media. In line with similar studies in other domains, it can be observed from Tables 2 and 3 that the most active users are not the most visible users and vice versa [85,86]. Similarly, visibility is more likely to be a better predictor of influence.

Spatial Analysis
Spatial analysis was performed on the subsample of tweets for which the geographical location was available. Even though this subset only represents one percent of the full sample, it is still large enough (27,321 tweets) to provide reasonably generalizable results. Over 137 countries and 3972 places (cities, towns and other granular geo-locations) can be identified in the overall dataset, representing a minimum number. Table 4 provides a list of the 20 countries with the largest volume of tweets, while Table 5 provides an overview of the number of countries and places detected in the five largest sub-communities. The United States (US) was found to be the most active country with 48.66% of the total tweets (17,402 tweets), followed by the UK (11.04%), France (7.09%), and Spain (6.28%). Although the results of this analysis may be partly driven by the fact that our dataset only includes English language tweets or by the population distribution across different countries, it is worth noting that the most active countries in terms of volume of tweets are also the ones with the largest market share in terms of crowdfunding volume [87]. Figure 3 provides a spatial visualization of the activities in terms of the number of tweets generated around crowdfunding. The data are based on the geocodes available as augmentations to the tweets having a valid geographic description.   Table 5. Spatial Analysis of Sub-Communities.  The number of countries and places varies at a sub-community level. Despite the low number of tweets with available geographical location, the analysis of the top 20 places per sub-community provides some interesting insights, particularly when combined with the earlier topological data analysis. Two sub-communities, SC2 and SC4, present evidence of geo-location concentration. 85% of the tweets in the top twenty places identified in SC2 were generated by one account in Wisconsin; therefore, it is not very generalizable, while 97% of the tweets from the top twenty users in SC4 were posted from French metropolitan areas, dominated by Paris (63%), with a wider distribution of users.

Text Mining-Content Analytics
Content analytics is primarily concerned with uncovering the patterns hidden inside the text. Specifically, we performed word analysis and hashtag analysis on the tweets generated by the most influential users as they have the ability to drive discussions and influence the opinion of other users [56,77,88]. Consequently, the analysis of their tweets provides an accurate picture of the most relevant topics discussed in the network and on the users' attitude.
The results presented in Table 6 also reveal that "crowdfunding campaign" and "equity crowdfunding" were the most co-occurring words with a frequency of 3903 and 3322, respectively. The most frequent words and hashtags (Table 7) fall in to a small number of categories, including general crowdfunding-related terms (e.g., investor, project, equity), marketing and knowledge sharing (e.g., marketing, experts etc.), platforms (Kickstarter and IndieGoGo), and support sites (e.g, LinkedIn, Crowdclan). From an investment focus perspective, only one particular project theme is highlighted i.e., film (e.g, #indiefilm and #cf4filmmakers).

Peak Detection Analysis
Peak detection analysis was performed on the full dataset to identify whether specific topics or events trigger abnormal interest from (and activity in) the Twitter crowdfunding community. For accuracy, three peak detection algorithms were used as per Healy et al. [78]. Du et al.'s [89] continuous wavelet transform algorithm (CWT) identified 11 true peaks. Palshikar's [90] peak detection algorithm (S1) identified 13 true peaks, ten of which were common to those identified by CWT. Finally, Lehmann et al.'s [91] algorithm (Lehmann) did not identify any true peak. Figure 4 presents a visual summary of the peaks in the analysis period along with details for each of the peaks, including the timestamp of the peak and the number of the tweets that constituted the peak.
The tweets contained in each of these peaks were manually investigated in order to identify the trending topics. The results presented in Table 8 highlight that the peaks are mainly related to the success of crowdfunding campaigns, modification to government policies relating to crowdfunding, or to the launch of new crowdfunding platform, with very limited evidence of spamming. There are two peaks that are related to spamming thus supporting the earlier evidence presented in the analysis of highly visible accounts.

Discussion
The network analysis of the #crowdfunding dataset identifies a large, fragmented crowdfunding public on Twitter, which we characterize as a sparse network. The findings highlight the presence of powerful hubs in the network that create connections between users and provide a valuable bridging role. These users fall in to three categories (i) platforms, (ii) crowdfunders, and (iii) individuals or organizations who derive income from the wider crowdfunding economy. It is worth noting that the larger platforms are not present as hubs, but are clearly active based on visibility and content analysis. Notwithstanding this, the sparsity in the network suggests that campaigns are not fully realizing or optimizing their activities to capitalize from the wider interest in crowdfunding.
Analysis also reveals a number of sub-communities; five of which are presented in this paper. This analysis suggests participants in the #crowdfunding discourse on Twitter tend to interact more with other participants in their sub-community than with users in other sub-communities. In line with network theory, this can be explained by homophily-the tendency for people to be attracted to others similar to themselves [92]. In the case of the crowdfunding datasets, the sub-communities provide evidence of self-categorization and similarity attraction. The use of common hashtags, interest in a particular form of collective action (e.g., supporting independent film), and specifically #crowdfunding, suggests a significant effect around homophily through self-categorization. Shen and Monge [93] suggest that as social attributes cannot be identified easily on Twitter, homophily is more likely to be operated on through attributes more easily identified on the social media e.g., popularity and geolocation. The analysis of sub-communities finds evidence supporting such behavior. For example, SC1 and SC5 demonstrate the power of individual Twitter accounts, representing different actors, to draw a sub-community together. In addition to homophily effects, the network structures suggest influence heterogeneity in that it is clear that more influential accounts are connecting with less influential accounts. The motivations are unclear although information sharing is evident from the activity. Whereas, SC2 both demonstrates how popular actors engage in a community. Targeting and gaining the support of these influencers, from a project perspective, may be a critical success factor in building awareness of a project.
Two pieces of evidence suggest a homophily effect of geolocation. While the spatial analysis has limitations, it reveals that around 60 percent of the tweets in the dataset with geo-location data came from users in the US and United Kingdom (UK), but also that the vast majority of active users are concentrated in US and Europe. This finding provides additional evidence of a geographical concentration of crowdfunding, but also provides evidence of the increasing geographic distribution of the crowdfunding phenomenon. At the sub-community level, we also present evidence of the homophily effect of geolocation (and language) consistent with Wang and Chu [94] in SC4, which has a narrow focus on the French market for crowdfunding. This is supported by sub-community centrality analysis that provide explanatory and confirmatory value. The geographic concentration in the US and UK at the aggregate level and the evidence of concentrated regional sub-communities (SC4) provides some support to the proposition that successful crowdfunding remains primarily a relative local activity [9,32]. This spatial analysis combined with the use of jurisdiction-specific hashtags, for example #jobsact, suggest the need for more cross-border harmonisation of regulations consistent with both industry and academic literature [95,96]. Notwithstanding the location of investors and the ultimate role of geographic location in the crowdfunding decision, the results of this paper suggests that crowdfunding, as evidenced by the discourse and community on Twitter, does reduce the geographic barriers between crowdfunding stakeholders, increases knowledge sharing, and facilitates the identification of both projects and investors. Participation in the crowdfunding discourse on Twitter may signal support for a project and, thus, attract further interest.
Wang and Chu [94] also suggest that strategic selection may play a role in explaining network behavior with more influential and more active accounts being more likely to be mentioned. Firstly, our results suggest that there is a significant difference in impact between visible and active users. This is consistent with Wang and Chu [94], who found that while more influential accounts are mentioned more, activity does not equate to legitimacy. The question of the legitimacy is evidenced by the post hoc suspension of large percentage of the most active accounts in the dataset and the high probabilities of bot activity suggested by the machine classification of most active accounts. The use of bots and other automated techniques provides evidence of a new type of social tie previously unidentified in the entrepreneurial networks literature, the computer as a social actor (CASA). The motivation for such activity may be benign, although the suspension analysis indicates otherwise. What is evident, is that the use of such techniques (including multiple accounts for the same service e.g., @PromotCrowdFund and @CrwdFndPlanning) reflects an appreciation by Twitter users and stakeholders in the crowdfunding economy of the importance of social media and entrepreneurial networks for awareness and, in the case of campaigns, mobilizing support. It is more likely that such active accounts are designed to increase prominence in Twitter feeds and, as a result, attract followers. From this point, the account can pursue a number of strategies. For example, followers can be targeted for specific call to actions, which may be malicious or benign. As these accounts are often related to crowdfunding economy activities e.g., selling marketing or consulting services to projects seeking funding, this may be viewed as an online version of direct marketing or cold calling. Alternatively, by sending signals to the network, these accounts may be used to impact topic (and hashtag) concentration, or build momentum and, in some cases, result in trending on Twitter and the associated visibility benefits attached to trending. This approach could be used by crowdfunding projects to achieve prominence and provide signals to investors in line with extant literature. However, the key issue here is not whether a promoter could use these tactics, but whether they should.
Our analysis suggests that sub-communities form within hashtag networks, and in this case, can form around different themes e.g., platforms, projects, and topics. This suggests that it is important for entrepreneurs to (i) pay close attention to the specific platform upon they wish to raise funding, (ii) identify the hashtags associated with their project e.g., #indiefilm, and (iii) most importantly, collaborate with different stakeholders to generate awareness, drive support and ultimately investment for their projects. In particular, attention should be given to the role and value of loose or distal ties, strangers. The network analysis provides evidence of strangers interacting in entrepreneurial networks, as evidenced by some of the longer path lengths both in the #crowdfunding network and sub-communities. Furthermore, there is evidence that they play a role in both information dissemination and knowledge sharing but also promotion of specific campaigns. In particular, the signals that each user provides to Twitter through various interactions, for example, retweets, mentions, tweet engagements, etc., contribute to determine the prominence that a specific user, tweet or topic is given by the Twitter algorithm in user content feeds and in some cases, such as abnormal peaks, may result in trending. This can be evidenced by the peak detection analysis findings and, in particular, the case of Jolla. On 19 November 2014, it launched a crowdfunding campaign on IndieGoGo for producing its new Jolla tablets containing its Sailfish operating system. On 28 November 2014, it announced that it had reached 343 percent of its original funding goal and was increasing its stretch funding goal. Both of these events, the launch of the campaign and the announcement of stretch goals resulted in peak discussion on Twitter involving a variety of stakeholders including the promoters, the media, investors, and strangers interested in crowdfunding. Jolla raised more than US$2.5 million from some 21,600 backers, however ultimately withdrew its tablet product in 2016 [97]. This paper addresses calls by Leenders and Dolfsma [67] and Stanko at al. [68] for further research on open innovation using social media data to understand innovation communities, and the role of social networks and its participants in crowdfunding and product development. In particular, we address calls for greater use of network analysis in this context [67]. Chesbrough et al. [98] define open innovation as "an innovation model that emphasizes purposive inflows and outflows of knowledge across the boundary of a firm in order to leverage external sources of knowledge and commercialization paths, respectively". Research suggests that crowdfunding platforms can play an important role in open innovation through (a) product, strategy, and market knowledge, and (b) network ties with stakeholders [99]. Indeed, Di Pietro et al. [99] suggest that startups that exploit crowd network ties are more likely to be successful two years later in terms of both survival rates and fundraising achievements than startups that do not gain knowledge from the crowd. In this way, it can be a source of competitive advantage. While acknowledging the importance of diversity in the crowd, platform selection, and networking quality, Di Pietro et al. [99], like others [100,101], tend to place more emphasis on the post-funding contribution of investors (backers) in open innovation. Our analysis provides supporting evidence that pre-investment social media activities can play a critical role in facilitating the aggregation of a diverse and high quality network of investors. Our analysis highlights how the selection and targeting of appropriate sub-communities is critical, both from a categorization (e.g, SC2 and SC5) and geographic perspective (e.g, SC4), suggesting the need for careful research on appropriate linguistic features (e.g, hashtags, language, etc.) and influencer engagement. Similarly, researchers have noted that the selection of the right crowdfunding platform for a given project is also critical for both crowdfunding success but also subsequent post-investment innovation [68,101]. Again, our analysis identifies how social media can play a role in informing both strategy and market knowledge. SC3, in particular, illustrates how specific sub-communities on Twitter can provide valuable knowledge on both the crowdfunding process and crowdfunding platforms.
Using Theory of Practice, Smith et al. [102] suggest that social networking sites, such as Twitter, represent a field in which the online networks being established represent a new habitus for enterepreneurs' social capital. In order to exploit the opportunities represented by social networks, in this case for crowdfunding, entrepreneurs need to understand and acquire capabilities to both develop and mobilize these new entrepreneurial networks. Our analysis suggests that the norms represented in the #crowdfunding community on Twitter require both a broad and deep appreciation of the network structure and dynamics related to crowdfunding platforms, sectoral domains, geographies and the acceptable behaviors within these networks. At the same time, care needs to be taken to walk a line between innovative exploitation of the functional building blocks of social networks, ethics, and loss of reputation and presence resulting from suspension.

Conclusions, Limitations and Avenues for Future Research
In this paper, we present a study of the network characteristics of an issue-centered public on Twitter organized around the hashtag #crowdfunding for the calendar year 2014. Using network analytics, we establish that the #crowdfunding public is a sparse network comprising multiple sub-communities, hubs and influencers. At a high-level, it has the characteristics of both an information network and a social network. At a more granular level, like all communities, the sub-communities in the #crowdfunding network are more nuanced built on social ties to a project, platform, geographic region or domain area. Each sub-community may have one or more hubs, brokers and associated influencers, as well as behavioral norms. In building and mobilizing an online network within this public, the entrepreneur needs to understand and design a multi-levelled strategy to exploit the opportunity that such online networks represent. At the same time, it would seem policymakers need to consider supports for developing the digital capabilities and skills in entrepreneurs to leverage the potential of these new platforms and networks but also the harmonization of cross-border regulator to harness the power of the global crowd.
This paper makes a number of theoretical contributions to both social network theory, crowdfunding and open innovation literature. Firstly, we extend the literature on the direct and indirect role and value of latent ties (strangers). Strangers play a direct role disseminating information, sharing knowledge and investing in crowdfunding campaigns but also play an indirect role by providing signals to social networks and social networking site algorithms that further raise the prominence of campaigns. Secondly, we identify the use and role of automated software (bots) in crowdfunding. This phenomenon introduces a new type of social tie, the computer as a social actor that is not identified in the entrepreneurial networks literature. Thirdly, we extend our understanding of the wider passive crowdfunding community and provide insights in to the structure of this community which is comprised of sub-networks organized around platforms, geographic regions, and domain areas, in which a project might be situated. To this end, we provide additional evidence on the geographical concentration of crowdfunding in specific areas or communities, hence contributing to the on-going debate on the effectiveness of crowdfunding in lowering geographical financial barriers between entrepreneurs and investors. Finally, we address calls for research on open innovation using social media data, and specifically using network analysis techniques, in order to understand innovation communities, and the role of social networks and its participants in crowdfunding [67,68]. This paper is not without limitations, which also represent avenues for future research. Our focus was understanding the network structure and characteristics of the crowdfunding public on Twitter. It was primarily based on one year (2014), one language (English), one keyword and one hashtag (crowdfunding), and on one social network (Twitter). While the data for this study were generated in 2014, this represents a limitation and an opportunity for other researchers to confirm the applicability of our findings in more recent data sets. Similarly, while excluding non-English language tweets clearly introduces a cultural bias, it presents the opportunity for further research regarding non-English messaging. Not all stakeholders are necessarily represented in the dataset; some may not use the crowdfunding hashtag. Similarly, only a small proportion of tweets will feature the hashtag or keyword. As such, including all tweets for specific accounts e.g., the platforms and projects, may provide additional insights. In addition, a longitudinal study over multiple years would provide greater insights in to how the discourse and network has evolved and whether sub-communities were compact or loose, transient, or lasting. While this paper focuses on Twitter, it featured data from and links to other networks, including Facebook, You Tube, and LinkedIn. Comparative analysis of network structure for different social networking sites, inter-network analysis, including simultaneous messaging and flows between networks, would be novel and may provide insights. The content and success of particular communication strategies, in terms of electronic word of mouth impact and funding success is not captured in the study and would be clearly of interest to researchers and practitioners alike. Clearly in this paper, one set of tactics was identified that might be classified as covert or manipulative. Studies examining the prevalence and impact of such behaviors is worthy of examination, not only in an entrepreneurial context but in wider socio-political, communications, and business contexts, and specifically in the finance domain.
Funding: This work is partially funded by the Irish Institute of Digital Business, and by the Irish Centre for Cloud Computing and Commerce (IC4), an Enterprise Ireland/IDA technology centre.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: