Network Patterns of Inventor Collaboration and Their Effects on Innovation Outputs

The purpose of this study is to examine how the collaboration structure among inventors in an R and D organization affects its capability to create impactful innovations. Specifically, this study is focused on examining whether a certain type of network mechanism found in collaboration among inventors contributes more to enhancing the future impacts of collaboration outputs, which is represented by the forward citations of their patents. To this end, co-invention networks for R and D organizations are constructed from an inventor-patent database, and the three structural patterns are measured by using network analytic constructs, namely, structural holes, strength of ties, and centralization. The results show that the presence of structural holes and strong ties are positively associated with the increasing forward citations, and that decentralized collaboration has also a positive impact. The findings offer support for both structural hole and network closure perspectives on social capital, which have been considered contradictive in the literature.


Introduction
Innovation is widely recognized as a process of identifying opportunities for unconventional recombination of diverse technology options that have already existed [1][2][3].The process of recombination leads R and D personnel to search beyond their own boundaries for knowledge and skills to complement their capabilities [4].Typically, innovation processes involve teams of researchers who work together on the same project.While exchanging ideas and sharing information, participants of a research team carry over their knowledge to other members in the same team or other projects that they are involved in.Whenever researchers collaborate with other coworkers, they create knowledge spillovers.The quality and impact of outputs from collaboration processes are inextricably related to who is working with whom; that is, how knowledge spills over among members in an R and D organization.Therefore, knowledge spillover is a causal mechanism linking network structures to organizational performance.
The innovation literature has reported empirical evidence on the relationship between collaboration structures and organizational performance.They find that the transfer of knowledge across boundaries within firms (example, [5]), the combination of technologies from heterogeneous technological origins [3,6,7], and knowledge spillover amongst researchers with different roles (example, [8]) are closely associated with organizational performance and output quality.
In particular, previous research repeatedly stressed the importance of inter-firm alliances and networks for organizational learning and knowledge flows in knowledge-intensive industries.Indeed, numerous previous studies found that R and D alliances are used as an instrument by firms to acquire new skills and to source specialized know-how (see [9] for a nice review).However, these studies were interested in the effects of collaborative R and D on subsequent innovation performance, without putting emphasis on micro-level interactions of how individual inventors collaborate and which co-working structures are more productive.Rather, these previous studies were interested in the fact that collaboration has taken place (as opposed to in-house R and D exclusively) in a certain network structure and in some cases distinguished between the types of partner involved.
This study examines how individual inventors in an R and D organization collaborate with other co-workers in the same organization.We focus on various network mechanisms identified in intra-organizational collaborative invention based on social capital theories.Studies on social capital suggest a variety of possible explanations for the empirically-observed relationship between various network mechanisms and organizational performance.We combine these streams of literature and extend previous research in at least two major ways.
First, we examine in-house R and D collaboration based on social capital theories and focus on how micro-level R and D collaboration among individual inventors affects organizational performance.Second, we explicitly separate three important network mechanisms of strength of ties, dense connectivity, and network centralization and examine how they operate independently and interactively.Although the effects of these network factors have been well studied in the literature, little has been done to disentangle one mechanism from others [10].Specifically, this study examines whether a firm's patent stock produced under a particular collaboration structure is more impactful on subsequent innovations than other structures.To this end, co-invention networks for R and D organizations are constructed from a patent database, and their structural patterns are examined by using network analytic constructs.Based on the three distinct constructs that represent collaboration patterns, our study seeks to disentangle the effects of two leading network mechanisms, structural hole and tie strength, which have been considered to contradict each other in the extant literature.

Hypothesis Development
Although there has been a variety of interrelated definitions of social capital (for differently-focused works, see e.g., [11][12][13]), most definitions have two elements in common: social capital is embedded in some aspect of social structures, and it facilitates certain actions of actors within the structure [14].In this sense, social capital refers to the collective value of all social networks and the inclinations that arise from these networks to do things for one other (e.g., [15,16]).
The importance of social capital as an antecedent of innovation has received much theoretical attention over the last few years [17].It has been shown that social capital and learning have a positive relationship because social capital directly affects the combine-and-exchange process and provides relatively easy access to network resources [18,19].Relatedly, the overall hypothesis of the social capital theory in the matter of innovation is that firms with a large stock of social capital will have a competitive advantage to the extent that social capital help reduce many forms of communication inefficiencies (e.g., transaction costs, bargaining costs, search costs, and policing costs, etc.), cause agreements and cooperation to be honored, and enable employees to share tacit knowledge and place negotiations on the same wavelength [20].
Social capital can take different forms, primarily trust, norm, and network.However, the most distinguishable and relatively easily measurable form is the network structure of relations between and among actors.It is not completely fungible nor exchangeable but may be specific to certain organizations or activities.Depending on diverse organizational characteristics such as culture, routines, and demographic compositions, social capital of organizations inheres in the distinctive structures of collaboration among their members.Thus, the structural features of collaboration must be closely associated with the organizational capability of creating innovations.
Although the contribution of social capital to innovation has been well recognized in the literature, empirical support is scarce owing in the main to: (1) a lack of agreement regarding the content of the concept of social capital and the appropriate way of measuring it [21,22], and (2) the lack of empirical research in the area [23].This paper aims to fill this research gap by focusing on two issues.First, we separate different network mechanisms, particularly, distinguishing tie strengths (which is at the dyadic level) and density (which as ate the network level), and examines how they affect innovation outputs both independently and interactively.Second, we provide empirical evidence in the context of inventors' collaboration based on a large scale analysis of co-patenting behaviors.In the following sections, we will identify three network constructs that can characterize an R and D organization's social capital and develop hypotheses regarding the structure-performance relationship.

Structural Holes
The basic idea of a structural hole is that a lack of ties among alters in an ego's social network benefits the ego in terms of accessing diverse information.In a social capital theory, actors who develop ties with disconnected groups are believed to gain access to a broader array of knowledge than those who are connected to a cohesive one [24].Actors who are in a position of bridging structural holes or gaps between alters, have opportunities to access and assimilate different streams of knowledge and, thus, are likely to play a key role in creating novel ideas [25].Therefore, the presence of structural holes in a collaboration network indicates that collaboration occurs among R and D personnel with different knowledge backgrounds, providing a greater opportunity for knowledge brokerage that can bring together more diverse knowledge streams and lead to richer contents [26].
Structural hole is also related to information efficiency.In frequent and intense interaction among actors that forms a dense communication structure, much of the information circulating in the system is redundant.Contrastingly, an inventor who spans a structural hole, benefits by brokering and controlling the flow of information between unconnected inventors who have not previously collaborated.Such an inventor is in a position of control since she or he is the only one connected to the other actors in an efficient way, which economizes on the number of ties.This means that inventors who value speed in their search for knowledge have to rely on the focal inventor.Consequently, the presence of structural holes in a collaboration network implies that diverse and non-overlapping knowledge is shared, and knowledge exchange occurs efficiently around the inventors who play the role of knowledge brokers, which results in greater creativity and productivity.This leads to the following hypothesis: H1: All other things being equal, the presence of structural hole in an inventor collaboration network will be positively associated with creating knowledge with a future impact.

Strength of Ties
The structural hole perspective focuses on the benefits of transferring and assimilating diverse knowledge (example, [27]) but does not address the problematic nature of such transfers.Presumably, people at opposite ends of a structural hole may have less experience than that of co-workers, which can impede knowledge transfer.On the contrary, individuals who communicate with others frequently or who have a strong emotional attachment to others are more likely to share knowledge than those who communicate infrequently or who are not emotionally attached [28].As an example, frequent communication can be more effective through the development of relationship-specific heuristics [29].
This view, known as a closure view on social capital [30], focuses on the risks associated with incomplete information in the presence of structural holes.Specifically, closure in a collaboration network is argued to affect easy access to information, and to facilitate sanctions that make it less risky for people in the network to trust each other.Research adopting this view have inferred the network effect on knowledge transfer from the association between tie strength and knowledge transfer [29,[31][32][33].They primarily focus on how the social dynamics within two-way interactions (example, reciprocity, and commitment) influence knowledge transfer.The effect of tie strength on knowledge transfer is also believed to facilitate the transfer of tacit knowledge [28,33].Hansen [33] argued that strong ties promote the transfer of complex knowledge than weak ties [29,33,34], because they are more likely be embedded in a dense web of trustworthy relationships [11,20].
The closure and structural hole views have striking parallels in the literature on social capital.Such disagreement originates from a lack of distinction between strong (weak) ties and a dense (sparse) network in the process of operationalization.In fact, research adopting the closure view usually assumes that a dense network represents social cohesion in which most members communicate frequently.Strong ties and social cohesion can be structurally correlated, but it is a mistake to equate their effects because they are conceptually distinct.Burt [35] made a clear conceptual separation between the strength and density of ties.It is very important to acknowledge this, since it is conceivable that sparse ties may be strong, and dense ties may be weak [28].Specifically, a strong tie can occur in both a cohesive group or in a sparse group [35,36].Therefore, only by investigating tie strength and cohesion simultaneously, is it possible to dissolve the disagreement.
In this study, we clearly distinguish between tie strength and network closure.The former is related to frequency, depth, or duration of collaboration within a pair of partners, while the latter is associated with a degree or density of overall connectivity; the former then needs to be observed at the dyadic level, but the latter at the network level.By separating them, the following hypothesis does not contradict, but complement, the previous hypothesis: H2: All other things being equal, an inventor collaboration network with many strong ties will be positively associated with creating knowledge with a future impact.

Centralization
The third hypothesis developed in this study is about the effects of network centralization.In social network theories, researchers have used the concept of centrality to indicate the status, power, and social capital captured by the location of an actor in a network [37][38][39].Unlike centrality, centralization is a network-level measure that examines the extent to which a whole network has a centralized structure.Centralization can tell us whether a network, as a whole, is organized around its most central points.
Centralization is related to cohesion, but provides more information than cohesion.In effect, the concepts of cohesion and centralization refer to the differing aspects of the overall "compactness" of a network.Cohesion describes the density of connections within a network, while centralization describes the extent to which such dense connections are organized around particular focal points.Centralization and cohesion, therefore, are important complementary measures.
As a result, in highly-centralized networks, there are a few clusters of inventors that form a strongly cohesive relationship.Research has examined the effect of such central groups on others within the same organization [38,[40][41][42].Centrality is often perceived as a signal of quality [41]; as a result, central groups of inventors create an attraction for their knowledge to be selected by others in their own inventive activities.Additionally, central groups have a topological advantage in that they have greater access to other parts of the network than less centralized ones.The expanding effects of a few central groups may weaken the activities of those that are local and independent, which mitigate a firm's capability to diversify its technology base and R and D portfolio.
In the innovation literature, the capability of technological diversification has been considered as a critical dimension for impactful innovation creation of many R and D organizations [43][44][45][46].Since many innovations are designed to solve unrelated problems, companies that are more technologically-diversified, capture more opportunities and technical possibilities; as a result, they benefit largely from their own research activities [47].Organizational learning theory also suggests the benefits of a diverse knowledge base.One such benefit is technological diversification that may play a preventive role against core rigidities [48], by generating and renovating technological trajectories, and taking advantage of cross-fertilization effects between different technologies [49,50].
Many empirical studies have provided evidence supporting these arguments.Ahuja and Lampert [7] demonstrate that, for the chemical industry, experimenting with diverse emerging technologies is a way for organizations to overcome core rigidities, and is associated with the subsequent number of inventions.Katila and Ahuja [51] also reports empirical evidence from the robotics industry, which shows that there is a linear and positive relationship between technological search scope and product innovation.A study by Nicholls-Nixon and Woo [52] examines the relationship between the breadth of technological knowledge and technical output (number of products and patents) in a sample of established pharmaceutical companies.Recently, Nesta and Saviotti [53] state that the scope and coherence of the knowledge base contribute positively to innovation performance, which is estimated by the number of patent applications.
Based on this review, we hypothesize that a highly centralized organization of R and D activities hampers technological diversification, which leads to a weaker performance of knowledge creation.This leads to the next hypothesis: H3: All other things being equal, highly centralized organization of R and D activities will be negatively associated with creating knowledge with a future impact.

Additional Test: Interaction Effects
The hypotheses we have developed so far assume that each network mechanism operates independently.However, since we captured the three factors from a single network, it is likely that they are structurally correlated and, thereby, operate interactively.Given the complexity of interactions among the network factors, it is not easily predictable whether one network mechanism boosts or weakens the others.A test for interaction effects between tie strength and two other factors is particularly meaningful in that if we find significant interaction effects among them, this proves that the effects of these factors have distinguishable network mechanisms and that they need to be treated separately.For instance, it is possible that the effects of strong ties diminish as the density of a network increases or the network becomes more centralized.Although we hypothesize that connection strength, itself, will be positively associated with performance, when both clustering coefficient and connection strength are high, the collaboration network becomes exceedingly cohesive, which has negative effects on performance.In this situation, the strong connections at the dyadic level may aggravate the negative effects of cohesive structure at the network level rather than compensate the effects depending on the topological structure of collaboration networks.Similarly, although we hypothesize that decentralized collaboration networks have a positive effect on innovation outputs, if a collaboration network has many strong ties, the effects of decentralization may have diminishing returns on performance.This leads to the following hypothesis: H4: All other things being equal, the network factors will interact with each other as they affect the firm's capability of creating impactful knowledge.

Research Sample
The research sample is constructed by using the patent database recently developed by Li et al. [54].Unlike the original United States Patent and Trademark Office (USPTO) database, the patent database by Li et al., includes unique inventor identifiers for patents granted from 1975 through 2010.For each firm, which is identified by a unique assignee code in the inventor database, we first construct a two-mode network consisting of two types of nodes, its patents and inventors.Following this, the two-mode network is converted into a one-mode network of inventors, as shown in Figure 1.The nodes of the converted network are distinct inventors, and there is a link between two different inventor nodes, if they have filed at least one patent together within a given time window.Note that each link may have a value if the pair of inventors has filed more than one patent jointly.
Each co-invention network represents a firm-year observation.Each network changes over time as a firm's patent stock accumulates.Time is needed for a co-invention network to grow to a meaningful size, so that network analysis can be applied; as a result, it is necessary to set a sufficient time window to obtain a single firm-level observation.In this study, a four-year time window has been set for each network following Rappa and Garud (1992) [55].More specifically, for each firm, co-invention networks are constructed every four years by using patents granted within the last four years from the end of the previous time window.Each co-invention network represents a firm-year observation.Each network changes over time as a firm's patent stock accumulates.Time is needed for a co-invention network to grow to a meaningful size, so that network analysis can be applied; as a result, it is necessary to set a sufficient time window to obtain a single firm-level observation.In this study, a four-year time window has been set for each network following Rappa and Garud (1992) [55].More specifically, for each firm, co-invention networks are constructed every four years by using patents granted within the last four years from the end of the previous time window.
The sample firms are selected by the following procedure.We first counted the total number of patents of each firm during the total time window (35 years) and selected 500 firms in order of patent counts.Then we built their co-invention networks and again sorted them in order of edge counts in the networks.By examining the individual firm networks we excluded firms that have not filed patents at least one year during the total time window.Specifically, a firm is included in the sample only when the firm has a record of filing patents every year, which means that every firm in the sample has filed at least one patent during the total time window.By doing this, we intend to narrow our focus on firms whose propensity to patent is persistent during the sample period, and that have a sufficient size for an R and D organization.In this way, we finally selected 50 firms that have filed patents more than one patent during 1991-2010.Since we consider five time windows for each firm during the 20-year period, the research sample consists of 250 observations.Information on the sample firms has been included in Appendix A. The sample firms have filed 669,332 patents, which account for 21.4% of the total patents during the period.

Dependent Variable: Patent Citations
The output quality of an R and D organization is measured by the forward citations that its patents stock have received by subsequent inventions until 2010.Patent citations have been considered as excellent measures for technological impact and performance [39,56,57].We use the total number of citations a patent receives from the time it is granted until the end of 2010 as an indicator of its impact on future knowledge creation.These citations are received from the entire universe of patents, which includes a sample of more than 4,000,000 patents used in this study.

Measures for Structural Holes: Average Degree and Clustering Coefficient
The size of structural holes is measured by average degree and clustering coefficient, which are, in fact, measures for network cohesion.We associate network cohesion with dense connectivity among inventors; as a result, network cohesion is opposite to a structural hole.Thus, more cohesive networks spanned fewer structural holes, so a firm's performance should have a negative association with cohesion according to H1.A popular measure for cohesion is network density, which indicates The sample firms are selected by the following procedure.We first counted the total number of patents of each firm during the total time window (35 years) and selected 500 firms in order of patent counts.Then we built their co-invention networks and again sorted them in order of edge counts in the networks.By examining the individual firm networks we excluded firms that have not filed patents at least one year during the total time window.Specifically, a firm is included in the sample only when the firm has a record of filing patents every year, which means that every firm in the sample has filed at least one patent during the total time window.By doing this, we intend to narrow our focus on firms whose propensity to patent is persistent during the sample period, and that have a sufficient size for an R and D organization.In this way, we finally selected 50 firms that have filed patents more than one patent during 1991-2010.Since we consider five time windows for each firm during the 20-year period, the research sample consists of 250 observations.Information on the sample firms has been included in Appendix A. The sample firms have filed 669,332 patents, which account for 21.4% of the total patents during the period.

Dependent Variable: Patent Citations
The output quality of an R and D organization is measured by the forward citations that its patents stock have received by subsequent inventions until 2010.Patent citations have been considered as excellent measures for technological impact and performance [39,56,57].We use the total number of citations a patent receives from the time it is granted until the end of 2010 as an indicator of its impact on future knowledge creation.These citations are received from the entire universe of patents, which includes a sample of more than 4,000,000 patents used in this study.

Measures for Structural Holes: Average Degree and Clustering Coefficient
The size of structural holes is measured by average degree and clustering coefficient, which are, in fact, measures for network cohesion.We associate network cohesion with dense connectivity among inventors; as a result, network cohesion is opposite to a structural hole.Thus, more cohesive networks spanned fewer structural holes, so a firm's performance should have a negative association with cohesion according to H1.A popular measure for cohesion is network density, which indicates directly how densely inventors are connected each other.However, network density has a scale problem, in that it underestimates cohesion when a network size is too large.To overcome this problem, average degree is usually considered as a substitute measure of cohesion.Since the average degree does not depend on network size, network cohesion can be compared to the networks of different sizes ( [58], p. 74).Clustering coefficient is calculated at both the network level (global) and the node level (local).
A global clustering coefficient represents the number of closed triplets (complete triangles) over the total number of triplets.The local clustering coefficient [59] is given by the proportion of links between nodes within a focal node's neighborhood divided by the maximum number of links that could exist among them.In this study, we combine the two clustering coefficients into one index by using a principal component analysis.

Measures for Centralization: Weighted Degree Centralization, Number of Components, and Component Concentration
Centralization is a macro-level characteristic of a network, which is calculated by using each node's centrality, a node-level characteristic.Centralization indicates how unequal the distribution of node centrality is in a network, or how much variance there is in the distribution of centrality in a network.There are as many centralization measures as centrality measures.This study considers only degree centralization.Note that we use weighted degree centralization (WDC) since a co-invention network is a valued one.To calculate a WDC index, we first calculate the sum of the differences in degree centrality between the most central actor, A, and all the other actors in the network.The sum is then divided by its maximum under the largest possible centralization (that is, a star network).Centralization is a macro-level characteristic of a network, which is calculated by using each node's centrality, a node-level characteristic.Centralization indicates how unequal the distribution of node centrality is in a network, or how much variance there is in the distribution of centrality in a network.There are as many centralization measures as centrality measures.This study considers only degree centralization.Note that we use weighted degree centralization (WDC) since a co-invention network is a valued one.To calculate a WDC index, we first calculate the sum of the differences in degree centrality between the most central actor, A, and all the other actors in the network.The sum is then divided by its maximum under the largest possible centralization (that is, a star network).Another important measure for centralization is the number of components.Components represent a part of a network (that is, a sub-network) that is connected within, but disconnected from other parts of a network.If a firm's co-invention network has many components, this means that its R and D is conducted by many independent groups of inventors.
The number of components does not take into account the differences in component sizes.Given networks with the same number of components, the distribution of component sizes in each network may vary significantly.Some networks may have a giant component and many small-sized components, while others only have many components of a similar size.The former may be considered centralized in that many inventors are connected to form a giant component.However, the network can also be considered decentralized, because many small components conduct R and D activities independent of the inventors in the largest component.To quantify this difference we use component concentration, which is represented by Herfindahl-Hirschman index (HHI) for the number of inventors in the components.
directly how densely inventors are connected each other.However, network density has a scale problem, in that it underestimates cohesion when a network size is too large.To overcome this problem, average degree is usually considered as a substitute measure of cohesion.Since the average degree does not depend on network size, network cohesion can be compared to the networks of different sizes ( [58], p. 74).Clustering coefficient is calculated at both the network level (global) and the node level (local).A global clustering coefficient represents the number of closed triplets (complete triangles) over the total number of triplets.The local clustering coefficient [59] is given by the proportion of links between nodes within a focal node's neighborhood divided by the maximum number of links that could exist among them.In this study, we combine the two clustering coefficients into one index by using a principal component analysis.Centralization is a macro-level characteristic of a network, which is calculated by using each node's centrality, a node-level characteristic.Centralization indicates how unequal the distribution of node centrality is in a network, or how much variance there is in the distribution of centrality in a network.There are as many centralization measures as centrality measures.This study considers only degree centralization.Note that we use weighted degree centralization (WDC) since a co-invention network is a valued one.To calculate a WDC index, we first calculate the sum of the differences in degree centrality between the most central actor, A, and all the other actors in the network.The sum is then divided by its maximum under the largest possible centralization (that is, a star network).Another important measure for centralization is the number of components.Components represent a part of a network (that is, a sub-network) that is connected within, but disconnected from other parts of a network.If a firm's co-invention network has many components, this means that its R and D is conducted by many independent groups of inventors.
The number of components does not take into account the differences in component sizes.Given networks with the same number of components, the distribution of component sizes in each network may vary significantly.Some networks may have a giant component and many small-sized components, while others only have many components of a similar size.The former may be considered centralized in that many inventors are connected to form a giant component.However, the network can also be considered decentralized, because many small components conduct R and D activities independent of the inventors in the largest component.To quantify this difference we use component concentration, which is represented by Herfindahl-Hirschman index (HHI) for the number of inventors in the components.Another important measure for centralization is the number of components.Components represent a part of a network (that is, a sub-network) that is connected within, but disconnected from other parts of a network.If a firm's co-invention network has many components, this means that its R and D is conducted by many independent groups of inventors.
The number of components does not take into account the differences in component sizes.Given networks with the same number of components, the distribution of component sizes in each network may vary significantly.Some networks may have a giant component and many small-sized components, while others only have many components of a similar size.The former may be considered centralized in that many inventors are connected to form a giant component.However, the network can also be considered decentralized, because many small components conduct R and D activities independent of the inventors in the largest component.To quantify this difference we use component concentration, which is represented by Herfindahl-Hirschman index (HHI) for the number of inventors in the components.
Given the number of components n, and the number of inventors I, in a co-invention network, component concentration is calculated as follows: (1) In the Equation ( 1), C i represents the number of inventors in the i th component.The lower component concentration implies that inventors are more evenly distributed over components in the network.Contrastingly, if inventors are connected in a few large components within a network, then component concentration becomes close to 1.In Table 2, the network in Table 2b has more components, but its component concentration is smaller than the network in Table 2a in which there is a giant component.Given the number of components n, and the number of inventors I, in a co-invention network, component concentration is calculated as follows:

HHI
(1) In the Equation ( 1), Ci represents the number of inventors in the i th component.The lower component concentration implies that inventors are more evenly distributed over components in the network.Contrastingly, if inventors are connected in a few large components within a network, then component concentration becomes close to 1.In Table 2, the network in Table 2b has more components, but its component concentration is smaller than the network in Table 2a in which there is a giant component.Tie strength represents frequency, depth, and duration of the collaboration, and is measured by the number of patents that two inventors have co-invented.Since tie strength is a value for each dyad, it needs to be converted into a firm or network level index.We consider the ratio of dyads with multiple links (that is, links having weight larger than 1) to the total number of links in a given network.Specifically, it is calculated as follows:

Strength of ties
The Number of Valued Weighted Edges The Number of Total Edges (2)

Control Variables
To remove truncation effects due to different time horizons, we include period as a control variable.As noted earlier, each period variable represents a four-year time window.For instance, period one refers to the period from 1991 to 1995.The difference in propensity to patent, according to industry, also needs to be controlled.We include an industry control that has one of the following six values based on SIC classification codes: (1) construction; (2) manufacturing; (3) transportation, communication, electric, gas, and sanitary services; (4) wholesale trade; (5) financial, insurance, and real estate; and (6) services.Finally, we control for the effects of firm size by including the number of inventors and patents as control variables.

Descriptive Analysis
Descriptive statistics of variables along with a correlation matrix are presented in Given the number of components n, and the number of inventors I, in a co-invention network, component concentration is calculated as follows:

HHI
(1) In the Equation ( 1), Ci represents the number of inventors in the i th component.The lower component concentration implies that inventors are more evenly distributed over components in the network.Contrastingly, if inventors are connected in a few large components within a network, then component concentration becomes close to 1.In Table 2, the network in Table 2b has more components, but its component concentration is smaller than the network in Table 2a in which there is a giant component.Tie strength represents frequency, depth, and duration of the collaboration, and is measured by the number of patents that two inventors have co-invented.Since tie strength is a value for each dyad, it needs to be converted into a firm or network level index.We consider the ratio of dyads with multiple links (that is, links having weight larger than 1) to the total number of links in a given network.Specifically, it is calculated as follows:

Strength of ties
The Number of Valued Weighted Edges The Number of Total Edges (2)

Control Variables
To remove truncation effects due to different time horizons, we include period as a control variable.As noted earlier, each period variable represents a four-year time window.For instance, period one refers to the period from 1991 to 1995.The difference in propensity to patent, according to industry, also needs to be controlled.We include an industry control that has one of the following six values based on SIC classification codes: (1) construction; (2) manufacturing; (3) transportation, communication, electric, gas, and sanitary services; (4) wholesale trade; (5) financial, insurance, and real estate; and (6) services.Finally, we control for the effects of firm size by including the number of inventors and patents as control variables.

Descriptive Analysis
Descriptive statistics of variables along with a correlation matrix are presented in Table 3  Tie strength represents frequency, depth, and duration of the collaboration, and is measured by the number of patents that two inventors have co-invented.Since tie strength is a value for each dyad, it needs to be converted into a firm or network level index.We consider the ratio of dyads with multiple links (that is, links having weight larger than 1) to the total number of links in a given network.Specifically, it is calculated as follows:

Strength of ties "
The Number of Valued pWeightedq Edges The Number of Total Edges (2)

Control Variables
To remove truncation effects due to different time horizons, we include period as a control variable.As noted earlier, each period variable represents a four-year time window.For instance, period one refers to the period from 1991 to 1995.The difference in propensity to patent, according to industry, also needs to be controlled.We include an industry control that has one of the following six values based on SIC classification codes: (1) construction; (2) manufacturing; (3) transportation, communication, electric, gas, and sanitary services; (4) wholesale trade; (5) financial, insurance, and real estate; and (6) services.Finally, we control for the effects of firm size by including the number of inventors and patents as control variables.

Descriptive Analysis
Descriptive statistics of variables along with a correlation matrix are presented in Table 3 below.Each # Symbol in our paper means the number of variable (e.g., (9) in Table 3 is the number of patents).The table shows that an average co-invention network has about 2600 inventors, and they file a similar number of patents.The average degree is 4.58, which means that, on average, an inventor collaborates with about 4.58 inventors.Notably, an average network has about 500 components, implying that each component has about only 5-6 nodes on average.If we consider that component concentration is rarely zero, most components would have an even smaller size.Finally, the ratio of multi-valued edges is about 27.6% on average, which suggests that repeated collaboration is an unusual event.Remarkably, the variables of network centralization have a much larger variation than an average degree or connection strength.This suggests that network centralization is a more effective factor that explains the differences in collaboration structures.

Estimation Result
The proposed hypotheses are tested by using negative binomial regression with time-dummies.The dependent variable of citation counts takes on only whole number values.The use of a linear regression model on such data can yield inefficient, inconsistent, and biased coefficient estimates.These data, like most count data, exhibit over-dispersion-the variance is greater than the mean.Negative binomial regressions explicitly accommodate this over-dispersion by enabling the variance to be greater than the mean.
We use a time-fixed effect estimation model without firm dummies, unlike typical fixed effects estimations.Fixed effects estimation with firm dummies uses only within-firm differences (which have been pooled in our case), essentially discarding information about differences between firms.In our application where the within-firm variation is small relative to the between-firm variation, use of a negative binomial model with only time-dummies is more suitable.Moreover, patenting behavior (thus, patent citations) is often affected by unobserved time-related factors (which are universally affecting firms) like macro-economic, sociological, or technological situations (e.g., IT bubbles in the late 1990s).By adding time-dummies only, we can estimate between-firm variation by controlling such unobserved time-related factors.
Table 4 displays the regression results, where patent citations are regressed on variables for co-invention network structures.Model 1 contains only the control variables.In the model, the negative coefficients on period dummies show that truncation effects are effectively controlled.Remarkably, industry dummies do not have significant effects on patent citations, implying that our sample shows a consistent patenting behavior regardless of industry type.
Models 2 through 7 test the effects of independent variables individually.At first, the signs on the clustering coefficient and average degree all have negative signs, but the effect of density is not significant.However, in Models 8, 9, and 10, in which the effects of other variables are controlled for, the two cohesion variables show negative and significant coefficients.Specifically, clustering and density, which are associated with network cohesion, have a negative effect on creating inventions with future impact.This offers support for H1, implying that a firm's R and D performance is negatively associated with the extent to which collaboration among inventors forms a dense or cohesive network structure.Consequently, the presence of structural holes in a collaboration network is associated with a much higher innovation performance than those in a dense collaboration structure.
Second, Model 4 shows that strength of ties has positive and significant effects on forward citation frequency.The effect of tie strength is relatively strong and consistent throughout all models (in Models 8 through 10) in which it is included.This offers support for H2, suggesting that frequent and repetitive collaboration between previous partners can significantly improve the likelihood of inventing patents with many citations.Specifically, once a collaboration relationship is established between a particular pair of inventors, this needs to be sustained in the subsequent projects instead of exploring and establishing a new partnership with other partners.
The findings so far support both the structural hole and closure perspectives, as they are operationalized in our terms and methods.The findings clearly show that tie strength and cohesive connectivity have distinct effects (example, [28]).Cohesiveness represents redundancy and inefficiency of knowledge acquisition, which has a negative performance implication, while tie strength reduces coordination costs and facilitates the transfer of complex knowledge.Finally, Models 5, 6, and 7 through 10 show the effects of three variables representing network centralization, which are WDC, number of components, and component concentration.They have an insignificant or marginally significant effect on forward citations in the models in which they are considered individually.However, when other variables are included together as in Models 8 through 10 (that is, the effects of structural holes and tie strength are controlled), the coefficients of all three centralization variables become significant.The sign of each variable offers consistent support for H3, suggesting that a centralized structure of R and D collaboration has a negative effect on the impact of inventions.Both WDC and component concentration that directly measure the extent of network centralization, have significant and negative coefficients.On the contrary, the number of components, which is a measure associated with decentralization, has a positive and significant coefficient.Specifically, an R and D output has a weaker impact when there are highly centralized groups of collaborating inventors.Contrastingly, the impact of the output increases with the number of isolated groups.Supporting this interpretation, the coefficient of component concentration is significant and positive, suggesting that when inventors are distributed evenly in many sub-networks of a similar size, the overall performance of collaboration becomes much greater.Such a decentralized organization of an R and D collaboration indicates that there are no leading groups that manage and control overall inventive processes, and that inventors do not rely on particular groups of inventors.Rather, in decentralized organizations, inventive activities are performed by various independent groups of inventors and those independent groups are likely to have distinct expertise, and to proceed with their own agenda, independent of interventions from central inventor groups.In sum, the findings so far consistently support H1, H2, and H3, which suggest that while cohesive and centralized collaboration structure is not desirable, frequent and repetitive collaboration between existing co-workers can enhance patent quality.

Interaction Effects
Table 5 displays the results of the negative binomial regression models containing interaction terms between each pair of network variables.In Model 11, it is found that WDC and clustering coefficient have a negative interaction effect, as expected.The fact that both variables have negative effects on performance, it can be naturally expected that one will boost the effect of the other.In Model 12, the interaction effect between WDC and connection strength is tested.The coefficient is significant and positive, suggesting that connection strength, which has a positive effect on performance, also alleviates the negative effect of a centralized collaboration structure.Finally, Model 13 tests the interaction effect between the clustering coefficient and connection strength, and displays the result that connection strength reinforces the negative effects of a cohesive structure.Although connection strength itself is positively associated with performance, when both clustering coefficient and connection strength are high, the collaboration network becomes exceedingly cohesive, which has negative effects on performance.In consequence, the positive effects of connection strength may not be found depending on the topological structure of collaboration networks.The previous findings show that in centralized collaboration networks, strong connections can enhance performance.However, if a collaboration network is already cohesive, the existence of many strong ties in the network may have an adverse effect on performance.

Conclusions
This study offers empirical evidence to show that each firm has a distinctive R and D collaboration structure, which affects the firm's R and D performance and output quality.The findings are in line with the extant literature on the structural hole perspective, and at the same time, provide support for the network closure perspective, by showing the positive impact of information brokerage and efficiency, as well as recurring and intense collaboration.More importantly, our analysis consistently reports the benefits of a decentralized collaboration regardless of the different operationalization of centralized structures.
Our study makes several contributions.First, this is the first attempt to examine a collaboration structure employing a large-scale sample of uniquely identified inventors and their patents data for more than 20 years.Second, this study clearly separates the two structural concepts, tie strength and connectivity, from the traditional closure perspective in which a clear distinction between them has been rarely made.The results show that these two structural concepts examine different aspects of network mechanisms in R and D collaboration and, thus, report that their effects on organization performance are different with firms.As claimed in Reagans and McEvily [28], our findings suggest that structural holes are the source of value added, and strength of ties is essential to realizing the value buried in the holes.Finally, our study employs centralization and component structures, which are rarely found in empirical studies based on social network theories.Taking into account components structures was required from a methodological perspective, because each observation has many isolates or components.Component structures also help to avoid the traditional dichotomous view of social capital, the structural hole, and closure perspectives, by complimenting the two typical network mechanisms in relation with the output quality.
The findings provide some implications for the management of R and D organizations.From the perspective of individual R and D personnel, it is more effective to continue and strengthen collaboration with currently working partners than finding new ones.If they want to find new collaboration partners, it would be beneficial to find inventors who have not been cooperating much with others.At the organization level, managers need to identify and empower distinct groups of collaborators to maintain the decentralization of inventive activities.They need to understand that the excessive reliance on "superstars" may inhibit the capability of creating new inventions, and diversifying the knowledge base.More problematically, that may reduce the incentive to focus on new ideas and cause inventors to maintain their status by relying on the idea of a few key players or the organizational status quo.
There are also some limitations worth noting.First, our findings are not generalizable in the context of inter-firm R and D collaboration.Extending our research framework to the context of inter-firm R and D alliances, it is necessary to examine inventor-level collaboration structures in which each inventor belongs to different organizations.This is hardly considered in current studies on R and D alliances mostly due to the lack of available data.In addition to disambiguating inventors' name in the patent database, we need information on their affiliation.Second, like many other studies on network structures, this study did not take into account demographic features of individual inventors.Detailed information about inventors is typically hard to obtain [8].If demographic information of inventors is available, research incorporating both network structures and demographic information will provide richer implications on the relationship among collaboration structure, individual characteristics, and organizational performance.Finally, it is also worth noting that although measures of centralization we used are popular, they are not perfect measures for clearly distinguishing centralized and decentralized R and D organizations.For instance, centralization tells us only whether a network is organized around its most central points, but they do not tell us whether these central points comprise of a distinct set of points, which cluster together in a particular part of the network.The individual central points, for example, may be distributed widely throughout the network, and in such cases, a measure of centralization might not be especially informative.Although overcoming this limitation may require new methods and wide empirical tests, reexamining the proposed hypotheses with more sophisticated measures on centralization will be a worthwhile extension to the present study.

Appendix B. Negative Binomial Regression Results for Different Time Windows
We conducted additional tests to see if changing time windows affect the regression results.In addition to the four-year time window, two additional time windows, three-and five-year, are considered and the regression results are provided in the following tables.The results are not significantly different from the results in the text and they still offer support of our hypotheses.

4 .
Measure for Strength of Ties: The Ratio of Dyads with Multiple Links

4 .
Measure for Strength of Ties: The Ratio of Dyads with Multiple Links

Table 1
compares two co-invention networks with different WDC indices.At a glance, Table1blooks more centralized around a few inventors.However, the actual WDC index of Table1ais greater than Table1bby 0.34.This is because WDC reflects the weights of links.

Table 1 .
Two co-invention networks with different weighted degree centralization.
[59]densely inventors are connected each other.However, network density has a scale problem, in that it underestimates cohesion when a network size is too large.To overcome this problem, average degree is usually considered as a substitute measure of cohesion.Since the average degree does not depend on network size, network cohesion can be compared to the networks of different sizes ([58], p. 74).Clustering coefficient is calculated at both the network level (global) and the node level (local).A global clustering coefficient represents the number of closed triplets (complete triangles) over the total number of triplets.The local clustering coefficient[59]is given by the proportion of links between nodes within a focal node's neighborhood divided by the maximum number of links that could exist among them.In this study, we combine the two clustering coefficients into one index by using a principal component analysis. directly

Table 1
compares two co-invention networks with different WDC indices.At a glance, Table1blooks more centralized around a few inventors.However, the actual WDC index of Table1ais greater than Table1bby 0.34.This is because WDC reflects the weights of links.

Table 1 .
Two co-invention networks with different weighted degree centralization.

Table 1
compares two co-invention networks with different WDC indices.At a glance, Table1blooks more centralized around a few inventors.However, the actual WDC index of Table1ais greater than Table1bby 0.34.This is because WDC reflects the weights of links.

Table 1 .
Two co-invention networks with different weighted degree centralization.

Table 2 .
Two co-invention networks with different component concentrations.

Table 2 .
Two co-invention networks with different component concentrations.

Table 2 .
Two co-invention networks with different component concentrations. below.
3.2.4.Measure for Strength of Ties: The Ratio of Dyads with Multiple Links

Table 3 .
Descriptive statistics and correlation matrix.

Table 4 .
Negative binomial regression results (A sensitivity test for different time-windows is provided in Appendix B).

Table B1 .
Regression Results for Three-Year Time Window.

Table B2 .
Regression Results for Three-Year Time Window (interaction effects).

Table B3 .
Regression Results for Five-Year Time Window.

Table B4 .
Regression Results for Five-Year Time Window (interaction effects).