# Intensity of Bilateral Contacts in Social Network Analysis

## Abstract

## 1. Introduction

- explores the application of recent advances in SNA methods that allow the consideration of weights and direction in the calculation of clustering coefficients;
- proposes an approach to extract indicators that describe the behavioral aspects of the social network members; and
- develops a model that can explain the actors’ behavior through SNA indicators.

## 2. Related Work

- type of network analyzed;
- indicator type: Structural (conventional graph theory indicators), Activity (accounting for traffic, intensity or frequency of connections), or Clustering (local, “small world” indicators);
- use of direction in connections; and
- weights used (in the case of weighted indicators).

## 3. Email Network Data and Main Indicators

## 4. Graph Theory Indicators

## 5. Extending Clustering Approach

- Cycle: a triangle where every arc has the same direction (j→i, i→k, k→ j or vice versa) (Figure 4c); and
- Middleman: a triangle where the two arcs of i have different directions and there is an arc between j and k (or vice versa), without forming a cycle. There are two arcs incoming to k or j (j→i, i→k, j→k or vice versa) (Figure 4d).

## 6. Symmetrical and A-Symmetrical Models

- Conventional model: independent variables include main statistics on individual email activity (number of emails sent and share of own replies within the threshold period) and standard symmetric centrality indicators;
- Clustering model: independent variables include main statistics on individual email activity and directed clustering indicators; and
- Extended Directional model: independent variables combine main statistics on individual email activity, directed centrality indicators, and directed clustering indicators.

#### 6.1. Share of Outgoing Emails Responded Within 7 Days

^{2}coefficient of the Extended Directional model is 0.8596 compared to 0.8468 and 0.8428 for the other two models, respectively. Closeness is considered in its directional version, which results in its weight in the model to be split in two. The two directions are not symmetrical though, with the in-closeness centrality having a negative correlation which roughly counter-balances the positive impact of the out-closeness one. The three clustering coefficients remain significant in the Extended Directional model, maintaining the direction of the influence, with small changes in the estimates. The estimates of In- and Out-clustering coefficients converge to comparable levels, while the estimate for the Middleman coefficient decreases further.

#### 6.2. Share of Outgoing Emails Responded Within 24 Hours

#### 6.3. Validation of Methodology on Alternative Datasets

^{2}but also demonstrates a visible improvement when the Extended Directional model is applied. It is also notable that the 1 d timescale has a better fit than the 7 d timescale in the case of Enron. This comparison confirms that the explanatory power of the model increases in all cases when directional coefficients are used. The overall accuracy, though, depends on the specificities of each network. Different parameters, especially in regard to the timescale used, would probably improve the results of the comparison.

## 7. Conclusions

## Appendix A. Results for Validation Networks

Conventional | Clustering | Extended Directional | ||||
---|---|---|---|---|---|---|

Estimate | Pr(>|t|) | Estimate | Pr(>|t|) | Estimate | Pr(>|t|) | |

Number of emails sent | −1.113 × 10^{−4} | 0.00601 ** | 1.403 × 10^{−4} | 0.000995 *** | −8.578 × 10^{−6} | 0.87386 |

Share of own replies within period | 9.635 × 10^{−2} | 0.04899 * | 9.553 × 10^{−1} | <2 × 10^{−16} *** | 9.887 × 10^{−2} | 0.04459 * |

Closeness | 2.949 | <2 × 10^{−16} *** | ||||

Closeness (in) | 4.887 × 10^{−2} | 0.21796 | ||||

Closeness (out) | −2.439 | 0.20323 | ||||

“In” clustering coefficient | 1.245 × 10^{−1} | 0.004622 ** | 4.887 × 10^{−2} | 0.21796 | ||

“Out” clustering coefficient | 4.702 × 10^{−3} | 0.878011 | −2.164 × 10^{−2} | 0.43262 | ||

“Middleman” clustering coefficient | −4.740 × 10^{−2} | 0.272049 | −1.180 × 10^{−2} | 0.76112 | ||

Adjusted R-squared | 0.9823 | 0.9781 | 0.9823 | |||

p-value | <2.2 × 10^{−16} | <2.2 × 10^{−16} | <2.2 × 10^{−16} |

Conventional | Clustering | Extended Directional | ||||
---|---|---|---|---|---|---|

Estimate | Pr(>|t|) | Estimate | Pr(>|t|) | Estimate | Pr(>|t|) | |

Number of emails sent | −9.079 × 10^{−5} | 0.085. | 2.369 × 10^{−4} | 0.000206 *** | 7.396 × 10^{−5} | 0.30228 |

Share of own replies within period | 1.813 × 10^{−1} | 3.69 × 10^{−10} *** | 9.111 × 10^{−1} | <2 × 10^{−16} *** | 1.866 × 10^{−1} | 1.04 × 10^{−10} *** |

Closeness | 2.396 | <2 × 10^{−16} *** | ||||

Closeness (in) | 8.495 | 0.00103 ** | ||||

Closeness (out) | −6.139 | 0.01784 * | ||||

“In” clustering coefficient | 1.529 × 10^{−1} | 0.020532 * | 4.493 × 10^{−2} | 0.40060 | ||

“Out” clustering coefficient | −1.161 × 10^{−1} | 0.011989 * | −9.589 × 10^{−2} | 0.01030 * | ||

“Middleman” clustering coefficient | 8.458 × 10^{−2} | 0.192056 | 4.440 × 10^{−2} | 0.39701 | ||

Adjusted R-squared | 0.9607 | 0.9404 | 0.9611 | |||

p-value | <2.2 × 10^{−16} | <2.2 × 10^{−16} | <2.2 × 10^{−16} |

Conventional | Clustering | Extended Directional | ||||
---|---|---|---|---|---|---|

Estimate | Pr(>|t|) | Estimate | Pr(>|t|) | Estimate | Pr(>|t|) | |

Number of emails sent | 2.161 × 10^{−5} | 0.462 | 1.245 × 10^{−4} | 6.72 × 10^{−5} *** | 1.413 × 10^{−5} | 0.742 |

Share of own replies within period | 4.322 × 10^{−2} | 4.481 × 10^{−2} | 5.021 × 10^{−1} | <2 × 10^{−16} *** | 3.915 × 10^{−2} | 0.384 |

Closeness | 1.482 | 1.201 × 10^{−1} | ||||

Closeness (in) | −1.107 × 10^{1} | 0.881 | ||||

Closeness (out) | 1.262 × 10^{1} | 0.865 | ||||

“In” clustering coefficient | 3.834 × 10^{−2} | 0.521 | −1.115 × 10^{−1} | 0.046 * | ||

“Out” clustering coefficient | 1.761 × 10^{−1} | 0.145 | 8.092 × 10^{−2} | 0.461 | ||

“Middleman” clustering coefficient | 7.172 × 10^{−2} | 0.644 | 3.361 × 10^{−2} | 0.739 | ||

Adjusted R-squared | 0.6082 | 0.5239 | 0.6091 | |||

p-value | <2.2 × 10^{−16} | <2.2 × 10^{−16} | <2.2 × 10^{−16} |

Conventional | Clustering | Extended Directional | ||||
---|---|---|---|---|---|---|

Estimate | Pr(>|t|) | Estimate | Pr(>|t|) | Estimate | Pr(>|t|) | |

Number of emails sent | 1.154 × 10^{−5} | 8.94 × 10^{−6} *** | 1.208 × 10^{−5} | 1.69 × 10^{−6} *** | 3.523 × 10^{−6} | 0.35463 |

Share of own replies within period | 6.126 × 10^{−1} | <2 × 10^{−16} *** | 6.149 × 10^{−1} | <2 × 10^{−16} *** | 6.110 × 10^{−1} | <2 × 10^{−16} *** |

Closeness | 3.245 × 10^{−3} | 0.492 | ||||

Closeness (in) | −1.847 × 10^{1} | 0.00524 ** | ||||

Closeness (out) | 1.847 × 10^{1} | 0.00522 ** | ||||

“In” clustering coefficient | −2.139 × 10^{−4} | 0.964 | −3.092 × 10^{−3} | 0.53343 | ||

“Out” clustering coefficient | 1.449 × 10^{−3} | 0.883 | 1.779 × 10^{−3} | 0.85563 | ||

“Middleman” clustering coefficient | −1.584 × 10^{−3} | 0.860 | −2.073 × 10^{−3} | 0.81785 | ||

Adjusted R-squared | 0.6791 | 0.678 | 0.6814 | |||

p-value | <2.2 × 10^{−16} | <2.2 × 10^{−16} | <2.2 × 10^{−16} |

**Figure 1.**Number of emails sent and received by each individual; color represents the department the individual belong to (logarithmic scale).

**Figure 2.**Average number of emails sent and received per person for each department (bubble size is proportional to the number of department members).

**Figure 3.**Email traffic between departments (top 10% of bilateral traffic); each color represents a different sender department.

**Figure 4.**Possible triangles in directed clustering coefficient analysis: (

**a**) In, (

**b**) Out, (

**c**) Cycle, (

**d**) Middleman.

Authors | Network | Indicators | Directed | Weights |
---|---|---|---|---|

Adamic and Adar (2001) [12] | Web page network | Structural | Yes | Structural (number of links) |

Watts and Strogatz (1998) [14] | Biological network; Collaboration network | Clustering | No | No |

Barrat et al. (2004) [13] | Aviation network | Structural; Activity | yes | Activity (available seats per year) |

Fagiolo (2007) [15] | International trade network | Clustering | Yes | Clustering (unweighted local coefficients) |

Fagiolo et al. (2008) [16] | International trade network | Structural; Activity; Clustering | Yes | Clustering (weighted local coefficients) |

Hangal et al. (2010) | Bibliography network; Twitter retweet network | Structural; Activity | Yes | Activity (directed influence between nodes) |

Traud et al. (2012) [17] | Facebook contacts network | Structural; Clustering | No | No |

Chen et al. (2013) [18] | Online community network; collaboration network | Clustering | Yes | Clustering (in- and out-degree) |

Myers et al. (2014) [19] | Twitter follow graph | Structural; Clustering | Yes | No |

Clemente and Grassi (2018) [11] | Theoretical graphs | Clustering | Yes | Clustering (weighted local coefficients) |

Portela et al. (2016) [21] | email network | Structural; Clustering | Yes | No |

Chen et al. (2019) [22] | email network | Structural | Yes | No |

This work | email network | Structural;Activity; Clustering | Yes | Clustering (weighted local coefficients, based on References [21,22]) |

Indicator | Median | Mean | Minimum | Maximum | Standard Deviation | Skewness |
---|---|---|---|---|---|---|

Number of sent emails | 77 | 330.7 | 0 | 9782 | 740.4 | 5.43 |

Number of received emails | 130 | 330.7 | 0 | 4710 | 483.9 | 2.88 |

Ratio of number of sent to number of received emails | 0.767 | 1.553 | 0.002 | 303.6 | 11.43 | 25.11 |

Indicator | Median | Mean | Minimum | Maximum | Standard Deviation | Skewness |
---|---|---|---|---|---|---|

Number of sent emails per person | 378.75 | 370.67 | 43.33 | 970.47 | 197.69 | 0.85 |

Number of received emails per person | 423 | 392.63 | 52.67 | 640 | 146.89 | −0.61 |

Ratio of number of sent to number of received emails | 0.90 | 1.02 | 0.24 | 2.88 | 0.552 | 1.8 |

Median | Mean | Minimum | Maximum | Standard Deviation | Skewness | |
---|---|---|---|---|---|---|

Degree centrality | 0.0478 | 0.0657 | 0.0020 | 0.5483 | 0.0623 | 2.34 |

Degree centrality (in) | 0.0244 | 0.0322 | 0.0010 | 0.2116 | 0.0287 | 1.98 |

Degree centrality (out) | 0.0234 | 0.0336 | 0.0010 | 0.3367 | 0.0349 | 2.72 |

Closeness centrality | 0.3759 | 0.3787 | 0.3699 | 0.4692 | 0.0091 | 3.18 |

Closeness centrality (in) | 0.3752 | 0.3779 | 0.3699 | 0.4692 | 0.0073 | 2.19 |

Closeness centrality (out) | 0.3755 | 0.3775 | 0.3699 | 0.4265 | 0.0091 | 3.30 |

Betweenness centrality | 958 | 2453.9 | 0 | 42,250 | 4449.6 | 4.40 |

Betweenness centrality (directional) | 956 | 2446.6 | 0 | 42,225 | 4437.5 | 4.41 |

Clustering Coefficient | Median | Mean | Minimum | Maximum | Standard Deviation | Skewness |
---|---|---|---|---|---|---|

In | 0.4738 | 0.4828 | 0 | 1 | 0.2026 | 0.1466 |

Out | 0.4167 | 0.4413 | 0 | 1 | 0.2141 | 0.4148 |

Middleman | 0.4689 | 0.4845 | 0 | 1 | 0.1988 | 0.2346 |

Cycle | 0.4282 | 0.4444 | 0 | 1 | 0.1963 | 0.4019 |

n | k | From | To | Timestamp ${\mathit{t}}_{\mathit{n}}$ | ${\mathit{t}}_{\mathit{i}\mathit{j}}$ | ${\mathit{t}}_{\mathit{j}\mathit{i}}$ |
---|---|---|---|---|---|---|

1 | 1 | i | j | t_{1} | t_{2}–t_{1} | |

2 | j | i | t_{2} | |||

3 | 2 | i | j | t_{3} | t_{3}–t_{2} | |

4 | 3 | i | j | t_{4} | t_{5}–t_{4} | |

5 | j | i | t_{5} | |||

6 | 4 | i | j | t_{6} | t_{6}–t_{5} |

Conventional | Clustering | Extended Directional | ||||
---|---|---|---|---|---|---|

Estimate | Pr(>|t|) | Estimate | Pr(>|t|) | Estimate | Pr(>|t|) | |

Number of emails sent | 1.294 × 10^{−4} | <2 × 10^{−16} *** | 1.396 × 10^{−4} | <2 × 10^{−16} *** | 1.069 × 10^{−16} | <2 × 10^{−16} *** |

Share of own replies within period | 4.174 × 10^{−1} | <2 × 10^{−16} *** | 5.713 × 10^{−1} | <2 × 10^{−16} *** | 4.676 × 10^{−1} | <2 × 10^{−16} *** |

Closeness | 4.201 × 10^{−1} | 3.51 × 10^{−16} ****** | ||||

Closeness (in) | −1.223 × 10^{1} | 6.44 × 10^{−10} *** | ||||

Closeness (out) | 1.259 × 10^{1} | 1.51 × 10^{−10} *** | ||||

“In” clustering coefficient | 3.953 × 10^{−1} | 1.55 × 10^{−7} *** | 2.757 × 10^{−1} | 0.000130 *** | ||

“Out” clustering coefficient | 2.208 × 10^{−1} | 0.00121 ** | 2.519 × 10^{−1} | 0.000102 *** | ||

“Middleman” clustering coefficient | −4.599 × 10^{−1} | 2.14 × 10^{−5} *** | −5.004 × 10^{−1} | 1.43 × 10^{−6} *** | ||

Adjusted R-squared | 0.8468 | 0.8428 | 0.8596 | |||

p-value | <2.2 × 10^{−16} | <2.2 × 10^{−16} | <2.2 × 10^{−16} |

Conventional | Clustering | Extended Directional | ||||
---|---|---|---|---|---|---|

Estimate | Pr(>|t|) | Estimate | Pr(>|t|) | Estimate | Pr(>|t|) | |

Number of emails sent | 9.420 × 10^{−5} | <2 × 10^{−16} *** | 1.065 × 10^{−4} | <2 × 10^{−16} *** | 7.913 × 10^{−5} | <2 × 10^{−16} *** |

Share of own replies within period | 3.163 × 10^{−1} | <2 × 10^{−16} *** | 4.499 × 10^{−1} | <2 × 10^{−16} *** | 3.492 × 10^{−1} | <2 × 10^{−16} *** |

Closeness | 3.001 × 10^{−1} | <2 × 10^{−16} *** | ||||

Closeness (in) | −7.363 | 3.01 × 10^{−6} *** | ||||

Closeness (out) | 7.687 | 9.58 × 10^{−7} *** | ||||

“In” clustering coefficient | 2.782 × 10^{−1} | 3.94 × 10^{−6} *** | 1.769 × 10^{−1} | 0.002354 ** | ||

“Out” clustering coefficient | 1.598 × 10^{−1} | 0.003591 ** | 1.726 × 10^{−1} | 0.000991 *** | ||

“Middleman” clustering coefficient | −3.137 × 10^{−1} | 0.000312 *** | −3.662 × 10^{−1} | 1.28 × 10^{−5} *** | ||

Adjusted R-squared | 0.7675 | 0.7571 | 0.7806 | |||

p-value | <2.2 × 10^{−16} | <2.2 × 10^{−16} | <2.2 × 10^{−16} |

Conventional | Clustering | Extended Directional | ||||
---|---|---|---|---|---|---|

7 D | 1 D | 7 D | 1 D | 7 D | 1 D | |

email-Eu-core-temporal | 0.8468 | 0.7675 | 0.8428 | 0.7571 | 0.8596 | 0.7806 |

CollegeMsg | 0.9823 | 0.9607 | 0.9781 | 0.9404 | 0.9823 | 0.9611 |

Enron | 0.6082 | 0.6791 | 0.5239 | 0.678 | 0.6091 | 0.6818 |

Email-Eu-Core-Temporal | CollegeMsg | Enron | ||||
---|---|---|---|---|---|---|

7 D | 1 D | 7 D | 1 D | 7 D | 1 D | |

Number of emails sent | + * | + * | - | + | + | + |

Share of own replies within period | + * | + * | + * | + * | + | + * |

Closeness (in) | - * | - * | + | + * | - | - * |

Closeness (out) | + * | + * | - | - * | + | + * |

“In” clustering | + * | + * | + | + | - | - |

“Out” clustering | + * | + * | - | - * | + | + |

“Middleman” clustering | - * | - * | - | + | + | - |

**+/-**indicate positive or negative estimate,

*****indicates whether estimate prediction is significant.

