Harnessing network science to reveal our digital footprints Jukka-Pekka “JP� Onnela Harvard University
University of Waterloo; January 26, 2011
Analysis and modeling of social networks
Metrics and methods for network analysis
Network theory
Online social systems and social media
2
Introduction Part I: Social network structure and function Part II: Network community structure Part III: Online social networks Conclusions
3
Phone calls and texts in a European network
Animation by Mikko Kivel채, Helsinki University of Technology, Finland. 4
Tie strengths in social networks The weak ties hypothesis •
The stronger the tie between A and B, the higher the fraction of common friends they have
Mark Granovetter, The strength of weak ties, American Journal of Sociology 78, 1360, 1973
5
Tie strengths in social networks Revisiting the hypothesis with cell phone data • Tie strength? • Fraction of friends in common?
7 min 15 min
(3 calls)
5 min 3 min
Onnela, Saramäki, Hyvönen, Szabó, Lazer, Kaski, Kertész, Barabási Structure and tie strengths in mobile communication networks, PNAS 104, 7332, 2007 6
Tie strengths in social networks
7
Tie strengths in social networks
mean
std
max
degree k
3.3
2.5
144
weight wN
15.4
37.3
3,610
weight wD
41 min
206 min
663 h
strength sN
51
75
3,644
strength sD
135 min
386 min
690 h
8
Tie strengths in social networks
Onnela, Saramäki, Hyvönen, Szabó, Lazer, Kaski, Kertész, Barabási Structure and tie strengths in mobile communication networks, PNAS 104, 7332, 2007 9
Tie strengths in social networks Initial connected network
10
Tie strengths in social networks 80% of the strongest links removed
11
Tie strengths in social networks Initial connected network
12
Tie strengths in social networks 80% of the weakest links removed
13
Connected phase
Connected phase
Disconnected phase
14
Tie strengths in social networks Qualitative difference at the global level • Phase transition when weak ties (red) are removed first • No phase transition when strong ties (black) are removed first • Quantitative division between weak and strong: Order parameter RLCC (fraction of nodes in LCC) Susceptibility S (average cluster size excl. LCC) Phase 1: Connected phase Phase 2: Disconnected phase
15
Implications for spreading processes on networks?
16
Diffusion in social networks Most studies of diffusion are based on small, binary networks • Use the observed network as a platform to study weighted diffusion (SI) • Start with one infected node; Infect neighbors with given probabilities • Weighted model
Unweighted
Number of infected nodes
Fraction of infected nodes
• Unweighted model
Weighted
Time
Time
17
Diffusion in social networks Where do individuals get their information from?
Unweighted
• Unweighted: Infections via “weak” ties • Weighted: Infections via “intermediate” ties • WT’s have access to new info • WT’s have low transmissions • ST’s have high transmission rates • ST’s rarely have access to new info
18
Weighted
Introduction Part I: Social network structure and function Part II: Network community structure Part III: Online social networks Conclusions
19
Network community • “A group of nodes that are relatively densely connected to each other but relatively sparsely connected to other nodes in the network” • Communities are thought to have a strong bearing on functional units in networks (e.g. social) • Community detection is one of the most active areas of research in network science
Porter, Onnela, Mucha, Communities in networks, Notices of the American Mathematical Society 56, 1082, 2009 20
Example community • Zachary Karate Club network describes the friendships between 34 members of a karate club at a U.S. university in the 1970s • After an internal dispute, the club split in two, and members chose preferentially to be with their friends • Node color indicates post-split club affiliation • Community detection: What’s the “best” way to split the group?
21
Modularity maximization • Modularity maximization is the most commonly used method
• Assign nodes to communities to maximize modularity (algorithmic definition) • More within-community edges than one would expect at random
Newman, Modularity and community structure in networks, PNAS 103, 8577, 2006 22
Multislice community detection Goal: Extend modularity maximization to deal with • Time-dependent networks: Nodes and ties may change in time • Multiscale networks: Structure simultaneously present at multiple scales • Multiplex networks: Multiple types of ties
Mucha, Richardson, Macon, Porter, Onnela, Community structure in time-dependent, multiscale, and multiplex networks, Science 328, 876, 2010 23
Multislice community detection • Introduce multiple slices • Connect slices by connecting nodes across slides • Null model?
ORDERED: neighbors
CATEGORICAL: all to all
24
Modularity from a dynamical process • Quality of a partition in terms of its “stability”, which is an autocovariance function of an ergodic Markov process on the network:
RM (t) =
� C
p˙i =
� Aij j
kj
[P (C, t) − P (C, ∞)]
pj − pi
• Expansion of matrix exponential to first-order in time recovers NewmanGirvan modularity with “resolution parameter”
Lambiotte, Delvenne, Barahona, arxiv:0812.1770
25
Multislice formulation • Undirected network slices • Undirected couplings • Define multislice strength
Aijs = Ajis Cjrs = Cjsr κjs = kjs + cjs
• Density of random walkers in node i at slice s:
p˙is =
� jr
∗ pjr
(Aijs δsr + δij Cjsr )pjr /κjr − pis within slice
between slices
2µ =
= κjr /(2µ)
� jr
26
κjr
Multislice formulation • Null model: Probability of sampling node-slice is conditional on whether the multislice structure allows one to step from node j at slice r to node i at slice s:
∗ ρis|jr pjr
�
�
kis kjr Cjsr cjr κjr = δsr + δij 2ms κjr cjr κjr 2µ
• Subtracting this conditional joint probability from the linear in time approximation of the exponential describing the Laplacian dynamics gives
Qmultislice
�� �
1 = 2µ ijsr
kis kjs Aijs − γs δsr 2ms
• Each slice has its own resolution parameter • Intra-slice couplings
�
�
+ δij Cjsr δ(cis , cjr )
γs
Cjsr = {0, ω}
Mucha, Richardson, Macon, Porter, Onnela, Community structure in time-dependent, multiscale, and multiplex networks, Science 328, 876, 2010 27
Application I: College students (multiplex) • “Tastes, ties, and time” multiplex network of 1640 college students • Examine the following symmetrized ties from one wave of data: 1.
Facebook friendships
2.
Facebook picture friendships (upload & tag a photo)
3.
Roommates (share dormitory room, creating clusters of 1-6 students)
4.
Housing group (preference to be placed in same upper-class residence)
• Slices are categorical, hence inter-slice coupling from all slices to all slices
28
Application I: College students (multiplex) • When omega = 0, individuals (must be) placed in four separate communities • Increasing omega causes communities to merge across slices, especially if the patterns of connection are similar between slices (tie types) • For intermediate omega, most individuals are placed in 1 or 2 communities, indicating their networks maintain group-level similarities across tie types • Small minority maintain 4 separate assignments => different positions in slices
ω
#communities
0 0.1 0.2 0.3 0.4 0.5 1
1036 122 66 49 36 31 16
# communities per individual 1
2
0 14% 19.9% 26.2% 31.8% 39.3% 100%
0 40.5% 49.1% 48.3% 47% 42.4% 0
3
4
0 100% 37.3% 8.2% 25.3% 5.7% 21.6% 3.9% 18.4% 2.8% 16.8% 1.5% 0 0 29
1
2
3
4
Application II: Karate club (multiscale) • Zachary Karate Club consists of 34 members of a 1970s university club • An internal dispute led to the schism of the karate club into two smaller clubs • Sociologist Wayne Zachary studied club’s friendships when schism occurred • Realized he might have been able to predict the split in advance • Classic small-scale social network and typical small-scale benchmark • Color = actual post-split affiliation • Dashed lines = divisions
30
Application II: Karate club (multiscale) • Keep the same unweighted 34 x 34 adjacency matrix across all 16 ordered slices • Resolution dictated by a specified sequence of resolution parameters gamma = {0.25, 0.5, ..., 4} • Communities shown for inter-slice coupling omega = 0 (top) and omega = 0.1(bottom) • Colors correspond to communities (repeat colors in the top panel across uncoupled slices) • Dashed lines partition the network into four communities at the default resolution of modularity (gamma = 1)
31
Application III: US Senate (longitudinal) • 100 Senators serving staggered six-year terms • Study Congresses 1 - 110, covering 1789-2008, with 1884 individual Senators • Define weighted connections between each pair of Senators in terms of similarity of their voting dynamics (independently for each two-year Congress) • Define adjacency matrices based on roll-call votes: where and
αijk
bij
Aij = (1/bij )
� k
equals unity if and only if i and j voted the same on bill k
is the total number of bills on which both legislators voted
• Ordered inter-slice coupling from each Senator to himself only when in consecutive Congresses • Note that link strengths and nodes change from one slice to another
32
αijk
Application III: US Senate (longitudinal) • Obtain 9 communities (color coded) using inter-slice coupling omega = 0.5 • Dark blue and red correspond to modern Democratic and Republican parties • Vertical gray bars indicate Congresses in which three communities appeared
Nominal party affiliations:
• Pro-Administration (PA) • Anti-Administration (AA) • Federalist (F) • Democratic-Republican (DR) • Whig (W) • Anti-Jackson (AJ) • Adams (A) • Jackson (J) • Democratic (D) • Republican (R) 33
Application III: US Senate (longitudinal) • Obtain 9 communities (color coded) using inter-slice coupling omega = 0.5 • Dark blue and red correspond to modern Democratic and Republican parties • Vertical gray bars indicate Congresses in which three communities appeared
Gray areas:
• 4th and 5th: First with political parties •10th and 11th:Vice President Aaron Burr's indictment for treason •14th and 15th: Changing structures in Democratic-Republican party •31st: Compromise of 1850 •37th: Beginning of the American Civil War •73rd and 74th: Landslide 1932 election amidst the Great Depression •85th to 88th:
34
Brought the major American civil rights acts
But these are proofs of concept. What can we do with this for real?
35
Application: Health care
Onnela et al, Impact of physician communities for healthcare costs, working paper, 2011 36
Introduction Part I: Social network structure and function Part II: Network community structure Part III: Online social networks Conclusions
37
Social influence • Ways in which people affect each others’ beliefs, feelings, and behaviors • Traditionally the domain of social psychology • Prominent in contagion in sociology, herding behavior in economics, speculative bubbles in financial markets, public health, etc. • Online social systems provide a complementary perspective
• Closed and data rich systems • Access to complete populations of agents without sampling • Platform? • Behavior?
38
Social influence and Facebook • Facebook has free applications or “apps” • Focus on a simple, observable behavior: Facebook “app” installation • Installation is not use! • Since apps are free, why would influence matter? • Popular applications: • Readily discoverable (low search cost) • High quality (exhaustively tested) • High functionality (superior features)
39
Social influence and Facebook Local information
John Doe I
John Doe II
Jane Doe I
Jane Doe III
Jane Doe II
Jane Doe IV
Global information
40
Social influence and Facebook •
Each installation contributes to both local and global information
•
Each installation is a microscopic social stimulus
•
Superposition of 104 million application installations
•
Possibility of cascades, or adoption ripples, in the network
41
Data • Time period June 25, 2007 - August 14, 2007 • Hourly data for 2,705 applications, T=1,208 time steps • Number of application i users at time t denoted by ni(t)
42
7/4/07 0:01 7/4/07 1:01 7/4/07 2:01 7/4/07 3:01 7/4/07 4:01 7/4/07 5:01 7/4/07 6:01 7/4/07 7:01 7/4/07 8:01 7/4/07 9:03 7/4/07 10:01 7/4/07 11:02 7/4/07 12:02 7/4/07 13:02 7/4/07 14:01 7/4/07 15:01 7/4/07 16:01 7/4/07 17:03 7/4/07 18:03 7/4/07 19:03 7/4/07 20:03 7/4/07 21:03 7/4/07 22:03 7/4/07 23:03
1820 1836 1839 1847 1852 1860 1867 1874 1880 1889 1899 1908 1921 1931 1949 1964 1987 2000 2014 2025 2036 2048 2060 2071
We wanted to learn about individual behavior, but these are aggregate data?
43
Fluctuation scaling CORRELATED
σi ∼ µi
INDEPENDENT
CORRELATED
INDEPENDENT
σi ∼
1/2 µi
• Fluctuation scaling can be used to study collective behavior • Facebook “coins”, one per app per user, are coupled via local and global signals • What is the slope, i.e. extent of social influence, on Facebook? 44
Fluctuation scaling
σi ∼
α µi
α ∈ [0.5, 1]
Individual regime
αI ≈ 0.55
Collective regime
αC ≈ 0.85
0.36 = 55 installations a day
Onnela, Reed-Tsochas, The spontaneous emergence of social influence in online systems, PNAS 107, 18375, 2010 45
And then to something different...
46
Mislove, Lehmann, Ahn, Onnela, Rosenquist, Understanding the demographics of Twitter users, submitted, 2010.
47
Introduction Part I: Social network structure and function Part II: Network community structure Part III: Online social networks Conclusions
48
Conclusions and Outlook • Structure of large-scale human social networks • Local, global, diffusion
• Community detection in multislice network • Multiscale, multiplex, time-dependent
• Online social networks • Social influence from aggregate data; Content
49
Conclusions and Outlook • Structure of large-scale human social networks • Local, global, diffusion • Cell phones as diagnostic tools • Community detection in multislice network • Multiscale, multiplex, time-dependent • Communication within evolving organizations • Online social networks • Social influence from aggregate data; Content • Public health applications & consumer confidence
50
Thank you