prof._ming-syan_chen by Epoch Foundation

On Application-Aware Data Extraction for Big Data in Social Networks Ming-Syan Chen Research Center for Information Tech. Innovation, Academia Sinica EE Department, National Taiwan Univ.

Fast Increasing of Social Network Activities • Example social networks: – Twitter – Facebook – Flickr – MSN – Wikipedia – Amazon.com

• Such a network – Very huge in size! – Cannot easily be analyzed M.-S. Chen

The Amount of Information is Huge! • Twitter – 150+ million members – 50 million tweets per day

• Facebook

From twitter.om

– 800+ million users

• Amazon Co-purchasing Network – half million product nodes – several million recomm. links

• Web Pages – Yahoo! Over one billion Web Pages Amazon From SNSP M.-S. Chen

Example of Big Data and Social Network Volume: thousands of people! Velocity: fast accumulated!! Variety: eating different food!!!

M.-S. Chen

Example of Big Data and Social Network For some gossip in this occasion, Veracity is an issue and the information Value could be low. Mr. Lin won the lottery!

Mrs. Chang just did a face lift!

M.-S. Chen

Information Extraction for Big Data in Social Networks â&#x20AC;˘ Extracting important information from large social network graphs â&#x20AC;&#x201C; To allow data analysts to mine the information in large social networks, to enable scalable storage and querying, and to facilitate the development of real-world applications

M.-S. Chen

Outline • Graph reduction – Summarization, sampling, and extraction

• Information Extraction on Social Network Graphs – Capturing key parameters (parameter extraction) – Guide query (information extraction) – Decomposing SN graphs (structure extraction)

M.-S. Chen

Graph Reduction  Graph summarization (going thru all data)  e.g., NTU has 32K students, 20% are sushi lovers, 25% prefer steak, also 15% are artists, 20% are engineers, etc.

 Graph sampling (going thru a subset)  Getting a small representative set of NTU students (which preferably fit statistics)

 Graph extraction  Application/goal-oriented data extraction, e.g., only picking good eaters for feast contest. M.-S. Chen

Graph Extraction 執簡御繁 To handle complicated things with simple skills.

Application/goal-oriented data extraction Three levels of information extraction from SN graphs

• Parameter extraction (e.g., company stat.) – Fast calculation of closeness centrality

• Information extraction (e.g., company biz.) – Guide query

• Structure extraction (e.g., company org.) – Decomposing SN graphs M.-S. Chen

Parameter extraction Structure extraction

weapon

Information extraction (regarding capability) M.-S. Chen

Outline • Graph reduction • Information Extraction on Social Network Graphs – Capturing key parameters (parameter extraction) – Guide query (information extraction) – Decomposing SN graphs (structure extraction)

M.-S. Chen

Closeness centrality â&#x20AC;˘ There are several interesting quantities, including closeness centrality, network diameters, degree distribution, in SN graphs. â&#x20AC;˘ Closeness centrality of node v, Cc(v): the inverse of the average shortest path distance from v to any other node in a network. â&#x20AC;&#x201C; If Cc(v) is large, v is around the center as it requires only few hops to reach others. M.-S. Chen

Response to Dynamic Changes • It is frequent to have edge insertion or deletion in a social network – It is desirable to fast update the closeness centrality of every node in response to edge insertion/deletion.

• Example use: pick a number of people (the nodes with high CCs) who can maximize advertisement effectiveness. M.-S. Chen

Example of Closeness Centrality Cc(v): the inverse of the average shortest path distance from v to other nodes Cc (v) 

14  1 13  1 4  2  2  3 1  4 1  5  2  6  2  7 1 44

Cc ( w) 

14  1 13  1 3  2  4  3  4  4  2 31

| V | 1 Cc (v)  uV | p(v, u) |

Thus, node w is closer to all other node than the node v.

M.-S. Chen

An unweighted and undirected graph G with 14 nodes and 18 edges 14

Calculating Closeness Centrality • One can calculate closeness centralities of all vertices by solving All Pairs Shortest Paths (APSP) problem. – O(n(m+n)) based on the breadth-first search (BFS) method for undirected graph, where n and m are the number of nodes and edges in the graph. – In a dynamic graph, re-solving APSP problem after each edge insertion or deletion is not efficient.

• Note that only some pairs of shortest paths will be affected due to certain edge changes. – Identify them (unstable node pairs) for fast calculation of CC M.-S. Chen

Example For example, with the addition of (a,b)  Un-changed shortest paths ◦ p(b,v), p(c,t) and p(r,h), etc.

 Changed shortest paths ◦ Before edge insertion

 p(a,b)={a,d,w,b}, p(a,c)={a,d,w,r,c} and p(u,v)={u,l,o,d,w,r,s,v}, etc.

◦ After edge insertion (we then call these nodes unstable)  p(a,b)={a,b}, p(a,c)={a,b,c} and p(u,v)={u,x,a,b,c,v}, etc. (a): the original unweighted and undirected graph G. (b): G’=G∪e(a,b).

M.-S. Chen

Illustration of Unstable Node Pairs • To find V’u : u-unstable node set, whose shortest paths to u changed after the edge addition • If we perform BFS at node u in G and G’ to obtain Gu and G’u, we can find only the shortest paths p(u,b), p(u,c), p(u,h), p(u,v) and p(u,t) changed.

G’u

– unstable node pairs: (u,b), (u,c), (u,h), (u,v) and (u,t). – V’u={b,c,h,v,t} M.-S. Chen

(Main Theorem) After the addition of edge (a,b), every unstable node pair (whose shortest path changed) {v,u} will have v ∈ V’ and u ∈ V’ a

V’b

V’a .. ..

. . .

Only these shortest paths will change after edge addition (and need to be re-calculated)

Concurrent Calculation of CC in SN Time Perform in parallel BFS at nodes a and b in G to obtain V’a={a,x,l,u},V’b={b,c,h,v,t}, Calculate Ga simultaneously.

Calculate G’a and V’a

Perform BFS starting at a ∈ V’b

Calculate Gb

Calculate G’b and V’b

Perform BFS starting at x ∈ V’b Perform BFS starting at l ∈ V’b

Inform nodes in these unstable pairs to re-calculate their shortest paths to others and CC

Perform BFS starting at u ∈ V’b M.-S. Chen

Experiments â&#x20AC;˘ To evaluate CENDY, we conducted experiments on six real unit-weighted graph datasets of different types. â&#x20AC;˘ The case of edge deletion can be done similarly (in light of a companion theorem proposed)

M.-S. Chen

Experiments Evaluation on Edge Insertion From this table, we can see that the closeness centralities of all vertices and APL can be updated only by a few of BFS processes. e.g., DBLP contains 460,413 nodes. The na誰ve way requires to perform 460K BFS processes to update closeness centrality and APL. However, CENDY only requires 4K BFS processes to finish the task.

M.-S. Chen

Remark â&#x20AC;˘ In response to the fast changes in SN, CENDY is devised to efficiently update the closeness centrality of each node in the social network. â&#x20AC;˘ The design of new algorithms is called for to efficiently calculate other key parameters in the fast changing social network

M.-S. Chen

Motivation of Guide Query Several works on information finding in social networks • Expert finding [Deng’08][Lappas’09] – To find the experts based on some given requirement

• Gateway finding [Koren’06][Wang’10] – To find the gateways between the source group and the target group • Active Friending [Wu’13] – To explore social networks to improve friend finding • Guide query [Lin’13] – To explore social networks to improve friend finding [Deng’08] ICDM 2008. [Lappas’09] KDD 2009.

[Koren’06] KDD 2006. Wang’10] KDD 2010. [Wu’13] KDD2013. [Lin’13] WAIM 2013M.-S. Chen

Motivation of Guide Query (Cont’d) • By expert finding, the answer is a list of experts ranked by their expertise. • Using the guide query, the answer is a list of informative friends of the querier ranked by the ability of gathering information from experts – Exploring social relationship – Taking the probabilities of getting help into consideration M.-S. Chen

Guide Query: Graph Extraction based on Your Friends This friend is also who I should ask since she can collect information from her friends.

These two friends are who I should ask for information.

I want to know information about Company A or B.

A A

D B E

C E

M.-S. Chen

Quide Query • Guided query [Lin’13] – For a user initiating the query, the answer is the user’s neighbors that are informative about user-assigned attributes. – An informative neighbor should either have the attributes itself or know some other friends that have the attributes.

[Lin’13] Y.-C. Lin, P. S. Yu, M.-S. Chen, “Guide Query in Social Networks,” WAIM 2013. M.-S. Chen

Problem Definition Given a query node q and a set of keywords W = {w1, w2, â&#x20AC;Ś, w|W|}, the guide query is to find the top-k informative neighbors of q considering W. {B}

q = N0 W = {A, B}

{B}

N41

{A} N11

{D}

{C}

{B}

N32

{B} {A}

{C}

N12

N2 Ni

candidate

{B} N34

target {A, B}

N31

N13

{A} N21 M.-S. Chen

{A} N33

Problem (Contâ&#x20AC;&#x2122;d) In the model, an edge is labeled with the probability that a node successfully spreads the request to the linked node. We rank the candidates based on how informative they are, which is evaluated by the proposed {A} N11 InfScore and {C} DivScore P=0.6

{B} {B}

N41

P=0.5

{D} N0

{B}

N1 {B}

P=0.5 P=0.7

P=0.5

{A}

{C}

N12

N2 P=0.3 {A, B}

N13

N32

P=0.2 {A} N21 M.-S. Chen

N31

{B} P=0.8

P=0.5

N34

{A} N33

InfScore InfScore: The informative level for a candidate node (i.e., the ability to spread the request to targets). Modeled by the expected number of targets a candidate is able to spread the request to. {B} {B}

N41

P=0.5

{A} N11 P=0.5

{D}

{C}

{B}

N1 {B}

P=0.5 P=0.5

P=0.5

{A}

{C}

N12

N2 P=0.5 {A, B}

N13

N32

P=0.5 {A} N21 M.-S. Chen

N31

{B} P=0.5

P=0.5

N34

{A} N33

InfScore InfRatio is defined as the probability that a specific candidate successfully spreads the request to a certain target. {B} {B}

e.g., the InfRatio from N1 to N13 is 0.25

P=0.5

N41

{A} N11 P=0.5

{D}

{C}

{B}

P=0.5

N1 {B}

N32 P=0.5

P=0.5

P=0.25 P=0.5 {A, B}

N13

{A}

{C}

N12

P=0.5

P=0.25 N31

P=0.5

{A} N21 M.-S. Chen

P=0.25

{B} N34

{A} N33

InfScore (intensity) The InfScore is the weighted sum of InfRatio. InfScore(N1) = 0.5 + 0.5 + 0.25*2 = 1.5 (N11)

(N12)

(N13)

InfScore(N4) = 1.0 + 0.5 = 1.5 (N4)

{B} {B}

(N41)

P=0.5

N41

{A} N11 P=0.5

{D}

{C}

{B}

P=0.5

InfScore

1.5

{B}

1.5

P=0.5

{A}

{C}

N12

P=0.25

0.5

N32 P=0.25 N31

{B}

P=0.5 {A, B}

N13

P=0.25

{A} N21 M.-S. Chen

N34

{A} N33

DivScore (Diversity) The DivScore is an entropy-like measure to evaluate the diversity of possibly accessible target nodes. For each node, the target vector XT is defined as follows. Each item in the vector is a normalized InfScore value, describing the probability distribution on different targets.

With the target vector, the DivScore is defined as follows.

DivScore We design the DivScore as the probability distribution to each possibly accessible target. Example: DivScore(N3) = [-(1/3)*log2 (1/3)]*2 + [-(1/6)*log2(1/6)]*2 Distribution of N3: [0.5/1.5, 0.5/1.5, 0.25/1.5, 0.25/1.5] =[1/3, 1/3, 1/6, 1/6]

{B} {B}

P=0.5

N41

{A} N11 P=0.5

{D}

{C}

DivScore

1.585

0.000

1.918

0.918

{B}

N32 P=0.5

P=0.5

P=0.25 P=0.5 N13

{A}

{C}

N12

{A, B}

{B}

P=0.5

{A} N21

P=0.25 N31

{B} P=0.25 {A} N33

N34

Experimental Setup • DBLP dataset [DBLP] – Co-authorship network – Edge probability • Based on the WC (weighted cascade) model • p(Ni -> Nj) = 1 / d(Nj) • d(Nj) is the in-degree of Nj

– Node attribute • Conference names of an author’s publications

[DBLP] http://www.informatik.uni-trier.de/~ley/db/ [Chen’10] W. Chen, et al., “Scalable Influence Maximization for Prevalent Viral Marketing in Large-Scale Social Networks,” KDD 2010. M.-S. Chen

Experimental Results Suppose Ming-Syan Chen wants to discuss with people who have published papers on KDD, SDM, CIKM, ICDM, PKDD, which coauthors should he first connect to? (i.e., Either coauthors who have these conf. papers or coauthors who coauthored with people who have these conf. papers.) Query input: • q = ‘Ming-Syan Chen’ • k = 10 • W = [KDD, SDM, CIKM, ICDM, PKDD]

M.-S. Chen

Remark • The key notion is to guide the query to right candidates in the social network. – For each candidate, a combination of the expertise and the social relationship with the person initiating the query is considered

• Just like the group formation (KDD-12) and this expert finding problem (WAIM-13), more applications/tools can be enhanced with SR considered

M.-S. Chen

Diffusion Analysis in Social Networks • Diffusion of Information can be used to model the interaction among nodes in a network, e.g., – Viruses spread over the internet. – Disease spread in the community. – Rumors/news spread among humans.

M.-S. Chen

Example Diffusion â&#x20AC;˘ Information diffusion can happen in social networks, such as facebook and twitter. đ?&#x2018;&#x203A;3

đ?&#x2018;&#x203A;8 đ?&#x2018;&#x203A;5

đ?&#x2018;&#x203A;1

3 đ?&#x2018;&#x203A;7

0 đ?&#x2018;&#x203A;4 2 đ?&#x2018;&#x203A;2

đ?&#x2018;&#x203A;6

đ?&#x2018;&#x203A;9

Underlying network Path of Infection

M.-S. Chen

The Network is Hidden â&#x20AC;˘ In some situations, the underlying network is not known (due to cost or privacy issue). â&#x20AC;˘ Network inference problem (NIP) is studied to discover the underlying network đ?&#x2018;&#x203A;3

đ?&#x2018;&#x203A;8

đ?&#x2018;&#x203A;5

đ?&#x2018;&#x203A;1 0

3 đ?&#x2018;&#x203A;7

đ?&#x2018;&#x203A;4

2 To infer the network from what happened. đ?&#x2018;&#x203A;9

đ?&#x2018;&#x203A;2 đ?&#x2018;&#x203A;6 M.-S. Chen

Network Inference Problem â&#x20AC;˘ Assume there is an underlying information network. â&#x20AC;˘ NIP is to infer the information network given a set of cascades. â&#x20AC;˘ A cascade đ?? s = [t1s , â&#x20AC;Ś , t sN ] is the time records of information s spreading over the network. (N is #nodes), i.e., node đ?&#x2018;&#x203A;đ?&#x2018;&#x2013; gets s (infected) in time t si

â&#x20AC;˘ If a node i is never infected with s, set đ?&#x2018;Ąđ?&#x2018;&#x2013;đ?&#x2018; = â&#x2C6;&#x17E; . â&#x20AC;˘ Ex : đ?? đ??Ź = [â&#x2C6;&#x17E;, â&#x2C6;&#x17E;, 2, â&#x2C6;&#x17E;, 0,1]

đ?&#x2018;&#x203A;2

đ?&#x2018;&#x203A;3

đ?&#x2018;&#x203A;1

đ?&#x2018;&#x203A;5 M.-S. Chen

đ?&#x2018;&#x203A;4

đ?&#x2018;&#x203A;6

Clustering Cascades • Traditionally, NIP assumes there is one underlying network, which may not always be true in reality – e.g., Sports news, political news, and entertainment news are likely to spread in different ways

• Hence, we would like to cluster cascades so that the cascades in each cluster spread in the same pattern – An SN graph is hence decomposed into application-specific ones M.-S. Chen

Example Cascades Cascade a (Lakers news) đ?&#x2018;&#x203A;2 đ?&#x2018;&#x203A;3 đ?&#x2018;&#x203A;1

đ?&#x2018;&#x203A;5 0

đ?&#x2018;&#x203A;4

Cascade b (49ers news) đ?&#x2018;&#x203A;2 đ?&#x2018;&#x203A;3 đ?&#x2018;&#x203A;1 0 đ?&#x2018;&#x203A;5

Cascade d (Heats news)

đ?&#x2018;&#x203A;3

đ?&#x2018;&#x203A;1 đ?&#x2018;&#x203A;5 đ?&#x2018;&#x203A;4

đ?&#x2018;&#x203A;6

đ?&#x2018;&#x203A;4

đ?&#x2018;&#x203A;2

đ?&#x2018;&#x203A;6

đ?&#x2018;&#x203A;4

đ?&#x2018;&#x203A;6

Cascade c (Redskins news) đ?&#x2018;&#x203A;2 1 đ?&#x2018;&#x203A;3 đ?&#x2018;&#x203A;1 2 0 đ?&#x2018;&#x203A;5

Cascade e (Jets news) đ?&#x2018;&#x203A;2 2 đ?&#x2018;&#x203A;3 đ?&#x2018;&#x203A;1 0 đ?&#x2018;&#x203A;5 đ?&#x2018;&#x203A;4

đ?&#x2018;&#x203A;6 1 M.-S. Chen

Cascade f (Celtics news) đ?&#x2018;&#x203A;2

đ?&#x2018;&#x203A;3

đ?&#x2018;&#x203A;1 đ?&#x2018;&#x203A;5 đ?&#x2018;&#x203A;4

đ?&#x2018;&#x203A;6 44

To Model Inference Network â&#x20AC;˘ Modeling method: â&#x20AC;&#x201C; If two nodes are always infected in short time, the weight would be large. â&#x20AC;&#x201C; đ?&#x2018;¤đ?&#x2018;&#x2013;đ?&#x2018;&#x2014; =

1 |đ?&#x2018; :đ?&#x2018;Ąđ?&#x2018;&#x2013;đ?&#x2018; <đ?&#x2018;Ąđ?&#x2018;&#x2014;đ?&#x2018; |

1 đ?&#x2018; :đ?&#x2018;Ąđ?&#x2018;&#x2013;đ?&#x2018; <đ?&#x2018;Ąđ?&#x2018;&#x2014;đ?&#x2018; đ?&#x2018;Ą đ?&#x2018; â&#x2C6;&#x2019;đ?&#x2018;Ą đ?&#x2018; đ?&#x2018;&#x2014; đ?&#x2018;&#x2013;

â&#x20AC;&#x201C; Consider đ?&#x2018;¤12 as an example. đ?&#x2018;¤12

{đ?&#x2018; : đ?&#x2018;Ą1đ?&#x2018; < đ?&#x2018;Ą2đ?&#x2018; } = {đ?&#x2018;?, đ?&#x2018;?, đ?&#x2018;&#x2019;} 1 1 1 1 1 = ( + + )= 3 â&#x2C6;&#x17E;â&#x2C6;&#x2019;0 1â&#x2C6;&#x2019;0 2â&#x2C6;&#x2019;0 2

Example Inference Network 0.25 đ?&#x2018;&#x203A;2

0.5

0.17

đ?&#x2018;&#x203A;1

đ?&#x2018;&#x203A;3 0.5

0.25

đ?&#x2018;&#x203A;5

0.67

đ?&#x2018;&#x203A;4

0.17

0.5 0.25 M.-S. Chen

đ?&#x2018;&#x203A;6 46

To Cluster Cascades by K-Means â&#x20AC;˘ Transform cascade đ?&#x2019;&#x2022; to N-dim indicator based on whether nodes are infected or not. â&#x20AC;˘ Ex: â&#x20AC;&#x201C; đ?&#x2019;&#x2022;đ?&#x2019;&#x201A; = â&#x2C6;&#x17E;, â&#x2C6;&#x17E;, â&#x2C6;&#x17E;, â&#x2C6;&#x17E;, 0,1 â&#x2020;&#x2019; [0,0,0,0,1,1]

â&#x20AC;&#x201C; đ?&#x2019;&#x2022;đ?&#x2019;&#x192; = 0, â&#x2C6;&#x17E;, â&#x2C6;&#x17E;, 1, â&#x2C6;&#x17E;, â&#x2C6;&#x17E; â&#x2020;&#x2019; [1,0,0,1,0,0] â&#x20AC;&#x201C; đ?&#x2019;&#x2022;đ?&#x2019;&#x201E; = 0,1,2, â&#x2C6;&#x17E;, â&#x2C6;&#x17E;, â&#x2C6;&#x17E; â&#x2020;&#x2019; [1,1,1,0,0,0]

â&#x20AC;˘ Run K-means to get the clustering result. â&#x20AC;&#x201C; (đ?&#x2018;&#x17D; , đ?&#x2018;&#x2018; , f) and (b, c , e) 47

Graph Decomposition â&#x20AC;˘ By considering cascades {a, d, f} and cascades {b, c, e} independently (based on which nodes are infected), the original SN graph is decomposed in accordance with the information carried. Cascades {b, c, e} (NFL)

Cascades {a, d, f} (NBA) 0.25

đ?&#x2018;&#x203A;2 đ?&#x2018;&#x203A;1

0.5

0.5 0.67

0.5

đ?&#x2018;&#x203A;5

0.67

0.33

0.5 đ?&#x2018;&#x203A;3

đ?&#x2018;&#x203A;5

0.17

0.5 đ?&#x2018;&#x203A;4

0.17

đ?&#x2018;&#x203A;1

đ?&#x2018;&#x203A;3

đ?&#x2018;&#x203A;2

đ?&#x2018;&#x203A;6

M.-S. Chen

đ?&#x2018;&#x203A;4

đ?&#x2018;&#x203A;6

Remark • Traditionally NIP results in a dense and complex network, which is difficult to capture knowledge. • By properly clustering cascades, we can have a few resulting concise networks which carry clearer information – These resulting networks better match the corresponding cascades than a single dense network. M.-S. Chen

Conclusion â&#x20AC;˘ Information extraction is an application/goaloriented process to capture the key ingredients (parameters, information, structure, etc) in the huge SN â&#x20AC;˘ The procedure of information extraction can be integrated into related process for better efficiency in practice

M.-S. Chen

Thank you!

M.-S. Chen

Graph Summarization  Condense the original graph to a

more compact form Lossless and lossy methods Required to examine the entire network 2 1

G 4 5

d ─ {5, 10}

6 8 7

A revised example form S. Navlakha et al. Graph Summarization with Bounded Error. M.-S. Chen SIGMOD’08

─ {6, 10} Sa={2,3}

Sb={1,9}

Sc={7,8,10}

Sd={4,5,6} 52

Graph Sampling • Graph Sampling – Selecting a subset of the original data – Characteristics of the original graph are preserved – Only a proportion of nodes in the network are visited

Sampling

M.-S. Chen Plotted by NodeXL, an EXCEL template created by the NodeXL team at Microsoft Research

A Running Example of CENDY Originally, we have the closeness centralities of all nodes and the average path length of the graph.

Cc ( x) 

14  1 13  1 3  2  2  3 1  4  2  5  2  6  2  7  2 47

a b c d h l o r s t u v w x A=

13 13 13 13 13 13 13 13 13 13 13 13 13 13 40 35 37 33 46 47 40 33 40 56 57 44 31 47

An unweighted and undirected graph G with 14 nodes and 18 edges

40  35  37    47 586 LG   14(14  1) 182 M.-S. Chen

Example (Cont’d) For the insertion of the edge e(a,b). • We perform BFS at node a in G and G’ to obtain Ga and G’a, and then have V’a={b,c,h,v,t}.

M.-S. Chen

G’a

Example (Cont’d) • Also, we perform BFS at node b in G and G’ to obtain Gb and G’b, and then have V’b={a,x,l,u}.

M.-S. Chen

G’b

Example (Cont’d) • Then, in light of the main theorem, we re-calculate the paths between V’a and V’b

G’x

• For example, for node x ∈ V’b, we calculate – – – – –

(1): ||p(x,t)| - |p’(x,t)|| = 7 – (1+1+3) = 2 (2): ||p(x,h)| - |p’(x,h)|| = 6 – 4 = 2 (3): ||p(x,v)| - |p’(x,v)|| = 6 – 4 = 2 (4): ||p(x,c)| - |p’(x,c)|| = 5 – 3 = 2 (5): ||p(x,b)| - |p’(x,b)|| = 4 – 2 = 2

• and then update its new closeness centrality: Cc ( x) 

13 13 13   47  (1)  (2)  (3)  (4)  (5) 47  2  2  2  2  2 37

M.-S. Chen

Example (Cont’d) • Finally, we update the closeness centralities of the referenced nodes and recalculate the APL. a b c d h l o r s t u v w x A=

13 13 13 13 13 13 13 13 13 13 13 13 13 13 40 35 37 33 46 47 40 33 40 56 57 44 31 47

a b c d h l o r s t u v w x 13 13 13 13 13 13 13 13 13 13 13 13 13 13 30 28 30 33 39 42 40 33 40 49 47 37 31 37

30  28  30    37 516 LG   14(14  1) 182 M.-S. Chen

Example Scenario N0 is initiating a query to find a job in company A or company B. Which friend should N0 ask for information? {B} {B}

N41

{A} N11

{D}

{C}

{B}

N32

{B} {A}

{C}

N12

N31

{B} N34

{A, B}

N13

{A} N21 M.-S. Chen

{A} N33

New Contributions • Given M. Gomez-Rodriguez, J. Leskovec, and A. Krause. Inferring Networks of Diffusion and Inﬂuence. In KDD ’10, Our work is unique in that: 1. We assume there could be many underlying networks (rather than only one). 2. We model and learn a weighted graph (rather than an unweighted one).

M.-S. Chen