Przemyslaw fast viz nov21

Page 1

Fast visualization of relevant portions of large dynamic networks Przemyslaw A. Grabowicz Luca Maria Aiello Filippo Menczer


Networks‌


Networks are everywhere •  Information & knowledge networks •  e.g., Wikipedia, WWW, IMDb

•  Social networks •  e.g., Facebook, Twitter

•  Commodity networks •  e.g., Internet, transportation networks

•  Biological networks •  e.g., neural network, gene expression networks


Networks are large Examples: •  Wikipedia ~ 106 articles •  Online social networks ~ 109 users •  Transportation networks ~ 104 airports worldwide •  Neural networks ~ 1011 neurons

Internet


Networks are dynamic Examples: •  Wikipedia – half million new articles per year •  Online social networks – temporal user interactions •  Transportation networks – people traveling •  Neural networks – traveling action potentials

The Egyptian Revolution on Twitter Visualization by André Panisson (http://youtu.be/2guKJfvq4uI)


How we describe networks? •  Structural properties o  o  o  o

Clustering coefficient Modularity Assortativity coefficient … mostly designed for static networks

•  Visualizations o  Graph layouts o  Network filtering o  … mostly designed for static networks


Network animation Outline I.  Existing software for network visualization II.  Filtering of large dynamic graphs III.  Experimental datasets IV.  Source-code release and summary


I. Existing software for network visualization

D3.JS


Network visualization tools

•  can handle dynamic graphs •  interactive •  static filtering •  slower than other tools (written in Java, GUI-based) The Egyptian Revolution on Twitter


Network visualization tools

•  can easily plot large networks •  fast (written in C++, with interfaces in Python and R) Clusters and activity of Twitter users


Network visualization tools

•  can easily plot large networks •  fast (C++ core of BGL, with an interface in Python)

Price network


Network visualization tools D3.JS •  a tool for data visualization, not just networks •  highly interactive •  easy to embed in webpages •  slow (a JavaScript library)

Songs similar to “Poker Face” of Lady Gaga, based on Last.fm

Character co-occurence in Les Misérables


Visualizing large dynamic networks? (with these or other tools) Challenges: •  Hard to distinguish important nodes and edges •  Hard to follow the evolution of nodes and edges •  Computationally expensive

A technique of filtering large dynamics graphs is needed.


II. Filtering of large dynamic graphs o  Our solution o  Problem formulation o  Which nodes are important to keep? o  Key concepts of the algorithm o  The algorithm o  Computational complexity o  Output and visualization


Our solution Filtering

1.  Processes a chronological stream of interactions between the nodes of a large network 2.  Dynamically filters the most relevant parts of the network, emphasizing old nodes that show fresh activity

Animation

3.  Produces differential updates representing the network evolution 4.  Can feed these updates directly to visualization tools, potentially to any of the aforementioned tools

t1, n1, n2 t2, n1, n3 t3, n1, n2, n3, n4 t4, n3, n5 t5, n1, …, nm


Problem formulation Imagine that we have a live stream of interactions between nodes that we want to visualize. Stream of interactions: t1, n1, n2 t2, n1, n3 t3, n1, n2, n3, n4 t4, n3, n5 t5, n1, ‌, nm


Stream of interactions


Stream of interactions


Stream of interactions


Stream of interactions


Stream of interactions


Stream of interactions Stream of interactions: t1, n1, n2 t2, n1, n3 t3, n1, n2, n3, n4 t4, n3, n5 t5, n1, …, nm (…) Millions of such interactions


Filtering Stream of interactions: t1, n1, n2 t2, n1, n3 t3, n1, n2, n3, n4 t4, n3, n5 t5, n1, ‌, nm

We filter the network of interactions on-the-fly.


Filtering We aim to pick the most important nodes and visualize them. Which are the most important nodes? 1.  With highest degree 2.  Exhibiting highest activity/node strength 3.  Most central


Filtering We aim to pick the most important nodes and visualize them. Which are the most important nodes? 1.  With highest degree 2.  Exhibiting highest activity/node strength 3.  Most central


Filtering Key factors that we address here: •  The importance of nodes changes in time o  We update the score of nodes on the fly

•  It builds up in time due to repeated activity o  We remember the past score and increase it whenever nodes show new activity

•  Sometimes it diminishes due to inactivity o  We gradually decrease the score to forget the oldest activity


Filtering To sum up: •  We introduce scores o  (changeable it time)

•  Increasing due to the activity

Nodes

Edges

sij (t)

Si (t)

δij

Δi

o  (per each arrival of interaction)

•  Decreasing due to the forgetting o  (every time period)

C forget


Filtering To sum up: •  We introduce scores o  (changeable it time)

Nodes

Edges

Si (0) = 0

sij (0) = 0

Δi = 1

1 δij = m −1

•  Increasing due to the activity o  (per each arrival of interaction)

•  Decreasing due to the forgetting

C forget = 0.9

o  (every time period)

We keep the nodes and edges with the highest scores


Degree/strength filtering?

Activity stream (input)

Visualized network (output)


A filtering buffer

Activity stream (input)

Buffered network (memory)

Visualized network (output)


A filtering buffer

Stage 1 (filtering)

Stage 2 (update-generation)

Why buffer? •  Remembers the scores of the network •  Computationally inexpensive in comparison with the full network •  Smoothens the animations


The algorithm


The algorithm


The algorithm

Buffered network N b ≈ 10 4


The algorithm

Buffered network N b ≈ 10 4


The algorithm

Buffered network N b ≈ 10 4

Visualized network N v ≈ 101 ÷10 2


Computational complexity The algorithm is fast: •  Stage 1 •  Stage 2 Where: E – the total number of pairwise interactions read Nb – the number of buffered nodes Nv – the number of visualized nodes F – the number of frames produced


Output of the filtering algorithm The output of the filtering step is formatted as JSON files with differential updates to the visualized network. an: Add node cn: Change node dn: Delete node ae: Add edge ce: Change edge de: Delete edge

JSON icon created by http://dryicons.com JSON format of the Gephi Streaming API


Output of the filtering algorithm The output of the filtering step is formatted as JSON files with differential updates to the visualized network. One can feed it directly to: •  Our video-generating module o  uses igraph for graph plotting and mencoder for video encoding

•  Other tools visualizing dynamic networks o  Gephi Streaming API o  more platforms?


Our video-generating module What it does? 1.  Parses JSON differential updates 2.  Creates/updates a network using igraph 3.  Plots the network using pycairo o  o  o

the Fruchterman-Reingold layout frames stabilization extra effects: node-popping and node-soaking animations

4.  Encodes a video by combining the frames with plotted network using mencoder Let’s see how it works!


III. Experimental datasets o

o  Characteristics o  Animations Parameters of the filtering algorithm


Experimental datasets 1.  The announcement of Bin Laden’s death on Twitter (2011) Nodes: @users and #hashtags Edges: co-appearances in tweets related to Bin Laden’s death

2.  Plot keywords from movies (1912-2018)

Nodes: keywords Edges: co-appearances of keywords in the descriptions of movies

3.  Words co-appearing in US patents (1976-2010) Nodes: words appearing in the titles of patents (no stopwords) Edges: co-appearances in the titles


Experimental datasets Characteristics: •  Periods of time from 2 hours to 106 years •  From dozens of thousands to half a million nodes •  We visualize at most hundreds of most important nodes


1.  The announcement of Bin Laden’s death on Twitter (2011) Nodes: @users and #hashtags Edges: co-appearances in tweets related to Bin Laden’s death


2.  Plot keywords from movies (1912-2018)

Nodes: keywords Edges: co-appearances in the user-generated descriptions of movies


3.  Words co-appearing in US patents (1976-2010) Nodes: words appearing in the titles of patents (no stopwords) Edges: co-appearances in the titles


Algorithm’s parameters

Tcontr – time contraction, i.e., how much shorter is the visualization than the real evolving network Nb – the number of buffered nodes Nv – the number of visualized nodes smin – the minimal score of edges that will be visualized Cforget – the forgetting multiplier Fforget – the number of frames passing between consecutive forgetting events


IV. Source-code release and summary


Open-source

In github: •  the filtering algorithm (C++) •  the video-generating tool (Python) •  preprocessed datasets •  documentation


More resources

Whitepaper is available on arxiv

More videos: http://www.youtube.com/user/truthyatindiana/videos


Summary •  A filtering algorithm for large streams of interactions producing differential network updates o  Computationally inexpensive

•  A tool that generates network animations from the network updates o  Feel free to produce your own animation tools making use of of the differential network updates!

•  All is open-sourced and a whitepaper is released

Thanks for listening! Thanks to the organizers of the data challenge!


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.