Przemyslaw fast viz nov21 by Waterloo Institute for Complexity and Innovation

Fast visualization of relevant portions of large dynamic networks Przemyslaw A. Grabowicz Luca Maria Aiello Filippo Menczer

Networksâ&#x20AC;Ś

Networks are everywhere •  Information & knowledge networks •  e.g., Wikipedia, WWW, IMDb

•  Social networks •  e.g., Facebook, Twitter

•  Commodity networks •  e.g., Internet, transportation networks

•  Biological networks •  e.g., neural network, gene expression networks

Networks are large Examples: •  Wikipedia ~ 106 articles •  Online social networks ~ 109 users •  Transportation networks ~ 104 airports worldwide •  Neural networks ~ 1011 neurons

Internet

Networks are dynamic Examples: •  Wikipedia – half million new articles per year •  Online social networks – temporal user interactions •  Transportation networks – people traveling •  Neural networks – traveling action potentials

The Egyptian Revolution on Twitter Visualization by André Panisson (http://youtu.be/2guKJfvq4uI)

How we describe networks? •  Structural properties o  o  o  o

Clustering coefficient Modularity Assortativity coefficient … mostly designed for static networks

•  Visualizations o  Graph layouts o  Network filtering o  … mostly designed for static networks

Network animation Outline I.  Existing software for network visualization II.  Filtering of large dynamic graphs III.  Experimental datasets IV.  Source-code release and summary

I. Existing software for network visualization

D3.JS

Network visualization tools

•  can handle dynamic graphs •  interactive •  static filtering •  slower than other tools (written in Java, GUI-based) The Egyptian Revolution on Twitter

Network visualization tools

•  can easily plot large networks •  fast (written in C++, with interfaces in Python and R) Clusters and activity of Twitter users

Network visualization tools

•  can easily plot large networks •  fast (C++ core of BGL, with an interface in Python)

Price network

Network visualization tools D3.JS •  a tool for data visualization, not just networks •  highly interactive •  easy to embed in webpages •  slow (a JavaScript library)

Songs similar to “Poker Face” of Lady Gaga, based on Last.fm

Character co-occurence in Les Misérables

Visualizing large dynamic networks? (with these or other tools) Challenges: •  Hard to distinguish important nodes and edges •  Hard to follow the evolution of nodes and edges •  Computationally expensive

A technique of filtering large dynamics graphs is needed.

II. Filtering of large dynamic graphs o  Our solution o  Problem formulation o  Which nodes are important to keep? o  Key concepts of the algorithm o  The algorithm o  Computational complexity o  Output and visualization

Our solution Filtering

1.  Processes a chronological stream of interactions between the nodes of a large network 2.  Dynamically filters the most relevant parts of the network, emphasizing old nodes that show fresh activity

Animation

3.  Produces differential updates representing the network evolution 4.  Can feed these updates directly to visualization tools, potentially to any of the aforementioned tools

t1, n1, n2 t2, n1, n3 t3, n1, n2, n3, n4 t4, n3, n5 t5, n1, …, nm

Problem formulation Imagine that we have a live stream of interactions between nodes that we want to visualize. Stream of interactions: t1, n1, n2 t2, n1, n3 t3, n1, n2, n3, n4 t4, n3, n5 t5, n1, â&#x20AC;Ś, nm

Stream of interactions

Stream of interactions Stream of interactions: t1, n1, n2 t2, n1, n3 t3, n1, n2, n3, n4 t4, n3, n5 t5, n1, …, nm (…) Millions of such interactions

Filtering Stream of interactions: t1, n1, n2 t2, n1, n3 t3, n1, n2, n3, n4 t4, n3, n5 t5, n1, â&#x20AC;Ś, nm

We filter the network of interactions on-the-fly.

Filtering We aim to pick the most important nodes and visualize them. Which are the most important nodes? 1.â&#x20AC;Ż With highest degree 2.â&#x20AC;Ż Exhibiting highest activity/node strength 3.â&#x20AC;Ż Most central

Filtering Key factors that we address here: •  The importance of nodes changes in time o  We update the score of nodes on the fly

•  It builds up in time due to repeated activity o  We remember the past score and increase it whenever nodes show new activity

•  Sometimes it diminishes due to inactivity o  We gradually decrease the score to forget the oldest activity

Filtering To sum up: •  We introduce scores o  (changeable it time)

•  Increasing due to the activity

Nodes

Edges

sij (t)

Si (t)

δij

Δi

o  (per each arrival of interaction)

•  Decreasing due to the forgetting o  (every time period)

C forget

Filtering To sum up: •  We introduce scores o  (changeable it time)

Nodes

Edges

Si (0) = 0

sij (0) = 0

Δi = 1

1 δij = m −1

•  Increasing due to the activity o  (per each arrival of interaction)

•  Decreasing due to the forgetting

C forget = 0.9

o  (every time period)

We keep the nodes and edges with the highest scores

Degree/strength filtering?

Activity stream (input)

Visualized network (output)

A filtering buffer

Activity stream (input)

Buffered network (memory)

Visualized network (output)

A filtering buffer

Stage 1 (filtering)

Stage 2 (update-generation)

Why buffer? •  Remembers the scores of the network •  Computationally inexpensive in comparison with the full network •  Smoothens the animations

The algorithm

Buffered network N b â&#x2030;&#x2C6; 10 4

The algorithm

Buffered network N b â&#x2030;&#x2C6; 10 4

The algorithm

Buffered network N b ≈ 10 4

Visualized network N v ≈ 101 ÷10 2

Computational complexity The algorithm is fast: •  Stage 1 •  Stage 2 Where: E – the total number of pairwise interactions read Nb – the number of buffered nodes Nv – the number of visualized nodes F – the number of frames produced

Output of the filtering algorithm The output of the filtering step is formatted as JSON files with differential updates to the visualized network. an: Add node cn: Change node dn: Delete node ae: Add edge ce: Change edge de: Delete edge

JSON icon created by http://dryicons.com JSON format of the Gephi Streaming API

Output of the filtering algorithm The output of the filtering step is formatted as JSON files with differential updates to the visualized network. One can feed it directly to: •  Our video-generating module o  uses igraph for graph plotting and mencoder for video encoding

•  Other tools visualizing dynamic networks o  Gephi Streaming API o  more platforms?

Our video-generating module What it does? 1.  Parses JSON differential updates 2.  Creates/updates a network using igraph 3.  Plots the network using pycairo o  o  o

the Fruchterman-Reingold layout frames stabilization extra effects: node-popping and node-soaking animations

4.  Encodes a video by combining the frames with plotted network using mencoder Let’s see how it works!

III. Experimental datasets o

o  Characteristics o  Animations Parameters of the filtering algorithm

Experimental datasets 1.  The announcement of Bin Laden’s death on Twitter (2011) Nodes: @users and #hashtags Edges: co-appearances in tweets related to Bin Laden’s death

2.  Plot keywords from movies (1912-2018)

Nodes: keywords Edges: co-appearances of keywords in the descriptions of movies

3.  Words co-appearing in US patents (1976-2010) Nodes: words appearing in the titles of patents (no stopwords) Edges: co-appearances in the titles

Experimental datasets Characteristics: •  Periods of time from 2 hours to 106 years •  From dozens of thousands to half a million nodes •  We visualize at most hundreds of most important nodes

1.  The announcement of Bin Laden’s death on Twitter (2011) Nodes: @users and #hashtags Edges: co-appearances in tweets related to Bin Laden’s death

2.â&#x20AC;Ż Plot keywords from movies (1912-2018)

Nodes: keywords Edges: co-appearances in the user-generated descriptions of movies

3.â&#x20AC;Ż Words co-appearing in US patents (1976-2010) Nodes: words appearing in the titles of patents (no stopwords) Edges: co-appearances in the titles

Algorithm’s parameters

Tcontr – time contraction, i.e., how much shorter is the visualization than the real evolving network Nb – the number of buffered nodes Nv – the number of visualized nodes smin – the minimal score of edges that will be visualized Cforget – the forgetting multiplier Fforget – the number of frames passing between consecutive forgetting events

IV. Source-code release and summary

Open-source

In github: •  the filtering algorithm (C++) •  the video-generating tool (Python) •  preprocessed datasets •  documentation

More resources

Whitepaper is available on arxiv

More videos: http://www.youtube.com/user/truthyatindiana/videos

Summary •  A filtering algorithm for large streams of interactions producing differential network updates o  Computationally inexpensive

•  A tool that generates network animations from the network updates o  Feel free to produce your own animation tools making use of of the differential network updates!

•  All is open-sourced and a whitepaper is released

Thanks for listening! Thanks to the organizers of the data challenge!