Charalampos (Babis) E. Tsourakakis
SIAM Data Mining Conference April 30th, 2010 MACH: Fast Randomized Tensor Decompositions, SDM 2010
1
Introduction Why Tensors? Tensor Decompositions
Our Motivation Proposed Method
Experimental Results Case study I: Intemon Case study II: Intel Berkeley Lab
Conclusion
MACH: Fast Randomized Tensor Decompositions, SDM 2010
2
(min)value
0200040006000800010000010203040time
(min)value
020004000600080001000000.511.522.5time
Temperature Light
Intel Berkeley lab
Humidity Voltage
MACH: Fast Randomized Tensor Decompositions, SDM 2010 3
(min)value
0200040006000800010000051015202530time
(min)value
02000400060008000100000100200300400500600time
Location
time
Data modeled as a tensor, i.e., multidimensional matrix, T x (#sensors) x (#types of measurements) Time mode
Sensor mode
Measurement type mode
Observation Multi-‐aspect data can be modeled in such way. MACH: Fast Randomized Tensor Decompositions, SDM 2010
4
Functional Magnetic Resonance Imaging (fMRI) 5 Mode Tensor voxel x subjects x trials x task conditions x timeticks
Tensors model naturally numerous real-‐world datasets. And now what? MACH: Fast Randomized Tensor Decompositions, SDM 2010
5
Introduction Why Tensors? Tensor Decompositions
Our Motivation Proposed Method
Experimental Results Case study I: Intemon Case study II: Intel Berkeley Lab
Conclusion
MACH: Fast Randomized Tensor Decompositions, SDM 2010
6
Α (m x n)
=
σ1
o v1 u1
+
σ2 o v2
+
u2
σ3 o v3
+…
u3
Singular value decomposition (SVD) The “Swiss army knife” of matrix decompositions (O’Leary)
MACH: Fast Randomized Tensor Decompositions, SDM 2010
7
Document to term matrix
Documents to Document HCs Strength of each concept
CS =
x
x
MD data graph java brain lung
Term to Term HCs MACH: Fast Randomized Tensor Decompositions, SDM 2010
8
Two families of algorithms extend SVD to the
multilinear setting
PARAFAC/CANDECOMP decompositions Tucker decomposition Tensor Decompositions and its Applications, SIAM review Kolda
Bader MACH: Fast Randomized Tensor Decompositions, SDM 2010
9
Tucker is an SVD-‐like decomposition of a tensor, with one projection matrix per mode and a core tensor.
~
J. Sun showed that Tucker decompositions can be used to extract useful knowledge from monitoring systems. MACH: Fast Randomized Tensor Decompositions, SDM 2010
10
Introduction Why Tensors? Tensor Decompositions
Our Motivation Proposed Method
Experimental Results Case study I: Intemon Case study II: Intel Berkeley Lab
Conclusion
MACH: Fast Randomized Tensor Decompositions, SDM 2010
11
Most of the real-‐world processes result in
sparse tensors. However, there exist important processes which result in dense tensors: Physical Process
Percentage of non-‐zero entries
Sensor network (sensor x measurement type x timeticks)
85%
Computer network (machine x measurement type x timeticks)
81%
MACH: Fast Randomized Tensor Decompositions, SDM 2010
12
It can be either very slow or impossible to
perform a Tucker decomposition on a dense tensor due to memory constraints.
Given the fact that (low rank) Tucker decompositions are valuable in practice, can we “trade” a “little bit” of accuracy for efficiency? MACH: Fast Randomized Tensor Decompositions, SDM 2010
13
Introduction Why Tensors? Tensor Decompositions
Our Motivation Proposed Method
Experimental Results Case study I: Intemon Case study II: Intel Berkeley Lab
Conclusion
MACH: Fast Randomized Tensor Decompositions, SDM 2010
14
Fast low rank matrix approximation STOC 2001
McSherry
Achlioptas
MACH extends the work of Achlioptas-McSherry for fast low rank approximations to the multilinear setting. MACH: Fast Randomized Tensor Decompositions, SDM 2010
15
Toss a coin for each non-‐zero entry with
probability p
If it “survives” reweigh it by 1/p. If not, make it zero!
Perform Tucker on the sparsified tensor!
For the theoretical results and more details,
see the MACH paper.
MACH: Fast Randomized Tensor Decompositions, SDM 2010
16
Introduction Why Tensors? Tensor Decompositions
Our Motivation Proposed Method
Experimental Results Case study I: Intemon Case study II: Intel Berkeley Lab
Conclusion
MACH: Fast Randomized Tensor Decompositions, SDM 2010
17
Intemon: A prototype monitoring and mining
system for data centers, developed at Carnegie Mellon University.
Tensor X, 100 machines x 12 types of
measurement x 10080 timeticks
MACH: Fast Randomized Tensor Decompositions, SDM 2010
18
Ideal ρ=1
For p=0.1 we obtain that Pearson’s Correlation Coefficient is 0.99
MACH: Fast Randomized Tensor Decompositions, SDM 2010
19
Find the differences! Exact
MACH
The qualitative analysis which is important for our goals remains the same! MACH: Fast Randomized Tensor Decompositions, SDM 2010
20
Introduction Why Tensors? Tensor Decompositions
Our Motivation Proposed Method
Experimental Results Case study I: Intemon Case study II: Intel Berkeley Lab
Conclusion
MACH: Fast Randomized Tensor Decompositions, SDM 2010
21
Berkeley Lab
Tensor 54 sensors x 4 types of measurement x
5385 timeticks
MACH: Fast Randomized Tensor Decompositions, SDM 2010
22
The qualitative analysis which is important for our goals remains the same! MACH: Fast Randomized Tensor Decompositions, SDM 2010
23
Exact
MACH
The spatial principal mode is also preserved, and Pearson’s correlation coefficient is again almost 1! MACH: Fast Randomized Tensor Decompositions, SDM 2010
24
REMARKS 1) Daily periodicity is apparent. 2) Pearson’s correlation Coefficient 0.99 with the exact component.
MACH: Fast Randomized Tensor Decompositions, SDM 2010
25
Introduction Why Tensors? Tensor Decompositions
Our Motivation Proposed Method
Experimental Results Case study I: Intemon Case study II: Intel Berkeley Lab
Conclusion
MACH: Fast Randomized Tensor Decompositions, SDM 2010
26
Randomized Algorithms for Tensors Smallest p* for tensor sparsification for the
HOOI algorithm
Randomized Algorithms work very well (e.g.,
sublinear time algorithm), but typically hard to analyze.
MACH: Fast Randomized Tensor Decompositions, SDM 2010
27
MACH: Fast Randomized Tensor Decompositions, SDM 2010
28
MACH: Fast Randomized Tensor Decompositions, SDM 2010
29
Remark:Even if our theoretical results refer to HOSVD, MACH works for HOOI
MACH: Fast Randomized Tensor Decompositions, SDM 2010
30
Canonical Decomposition CANDECOMP/ PARAFAC
Tucker Decomposition
MACH: Fast Randomized Tensor Decompositions, SDM 2010
31