Biclustering Expression Data Based on Expanding Localized Substructures
Cesim Erten Melih Sรถzdinler
In the beginning, fear was the dominant motivating force. Robert Vaughn
Presentation Overview
Biclustering Definition Types of Biclusters Previous work Our Method
Graph Preliminaries Biclustering method: Localize&Extract Experimental Results
Future Work
Thursday, March 26, 2009
Melih Sözdinler Işık University
2
Biclustering Definition
Clustering: groups of “Similar” items Biclustering:Simultaneously cluster two dimensions The problem is introduced by [Hartigan 72]. The problem is observed that NP-Hard
Clustering Rows
Thursday, March 26, 2009
Clustering Columns
Melih Sözdinler Işık University
Biclustering
3
Types of Biclusters All Constant
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
Constant Row Additive
Constant Row Multiplicative
1 2 3 4
1 2 4 8
1 2 3 4
1 2 3 4
1 2 3 4
Constant Column Additive
Thursday, March 26, 2009
1 2 4 8
1 2 4 8
1 2 4 8
Constant Column Multiplicative
1 2 3 4 1 1 2 3 4 1 1 2 3 4 1 Melih1Sözdinler 2 3Işık4University 1
2 2 2 24
4 4 4 4
8 8 8 8
Types of Biclusters(cont) Coherent Additive Model
Coherent Multiplicative Model
1.0
2.0
4.0
5.0
1.0
2.0
4.0
5.0
2.0
3.0
5.0
6.0
2.0
4.0
8.0
10.0
4.0
5.0
7.0
8.0
0.4
0.8
1.6
2.0
5.0
6.0
8.0
9.0
0.8
1.6
3.2
4.0
Thursday, March 26, 2009
Melih Sözdinler Işık University
5
Previous work
Proposed Algorithms
Cheng and Church’s Algorithm(CC)[Cheng et al’00] Order-Preserving Sub Matrix(OPSM)[Ben Dor et al’02] Conserved gene expression motifs(xMOTIFs)[Murali et al’03] Iterative Signature Algorithm(ISA)[Bergmann et al’03] Statistical-Algorithmic Method for Bicluster Analysis(SAMBA) [Tanay et al’02, Sharan et al’03] Bimax[Prelic et al’06]
Thursday, March 26, 2009
Melih Sözdinler Işık University
6
Previous work(cont.)
Proposed Tools
Biclustering Analysis Toolbox(BicAT)[Barkow et al’06] Click and Expander[Sharan et al’03] Bicoverlapper[Santamaria et al’08]
Thursday, March 26, 2009
Melih Sözdinler Işık University
7
Our Method
Graph Preliminaries Biclustering method: Localize&Extract Experimental Results
Thursday, March 26, 2009
Melih Sözdinler Işık University
8
Graph Preliminaries
Gene Expression Matrix & Bipartite Graph Biclustering & Biclique Bicliques & Crossing Minimization Conditions CM generalized with weights:WOLF[Çakıroğlu Genes
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
Thursday, March 26, 2009
Melih Sözdinler Işık University
9
et al]
Biclustering method: Localize&Extract Phase-1: Localize Phase 1.1 Initial Placement, Run WOLF on Alternating Layers Phase 1.2 Adaptive Noise Hiding Iterate Run WOLF on Alternating Layers Gene-cond pair should
Gene-cond pair could be noise and should be hidden to help localization
not be noise and should not be hidden
Like a convey's Layer Layer AA game of life “who is alone should be hided” Thursday, March 26, 2009
Melih Sözdinler Işık University
LayerBB Layer
10
Biclustering method: Localize&Extract(cont.)
Phase-2: Extraction
It is generic and adaptable.
Constant
All Constant Constant Rows Constant Columns
Coherent
Thursday, March 26, 2009
Melih Sözdinler Işık University
11
Biclustering method: Localize&Extract(cont.)
Phase-2: Extraction
Constant
For All Constant Ones
For Constant row or Constant column Ones
Collect the ones with the same weights Collect the ones with the same weight on each row or column x
A bit error rate with threshold
Thursday, March 26, 2009
The ones with the same weight represent a constant bicluster
Melih Sözdinler Işık University
12
y
Our Method(cont.)
Phase-2: Extraction
Coherent
H-value Calculate H-value for each submatrix Si For each S_i: Mark S_i. Collect ones on the same x, y-alignment and with similar H-value. Expand S_i if H-value difference is small x Si
Thursday, March 26, 2009
Melih Sözdinler Işık University y
13
Experimental Results
Two Real Dataset Experiments
Yeast Cell Cycle(Saccharomyces cerevisiae)2884 genes,17 conditions [Cheng et al’00] Arabidopsis thaliana 734 genes,69 conditions [http://arabidopsis.info/]
LEB parameters, α = 2 and η = 10 for Arabidopsis thaliana dataset. Also for Yeast dataset α = 4 and η = 100
Thursday, March 26, 2009
Melih Sözdinler Işık University
14
Experimental Results(cont.)
Thaliana Experiment
H-Value Experiment We are the second best in terms of minumum average H-Values
Thursday, March 26, 2009
Melih Sözdinler Işık University
15
Experimental Results(cont.)
Yeast Experiments
Enrichments for each functional category Proportions of biclusters enriched according to each GO Biological category using FuncAssociate[Berriz et al’03] Protein-Protein interactions test.
Thursday, March 26, 2009
Melih Sözdinler Işık University
16
Experiment 1 We are better in 8 Functional Categories Thursday, March 26, 2009
Melih Sözdinler Işık University
17
Experiment 2 Biclusters of LEB enriched with better proportions Thursday, March 26, 2009
Melih Sözdinler Işık University
18
Yeast sample Protein-Protein Interactions(PPI) Network from http://www.bordalierinstitute.com/images/yeastProteinInteractionNetwork.jpg
Experiment 3 The best hit ratio for PPI Network is given by the biclustering results of LEB Thursday, March 26, 2009
Melih Sözdinler Işık University
19
Future Work
Experiments on other datasets. Evaluating biclusters using biological metrics as in identifying cancer related genes in sample human cancer data. Formulating the mathematical relation between "Weighted crossing minimization" and various "Bicluster scoring function"s.
Thursday, March 26, 2009
Melih Sözdinler Işık University
20
Acknowledgements
Thanks to TÜBİTAK-BIDEB:
Monthly Payments
Funding for visit
Thursday, March 26, 2009
Melih Sözdinler Işık University
21
Thank you! Any
Questions?
Thursday, March 26, 2009
Melih Sözdinler Işık University
22