DynamicFlow A Client-side Workflow Management System Pedro Lopes pedrolopes@ua.pt Joel Arrais jpa@ua.pt José Luís Oliveira jlo@ua.pt
3rd International Workshop on Practical Applications of Computational Biology & Bioinformatics University of Salamanca - Spain June 10 - 12th, 2009
Acknowledgement: The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement nº 200754, the GEN2PHEN project.
Outline ‣ Introduction - Problem Definition - Related Work
‣ Information Integration - Content Description - Information Sharing - Information Publication
‣ DynamicFlow - Biology Context - Workflow Description - User Interface
‣ Conclusions
Introduction ‣ Bioinformatics is evolving exponentially - HGP » HVP » GEN2PHEN - More computational requirements
‣ Web technologies are also evolving rapidly - More online content - More available services - Paradigm-shift: the web as a platform
‣ New level of interaction - Data and service integration - Autonomous, dynamic and real-time service orchestration ‣ Workflow ‣ Mashups
Related Work
Content Description ‣ Describe the content we want to share ‣ Semantic Web - RDF - OWL - SPARQL
‣ Notable challenge in bioinformatics - Large amount of content and services
‣ Bioinformatics developments - Bio2RDF - RDFScape
Information Sharing ‣ Web2.0 - Learn from other success cases ‣ Social networks ‣ User-driven content
‣ Share information to the community - Promote cooperation instead of competition
‣ Bioinformatics developments - myExperiment - GEN2PHEN Knowledge Centre
Information Publishing ‣ Make information available to others - Create a comprehensive API - Web Services
‣ Promote a new level of interaction among applications - Workflows - Mashups
‣ Bioinformatics developments - Biomart - Biomoby - BioDAS
A simple biological problem ‣ A human geneticist has localized an obesity factor to a 5-Mb region of human chromosome 1. Are there any genes in this region that have homologues that are involved in the regulation of lipid metabolism in Saccharomyces cerevisiae?
‣ How to address this issue? - Gather distinct and heterogeneous information - Query multiple services and applications
‣ Solution - Multiple copy-paste between several applications and web pages - Program an application for this specific problem - Ask a graduate student
Bioinformatics Workflow ‣ Divide and Conquer 1.Get all genes related to the human species 2.Restrict the locus of the genes to a 5Mb region of chromosome 2 3.Find all the homologues of the filtered genes 4.Limit the homologue list to the one related to the Lipid Metabolism 5.Relate the list of homologues to the list of genes in saccharomyces cerevisiae specie
1
GetAllGenesFromOrg(“hsa”)!
2
FilterByLocus(1,0,5M)!
3
GetKoByGenes()!
4
FilterByKo(“Lipid Metabolism”)!
5
GetGenesByKo(“sce”)!
Implementation ‣ Create service wrappers - Define an ontology - Specify XML structure
‣ Components - Display Name - Description - Input / Output - Specie - XmlString
Conclusion ‣ DynamicFlow is based on - Predefined ontologies - Service description - Services as wrappers - Workflow metaphor
‣ Resulting in a system that allows - Heterogeneous data integration - Agile client-side service integration
Questions?
Thank You!