Service Composition in Biomedical Applications Pedro Lopes pedrolopes@ua.pt PhD Thesis Proposal Programa Doutoral em Engenharia Informática December 17th, 2009
Research Supervisor: José
Luís Oliveira jlo@ua.pt
Outline ‣ Introduction ‣ Bioinformatics ‣ Objectives ‣ Problems & Requirements ‣ Technologies ‣ Strategies ‣ Workplan ‣ What’s Next?
Introduction ‣ Internet (and computer science) is suffering a (r)evolution! • New application paradigms ‣ Web access anywhere, anytime and to everyone • Static • Mobile
‣ The platform for everything • New opportunities • New challenges
Introduction ‣ Internet (and computer science) is suffering a (r)evolution! • New application paradigms ‣ Web access anywhere, anytime and to everyone • Static • Mobile
‣ The platform for everything • New opportunities • New challenges
Data
Information
Knowledge
Bioinformatics
[Motivation]
8 bits - 1 byte
10
________
ATCG
________
Bioinformatics
[Motivation]
8 bits - 1 byte
10 ATCG
10110011 ________
256 combs
Bioinformatics
[Motivation]
8 bits - 1 byte
10
10110011
256 combs
ATCG
ACCGTTAG
65536 combs
Wonderful Complexity!
[Contextualization] Bioinformatics
‣ It all started in the Human Genome Project... • Immense amount of data ‣ New technologies to deal with the “Book of Life”
‣ New projects were born • More data! ‣ Need for improved, next-generation applications
[Contextualization] Bioinformatics
‣ It all started in the Human Genome Project... • Immense amount of data ‣ New technologies to deal with the “Book of Life”
‣ New projects were born • More data! ‣ Need for improved, next-generation applications
Data
Information
Knowledge
[Landscape] Bioinformatics
‣ Databases • KEGG, UniProt, EBI, NCBI, LOVD, UMD...
(150 in GeNS)
‣ Service protocols • DAS, BioMart, EMBOSS, Soaplab, WABI, BioMOBY
‣ Integration applications • DiseaseCard, GeneBrowser, GeNS, ... • Taverna, Bioclipse... • Biozon, Bioconductor, Entrez, Ensembl, ... • Bio2RDF, RDF Scape, ...
‣ Previous research • DynamicFlow
Objectives ‣ Dig deep in the life sciences research field • Understand the problems • Study state-of-the-art
‣ Propose solutions • Analyze the requirements • Develop framework ‣ Internal and external usage
‣ Publish
Objectives ‣ Dig deep in the life sciences research field • Understand the problems • Study state-of-the-art
‣ Propose solutions • Analyze the requirements • Develop framework ‣ Internal and external usage
‣ Publish
Promote research and development of novel, nextgeneration frameworks and strategies to enhance life sciences web applications and systems
Roadmap Problems & Requirements
Technologies
‣Heterogeneity ‣Integration ‣Interoperability ‣Description
‣Web-based access ‣Web Services ‣GRID ‣Semantic Web Strategies
‣Static Apps ‣Dynamic Apps ‣Meta Apps
Roadmap Problems & Requirements
Technologies
‣Heterogeneity ‣Integration ‣Interoperability ‣Description
‣Web-based access ‣Web Services ‣GRID ‣Semantic Web Strategies
‣Static Apps ‣Dynamic Apps ‣Meta Apps
[Problems & Requirements] Heterogeneity
‣ Subject of many research projects ‣ Occurs at various levels Physical
‣Web Server ‣FTP Server ‣File Server ‣Backup Tape
Logical
‣Relational
Database ‣OO Database ‣Text File ‣Binary File
Format
‣HTML ‣CSV ‣XML ‣TXT ‣Excel
Model
‣Structure ‣Ontology ‣Semantics
Access
‣Local ‣Remote APIs ‣Web Services
[Problems & Requirements] Integration
‣ To deal with resource heterogeneity
Centralized (...) to Distributed (...) ‣ Various solutions
[Problems & Requirements] Integration
‣ To deal with resource heterogeneity
Centralized (...) to Distributed (...) ‣ Various solutions Warehouse
Mediator
Link
App
App
App
Mediator
[Problems & Requirements] Integration
‣ To deal with resource heterogeneity
Centralized (...) to Distributed (...) ‣ Various solutions Warehouse
Mediator
Link
App
App
App
HybridMediator framework!
[Problems & Requirements] Interoperability
‣ Facilitate integration and communication between applications Conceptual interoperability Dynamic interoperability Pragmatic interoperability Semantic interoperability Syntactic interoperability Technical interoperability No interoperability
Increasing capability for interoperation
[Problems & Requirements] Description
‣ Resource description is the key for integration and interoperability • Provide meaning to content
‣ Apply area-specific terminology • Ontology
• An extra-effort to resource publishers • Will be very important in the future Internet
Roadmap Problems & Requirements
Technologies
‣Heterogeneity ‣Integration ‣Interoperability ‣Description
‣Web-based access ‣Web Services ‣GRID ‣Semantic Web Strategies
‣Static Apps ‣Dynamic Apps ‣Meta Apps
Roadmap Problems & Requirements
Technologies
‣Heterogeneity ‣Integration ‣Interoperability ‣Description
‣Web-based access ‣Web Services ‣GRID ‣Semantic Web Strategies
‣Static Apps ‣Dynamic Apps ‣Meta Apps
Web
[Technologies] services
‣ Applications need to communicate with each other through the web ‣ Most widely used technology for the development of distributed web applications • SOAP
Service Broker
• REST • XMPP
UDDI
L
W
SD
SD
L
W
Service Requester
SOAP
Service Provider
GRID and Semantic
[Technologies] Web
‣ GRID • Combination of software and hardware infrastructures ‣ Pervasive, Consistent, Low-cost, • Various GRID types (Computing, Data, Knowledge)
‣ Semantic Web • Resource Description ‣ Complete framework • OWL + RDF + SPARQL, Microformats
• Link available resources in a meaningful way for both Humans and Machines
Roadmap Problems & Requirements
Technologies
‣Heterogeneity ‣Integration ‣Interoperability ‣Description
‣Web-based access ‣Web Services ‣GRID ‣Semantic Web Strategies
‣Static Apps ‣Dynamic Apps ‣Meta Apps
Roadmap Problems & Requirements
Technologies
‣Heterogeneity ‣Integration ‣Interoperability ‣Description
‣Web-based access ‣Web Services ‣GRID ‣Semantic Web Strategies
‣Static Apps ‣Dynamic Apps ‣Meta Apps
Static or Dynamic
[Strategies] Applications
‣ Static ‣ Solve all the problems... “by hand”! • Hard-coded integration, interoperability and Description ‣ Not a very clever solution • Adequate to (very) small projects
‣ Dynamic ‣ Take advantage of novel concepts • Description + Composition • Intelligent mechanisms for input/output combinations ‣ Generic • Suitable for the majority of scenarios
Meta
[Strategies] Applications
‣ Applications running applications • Like metadata is data about data
‣ Software-as-a-service • Service Oriented Architectures
‣ Mashups • Workflows
Meta
[Strategies] Applications
‣ Applications running applications • Like metadata is data about data Activity 1b In: A - Out: X
‣ Software-as-a-service • Service Oriented Architectures
Activity 2b In: X - Out: Y
‣ Mashups Activity 1a In: A - Out: B
• Workflows
Activity 2a In: B & Z - Out: C
‣ Advanced
usage
Activity 4 In: C - Out: D
Activity 5 In: D - Out: Final
Activity 3 In: Y - Out: Z
[Calendar] Workplan
‣ Thesis
Year 1 Q1
Q2
Q3
Year 2 Q4
Q1
Q2
Q3
Year 3 Q4
Q1
Q2
Q3
Year 4 Q4
Q1
Q2
Q3
Q4
State of the Art Domain Analysis Proposal Main corpus Delivery
‣ Software
Year 1 Q1
Q2
Q3
Year 2 Q4
Q1
Q2
Q3
Year 3 Q4
Q1
Q2
Q3
Year 4 Q4
Q1
Q2
Q3
Q4
Preliminary Research System Analysis Modelling Active Development Deliveries
‣ Publications
Year 1 Q1
High Impact Factor Medium Impact Factor
Q2
Q3
Year 2 Q4
Q1
Q2
Q3
Year 3 Q4
Q1
Q2
Q3
Year 4 Q4
Q1
Q2
Q3
Q4
[Publications] Workplan
‣ Medium impact factor • International Conferences & Workshops
‣ High impact factor • Science, BMC Bioinformatics, Hindawi, Oxford Journals
‣ Published work • Dynamic Service Integration using Web-based Workflows ‣
10th International Conference on Information Integration and Web Applications and Services; Linz, Austria; November 2008
• DynamicFlow: A Client-side Workflow Management System ‣
3rd International Workshop on Practical Applications of Computational Biology; Salamanca, Spain; June 2009
• Arabella: A Directed Web Crawler ‣
International Conference on Knowledge Discovery and Information Retrieval; Madeira, Portugal; October 2009
• Link Integrator: A Link-based Data Integration Architecture ‣
International Conference on Knowledge Discovery and Information Retrieval; Madeira, Portugal; October 2009
• Integration of Variome Data using a Link Discovery Strategy ‣
Iberian Bioinformatics Conference 2009; Lisbon, Portugal; November 2009
What’s Next? Promote research and development of novel, nextgeneration frameworks and strategies to enhance life sciences web applications and systems ‣ Research and Development • Enabling knowledge ‣ Semantic Web as a technology to ease integration and interoperability • Well-defined competition • Ongoing “hands-on” work • Promote internal and external usage
‣ One framework, multiple projects • EU-ADR, GEN2PHEN, DiseaseCard, OralCard, VarCard
‣ Publish
Thank You!
Questions?