SERVICE COMPOSITION FOR BIOMEDICAL APPLICATIONS Pedro Lopes pedrolopes@ua.pt
supervisor José Luís Guimarães Oliveira universidade de aveiro
doctoral programme in informatics engineering october 1st, 2012
jury Artur Manuel Soares da Silva universidade de aveiro Víctor Maojo García universidade politécnica de madrid Rui Pedro Sanches de Castro Lopes escola superior de tecnologia e gestão do instituto politécnico de bragança Francisco José Moreira Couto universidade de lisboa Carlos Manuel Azevedo Costa universidade de aveiro
software engineering
SERVICE COMPOSITION FOR BIOMEDICAL APPLICATIONS bioinformatics & computational biology
DECREASED DISEASE RISK
ELEVATED DISEASE RISK
DATA EVOLUTION
1500 1330 1250
1170
1380
1230
1078 968
1000
858 719
750 548 500 250 0
162
2004
171
2005
139
110
110
95
58
2006
2007
2008
2009
2010
New
96
92
2011
2012
Total
NAR database list evolution, from 2004 to 2012
DATA EVOLUTION
more data
bioinformatics requirements
more tools
computer science developments
innovation
DATA EVOLUTION
innovation
new software and hardware
more data
bioinformatics requirements
more tools
DATA EVOLUTION
more tools innovation more data
DATA EVOLUTION
more data more tools innovation
NEW STRATEGIES TO IMPROVE AND ADD VALUE TO SERVICE COMPOSITION SCENARIOS
more data more tools innovation
NEW STRATEGIES TO IMPROVE AND ADD VALUE TO SERVICE COMPOSITION SCENARIOS
workflow-based strategies for service composition resource integration approaches for service composition enhancing service composition with the semantic web and rad
NEW STRATEGIES TO IMPROVE AND ADD VALUE TO SERVICE COMPOSITION SCENARIOS
workflow-based strategies for service composition resource integration approaches for service composition enhancing service composition with the semantic web and rad
SCIENTIFIC WORKFLOWS
SCIENTIFIC WORKFLOWS
SCIENTIFIC WORKFLOWS data in integration perspective distributed composition control
data out result analysis knowledge exchange
INTEROPERABILITY CHALLENGE scientific workflows workflow & service interoperability parallelization, security, integration
activity execution combine and evaluate multiple independent activities deliver service & workflow execution in real-time web-based environment
data sharing knowledge exchange semantics collaborative and reproducible research
previous work dynamicflow taverna is the de facto workbench for scientific workflows
deploy a scalable architecture to enable service composition between multiple data & services providers
?
WORKFLOW COMPOSITION ARCHITECTURE API 5 CLIENT APPLICATIONS Hibernate
4 APPLICATION ENGINE Data Combination Engine
3 DATA MANAGEMENT
Workflow Execution Engine
Taverna
2 WORKFLOW ENGINE
WEB SERVICE
1 WORKFLOWS
WEB SERVICE
WEB SERVICE
WORKFLOW COMPOSITION ARCHITECTURE API 5 CLIENT APPLICATIONS Hibernate
4 APPLICATION ENGINE Data Combination Engine
3 DATA MANAGEMENT
Workflow Execution Engine
Taverna
workflow engine new java-based taverna workflow wrapper enable distributed service orchestration open knowledge provider hub
2 WORKFLOW ENGINE
WEB SERVICE
WEB SERVICE
WEB SERVICE
interaction xsd xml schema for service composition normalize service input and output
1 WORKFLOWS
enable autonomous data exchanges
HIGHLIGHTS workflow execution engine wrap taverna execution online real-time activity processing
new interoperability standard communication language to enable automated data exchanges from multiple providers ease the creation of distributed service composition workflows
distributed scientific workflows complex interactions between heterogeneous services distributed signal substantiation tasks
web-based workspace always-available collaborative environment custom on-demand data analysis
a new strategy for advanced service composition that enables collaborative & distributed research
!
RESULTS
EU-ADR WEB PLATFORM http://bioinformatics.ua.pt/euadr
RESULTS
EU-ADR WEB PLATFORM http://bioinformatics.ua.pt/euadr
a bioinformatics hub for pharmacovigilance knowledge providers Pedro Lopes, David Campos, Tiago Nunes, José Luís Oliveira acm transactions on management information systems [ongoing] november 2012
the eu-adr web platform: delivering advanced pharmacovigilance tools José Luís Oliveira, Pedro Lopes, Tiago Nunes, David Campos, S Boyer, E Ahlberg, EM Van Mulligen, JA Kors, B Singh, LI Furlong, F Sanz, A Bauer-Mehren, MC Carrascosa, J Mestres, P Avillach, G Diallo, C Diaz, J Van der Lei pharmacoepidemiology and drug safety [second revision] october 2012
automatic filtering and substantiation of drug safety signals A Bauher-Mehren, EM van Mulligen, P Avillach, MC Carrascosa, B Singh, R GarciaSerna, Pedro Lopes, José Luís Oliveira, G Diallo, J Mestres, E Ahlberg Helgee, S Boyer, F Sanz, JA Kors, LI Furlong plos computational biology march 2012
NEW STRATEGIES TO IMPROVE AND ADD VALUE TO SERVICE COMPOSITION SCENARIOS
workflow-based strategies for service composition resource integration approaches for service composition enhancing service composition with the semantic web and rad
NEW STRATEGIES TO IMPROVE AND ADD VALUE TO SERVICE COMPOSITION SCENARIOS
workflow-based strategies for service composition resource integration approaches for service composition enhancing service composition with the semantic web and rad
INTEGRATION CHALLENGE
UNDERSTAND CHANGES IN OUR GENETIC SEQUENCE genetic dataset aggregation extract data from distributed lsdbs
genotype-to-phenotype integration enrich human variome data with connections to multiple data types
content accreditation promote correct authorship, ownership and attribution
previous work scientific workflows ideal for service-service interactions service composition for resource integration
design a service composition strategy to enable agile integration of enriched human variome knowledge
?
INTEGRATION STRATEGY
deliver knowledge web application variome api
API
APPLICATION ENGINE GENE LIST
LOVD
UMD
IDB
LSDB LIST
Other
external data miscellaneous heterogeneous sources from gene ontology to protein databases
FEED READER
distributed lsdbs custom variant readers and web crawling multiple non-standardized formats
ARABELLA
INTEGRATION ARCHITECTURE API 5 CLIENT APPLICATIONS
4 APPLICATION ENGINE
BUILD ENGINE 3 BUILD ENGINE
CSV 2 INTEGRATION MIDDLEWARE
1 CONFIGURATION
XML
SQL
REST
INTEGRATION ARCHITECTURE api
API
rest api for external, enables service composition scenarios
5 CLIENT APPLICATIONS
extensible data model core (gene + variant) plus extensions lightweight link-based connections
4 APPLICATION ENGINE
advanced integration engine
BUILD ENGINE
service composition for intelligent lsdb data extraction
3 BUILD ENGINE
variation dataset enrichment
CSV
XML
SQL
REST
data gathering wrappers configurable resources
2 INTEGRATION MIDDLEWARE
load data from csv, xml, sql and rest sources
flexible configuration single resource setup file 1 CONFIGURATION
innovative service composition description schema
HIGHLIGHTS variation integration service composition approach to gather genetics dataset from multiple external resources
extensible model simplified description and addition of new external service composition actors
interoperability unique service composition variome api access to curated collection of genetic variants
innovative ui gene mesh and liveview content accreditation
new service composition methods allow understanding, exploring and connecting human variome knowledge
!
RESULTS
WAVE: WEB ANALYSIS OF THE VARIOME http://bioinformatics.ua.pt/wave
RESULTS
WAVE: WEB ANALYSIS OF THE VARIOME http://bioinformatics.ua.pt/wave
wave: web analysis of the variome Pedro Lopes, Raymond Dalgleish and José Luís Oliveira human mutation march 2011
an extensible platform for variome data integration Pedro Lopes and José Luís Oliveira 10th ieee international conference on information technology and applications in biomedicine corfu, greece, november 2010
a holistic approach for integrating genomic variation information Pedro Lopes and José Luís Oliveira 10th spanish symposium on bioinformatics malaga, spain, october 2010
NEW STRATEGIES TO IMPROVE AND ADD VALUE TO SERVICE COMPOSITION SCENARIOS
workflow-based strategies for service composition resource integration approaches for service composition enhancing service composition with the semantic web and rad
NEW STRATEGIES TO IMPROVE AND ADD VALUE TO SERVICE COMPOSITION SCENARIOS
workflow-based strategies for service composition resource integration approaches for service composition enhancing service composition with the semantic web and rad
NEXT-GENERATION BIOMEDICAL APPLICATIONS
life sciences complex mesh of data and services modern demands keep pushing computer science forward
previous work new service composition strategies for interoperability using scientific workflows new service composition strategies for resource integration
rapid application development easily-configurable service composition implementation environment build the next generation of service composition applications faster
the semantic web paradigm the perfect solution for life sciences natural complexity
NEXT-GENERATION BIOMEDICAL APPLICATIONS enhanced rapid application development straightforward service composition setup process
advanced data integration engine rich service composition description, flexible integration from heterogeneous resources
semantic knowledge management involve semantic web technologies at all service composition layers
state-of-the-art interoperability apis enable service composition for everything and everyone
knowledge federation enable distributed access to knowledge
empower ecosystems delivery network for custom cross-platform and cross-device applications
design a new semantic web framework to streamline the creation of next generation biomedical applications
?
FRAMEWORK MODEL SINGLE INSTANCE
integration
DATA INTEGRATION CONNECTORS CSV
XML
SQL
SPARQL
data in
foaf:name
dc:title
rdfs:label
owl:imports
KNOWLEDGE BASE
REST
JAVA
LDATA API
SPARQL
interoperability data out
FRAMEWORK MODEL FEDERATION
DATA INTEGRATION CONNECTORS CSV
XML
SQL
SPARQL
DATA INTEGRATION CONNECTORS CSV
XML
SQL
DATA INTEGRATION CONNECTORS SPARQL
foaf:name
dc:title
rdfs:label
CSV
XML
SQL
SPARQL
LDATA
SPARQL
owl:imports
KNOWLEDGE BASE KNOWLEDGE BASE
REST
JAVA
LDATA
API
KNOWLEDGE BASE
SPARQL
REST
JAVA
LDATA
SPARQL
API
KNOWLEDGE FEDERATION LAYER
REST
JAVA
API
FRAMEWORK ARCHITECTURE 6 CLIENT APPLICATIONS
REST
Java
pubby
Joseki
LinkedData
SPARQL
5 API
4 APPLICATION ENGINE
3 KNOWLEDGE BASE
ABSTRACTION ENGINE 2 INTEGRATION ENGINE
CSV 1 EXTERNAL SOURCES
XML
SQL
SPARQL
FRAMEWORK ARCHITECTURE modern client-side development
6 CLIENT APPLICATIONS
ready for any ui framework
future-proof interoperability REST
Java
pubby
Joseki
LinkedData
SPARQL
rest services sparql endpoint + linkeddata interfaces available by default java + javascript libraries
5 API
streamlined application engine “semantic web in a box” 4 APPLICATION ENGINE
straightforward backend deployment with tomcat
semantic knowledge management mysql-based triplestore 3 KNOWLEDGE BASE
jena-supported methods
advanced integration engine
ABSTRACTION ENGINE
semantic web translation
2 INTEGRATION ENGINE
new extract-transform-load strategy
CSV
XML
SQL
SPARQL
flexible service composition configuration comprehensive connectors & selectors
1 EXTERNAL SOURCES
load data from csv, xml, sql and sparql sources
HIGHLIGHTS rapid application development quickly build a new service composition application ecosystem
semantic data integration platform flexible acquisition and translation of data from heterogeneous resources
semantic web and linkeddata interoperability future-proof interoperability with the most innovative application paradigm semantic reasoning and inference
federation rest, sparql and linkeddata apis one query, multiple knowledge bases
a framework to empower the creation of next-generation service composition-based semantic software
!
RESULTS COEUS
http://bioinformatics.ua.pt/coeus
RESULTS COEUS
http://bioinformatics.ua.pt/coeus
coeus: “semantic web in a box” for biomedical applications Pedro Lopes and José Luís Oliveira journal of biomedical semantics [second revision] october 2012
coeus: a semantic web application framework Pedro Lopes and José Luís Oliveira 4th international semantic web applications & tools for life sciences workshop london, united kingdom, december 2012
a semantic web application framework for health systems interoperability Pedro Lopes and José Luís Oliveira international workshop on managing interoperability and complexity in health systems glasgow, scotland, november 2011
towards knowledge federation in biomedical applications Pedro Lopes and José Luís Oliveira 7th international conference on semantic systems graz, austria, october 2011
NEW STRATEGIES TO IMPROVE AND ADD VALUE TO SERVICE COMPOSITION SCENARIOS
workflow-based strategies for service composition resource integration approaches for service composition enhancing service composition with the semantic web and rad
NEW STRATEGIES TO IMPROVE AND ADD VALUE TO SERVICE COMPOSITION SCENARIOS
workflow-based strategies for service composition resource integration approaches for service composition enhancing service composition with the semantic web and rad coeus evaluation
LEGACY DISEASECARD
LEGACY DISEASECARD
rare diseases research portal collection of pointers to disease pages used for academia and clinical research
primitive engineering static integration/navigation protocol constrained data model
evolution imperative to update scientific dataset new features for users and developers
evaluation perfect benchmark for a new coeus instance
CREATING A NEW COEUS INSTANCE
setup new application model re-use existing ontologies improve omim basic model
configure service composition define resource descriptions for connectors specify data selectors
# locus entity
:entity_Locus :isEntityOf :concept_Ensembl, :concept_EntrezGene, :concept_GeneCards, :concept_HGNC, :concept_MapView, :concept_UCSC; :isIncludedIn :seed_Diseasecard4; dc:description "Collects Locus entity knowledge."^^xsd:s dc:title "Locus"^^xsd:string; a :Entity, owl:NamedIndividual; rdfs:label "entity_Locus"^^xsd:string;
# hgnc concept
:concept_HGNC :hasEntity :entity_Locus; :hasResource :resource_HGNC; :isExtendedBy :resource_ClinicalTrials, :resource_ENZYME, autonomous process :resource_Ensembl, :resource_HGNC, pull data from configured resources into semantic knowledge base :resource_KEGG, :resource_MedlinePlus, :resource_UniProt; dc:description "Concept relating HGNC data."^^xsd:string dc:title "HGNC"^^xsd:string; a :Concept, owl:NamedIndividual; rdfs:label "concept_hgnc"^^xsd:string;
build triplestore
deliver knowledge
use apis to create advanced user interfaces use apis to access data
CREATING A NEW COEUS INSTANCE # hgnc connector
setup new application model re-use existing ontologies improve omim basic model
configure service composition define resource descriptions for connectors specify data selectors
:resource_HGNC :endpoint "http://www.genenames.org/cgi-bin/hgnc_downlo +data&col=gd_hgnc_id&col=gd_app_sym&col=gd_app_name&col=gd_pub_chrom_m %27#replace# %27&order_by=gd_app_sym_sort&limit=&format=text&submit=submit&.cgifiel dbtag"^^xsd:string; :extends :concept_HGNC; :extension "rdfs:label"^^xsd:string; :hasKey :csv_HGNC_id; :isResourceOf :concept_HGNC; :loadsFrom :csv_HGNC_id, :csv_HGNC_name; :method "complete"^^xsd:string; :order 3 ; dc:description "Resource connecting gene HGNC informati dc:publisher "csv"^^xsd:string; dc:title "HGNC"^^xsd:string; a :Resource, owl:NamedIndividual; rdfs:label "resource_hgnc"^^xsd:string;
# hgnc identifier selector
:csv_HGNC_id :isKeyOf :resource_HGNC; :loadsFor :resource_HGNC; :property "dc:source|dc:identifier"^^xsd:string; :query "0"^^xsd:string; autonomous process dc:description "Information for HGNC CSV resource loading dc:title "HGNC_id"^^xsd:string; pull data from configured resources into semantic knowledge base a :CSV, owl:NamedIndividual; rdfs:label "csv_hgnc_id"^^xsd:string;
build triplestore
deliver knowledge use apis to create advanced user interfaces use apis to access data
# hgnc name selector
:csv_HGNC_name :loadsFor :resource_HGNC; :property "rdfs:comment|dc:description"^^xsd:string; :query "2"^^xsd:string; dc:description "Information for HGNC CSV resource loadi dc:title "HGNC_name"^^xsd:string; a :CSV, owl:NamedIndividual; rdfs:label "csv_hgnc_name"^^xsd:string;
CREATING A NEW COEUS INSTANCE
setup new application model re-use existing ontologies improve omim basic model
DISEASE
GENE
OMIM
HGNC
0
configure service composition define resource descriptions for connectors specify data selectors
autonomous process
deliver knowledge use apis to create advanced user interfaces use apis to access data
GENE
ONTOLOGY
UniProt
Entrez
HPO
PROTEIN
DRUG
ONTOLOGY
InterPro
DrugBank
MeSH
1
build triplestore pull data from configured resources into semantic knowledge base
PROTEIN
PDB
2
3
PROTEIN
GENE
ONTOLOGY
DRUG
Prosite
Ensembl
UMLS
PharmGKB
LITERATU
Pubmed
CREATING A NEW COEUS INSTANCE # java
pt.ua.bioinformatics.API.getTriple(“coeus:hgnc_BRCA2”, ”p”, ”o”, “xml”)
setup new application model re-use existing ontologies
# rest
http://bioinformatics.ua.pt/coeus/api/triple/coeus:hgnc_BRCA2/p/o/csv
improve omim basic model # sparql federation
PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX diseasome: <http://www4.wiwiss.fu-berlin.de/diseasome/resource/d PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX coeus: <http://bioinformatics.ua.pt/coeus/> SELECT ?pdb ?mesh WHERE {{ define resource descriptions for connectors SERVICE <http://www4.wiwiss.fu-berlin.de/diseasome/sparql> { specify data selectors <http://www4.wiwiss.fu-berlin.de/diseasome/resource/gen } }{ SERVICE <http://bioinformatics.ua.pt/coeus/sparql> { _:gene dc:title ?label . _:gene coeus:isAssociatedTo ?uniprot } autonomous process }{ SERVICE <http://bioinformatics.ua.pt/coeus/sparql> pull data from configured resources into semantic knowledge base { ?uniprot coeus:isAssociatedTo ?pdb . ?pdb coeus:hasConcept coeus:concept_PDB } }{ SERVICE <http://bioinformatics.ua.pt/coeus/sparql> { ?uniprot coeus:isAssociatedTo ?mesh . use apis to create advanced user interfaces ?mesh coeus:hasConcept coeus:concept_MeSH } use apis to access data } }
configure service composition
build triplestore
deliver knowledge
RESULTS
THE NEW DISEASECARD http://bioinformatics.ua.pt/dc4
RESULTS
THE NEW DISEASECARD http://bioinformatics.ua.pt/dc4
easy setup simplified resource integration straightforward client-side application creation
efficient development rapid application development at its best reduced implementation effort compared to similar systems
improved availability available to researchers through web application available to developers through default apis
NEW STRATEGIES TO IMPROVE AND ADD VALUE TO SERVICE COMPOSITION SCENARIOS
workflow-based strategies for service composition resource integration approaches for service composition enhancing service composition with the semantic web and rad
NEW STRATEGIES TO IMPROVE AND ADD VALUE TO SERVICE COMPOSITION SCENARIOS
CONCLUSIONS
1
interoperability
2
integration
3
service composition
new strategies for workflow-based service composition advanced methods to deliver knowledge
innovative integrative approach to describe service composition flexible integration engine to compose heterogeneous resources
pioneering framework for enhanced semantic web-based service composition next-generation strategies for integration and interoperability
NEW STRATEGIES TO IMPROVE AND ADD VALUE TO SERVICE COMPOSITION SCENARIOS
future perspectives
NEW STRATEGIES TO IMPROVE AND ADD VALUE TO SERVICE COMPOSITION SCENARIOS
future perspectives
FUTURE PERSPECTIVES beyond service composition software-as-a-service use is increasing streamlined and lightweight interactions are everywhere
linkeddata and the semantic web semantic web as the foundation for new software engineering strategies linkeddata is a growing knowledge network
worldwide knowledge networks more sophisticated knowledge expression technologies richer, meaningful data are more connected
modern software platforms growing relevance of efficient content delivery one knowledge base, multiple cross-platform & cross-device clients
business value research and enterprise are intertwined coeus use goes beyond science
THANK YOU
SERVICE COMPOSITION FOR BIOMEDICAL APPLICATIONS Pedro Lopes pedrolopes@ua.pt
supervisor José Luís Guimarães Oliveira universidade de aveiro
doctoral programme in informatics engineering october 1st, 2012
jury Artur Manuel Soares da Silva universidade de aveiro Víctor Maojo García universidade politécnica de madrid Rui Pedro Sanches de Castro Lopes escola superior de tecnologia e gestão do instituto politécnico de bragança Francisco José Moreira Couto universidade de lisboa Carlos Manuel Azevedo Costa universidade de aveiro