WAVe Semantic Web The Meaning of Life
No pro mises this on on e!!
PEDRO LOPES pedrolopes@ua.pt University of Leicester October 8th, 2010
PAST
PRESENT
It’ s
a“
flu Yes! xc ap ac
ito
FUTURE
SemanticWeb
r”!
DYNAMIC, WEB-BASED, WORKFLOW SERVICE COMPOSITION
http://bioinformatics.ua.pt/diseasecard
‣ FUTURE • Enhance User Experience • Improve Performance • Improve Structure & Organization • Add Semantic Layer
BRIDGING THE GAP BETWEEN GENOMICS AND MEDICINE
PAST
PRESENT
FUTURE
SemanticWeb
WHAT IS WAVe?
WEB ANALYSIS OF THE VARIOME
?
Enable agile access to integrated & enriched human variome research datasets
!
Genes * [LSDBs + Variants + Original Resources]
â˜ş
An extensible lightweight integration & enrichment platform for genomic variation datasets
http://bioinformatics.ua.pt/WAVe
HIGHLIGHT | RESOURCES ‣ LSDB • LOVD + MUTbase + UMD + misc legacy
‣ GENE • GeneCards + GeneNames + Entrez
‣ PHARMACOGENOMICS • PharmGKB
‣ LOCUS • MapViewer + Ensembl
‣ PUBLICATION
‣ PATHWAY
‣ DISEASE
‣ PROTEIN
~ 1350 Genes, 1550 LSDBs, 80k Variants, 100k Links ! • QuExT • KEGG + Reactome • OMIM
• UniProt + PDB + Expasy + InterPro
‣ GENE ONTOLOGY • AmiGO
HIGHLIGHT | FEATURES ‣ GENE SEARCH • Direct access to genes ‣ Auto-suggest engine
‣ API • RSS/XML access to data ‣ Usable in any framework • Genes
‣ GENE ANALYSIS WORKSPACE • Navigation tree ‣ Holistic perspective on all data • “Live view” mode ‣ Shows original applications/content
‣ Access navigation tree data ‣ Google Chrome Extension • Variants ‣ Only platform that aggregates variants from multiple sources
FUTURE PERSPECTIVES ‣ USER INTERACTION • Global search + augmented browsing + tabbed browsing + custom profiles
‣ RESOURCES • All genes • Café RouGE + LOVD-worldwide • dbSNP + HapMap + 1000 Genomes • LRG...
‣ SEMANTIC WEB • Meaningful relationships ‣ Gene to Protein ≠ Gene to Variant ≠ Gene to Pathway ≠ ...
PAST
PRESENT
FUTURE
SemanticWeb
HYPE
WHAT? ‣ Semantic Web is the Web of Knowledge ‣ It is about standards for publishing, sharing and querying knowledge drawn from distributed and heterogeneous resources ‣ It enables the answering of sophisticated questions
OK... BUT WHAT DO WE NEED TO DO?
FREE TEXT
The Eiffel Tower (French: La Tour Eiffel, [tuʁ ɛfɛl], nickname La dame de fer, the iron lady) is an 1889 iron lattice tower located on the Champ de Mars in Paris that has become both a global icon of France and one of the most recognizable structures in the world. The tallest building in Paris,[10] it is the most-visited paid monument i the world; millions of people ascend it every year. Named for its designer, engineer Gustave Eiffel, the tower was built as the entrance arch t the 1889 World's Fair. The tower stands 324 metres (1,063 ft) tall, abo the same height as an 81-storey building. It was the tallest man-made structure in the world from its completion until the Chrysler Building in New York City was built in 1930. Not including broadcast antennas, it is the second-tallest structure in France after the 2004 Millau Viaduc The tower has three levels for visitors. Tickets can be purchased to ascend, by stairs or lift, to the first and second levels. The walk to the first level is over 300 steps, as is the walk from the first to the second level. The third and highest level is accessible only by elevator. Both the firs and second levels feature restaurants.
STRUCTURED TEXT
Name: Eiffel Tower, La Tour Eiffel Location: Paris, France, Architect: Stephen Sauvestre Height: 324m ...
RELATIONAL MODEL
NAME
LOCATION
HEIGHT
Eiffel Tower
Paris, France
324m
...
...
...
SEMANTIC WEB
324 m
La Tour Eiffel sameAs
hasHeight
Eiffel Tower isLocatedAt hasArchitect Paris, France Stephen Sauvestre
EXPRESSING MEANING ‣ TRIPLES • Everything (really everything!) can be described as a statement based on a triple (or combination of statements)
‣ EXAMPLES • Liverpool is a sport club • James Cameron directed Avatar • Protein P05067 is located in Membrane
‣ SUBJECT PREDICATE OBJECT • Building and connecting statements creates knowledge
ENABLING KNOWLEDGE
Amyloid precursor protein
Alzheimer
label
uniprot:P05067 is a
Protein
label
involved
omim:104300 is a
Disease
OWL ‣ DEFINE RELATIONS ‣ Web Ontology Language • Define complex concept environments • Individual + Property assertion = Axiom • “Object-Oriented” ‣ Classes ‣ Properties ‣ Instances
FOAF Friend-Of-A-Friend
RDF
‣ STATEMENT STORAGE ‣ Resource Description Framework • Store data as triples ‣ File formats • RDF/XML • N3 • Turtle
‣ Relational database • Quite heavy and not easy to deal with ‣ Text files must be read (and parsed) (and cached)
SPARQL ‣ ASK QUESTIONS ‣ SPARQL Protocol and RDF Query Language • Query data stored in RDF • SQL’s “younger brother” • Features ‣ Ambiguous ‣ Multiple variables
SEMANTIC WEB RICHNESS ‣ CLIENT SIDE • User Interfaces
‣ SERVER SIDE • Ontology
‣ Semantically rich applications
• Semantically rich resources
‣ Meaningful results
• Meaningful relationships
‣ Context
• Reasoning
‣ Enrich text
• Context-aware • Artificial Intelligence
• Information Visualization
• Augmented browsing
• Linked Data • Intelligent resource networks
From server side semantic richness to client side interfaces
?
CLIENT SIDE Cardiac ... ECG
From simple result listings to semantically rich interfaces
!
DEMO SEMANTICALLY RICH INTERFACE
SERVER SIDE
Composition Tim Berners-Lee DBPedia
RDF OWL
Federation Endpoint
SPARQL
Query
Knowledge
FOAF
SADI
Integration Identity Triplestore Ontology
Mashup
Linked Data
Mapping XML
D2R Text
Network
Bio2RDF
FEDERATED QUERYING ‣ ONE QUERY, MULTIPLE INSTANCES • Connect distinct resources ‣ Cross information
1
‣ Merge datasets
2
‣ CHALLENGES • How to query so many distinct resources?
3
• How to map results?
‣ SOLUTIONS • SPARQL querying
...
• Ontology mapping
n
FEDERATED QUERYING IN GEN2PHEN ‣ MULTIPLE LSDBs
‣ MULTIPLE MOLGENIS
• Get data from distinct LOVD instances
CHINA
• Connect data models distributed in multiple MOLGENIS instances
PHENO
AUSTRALIA
VARIO
FRANCE
PAGE
... UK
...
HGVbaseG2P
ADVANTAGES ‣ DATA ACCESS • Direct ‣ No need for wrappers or mediators ‣ No need for data mappings or transformations • Homogeneous ‣ Results are retrieved as XML/JSON • Coherent • Easy to parse/browse • Client-side ready
SWAT4LS?
‣ DATA MODELS • Semantic, not relational ‣ Ontology ‣ No need for direct connections • INNER JOIN
• Reasoning ‣ Ask questions ‣ Process answers
DEMO FEDERATED QUERIES
THE MEANING OF LIFE
‣ COEUS • One of the 12 Titans ‣ Greek deities • Titan of Wisdom, Intelligence, Knowledge
THE CORE OF MY PHD WORK
SEMANTIC CONCEPT MANAGEMENT FRAMEWORK
Oral ...
DC4
Marker
WAVe
COEUS
http://bioinformatics.ua.pt
ME pedrolopes@ua.pt (or pl97@le.ac.uk)
http://pedrolopes.net
/pedrolopes
/pdrlps
QUESTIONS? THANK YOU!