Watson Under the Hood: Natural Language Processing
Content • Why do you need NLP to answer modern business questions?
• What types of words and terms does Watson understand?
• What do you mean when you say, “Watson Understands Natural Language?”
• How does an annotator work?
• What is Natural Language Processing (NLP) and how does it work? – Entity Recognition – Dictionaries – Thesauri – Annotators – Ontologies • How does Watson’s NLP work?
© 2015 International Business Machines Corporation
• What is an ontology? • How is Watson differentiated from other forms of NLP? • Give me an example of Watson understanding language nuance. • Does Watson actually read the entire corpus each time I ask a question?
2
Meaningful insights are only gained when data reveals a universe of relationships Adverse Event Data
80%
of the world’s information is buried in unstructured text.
Internal Experiment Notes and Results Clinical Trial Reports
Pharmacology Medical Text Books and Guidelines
Toxicology Report
Without a way to read and understand this data in a way that answers questions and identifies new possibilities many opportunities are lost. Š 2015 International Business Machines Corporation
3
What do you mean when you say “Watson understands natural language”? Watson understands language the way we speak it. More than term search, Watson understands all the parts of a sentence so it can find evidence related to our question instead of just looking for word matches. “Jack Welch was a master painter in the corporate world. Under his brilliant leadership, General Electric enjoyed some of it’s most profitable periods.”
Who Was Jack Welch?
A.
A master painter
B.
A brilliant leader
C.
CEO of GE
Why does a cognitive system know this is the right answer? © 2015 International Business Machines Corporation
4
How does NLP really work? Natural language processing is a series of technologies that together enable a solution like Watson to read unstructured data in the form of free text and understand not only the nouns and relevant entities but also the interrelating verbs and adjectives which allow it to comprehend contextual meaning. Natural Language Processing is a complex multi technology process that requires:
1
2
3
4
Entity ecognition
Synonyms
Ontalogies
Annotators
Dictionaries
Thesauri
Entity relationships
Extracting the relevant terms and their relating words
Š 2015 International Business Machines Corporation
5
To make connections one must unlock the meaning of language
Š 2015 International Business Machines Corporation
6
Unlocking the language: Dictionaries/Thesauri
Crystal Methamphetamine
Diagram
Formula
Street Names (50+)
FDA UNII
C16H15N
Crystal, blue meth, ice, Tina, glass, Poor Man’s cocaine, etc.
Crystal, blue meth, ice, Tina, glass, Poor Man’s cocaine, etc.
© 2015 International Business Machines Corporation
7
Unlocking the language: Annotators Crystal Methamphetamine Recognizes proper nouns Walter White = Person Recognizes proper nouns Jones Avenue = Location Witnesses reported seeing Walter White crossing Jones Avenue with a second male called Johnny asking to buy some Crank.
Recognizes prepositions With = Triggers relationship between entities Understands proper nouns Johnny = Person Understands verbs Buy = verb trigger recognition of subject and object Understands proper nouns and all synonyms Crank = noun, drug, alias for crystal methamphetamine
Š 2015 International Business Machines Corporation
8
Unlocking the language: Knowledge Graph
Crystal Methamphetamine
Johnny
Drugs
Walter White
Crank
Associate
Location
Š 2015 International Business Machines Corporation
Jones Ave
9
How does Watson understand Natural Language? Real language understanding is a competency IBM has been building for 50 years. Using a combination of annotators, Watson breaks down natural language understanding what it means with the following methods:
Words Their definition (dictionaries)
Synonyms All the alternative ways something can be named
Grammar and sentence structure Understanding the components of a sentence • Nouns • Verbs
• Objects • Pronouns
© 2015 International Business Machines Corporation
• Adjectives • Adverbs
TOKENIZATION
PARSING
10
What types of terms and words does Watson understand? Beyond words, Watson understands chemical diagrams, formulas.
Diagram
Formula
Names (149)
Chemical ID
C16H13CIN2O
Valium, Dizapam Alboral, Aliseum, Alupram, Amiprol, Asiolin, Ansiolisina Apaurin, Apoepam, etc.
CAS# 439-14-5
Watson is supplied with domain specific dictionaries so it understands the language of each industry.
Š 2015 International Business Machines Corporation
11
How does an annotator work? Beyond understanding words, their meaning, the synonyms and the various types of words, sentence structure or the way the components of grammar are put together is part of Watson annotation. ERK2 Extracts Entities ERK2 = Protein, P53 = Protein, Thr55 = Amino Acid
phosphorylates …doxorubicin results in extracellular signal-regulated kinase (ERK)2 activation, which in turn phosphorylates p53 on a previously uncharacterized site, Thr55…
Extracts Verb ! Maps to domain of Post Translational Modification ! Recognizes subject/object relationships
p53 Extracts Entities ERK2 = Protein, P53 = Protein, Thr55 = Amino Acid
on Extracts Preposition Recognizes preposition location on Thr55
Thr55 Extracts Entities ERK2 = Protein, P53 = Protein, Thr55 = Amino Acid
© 2015 International Business Machines Corporation
12
What are ontologies and what role do they play in answering my questions? Ontologies: The relationship between any entity and other scientific domains
Symptoms
Fever
Headache
Chronic pain
Arthritis pain
Drug class
Adverse Effects
AntiInflammatory
GI pain
Aspirin
Antiplatelet
GI bleeding
Illustrative Example
NSAID
Nausea
Analgesic
Gastritis Indications
Reduce MI
Š 2015 International Business Machines Corporation
Reduce stroke
Reduce fever
Reduce pain
13
How does Cognitive Computing answer this question with data from a newspaper?
a tent News Article
“Where was Al-XYZ born?”
“Al-XYZ was born in a tent outside of JHI city on June 7, 1942.”
June 7, 1942
Sirte
© 2015 International Business Machines Corporation
14
Subject
Verb
“Al-XYZ was born in a tent outside of JHI city on June 7, 1942.”
© 2015 International Business Machines Corporation
Preposition Location
15
Cognitive computing delivers different results vs. today’s applications Search Engine*
Cognitive System
Who discovered black holes?
Who discovered black holes?
In 1915, Einstein's theory of general relativity predicted the existence of black holes Hubblesite.org
Q&A about the history of scientist theories
Are black holes real? Skyandtelescope.com
Story about whether black Holes exist
Black Holes History – Amazing Space amazing-space.stsci.edu
Story about All the steps to discovering black holes
Black Hole – Wikipedia, the free encyclopedia Wikipedia.org
Enyclopedic Definition of black holes
Reads 100,000 newspapers
Reads all of Wikipedia
Reads 10,000 pages analyst notes
Reads 1,000 pages of Witness Interviews
*Search on 1/16/2015
© 2015 International Business Machines Corporation
16
NLP – How does Watson’s rich vocabulary deliver superior results? Term Recognition enables Watson to recognize all forms of an entity. In this Google search, only documents containing the word valium are retrieved. To get all the literature on valium, you would have to look up all 149 names, the chemical ID and the chemical diagram (which can’t be done here) and then compile all those searches.
© 2015 International Business Machines Corporation
17
Does Watson actually read millions of pages of information each time I ask a question? No. Metadata and indexing allow Watson to pull all the relevant evidence for a query into analysis. A Knowledge Graph maps every relationship to the identified topic, so that evidence can be thoroughly processed and accurately analyzed.
Watson reads a page
Makes an annotation
Indexes information
Metadata and indexing enable Watson to quickly pull the most relevant content to inquiries but are the most prevalent items dictated by what the KG reveals is related. Š 2015 International Business Machines Corporation
18
Extracted entities populate the knowledge graph Think of the nodes as entities and the connections as weighted, directional, and contextual relationships between entities. Establishing these connections requires Watson to understand all relational and topic oriented language in a corpus and to reason between relationships to deliver accurate representative answers.
Š 2015 International Business Machines Corporation
19
One example of how Watson answers a question Watson leverages multiple algorithms to perform deeper analysis.
Stronger evidence can be much harder to find and score… • Search far and wide • Explore many hypotheses © 2015 International Business Machines Corporation
• Find judge evidence • Many inference algorithms 20
One example of how Watson answers a question Watson leverages multiple algorithms to perform deeper analysis.
Stronger evidence can be much harder to find and score… • Search far and wide • Explore many hypotheses • Find judge evidence • Many inference algorithms
© 2015 International Business Machines Corporation
21