IBM Watson: Under the Hood Module

Page 1

Watson Under the Hood: Natural Language Processing


Content •  Why do you need NLP to answer modern business questions?

•  What types of words and terms does Watson understand?

•  What do you mean when you say, “Watson Understands Natural Language?”

•  How does an annotator work?

•  What is Natural Language Processing (NLP) and how does it work? –  Entity Recognition – Dictionaries –  Thesauri –  Annotators –  Ontologies •  How does Watson’s NLP work?

© 2015 International Business Machines Corporation

•  What is an ontology? •  How is Watson differentiated from other forms of NLP? •  Give me an example of Watson understanding language nuance. •  Does Watson actually read the entire corpus each time I ask a question?

2


Meaningful insights are only gained when data reveals a universe of relationships Adverse Event Data

80%

of the world’s information is buried in unstructured text.

Internal Experiment Notes and Results Clinical Trial Reports

Pharmacology Medical Text Books and Guidelines

Toxicology Report

Without a way to read and understand this data in a way that answers questions and identifies new possibilities many opportunities are lost. Š 2015 International Business Machines Corporation

3


What do you mean when you say “Watson understands natural language”? Watson understands language the way we speak it. More than term search, Watson understands all the parts of a sentence so it can find evidence related to our question instead of just looking for word matches. “Jack Welch was a master painter in the corporate world. Under his brilliant leadership, General Electric enjoyed some of it’s most profitable periods.”

Who Was Jack Welch?

A.

A master painter

B.

A brilliant leader

C.

CEO of GE

Why does a cognitive system know this is the right answer? © 2015 International Business Machines Corporation

4


How does NLP really work? Natural language processing is a series of technologies that together enable a solution like Watson to read unstructured data in the form of free text and understand not only the nouns and relevant entities but also the interrelating verbs and adjectives which allow it to comprehend contextual meaning. Natural Language Processing is a complex multi technology process that requires:

1

2

3

4

Entity ecognition

Synonyms

Ontalogies

Annotators

Dictionaries

Thesauri

Entity relationships

Extracting the relevant terms and their relating words

Š 2015 International Business Machines Corporation

5


To make connections one must unlock the meaning of language

Š 2015 International Business Machines Corporation

6


Unlocking the language: Dictionaries/Thesauri

Crystal Methamphetamine

Diagram

Formula

Street Names (50+)

FDA UNII

C16H15N

Crystal, blue meth, ice, Tina, glass, Poor Man’s cocaine, etc.

Crystal, blue meth, ice, Tina, glass, Poor Man’s cocaine, etc.

© 2015 International Business Machines Corporation

7


Unlocking the language: Annotators Crystal Methamphetamine Recognizes proper nouns Walter White = Person Recognizes proper nouns Jones Avenue = Location Witnesses reported seeing Walter White crossing Jones Avenue with a second male called Johnny asking to buy some Crank.

Recognizes prepositions With = Triggers relationship between entities Understands proper nouns Johnny = Person Understands verbs Buy = verb trigger recognition of subject and object Understands proper nouns and all synonyms Crank = noun, drug, alias for crystal methamphetamine

Š 2015 International Business Machines Corporation

8


Unlocking the language: Knowledge Graph

Crystal Methamphetamine

Johnny

Drugs

Walter White

Crank

Associate

Location

Š 2015 International Business Machines Corporation

Jones Ave

9


How does Watson understand Natural Language? Real language understanding is a competency IBM has been building for 50 years. Using a combination of annotators, Watson breaks down natural language understanding what it means with the following methods:

Words Their definition (dictionaries)

Synonyms All the alternative ways something can be named

Grammar and sentence structure Understanding the components of a sentence •  Nouns •  Verbs

•  Objects •  Pronouns

© 2015 International Business Machines Corporation

•  Adjectives •  Adverbs

TOKENIZATION

PARSING

10


What types of terms and words does Watson understand? Beyond words, Watson understands chemical diagrams, formulas.

Diagram

Formula

Names (149)

Chemical ID

C16H13CIN2O

Valium, Dizapam Alboral, Aliseum, Alupram, Amiprol, Asiolin, Ansiolisina Apaurin, Apoepam, etc.

CAS# 439-14-5

Watson is supplied with domain specific dictionaries so it understands the language of each industry.

Š 2015 International Business Machines Corporation

11


How does an annotator work? Beyond understanding words, their meaning, the synonyms and the various types of words, sentence structure or the way the components of grammar are put together is part of Watson annotation. ERK2 Extracts Entities ERK2 = Protein, P53 = Protein, Thr55 = Amino Acid

phosphorylates …doxorubicin results in extracellular signal-regulated kinase (ERK)2 activation, which in turn phosphorylates p53 on a previously uncharacterized site, Thr55…

Extracts Verb !  Maps to domain of Post Translational Modification !  Recognizes subject/object relationships

p53 Extracts Entities ERK2 = Protein, P53 = Protein, Thr55 = Amino Acid

on Extracts Preposition Recognizes preposition location on Thr55

Thr55 Extracts Entities ERK2 = Protein, P53 = Protein, Thr55 = Amino Acid

© 2015 International Business Machines Corporation

12


What are ontologies and what role do they play in answering my questions? Ontologies: The relationship between any entity and other scientific domains

Symptoms

Fever

Headache

Chronic pain

Arthritis pain

Drug class

Adverse Effects

AntiInflammatory

GI pain

Aspirin

Antiplatelet

GI bleeding

Illustrative Example

NSAID

Nausea

Analgesic

Gastritis Indications

Reduce MI

Š 2015 International Business Machines Corporation

Reduce stroke

Reduce fever

Reduce pain

13


How does Cognitive Computing answer this question with data from a newspaper?

a tent News Article

“Where was Al-XYZ born?”

“Al-XYZ was born in a tent outside of JHI city on June 7, 1942.”

June 7, 1942

Sirte

© 2015 International Business Machines Corporation

14


Subject

Verb

“Al-XYZ was born in a tent outside of JHI city on June 7, 1942.”

© 2015 International Business Machines Corporation

Preposition Location

15


Cognitive computing delivers different results vs. today’s applications Search Engine*

Cognitive System

Who discovered black holes?

Who discovered black holes?

In 1915, Einstein's theory of general relativity predicted the existence of black holes Hubblesite.org

Q&A about the history of scientist theories

Are black holes real? Skyandtelescope.com

Story about whether black Holes exist

Black Holes History – Amazing Space amazing-space.stsci.edu

Story about All the steps to discovering black holes

Black Hole – Wikipedia, the free encyclopedia Wikipedia.org

Enyclopedic Definition of black holes

Reads 100,000 newspapers

Reads all of Wikipedia

Reads 10,000 pages analyst notes

Reads 1,000 pages of Witness Interviews

*Search on 1/16/2015

© 2015 International Business Machines Corporation

16


NLP – How does Watson’s rich vocabulary deliver superior results? Term Recognition enables Watson to recognize all forms of an entity. In this Google search, only documents containing the word valium are retrieved. To get all the literature on valium, you would have to look up all 149 names, the chemical ID and the chemical diagram (which can’t be done here) and then compile all those searches.

© 2015 International Business Machines Corporation

17


Does Watson actually read millions of pages of information each time I ask a question? No. Metadata and indexing allow Watson to pull all the relevant evidence for a query into analysis. A Knowledge Graph maps every relationship to the identified topic, so that evidence can be thoroughly processed and accurately analyzed.

Watson reads a page

Makes an annotation

Indexes information

Metadata and indexing enable Watson to quickly pull the most relevant content to inquiries but are the most prevalent items dictated by what the KG reveals is related. Š 2015 International Business Machines Corporation

18


Extracted entities populate the knowledge graph Think of the nodes as entities and the connections as weighted, directional, and contextual relationships between entities. Establishing these connections requires Watson to understand all relational and topic oriented language in a corpus and to reason between relationships to deliver accurate representative answers.

Š 2015 International Business Machines Corporation

19


One example of how Watson answers a question Watson leverages multiple algorithms to perform deeper analysis.

Stronger evidence can be much harder to find and score… •  Search far and wide •  Explore many hypotheses © 2015 International Business Machines Corporation

•  Find judge evidence •  Many inference algorithms 20


One example of how Watson answers a question Watson leverages multiple algorithms to perform deeper analysis.

Stronger evidence can be much harder to find and score… •  Search far and wide •  Explore many hypotheses •  Find judge evidence •  Many inference algorithms

© 2015 International Business Machines Corporation

21


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.