Jadt seth grimes textanalyticsapplied 2014june5

Page 1

Text Analytics Past, Present & Text Analytics Past Present & Future: An Industry View Seth Grimes Alta Plana Corporation @ @sethgrimes g June 5 2014 June 5, 2014


Text Analytics: An Industry View

2

JADT – June 5, 2014


Text Analytics: An Industry View

3

Analytics is the systematic application of algorithmic methods that derive and deliver information, typically expressed quantitatively, whether in the form of tit ti l h th i th f f indicators, tables, visualizations, or models. • Systematic means formal & repeatable. • Algorithmic contrasts with heuristic.

JADT – June 5, 2014


Text Analytics: An Industry View

4

Text analytics past: Pioneers…

JADT – June 5, 2014


Document input and processing

Knowledge handling is g key

Hans Peter Luhn A Business Intelligence System “A Business Intelligence System” IBM Journal, October 1958

Desk Set (1957): Computer engineer Richard Sumner (Spencer Tracy) and television network librarian Bunny Watson (Katherine Hepburn) and the "electronic brain" EMERAC.


Text Analytics: An Industry View

6

“Statistical information derived from word frequency and distribution is p a relative measure off significance, g f , first f for f used byy the machine to compute individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the auto-abstract.” H.P. Luhn, h The Th Automatic A C Creation off Literature Abstracts, Ab IBM B Journal,l 1958. 195 JADT – June 5, 2014





Text Analytics: An Industry View

10

Pipelines and patterns IBM’s MedTAKMI, 997 1997‐

http://www.research.ibm.com/trl/projects/textmining/index_e.htm JADT – June 5, 2014


Text Analytics: An Industry View

11

Exhaustive extraction An (old) Attensity example – NLP to identify roles and p , pp relationships, for a law‐enforcement application .

JADT – June 5, 2014


Text Analytics: An Industry View

12

Language engineering GATE: General Architecture for Text Engineering.

http://gate.ac.uk/ JADT – June 5, 2014


Text Analytics: An Industry View

13

Text analytics present: Business, technology, applications, and , gy, pp , solutions…

JADT – June 5, 2014


Text Analytics: An Industry View

14

“Organizations embracing text analytics all report having an epiphany moment when they suddenly knew more than before.” ‐‐ Philip Russom, the Data Warehousing Institute, 2007 http://tdwi.org/articles/2007/05/09‐what‐works/bi‐search‐and‐text‐analytics.aspx

JADT – June 5, 2014


Text Analytics: An Industry View

15

Linguistics statistics and semantics Linguistics, statistics, and semantics Text analytics (typically) involves linguistic modelling, statistical characterization, learned patterns, and semantic understanding of text‐derived features – Named entities: people, companies, places, etc. P tt Pattern‐based features: e‐mail addresses, phone numbers, b d f t il dd h b etc. p Concepts: abstractions of entities. Facts and relationships. Events. Concrete and abstract attributes (e.g., “expensive” & “comfortable”) including measure‐value pairs. Subjectivity in the forms of opinions sentiments and Subjectivity in the forms of opinions, sentiments, and emotions: attitudinal data.

– applied to business ends. pp JADT – June 5, 2014


Text Analytics: An Industry View

16

Sources It’s a truism that 80% of enterprise‐relevant information g originates in “unstructured” form: E‐mail and messages. Web pages, online news & blogs, forum postings, and other social media. i l di Contact‐center notes and transcripts. Surveys feedback forms warranty claims Surveys, feedback forms, warranty claims. Scientific literature, books, legal documents. ...

Non‐text “unstructured” content? Images Audio including speech Video

Value derives from patterns. JADT – June 5, 2014


Text Analytics: An Industry View

17

Value What do we do with text, whether online, on‐social, or in p the enterprise? 1. Post/Publish, Manage, and Archive. 2. Index and Search. 3. Categorize and Classify according to metadata & contents. 4. Extract information and Analyze.

JADT – June 5, 2014


Text Analytics: An Industry View

18

Semantics analytics and IR Semantics, analytics, and IR Text analytics generates semantics to bridge search, BI, and pp , g g applications, enabling next‐generation information systems. Semantic search (search + text) Search based applications (search + text + apps) Text analytics (inner circle)

Information access (search + analytics)

Search

BI/Big BI/Bi Data

Applica‐ tions

Synthesis (text + BI)/(big data) NextGen CRM, EFM, MR, marketing, apps apps… JADT – June 5, 2014


Text Analytics: An Industry View

19

Content, composites, connections 1

JADT – June 5, 2014


Text Analytics: An Industry View

Content, Composites, Connections, 2

20

Content, composites, connections 2

JADT – June 5, 2014


Text Analytics: An Industry View

21

Applications Text analytics has applications in: Intelligence & law enforcement. Life sciences & clinical medicine. Media & publishing including social‐media analysis and p g g y contextual advertizing. Competitive intelligence. Voice of the Customer: CRM, product management & marketing. Public administration & policy. Public administration & policy Legal, tax & regulatory (LTR) including compliance. Recruiting. g

JADT – June 5, 2014


Text Analytics: An Industry View

22

Opinion, sentiment & emotion

JADT – June 5, 2014


Text Analytics: An Industry View

23

Sentiment analysis A specialization, of relevance to: Brand/reputation management. Customer experience management (CEM). Competitive intelligence. p g Survey analysis (EFM = Enterprise Feedback Management). Market research. Product design/quality. Trend spotting.

JADT – June 5, 2014


Text Analytics: An Industry View

24

Data exploration via dashboards and workbenches.

JADT – June 5, 2014


Text Analytics: An Industry View

25

Text analytics present: The market…

JADT – June 5, 2014


Text Analytics: An Industry View

26

http://altaplana.com/TA2014 JADT – June 5, 2014


Text Analytics: An Industry View

27

What are your primary applications where text comes into play? Voice of the Customer / Customer Experience Management

39%

Research (not listed)

38%

Brand/product/reputation management

38%

Competitive intelligence

33%

Search, information access, or Question Answering

29%

Customer /CRM

27%

Content management or publishing

25%

Online commerce including shopping, price intelligence,…

16%

Life sciences or clinical medicine

15%

E-discovery

14%

Insurance, risk management, or fraud

13%

Other

11%

Product/service design, quality assurance, or warranty claims

10%

Financial services/capital markets

9%

Intellectual property/patent analysis

8%

Law enforcement

6%

Military/national security/intelligence

5% 0%

5%

10%

15%

20%

25%

30%

35%

40%

JADT – June 5, 2014

45%


Text Analytics: An Industry View

28

Voice of the Customer Text analytics is applied to improve customer service and y y boost satisfaction and loyalty. Analyze customer interactions and opinions – • E‐mail, contact‐center notes, survey responses. • Forum & blog posting and other social media.

– to – • Address customer product & service issues. Address customer product & service issues • Improve quality. g p • Manage brand & reputation.

Assessment of qualitative information from text helps users – • • • •

Gain feedback on interactions. Assess customer value. Understand root causes. Mine data for measures such as churn likelihood Mine data for measures such as churn likelihood. JADT – June 5, 2014


Text Analytics: An Industry View

29

The commercial scene

JADT – June 5, 2014


Text Analytics: An Industry View

30

Online commerce Text analytics is applied for marketing, search optimization, p g competitive intelligence. Analyze social media and enterprise feedback to understand the Voice of the Market: • Opportunities • Threats • Trends

Categorize product and service offerings for on‐site search and faceted navigation and to enrich content delivery. Annotate pages to enhance Web‐search findability, ranking. Scrape competitor sites for offers and pricing. Analyze social and news media for competitive information.

JADT – June 5, 2014


Text Analytics: An Industry View

31

E Discovery and compliance E‐Discovery and compliance Text analytics is applied for compliance, fraud and risk, and y e‐discovery. Regulatory mandates and corporate practices dictate – • Monitoring corporate communications • Managing electronic stored information for production in event of litigation

Sources include e mail (!!), news, social media Sources include e‐mail (!!) news social media Risk avoidance and fraud detection are key to effective decision making • Text analytics mines critical data from unstructured sources • Integrated text‐transactional analytics provides rich insights

JADT – June 5, 2014


Text Analytics: An Industry View

32

What textual information are you analyzing or do you plan to analyze? 61%

blogs (long form+micro) 42%

news articles 38%

comments on blogs and articles bl d i l

37%

customer/market surveys

36%

on‐line forums on line forums 32%

Facebook postings scientific or technical literature

31%

online reviews

31%

2014

26%

e‐mail and correspondence

2011

22%

contact‐center notes or transcripts employee surveys

20%

chat

20%

2009

19%

social media not listed above

16%

Web‐site feedback 0%

20%

40%

60%

80%

JADT – June 5, 2014


Text Analytics: An Industry View

33

What textual information are you analyzing or do you plan to analyze? Twitter, Sina Weibo, or other microblogs blogs (long form) including Tumblr news articles comments on blogs and articles customer/market surveys on‐line forums Facebook postings scientific or technical literature online reviews e‐mail and correspondence contact‐center notes or transcripts employee surveys chat social media not listed above i l di t li t d b Web‐site feedback medical records text messages/instant messages/SMS other patent/IP filings speech or other audio field/intelligence reports crime, legal, or judicial reports or evidentiary materials h h h h l photographs or other graphical images warranty claims/documentation video or animated images point‐of‐service notes or transcripts insurance claims or underwriting notes

46% 43% 42% 38% 37% 36% 32% 31% 331% 26% 22% 20% 20% 19% 16% 13% 12% 12% 12% 11% 11% 9% 7% 5% 5% 5% 5%

0%

5%

10% 15% 20% 25% 30% 35% 40% 45% 50% JADT – June 5, 2014


Text Analytics: An Industry View

34

Do you currently need (or expect to need) to extract or analyze... Topics and themes

Current; 66%

Sentiment, opinions, attitudes, emotions, …

Expect; 22%

Current; 54%

Relationships and/or facts p /

Expect; 28% Expect; p 33%

Current; 47%

Named entities – people, companies, …

Current; 56%

Concepts that is abstract groups of entities Concepts, that is, abstract groups of entities

Expect; 25%

C Current; t 51%

Metadata such as document author, … Other entities – phone numbers, part/product … Semantic annotations

Current; 47% Current; 34%

Expect; 24%

Current; 33% 0%

10%

20%

Expect; 23% Expect; 23%

Current; 31%

Events

E Expect; t 28%

Expect; 21% 30%

40%

50%

60%

70%

80%

90% 100%

JADT – June 5, 2014


Text Analytics: An Industry View

35

“The share rise in users “Th h i i who selected Arabic coincided with Arabic…coincided with much of the civil unrest… in Middle Eastern countries.” http://bits.blogs.nytimes.com/2014/03/09/the ‐languages‐of‐twitter‐users/ g g f /

JADT – June 5, 2014


Text Analytics: An Industry View

36

Non-English language support? Other

9%

Other European or Slavic/Cyrillic

5%

Other Ot e East ast Asian sa

2%1%

0% 2%

C Current t

Other Arabic script (including Urdu,… 3% 1% Other African Turkish or Turkic

Within 2 years

2% 0% 3%

4%

Spanish

38%

Scandinavian or Baltic

7%

Russian Polish

3%

8%

Portuguese

21% 13%

3%

Korean

8%

7%

15%

Italian

18%

Hindi, Urdu, Bengali, Punjabi, or…2% Greek

17%

4%

4%

Japanese

11%

10%

2% 2%

German

34%

French

24%

36%

Dutch

9%

Chinese Bahasa Indonesia or Malay

20%

17%

7% 16%

28%

1% 3%

Arabic

10%

0%

17%

10%

20%

30%

40%

50%

60%

JADT – June 5, 2014


Text Analytics: An Industry View

37

Software & platform options Text‐analytics options may be grouped in general classes. • Installed text Installed text‐analysis application, whether desktop or analysis application, whether desktop or server or deployed in‐database. • Data mining workbench. • Hosted. • Programming tool. • As‐a‐service, via an application programming interface (API). • Code library or component of a business/vertical application, for instance for CRM, e‐discovery, search.

Text analytics is frequently embedded in search or other y q y end‐user applications. The slides that follow next will present leading options in each category except Hosted… JADT – June 5, 2014


Text Analytics: An Industry View

38

What is important in a solution? 64%

ability to generate categories or taxonomies 54%

ability to use specialized dictionaries, taxonomies, ontologies, or …

% 53%

broad information extraction capability

53%

document classification 45%

deep sentiment/emotion/opinion/intent extraction

44% %

low cost

43%

"real time" capabilities

41%

sentiment scoring

40%

support for multiple languages 37%

open source

2014 (n=139) 2011 (n=136)

36%

predictive‐analytics integration

2009 (n=78) 9( 7 )

33%

big data capabilities, e.g., via Hadoop/MapReduce

33%

ability to create custom workflows or to create or change …

32%

BI (business intelligence) integration

30%

sector adaptation (e.g., hospitality, insurance, retail, health care, …

28%

supports data fusion / unified analytics

25%

hosted or Web service (on‐demand "API") option

22%

media monitoring/analysis interface 0%

10%

20%

30%

40%

50%

60%

JADT – June 5, 2014

70%


Text Analytics: An Industry View

39

User decision criteria Primary considerations include – Adaptation or specialization: To a business or cultural domain, Adaptation or specialization: To a business or cultural domain language, information type (e.g., text, speech, images) & source (e.g., Twitter, e‐mail, online news). By‐user customization possibilities: For instance, via custom taxonomies, rules, lexicons. S ti Sentiment resolution: Aggregate, message, or feature level. t l ti Agg g t g f t l l (What features? Topics, coreferenced entities?) What sentiment? Valence & what else? Emotion? Intent? Outputs: E.g., annotated text, models, indicators, dashboards, exploratory data interfaces. Usage mode: As‐a‐service (API), installed, or hosted/cloud. Capacity: Volume, performance, throughput, latency. Cost. JADT – June 5, 2014


Text Analytics: An Industry View

40

A few French companies

JADT – June 5, 2014


Text Analytics: An Industry View

41

Academic spin offs Academic spin‐offs

People Pattern JADT – June 5, 2014


Text Analytics: An Industry View

42

Text analytics future: Synthesis and sensemaking. y g

JADT – June 5, 2014


New York Times, September 8, 1957


Text Analytics: An Industry View

44

Emotion in text

JADT – June 5, 2014


Text Analytics: An Industry View

45

Emotion and outcomes

JADT – June 5, 2014


Text Analytics: An Industry View

46

Beyond Text Audio including speech. Images Images. Video.

http://www.geekosystem.com/ f facebook‐face‐recognition/ f g

http://flylib.com/books/en/2.495.1.54/1/

http://www.sciencedirect.com/science /article/pii/S0167639312000118

JADT – June 5, 2014


Text Analytics: An Industry View

47

The world of big data Machine data (e.g., logs, sensor outputs, clickstreams). Actions interactions and transactions: geolocation and Actions, interactions, and transactions: geolocation time. Profiles: individual, demographic & behavioral. , g p Text, audio, images, and video. Facts and feelings.

JADT – June 5, 2014


Text Analytics: An Industry View

48

(Accessible) data everywhere

JADT – June 5, 2014


Text Analytics: An Industry View

49

A big data analytics architecture (example)

http://www.geeklawblog.com/2011/12/lexis‐advance‐platform‐launch‐two.html JADT – June 5, 2014


Text Analytics: An Industry View

50

Sensemaking “It is convenient to divide the entire information access process into two main components: information retrieval through searching and browsing, and analysis and synthesis of results. This broader process is often referred to in the literature as sensemaking. Sensemaking refers to an iterative process of formulating a conceptual p f f g representation from of a large volume of information.” – Marti Hearst, 2009

http://searchuserinterfaces.com/

JADT – June 5, 2014


Text Analytics: An Industry View

51

En route

http://www.businessweek.com/magazine/content/04_19/b3882029_mz072.htm JADT – June 5, 2014


Text Analytics Past, Present & Text Analytics Past Present & Future: An Industry View Seth Grimes Alta Plana Corporation @ @sethgrimes g June 5 2014 June 5, 2014


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.