Ida schulin-zeuthen (Infomedia) presentation

Page 1

Infomedia A/S How can a Media Intelligence Company gain from Language Technology? A presentation by Ida Schulin‐Zeuthen, Product Manager


Facts about Infomedia 1. Established in November

4. We are 180 employees in

2002 as a result of a merger of Politiken’s Polinfo and Berlingske Media’s Avisdata. Still owned by publicists – 50/50 JP/Politikens Hus and Berlingske

Denmark and have been through three acquisitions: A’jour Presseklip (2006), Cision Danmark (2009) and Infopaq (2012)

2. Market leader within media

5. Infomedia monitors printed

analyses, communications management and media monitoring with the broadest coverage and the most specialized solutions

as well as the digital press, radio, TV and the Internet, including social media – nationally and internationally

3. We have a strong position in

6. More than 2,500 public and

the Danish market and excellent insight in our clients’ processes and business areas

private corporate clients in Denmark with 80% of the market share

Market share

Infomedia

Member of:


We aggregate news content world wide

Retriever

Credit Info

Weekly Press Survey

Cis ion Finland

J&A Media

Opoint (web) Medias kopas N LI Media Market

Durrants PMG N LA Lexis Aus s chnitt Mediargus EEC Kantar Media Eco de la Stampa

BurellesLuce

In N ews

Interpres s

Media Watch Middle East

China Clipping Kantar Media Indian Press Clearing

Mediabanc

Global News Group Media Monitors Australia


We aggregate (news) content from all media sources


The building blocks…


Products across Analysis

Monitoring

Creates understanding

Search Ongoing overview

Self‐service


Infomedia Media Search With Media Search you get: › Denmark’s largest online article database › The option of searching in millions of complete articles from the country’s most important media › The broadest source coverage in the market › Overview of a certain topic or area

50 million articles growing + 6 million articles a year

Contains articles from: › National newspapers › Regional and local daily newspapers › Local weekly newspapers › Magazines, journals and trade journals › Online media › Summaries of radio and TV features

Different search options for different needs: › Simple search › Advanced search › Expert search


Infomedia Media Monitoring

Media

Digital articles Analogue articles Radio and TV programs Social media, blogs and forums

Selection Sorting Structuring Categorizing

Online sources

Delivery: Daily news e‐mail Portal Feed Intranet Internet


Infomedia Analysis 15‐25 experts are interviewed about clients communication

Stakeholder Analysis About 10 journalists are interviewed about clients communication

30‐40 pages written rapport

EXPERTS

Journalist Audit 20 pages written rapport

JOURNALISTS

Reputation Analysis YouGovs BrandIndex – 250 brands – daily measurements

PEOPLE EXPLORED

THE PUBLIC

Combined with a qualitative media analysis

Focus Groups RELEVANT TARGET GROUPS

Agenda Analysis CASE – DEBATE ‐ AGENDA

About 20 respondents are interviewed about clients communication

Target Group Analysis NEWS READERS

Qualitative Media Analysis CLIENT– COMPETITORS ‐ PRODUCTS

Quantitative Media Analysis CLIENT – COMPETITORS ‐ PRODUCTS

Analysis Light CLIENT – COMPETITORS ‐ PRODUCTS

MEDIA EXPLORED


Processing & production Media content

Processing

24/7 monitoring of media inflow

Delivery

Sorting and selection

Summaries

Sentiment Summarizing radio/tv content



Ontological Representation ‐ Categorisation in Infomedia’s internal systems and products ›

We established an ontology team September 2012

Rule based automatic categorisation • Technology • Verity 2003‐2007 • Smartlogic 2012‐

• Tags • Organisations • People names • Locations • Topics – exceptionally wide domain


Content ›

Infomedia handles more than 30.000 incoming articles every day about whatever possible

The articles are varied and messy

The articles are short: 15‐10.000 words – but an average under 200 words

From these articles we want to retrieve and extract information 


Search Engines ›

Articles are stored as XML‐files

At present we are running three search engines

• Verity K2 (in production since 2003) • IDOL 7 (in production since 2011) • Solr (in production since 2012 – to supersede Verity and IDOL)


Search ›

We have no: • lemmatisation • part of speech‐tagging • sentence boundary detection • language detection

We do have: • stemming – customised, word list‐based for a more “lemma‐like” performance

Search logs • User queries are stored (for documentation and billing), but not being analysed in any way


Entity Official Company Name

Alternative Name 1

50 % owned Subsidiary

Alternative Name 2

100 % owned Subsidiary

Brand 1

Brand 3

Brand 2


Entities and topics related People

Topics

Organisations

› › › › › › › › › ›

› ›

› › › › › › › › › ›

Barack Obama David Bowie Helle Thorning‐Schmidt Herman Van Rompuy Lance Armstrong Octavio Paz Pat McQuaid Sergei Eisenstein Steffi Graf Walter Cronkite

Business Environment

› ›

Pollution

Politics › Elections Sports

› › › ›

Cycling Golf Doping Tennis

Carlsberg CERN European Commission Folketinget Greenpeace Labour Party (UK) Republican Party (US) Systembolaget WADA WTA


Benefits from ontological representation ›

An entity includes all aspects

Explorative search

Related articles

Ambiguity: Jaguar and Golf – cars or something else?


Maintenance of ontology ›

We have to keep up with changes in language

New phenomena

With a statistic NER system we will discover new entities with little manual effort

Text mining


Language Technology Potential Gains ›

A query in one language should match with articles in several language

An executive summary delivered in a company’s corporate language – but inferred from media coverage in different languages

Automatic summarisation?

Machine translation?

Analysis of user queries?

Speech recognition?

Auto sentiment scoring?




A.P. Møller‐Mærsk APM

Damco

Odense Staalskibsværft

Maersk

Mærsk

Danbo

D/S Svendborg

D/S 1912

Lindøværftet

Svitser

Safmarine


TO BUILD A RULE FOR A TOPIC THOROUGHLY OR ’QUICK AND DIRTY’


Atomic Power (topic) Atomic Power

Atomic Weapon

Atomic Number

A‐Power

Areva

OOA

Nuclear Power

Barsebäck

Reactor

Atomic Activities

Euratom

Forsøgsanlæg Risø

Atomic Facility

Fukushima

Sellafield

Atomic Fuel

Ignalina

Heavy‐water Reactor

Atomic Energy

INES

Tjernobyl

Atomic Programme

Kärnkraft

Uran Enrichment

Atomic Reactor

Nuclear Reactor

Light Water Reactor


Atomic Power (topic) Atomic Power A‐Power All articles Nuclear Power

Topic 1 Relevant articles


Atomic Power (topic) Atomic Power

Atomic Weapon

Atomic Number

A‐Power All articles Nuclear Power Atomic Activities Atomic Facility Atomic Fuel Atomic Energy Atomic Programme Atomic Reactor

Topic 2 Relevant articles


All articles

Topic 3 Relevant articles


Media Search

>>> Search live


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.