Infomedia A/S How can a Media Intelligence Company gain from Language Technology? A presentation by Ida Schulin‐Zeuthen, Product Manager
Facts about Infomedia 1. Established in November
4. We are 180 employees in
2002 as a result of a merger of Politiken’s Polinfo and Berlingske Media’s Avisdata. Still owned by publicists – 50/50 JP/Politikens Hus and Berlingske
Denmark and have been through three acquisitions: A’jour Presseklip (2006), Cision Danmark (2009) and Infopaq (2012)
2. Market leader within media
5. Infomedia monitors printed
analyses, communications management and media monitoring with the broadest coverage and the most specialized solutions
as well as the digital press, radio, TV and the Internet, including social media – nationally and internationally
3. We have a strong position in
6. More than 2,500 public and
the Danish market and excellent insight in our clients’ processes and business areas
private corporate clients in Denmark with 80% of the market share
Market share
Infomedia
Member of:
We aggregate news content world wide
Retriever
Credit Info
Weekly Press Survey
Cis ion Finland
J&A Media
Opoint (web) Medias kopas N LI Media Market
Durrants PMG N LA Lexis Aus s chnitt Mediargus EEC Kantar Media Eco de la Stampa
BurellesLuce
In N ews
Interpres s
Media Watch Middle East
China Clipping Kantar Media Indian Press Clearing
Mediabanc
Global News Group Media Monitors Australia
We aggregate (news) content from all media sources
The building blocks…
Products across Analysis
Monitoring
Creates understanding
Search Ongoing overview
Self‐service
Infomedia Media Search With Media Search you get: › Denmark’s largest online article database › The option of searching in millions of complete articles from the country’s most important media › The broadest source coverage in the market › Overview of a certain topic or area
50 million articles growing + 6 million articles a year
Contains articles from: › National newspapers › Regional and local daily newspapers › Local weekly newspapers › Magazines, journals and trade journals › Online media › Summaries of radio and TV features
Different search options for different needs: › Simple search › Advanced search › Expert search
Infomedia Media Monitoring
Media
Digital articles Analogue articles Radio and TV programs Social media, blogs and forums
Selection Sorting Structuring Categorizing
Online sources
Delivery: Daily news e‐mail Portal Feed Intranet Internet
Infomedia Analysis 15‐25 experts are interviewed about clients communication
Stakeholder Analysis About 10 journalists are interviewed about clients communication
30‐40 pages written rapport
EXPERTS
Journalist Audit 20 pages written rapport
JOURNALISTS
Reputation Analysis YouGovs BrandIndex – 250 brands – daily measurements
PEOPLE EXPLORED
THE PUBLIC
Combined with a qualitative media analysis
Focus Groups RELEVANT TARGET GROUPS
Agenda Analysis CASE – DEBATE ‐ AGENDA
About 20 respondents are interviewed about clients communication
Target Group Analysis NEWS READERS
Qualitative Media Analysis CLIENT– COMPETITORS ‐ PRODUCTS
Quantitative Media Analysis CLIENT – COMPETITORS ‐ PRODUCTS
Analysis Light CLIENT – COMPETITORS ‐ PRODUCTS
MEDIA EXPLORED
Processing & production Media content
Processing
24/7 monitoring of media inflow
Delivery
Sorting and selection
Summaries
Sentiment Summarizing radio/tv content
Ontological Representation ‐ Categorisation in Infomedia’s internal systems and products ›
We established an ontology team September 2012
›
Rule based automatic categorisation • Technology • Verity 2003‐2007 • Smartlogic 2012‐
• Tags • Organisations • People names • Locations • Topics – exceptionally wide domain
Content ›
Infomedia handles more than 30.000 incoming articles every day about whatever possible
›
The articles are varied and messy
›
The articles are short: 15‐10.000 words – but an average under 200 words
›
From these articles we want to retrieve and extract information
Search Engines ›
Articles are stored as XML‐files
›
At present we are running three search engines
• Verity K2 (in production since 2003) • IDOL 7 (in production since 2011) • Solr (in production since 2012 – to supersede Verity and IDOL)
Search ›
We have no: • lemmatisation • part of speech‐tagging • sentence boundary detection • language detection
›
We do have: • stemming – customised, word list‐based for a more “lemma‐like” performance
›
Search logs • User queries are stored (for documentation and billing), but not being analysed in any way
Entity Official Company Name
Alternative Name 1
50 % owned Subsidiary
Alternative Name 2
100 % owned Subsidiary
Brand 1
Brand 3
Brand 2
Entities and topics related People
Topics
Organisations
› › › › › › › › › ›
› ›
› › › › › › › › › ›
Barack Obama David Bowie Helle Thorning‐Schmidt Herman Van Rompuy Lance Armstrong Octavio Paz Pat McQuaid Sergei Eisenstein Steffi Graf Walter Cronkite
Business Environment
›
› ›
Pollution
Politics › Elections Sports
› › › ›
Cycling Golf Doping Tennis
Carlsberg CERN European Commission Folketinget Greenpeace Labour Party (UK) Republican Party (US) Systembolaget WADA WTA
Benefits from ontological representation ›
An entity includes all aspects
›
Explorative search
›
Related articles
›
Ambiguity: Jaguar and Golf – cars or something else?
Maintenance of ontology ›
We have to keep up with changes in language
›
New phenomena
›
With a statistic NER system we will discover new entities with little manual effort
›
Text mining
Language Technology Potential Gains ›
A query in one language should match with articles in several language
›
An executive summary delivered in a company’s corporate language – but inferred from media coverage in different languages
›
Automatic summarisation?
›
Machine translation?
›
Analysis of user queries?
›
Speech recognition?
›
Auto sentiment scoring?
A.P. Møller‐Mærsk APM
Damco
Odense Staalskibsværft
Maersk
Mærsk
Danbo
D/S Svendborg
D/S 1912
Lindøværftet
Svitser
Safmarine
TO BUILD A RULE FOR A TOPIC THOROUGHLY OR ’QUICK AND DIRTY’
Atomic Power (topic) Atomic Power
Atomic Weapon
Atomic Number
A‐Power
Areva
OOA
Nuclear Power
Barsebäck
Reactor
Atomic Activities
Euratom
Forsøgsanlæg Risø
Atomic Facility
Fukushima
Sellafield
Atomic Fuel
Ignalina
Heavy‐water Reactor
Atomic Energy
INES
Tjernobyl
Atomic Programme
Kärnkraft
Uran Enrichment
Atomic Reactor
Nuclear Reactor
Light Water Reactor
Atomic Power (topic) Atomic Power A‐Power All articles Nuclear Power
Topic 1 Relevant articles
Atomic Power (topic) Atomic Power
Atomic Weapon
Atomic Number
A‐Power All articles Nuclear Power Atomic Activities Atomic Facility Atomic Fuel Atomic Energy Atomic Programme Atomic Reactor
Topic 2 Relevant articles
All articles
Topic 3 Relevant articles
Media Search
>>> Search live