Rees Craig - Search, APIs, Solr and the Sensis journey

Page 1

Search, APIs, Capability Management and the Sensis Journey Craig Rees


Project background

Platform selection

Search capability

Relevance

Architecture

Quality management

Hurdles

What’s next

Today’s menu


•  Sensis helps Australians find, buy and sell •  From print directories to a cross-platform lead generator •  Sensis publishes over 1.8 Million business listings •  Two of the top 10 visited online sites in Australia (WhitePages.com.au and YellowPages.com.au)

Sensis


Business objectives •

Drive presence in the local search market place

Open up the largest database of business listings in Australia

Reduce the effort required from local search developers

Free to use, we are after the reporting

Project background

Technology objectives •

Develop a total search platform

Relevancy testing as part of the development lifecycle

A framework to identify problem spaces

Manageable platform

Continuous deployments


Developer portal


Support for the search capability team

Structured vs non structured data

Deterministic vs black box

Non propriety code base

Community backing

Platform selection


Optimized Managed Monitored Adhoc Unmanaged

Lvl 5

Lvl 4

•  A/B testing •  Machine learning •  External collaboration •  Multiple contexts •  Online dashboards •  Test environments •  Dynamic search refinements •  Targets and metrics

Lvl 3

Lvl 2

Lvl 1

The Sensis Search capability maturity model *Courtesy of Pete Crawford & Craig Lonsdale

•  Defined team •  Regular monitoring •  Static autosuggest •  Basic linguistics •  Adhoc processes •  Part time team •  Static dictionaries •  Individual led innovation •  No resources •  No reporting •  Out of the box features


Location

Intent •  Name •  Type •  Product •  Spatial

Chronology

Social Graph

Device

Individual

Context is key


Business Data

Geo Service

Solr Business Data

Mashery

Name Query Handler

MongoDB

API

Index

Historical search Data Reporting Events

Our architecture

Publisher Reporting Service

Type Query Handler

Ontologies

Search Service


Business Data

Geo Service

Solr Business Data

Mashery

Name Query Handler

MongoDB

API

Index

Historical search Data Reporting Events

Data staging

Publisher Reporting Service

Type Query Handler

Ontologies

Search Service


Business Data

Geo Service

Solr Business Data

Mashery

Name Query Handler

MongoDB

API

Index

Historical search Data Reporting Events

Search

Publisher Reporting Service

Type Query Handler

Ontologies

Search Service


Business Data

Geo Service

Solr Business Data

Mashery

Name Query Handler

MongoDB

API

Index

Historical search Data Reporting Events

API

Publisher Reporting Service

Type Query Handler

Ontologies

Search Service


Business Data

Geo Service

Solr Business Data

Mashery

Name Query Handler

MongoDB

API

Index

Historical search Data Reporting Events

API proxy

Publisher Reporting Service

Type Query Handler

Ontologies

Search Service


•  Moved from a black box solution to a manageable platform

Yesterday

•  Deliver search improvements without major code changes •  Understand how results were calculated •  Identity problems scientifically •  Continuously tune and test relevance

Evolution of search management

Today

Tomorrow


Path Analysis used to identify problems spaces

“Gold Sets” used to define overall quality score (TREC)

Specific gold sets for each problem space: Ø  Ø  Ø  Ø

Intent Spelling & stemming Location Phrase parsing

Features signed off only when they make a positive impact to quality score

Problem spaces, quality management & tuning


Search quality analysis and testing


Results examiner


Score analysis


Tuning


Lather, rinse, repeat


•  •  •  •

Hurdles along the way

Data redundancy and homogeneity Solr ranking of rare terms Intent differentiation Contextual synonyms


•  •  •  •  •  •  •

Where next?

Query engine Facets / autosuggest Real time tuning Machine learning Multi term queries Scoring thresholds Content Value


Email: craig.rees@sensis.com.au www: developers.sensis.com.au Twitter: @SensisAPI @ablebagel

Questions?


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.