Search, APIs, Capability Management and the Sensis Journey Craig Rees
•
Project background
•
Platform selection
•
Search capability
•
Relevance
•
Architecture
•
Quality management
•
Hurdles
•
What’s next
Today’s menu
• Sensis helps Australians find, buy and sell • From print directories to a cross-platform lead generator • Sensis publishes over 1.8 Million business listings • Two of the top 10 visited online sites in Australia (WhitePages.com.au and YellowPages.com.au)
Sensis
Business objectives •
Drive presence in the local search market place
•
Open up the largest database of business listings in Australia
•
Reduce the effort required from local search developers
•
Free to use, we are after the reporting
Project background
Technology objectives •
Develop a total search platform
•
Relevancy testing as part of the development lifecycle
•
A framework to identify problem spaces
•
Manageable platform
•
Continuous deployments
Developer portal
•
Support for the search capability team
•
Structured vs non structured data
•
Deterministic vs black box
•
Non propriety code base
•
Community backing
Platform selection
Optimized Managed Monitored Adhoc Unmanaged
Lvl 5
Lvl 4
• A/B testing • Machine learning • External collaboration • Multiple contexts • Online dashboards • Test environments • Dynamic search refinements • Targets and metrics
Lvl 3
Lvl 2
Lvl 1
The Sensis Search capability maturity model *Courtesy of Pete Crawford & Craig Lonsdale
• Defined team • Regular monitoring • Static autosuggest • Basic linguistics • Adhoc processes • Part time team • Static dictionaries • Individual led innovation • No resources • No reporting • Out of the box features
Location
Intent • Name • Type • Product • Spatial
Chronology
Social Graph
Device
Individual
Context is key
Business Data
Geo Service
Solr Business Data
Mashery
Name Query Handler
MongoDB
API
Index
Historical search Data Reporting Events
Our architecture
Publisher Reporting Service
Type Query Handler
Ontologies
Search Service
Business Data
Geo Service
Solr Business Data
Mashery
Name Query Handler
MongoDB
API
Index
Historical search Data Reporting Events
Data staging
Publisher Reporting Service
Type Query Handler
Ontologies
Search Service
Business Data
Geo Service
Solr Business Data
Mashery
Name Query Handler
MongoDB
API
Index
Historical search Data Reporting Events
Search
Publisher Reporting Service
Type Query Handler
Ontologies
Search Service
Business Data
Geo Service
Solr Business Data
Mashery
Name Query Handler
MongoDB
API
Index
Historical search Data Reporting Events
API
Publisher Reporting Service
Type Query Handler
Ontologies
Search Service
Business Data
Geo Service
Solr Business Data
Mashery
Name Query Handler
MongoDB
API
Index
Historical search Data Reporting Events
API proxy
Publisher Reporting Service
Type Query Handler
Ontologies
Search Service
• Moved from a black box solution to a manageable platform
Yesterday
• Deliver search improvements without major code changes • Understand how results were calculated • Identity problems scientifically • Continuously tune and test relevance
Evolution of search management
Today
Tomorrow
Path Analysis used to identify problems spaces
“Gold Sets” used to define overall quality score (TREC)
Specific gold sets for each problem space: Ø Ø Ø Ø
Intent Spelling & stemming Location Phrase parsing
Features signed off only when they make a positive impact to quality score
Problem spaces, quality management & tuning
Search quality analysis and testing
Results examiner
Score analysis
Tuning
Lather, rinse, repeat
• • • •
Hurdles along the way
Data redundancy and homogeneity Solr ranking of rare terms Intent differentiation Contextual synonyms
• • • • • • •
Where next?
Query engine Facets / autosuggest Real time tuning Machine learning Multi term queries Scoring thresholds Content Value
Email: craig.rees@sensis.com.au www: developers.sensis.com.au Twitter: @SensisAPI @ablebagel
Questions?