Using Solr in Online Travel to Improve User Experience Sudhakar Karegowdra, Esteban Donato Travelocity, May 25TH 2011 { sudhakar.karegowdra, esteban.donato}@travelocity.com
What We Will Cover § Travelocity § Speakers Background § Merchandising & Solr • • • •
Challenges Solution Sizing and performance data Take Away
§ Location Resolution & Solr • • • •
Challenges Solution Sizing and performance data Take Away
§ Q&A 3
§ First Online Travel Agency(OTA) Launched in 1996 § Grown to 3,000 employees and is one of the largest travel agencies worldwide § Headquartered in Dallas/Fort Worth with satellite offices in San Francisco, New York, London, Singapore, Bangalore, Buenos Aires to name a few § In 2004, the Roaming Gnome became the centerpiece of marketing efforts and has become an international pop icon § Owned by Sabre Holdings - sister companies include Travelocity Business, IgoUgo.com, lastminute.com, Zuji among others
4
Speakers Background § Sudhakar Karegowdra • Principal Architect Travelocity.com § My experience – 13 + years – Solr/ Lucene 3 years – Implementing Hadoop, Pig and Hive for Data warehouse.
§ Topic : Merchandising
§ Esteban Donato • Lead Architect Travelocity.com § My experience – 10 + years – Solr 2 years – Analyzing Mahout and Carrot2 for document clustering engine.
§ Topic : Location Resolution
5
Merchandising By Sudhakar Karegowdra
6
The Challenge § Market Drivers • • • •
Build Landing Pages with Faceted Navigation Enable Content Segmentation and delivery Support Roll out of Promotions Roll up Data to a higher level § E.g., All 5 star hotels in California to bring all the 5 Star hotels from SFO,LAX, SAN etc.,
• Faster time to market new Ideas • Rapidly scale to accommodate global brands with disparate data sources
7
The Challenge § Traditional Database approach • Higher time to market • Specialized skill set to design and optimize database structures and queries • Aggregation of data and changing of structures quite complex • Building Faceted navigation capabilities needs complex logic leading to high maintenance cost
8
Solution - Overview § Data from various sources aggregated and ingested into Solr • Core per Locale and Product Type
§ Wrapper service to combine some data across product cores and manage configuration rules § Solr’s built in Search and Faceting to power the navigation
9
Solution – Architecture View Widgets
UI
Mobile
Services/Business Logic
Solr Slaves (Multi Core) Solr Master (Multi Core) Offer Management Tool
ETL
Oracle
Deals
Products
…… 10
Solution - Achievements § Millions of unique Long Tail Landing Pages § E.g., http://www.travelocity.com/hotel-d4980-nevada-las-vegashotels_5-star_business-center_green
§ Faster search across products § E.g., Beach Deals under $500
§ Segmented Content delivery through tagging § Scaled well to distribute the content to different brands, partners and advertisers § Opened up for other innovative applications § Deals on Map, Deals on Mobile, Wizards etc.,
11
Solution – Road Ahead § Migration to Solr 3.1 • Geo spatial search • CSV out put format
§ Query boosting by Search pattern § Near Real time Updates § Deal and user behavior mining in Hadoop – MapReduce and Solr to Serve the Content § Move Slaves to Cloud
12
Sizing & Performance § Index Stats § Number of Cores : 25 § Number of Documents : ~ 1 Million Records
§ Response § Requests : 70 tps § Average response time : 0.005 seconds (5 ms)
§ Software Versions § Solr Version 1.4.0 – filterCache size : 30000
§ Tomcat – 5.5.9 § JDK1.6
13
Take Away § Semi Structured Storage in Solr helps aggregate disparate sources easily Remember Dynamic fields
§ Multiple Cores to manage multiple locale data § Solr is a great enabler of “Innovations”
14
Location Resolution By Esteban Donato
15
The Challenge § How to develop a global location resolution service? § Flexibility to changes § General enough to cover everyone needs § Multi language § Performance and scalability § Configurable by site
16
Architecture of the solution Auto-complete Resolution
§ Master/Slave architecture § § S client each binarycore format MolrJ ulti-core: § represents Solr response cache a language § Remote Streaming indexing § CSV format
Management Tool
Location DB
Solr Slave
Solr Master
Batch Job
17
Auto-complete § System has to suggest options as the users type their desired location § Examples “san” => San Francisco, “veg” => Las Vegas § Relevancy: not all the locations are equally important. “par” => “Paris, France”; “Parana, Argentina” § Users can search by various fields: location code, location name, city code, city name, state/province code, state province name, country code, country name. 18
Solr schema <dynamicField name="RANK*" type="int" required="false" indexed="true" stored="true" /> <field name="GLS_FULL_SEARCH" type="glsSearchField" required="false" indexed="true" stored="false" multiValued="true" /> <fieldType name="glsSearchField" class="solr.TextField" positionIncrementGap="100â&#x20AC;&#x153;> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="[/\-\t ]+" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.TrimFilterFactory" /> <filter class="solr.ISOLatin1AccentFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement="" replace="all"/> </analyzer> </fieldType>
19
Resolution § System has to resolve the location requested by the users. § Contemplates aliases. Big Apple => New York § Contemplates ambiguities. § Contemplates misspellings. Lomdon => London § NGramDistance algorithm. § How to combine distance with relevancy § Error suggesting the correct location when it is a prefix. Lond => London
20
Spellchecker configuration <fieldType name=" spellcheckType " class="solr.TextField" positionIncrementGap="100“> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory” /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.TrimFilterFactory" /> <filter class="solr.ISOLatin1AccentFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement="" replace="all"/> </analyzer> </fieldType>
21
Sizing & Performance § 4 cores with ~ 500,000 documents indexed each § Response times • Auto-complete: 15ms, 20 TPS • Resolution: 10ms, 2 TPS
§ Cache configuration • queryResultCache: maxSize=1024 • documentCache, maxSize=1024 • fieldValueCache & filterCache disabled
22
Wrap Up § Performance always as top priority § Develop simple but robust services § Provide a simple API
23
Q&A
24
Contact § Esteban Donato • Esteban.donato@travelocity.com • Twitter: @eddonato
§ Sudhakar Karegowdra • Sudhakar.karegowdra@travelocity.com • Twitter: @skaregowdra
https://www.facebook.com/travelocity Twitter: @travelocity and @RoamingGnome 25