Using Solr in Online Travel Shopping to Improve User Experience

Page 1

Using Solr in Online Travel to Improve User Experience Sudhakar Karegowdra, Esteban Donato Travelocity, May 25TH 2011 { sudhakar.karegowdra, esteban.donato}@travelocity.com


What We Will Cover §  Travelocity §  Speakers Background §  Merchandising & Solr •  •  •  •

Challenges Solution Sizing and performance data Take Away

§  Location Resolution & Solr •  •  •  •

Challenges Solution Sizing and performance data Take Away

§  Q&A 3


§  First Online Travel Agency(OTA) Launched in 1996 §  Grown to 3,000 employees and is one of the largest travel agencies worldwide §  Headquartered in Dallas/Fort Worth with satellite offices in San Francisco, New York, London, Singapore, Bangalore, Buenos Aires to name a few §  In 2004, the Roaming Gnome became the centerpiece of marketing efforts and has become an international pop icon §  Owned by Sabre Holdings - sister companies include Travelocity Business, IgoUgo.com, lastminute.com, Zuji among others

4


Speakers Background §  Sudhakar Karegowdra •  Principal Architect Travelocity.com §  My experience –  13 + years –  Solr/ Lucene 3 years –  Implementing Hadoop, Pig and Hive for Data warehouse.

§  Topic : Merchandising

§  Esteban Donato •  Lead Architect Travelocity.com §  My experience –  10 + years –  Solr 2 years –  Analyzing Mahout and Carrot2 for document clustering engine.

§  Topic : Location Resolution

5


Merchandising By Sudhakar Karegowdra

6


The Challenge §  Market Drivers •  •  •  •

Build Landing Pages with Faceted Navigation Enable Content Segmentation and delivery Support Roll out of Promotions Roll up Data to a higher level §  E.g., All 5 star hotels in California to bring all the 5 Star hotels from SFO,LAX, SAN etc.,

•  Faster time to market new Ideas •  Rapidly scale to accommodate global brands with disparate data sources

7


The Challenge §  Traditional Database approach •  Higher time to market •  Specialized skill set to design and optimize database structures and queries •  Aggregation of data and changing of structures quite complex •  Building Faceted navigation capabilities needs complex logic leading to high maintenance cost

8


Solution - Overview §  Data from various sources aggregated and ingested into Solr •  Core per Locale and Product Type

§  Wrapper service to combine some data across product cores and manage configuration rules §  Solr’s built in Search and Faceting to power the navigation

9


Solution – Architecture View Widgets

UI

Mobile

Services/Business Logic

Solr Slaves (Multi Core) Solr Master (Multi Core) Offer Management Tool

ETL

Oracle

Deals

Products

…… 10


Solution - Achievements §  Millions of unique Long Tail Landing Pages §  E.g., http://www.travelocity.com/hotel-d4980-nevada-las-vegashotels_5-star_business-center_green

§  Faster search across products §  E.g., Beach Deals under $500

§  Segmented Content delivery through tagging §  Scaled well to distribute the content to different brands, partners and advertisers §  Opened up for other innovative applications §  Deals on Map, Deals on Mobile, Wizards etc.,

11


Solution – Road Ahead §  Migration to Solr 3.1 •  Geo spatial search •  CSV out put format

§  Query boosting by Search pattern §  Near Real time Updates §  Deal and user behavior mining in Hadoop – MapReduce and Solr to Serve the Content §  Move Slaves to Cloud

12


Sizing & Performance §  Index Stats §  Number of Cores : 25 §  Number of Documents : ~ 1 Million Records

§  Response §  Requests : 70 tps §  Average response time : 0.005 seconds (5 ms)

§  Software Versions §  Solr Version 1.4.0 –  filterCache size : 30000

§  Tomcat – 5.5.9 §  JDK1.6

13


Take Away §  Semi Structured Storage in Solr helps aggregate disparate sources easily Remember Dynamic fields

§  Multiple Cores to manage multiple locale data §  Solr is a great enabler of “Innovations”

14


Location Resolution By Esteban Donato

15


The Challenge §  How to develop a global location resolution service? §  Flexibility to changes §  General enough to cover everyone needs §  Multi language §  Performance and scalability §  Configurable by site

16


Architecture of the solution Auto-complete Resolution

§ Master/Slave architecture § § S client each binarycore format MolrJ ulti-core: § represents Solr response cache a language § Remote Streaming indexing § CSV format

Management Tool

Location DB

Solr Slave

Solr Master

Batch Job

17


Auto-complete §  System has to suggest options as the users type their desired location §  Examples “san” => San Francisco, “veg” => Las Vegas §  Relevancy: not all the locations are equally important. “par” => “Paris, France”; “Parana, Argentina” §  Users can search by various fields: location code, location name, city code, city name, state/province code, state province name, country code, country name. 18


Solr schema <dynamicField name="RANK*" type="int" required="false" indexed="true" stored="true" /> <field name="GLS_FULL_SEARCH" type="glsSearchField" required="false" indexed="true" stored="false" multiValued="true" /> <fieldType name="glsSearchField" class="solr.TextField" positionIncrementGap="100“> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="[/\-\t ]+" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.TrimFilterFactory" /> <filter class="solr.ISOLatin1AccentFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement="" replace="all"/> </analyzer> </fieldType>

19


Resolution §  System has to resolve the location requested by the users. §  Contemplates aliases. Big Apple => New York §  Contemplates ambiguities. §  Contemplates misspellings. Lomdon => London §  NGramDistance algorithm. §  How to combine distance with relevancy §  Error suggesting the correct location when it is a prefix. Lond => London

20


Spellchecker configuration <fieldType name=" spellcheckType " class="solr.TextField" positionIncrementGap="100“> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory” /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.TrimFilterFactory" /> <filter class="solr.ISOLatin1AccentFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement="" replace="all"/> </analyzer> </fieldType>

21


Sizing & Performance §  4 cores with ~ 500,000 documents indexed each §  Response times •  Auto-complete: 15ms, 20 TPS •  Resolution: 10ms, 2 TPS

§  Cache configuration •  queryResultCache: maxSize=1024 •  documentCache, maxSize=1024 •  fieldValueCache & filterCache disabled

22


Wrap Up §  Performance always as top priority §  Develop simple but robust services §  Provide a simple API

23


Q&A

24


Contact §  Esteban Donato •  Esteban.donato@travelocity.com •  Twitter: @eddonato

§  Sudhakar Karegowdra •  Sudhakar.karegowdra@travelocity.com •  Twitter: @skaregowdra

https://www.facebook.com/travelocity Twitter: @travelocity and @RoamingGnome 25


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.