Gregg Donovan @Lucenerevolution 2011

Page 1

Solr & Lucene at Etsy Gregg Donovan Technical Lead, Search gregg@etsy.com


1.5 years Solr & Lucene at Etsy.com 3 years Solr & Lucene at TheLadders.com



8+ million members


9.3 million items


800k+ active sellers


1+ billion pageviews / month








Maximize Solr out-of-the-box


Hack at a low-level


Know when to do each



Or



Don’t fear trunk


builds.apache.org/job/Solr-trunk/changes




http://localhost:8393/solr/placesuggest/ select? q={!lucene}s* &sfield=latlong&pt=37.595804,-122.364521 &sort=div(geodist(),sqrt(sum(population,50))) %20asc


{!lucene} {!field} {!term} {!boost} {!func} {!dismax} {!edismax}


Cheap ranking awesomeness



ExternalFileField ftw!


schema.xml: <fieldType name="file" keyField="treasury_id" defVal="0" stored="false" indexed="true" class="solr.ExternalFileField" valType="float"/> <field name="hotness" type="file"/> /search/data/treasury/external_hotness.1306390802088: 1=2.3 2=1.7 3=1.1 Solr query: sort={!func}hotness+desc


ExternalFileField caveats


More relevance: boost query


http://localhost:8983/solr/listings/select? q={!boost b=$rel v=$qq} &rel=category:furniture^10+OR+((-material:acrylic) ^5) &qq=desk


Impression tracking


etsy.com/search?q=desk&explain=1


Side-by-Side testing



Cheap performance wins


Put off sharding till you must


cat ${indexDir}/* > /dev/null


Return IDs, minimize stored fields


RAM: $10-20 / GB


SSD: 0.1ms vs 10ms seek


Custom?


solr-user


Tools for low-level hacking


Continuous deployment



One button. So easy a dog could do it.




MTTR > MTBF




github.com/etsy/logster


Tracking GC


export GC_DEBUG="-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX: +PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy XX:AdaptiveSizePolicyOutputInterval=1 -XX:+PrintTenuringDistribution -XX: +PrintGCDetails -Xloggc:/var/log/search/gc.log"




Alerting


Testing



SaveAsFixture


Profiling


Java Primitive Library fastutil trove4j


Know the hooks SolrRequestHandler SearchComponent QParserPlugin SolrEventListener SolrCache ValueSourceParser


SolrIndexSearcher gotchas reference counting using it as a cache key: WeakHashMap<SolrIndexSearcher,MyValue> myCache...


Example: personalized collections



fq={!term f=id}123 OR {!term f=id}456


Need a map of PK to docId


Use custom SolrCache plus SolrEventListener to fill it


github.com/giokincade/FastTermFilter


i18n currency sorting and filtering



currency.xml: <currencyConfig version="1.0"> ! <currencies> ! ! <currency name="United States Dollar" symbol="$" code="USD"/> ! ! <currency name="Australian Dollar" symbol="$" code="AUD"/> ! ! <currency name="Canadian Dollar" symbol="$" code="CAD"/> ! ! <currency name="Czech Koruna" symbol="KÄ?" code="CZK"/> ... ! </currencies> ! <rates> ! ! <rate from="USD" to="AUD" rate="1.168750"/> ! ! <rate from="USD" to="CAD" rate="1.085000"/> ! ! <rate from="USD" to="CZK" rate="20.107500"/> ! ! <rate from="USD" to="DKK" rate="5.323750"/> ... </rates> </currencyConfig>


price:[$10.00 to $50.00] price:[10.00USD to 50.00USD] price:20.00EUR


MoneyFieldType.java: @Override public Query getRangeQuery(QParser parser, SchemaField field, String part1, String part2, final boolean minInclusive, final boolean maxInclusive) { final MoneyValue p1 = MoneyValue.parse(part1, defaultCurrency); final MoneyValue p2 = MoneyValue.parse(part2, defaultCurrency); if (!p1.getCurrencyCode().equals(p2.getCurrencyCode())) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, new ParseException("Cannot parse range query " + part1 + " to " + part2 + ": range queries only supported when upper and lower bound have same currency.")); } String currencyCode = p1.getCurrencyCode(); final MoneyValueSource vs = new MoneyValueSource(field, currencyCode, parser); return new SolrConstantScoreQuery(new ValueSourceRangeFilter(vs, p1.getAmount() + "", p2.getAmount() + "", minInclusive, maxInclusive)); }


Replication gotcha


SOLR-2202


Related Searches


Autosuggest!


bjewlery dewelry ejewelry ejwelry ewelery ewerly ewlery fewelry fewlery fjewelery fjewelry gewerly gewlery hewelery hewelry hewerly hewlery hjewelry iewelry ijewelry jawelery jawlery jeawlery jeelery jeelry jeewelery jeewelry jeewlery jeewlry jefwelry jejelry jelelry jelery jellery jelwelery jelwelry jelwlery jemelry jemerly jemwelry jeqwelry jerelery jerelry jerely jererly jerlery jerwelery jerwelry jerwely jerwerly jeselery jeselry jevelry jeverly jewalery jewdelry jewedlry jeweelrry jeweelry jeweely jeweer jeweery jeweilry jeweiry jewejery jewejlry jewejrly jewejry jewekey jewekry jewelary jeweldy jewele jewelee jewelelry jewelera jewelerey jewelerly jewelert jewelerty jeweleru jeweleruy jeweleryl jewelerys jeweleryy jewelet jewelety jeweleya jewelfry jewelfy jeweliy jewellryp jewelltry jewelly jewelory jewelra jewelray jewelre jewelree jewelreyy jewelrfy jewelrh jewelri jewelrky jewelrly jewelrr jewelrs jewelrsy jewelrt jewelrty jewelru jewelruy jewelrye jewelryh jewelryl jewelrym jewelryr jewelrys jewelryt jewelryu jewelryuk jewelryy jewelrz jewelsry jewelsy jeweltry jewelty jewelw jewelwery jewelwey jewelwy jewelya jewelyj jewelyr jewelyry jewelyu jewelyy jewelzry jeweory jewerey jeweriy jewerky jewerlary jewerley jewerli jewerlly jewerls jewerlt jewerlu jewerlyh jewerlyr jewerlys jewerlyu jewerry jeweryl jewetry jewewlry jewewly jewewrly jewewry jeweylry jewiery jewilary jewkery jewlary jewledy jewleery jewlelery jewlely


The TermDictionary is not a whitelist



Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.