geuide_lucenerevolution_2011 by lucid imagination

LUCENE REVOLUTION San Francisco 2011

Welcome to San Francisco! We are excited to be bringing you the second Lucene Revolution event, following quickly on the success of our 2010 conference in Boston last year. In addition to all the great feedback we received after Boston, many people asked about bringing the conference to the West Coast – and here we are. It’s great to host the community here in our home state of California. There’s now no question: the revolution is in full swing, and Lucene and Solr are shaping the future of search. The diverse range of search technology and applications is without a doubt one of its greatest strengths. For the extended community and ecosystem of open source search, Lucene Revolution is an unmatched opportunity to learn, network, share experiences, see how others have changed the world of search. Speakers here at the conference hail from companies large and small, from innovative startups and established companies, as well as from government, academia and non-profits. Even better, the range of experience and application interests of your fellow-attendees should inspire you to seek out new ways to put search technology to work. We’ve allotted ample time in breaks to have formal and informal conversations. And be sure to join the Revolution social network at: http://lucene.crowdvine.com/. Keep an eye out at the Registration Desk for agenda changes and updates. One group you should definitely seek out here is the core group of developers and committers who are the heart and soul of the Apache Lucene/Solr project. You know them from the mailing lists; these are the people who do the hard work of making the code do its magic, resolving challenging technical and architectural issues that we all benefit from. Don’t just attend their roadmap panel and technical sessions; make sure you avail yourself of the opportunity to put faces to names, so that when you’re on the mailing lists, you’ll have more than a ‘to’ and a ‘from’ to go by. As the commercial entity for Lucene/Solr, we at Lucid Imagination are always looking for new ways to help make the most of open source search. Be sure to tell us what you like, what could be improved, and what topics should be covered in future events. Think about sharing your own successes with the community by speaking at the next Lucene Revolution. Let the conference staff, or anyone on the Lucid Imagination team, know if you have any questions, or if there’s anything you need. Onward to the revolution! Eric Gries, CEO Lucid Imagination

San Francisco 2011

LUCENE REVOLUTION

Opening Letter .................................................................................................................................................... 1! Contents ............................................................................................................................................................... 2! Timetable at a Glance ........................................................................................................................................ 3! Agenda .................................................................................................................................................................. 6! About Lucid Imagination .................................................................................................................................. 8! About Our Sponsors ........................................................................................................................................ 10! Training .............................................................................................................................................................. 14! Keynotes ............................................................................................................................................................ 18! Sessions–Day 1.................................................................................................................................................. 19! Lightning Talks ................................................................................................................................................. 25! Sessions–Day 2.................................................................................................................................................. 28! Speaker Bios ...................................................................................................................................................... 36! Hotel, Maps & Transportation Info .............................................................................................................. 50!

Lucene, Apache Lucene, Solr, Apache Solr, Hadoop, Apache Hadoop and other Apache projects mentioned are trademarks of The Apache Software Foundation.

LUCENE REVOLUTION San Francisco 2011

SUNDAY MAY 22 16:00 - 18:00 ........................................................................................ REGISTRATION OPEN Sandpebble Foyer outside Grand Peninsula Ballroom

MONDAY MAY 23 8:00 – 9:00 ....................................................................................... TRAINING REGISTRATION OPEN 9:00 - 17:00 ...................................................................................... Training Workshops/Day 1 ! Solr Application Development Workshop ! Developing Search Applications with LucidWorks Enterprise ! Lucene Application Development Workshop ! Scaling Search with Solr and Big Data See registration desk in Sandpebble Foyer for room assignment.

TUESDAY MAY 24 8:00 – 9:00 ....................................................................................... TRAINING REGISTRATION OPEN 9:00 - 17:00 ...................................................................................... Training Workshops/Day 2 ! Solr Application Development Workshop ! Developing Search Applications with LucidWorks Enterprise ! Lucene Application Development Workshop ! Scaling Search with Solr and Big Data 16:00 – 18:00 .............................................................................................. Ticket Pickup for Giants Game (advance tickets required). Tickets may be picked up at the Conference Registration Desk in the Sandpebble Foyer 18:00.................................................................................................................. Buses depart for Giants Game from front entrance of Hyatt Hotel

San Francisco 2011

LUCENE REVOLUTION

WEDNESDAY, MAY 25 7:30 – 18:00............................................................................................................. REGISTRATION OPEN 7:30 – 8:30 ..................................................................................................................Light Breakfast Available 8:30 – 10:05 ................................................................................................ Welcome & Keynotes Welcome .................................................................. Eric Gries, Lucid Imagination Keynotes ......................................................Marc Krellenstein, Lucid Imagination Stephen Dunn, The Guardian News and Media 10:05 – 10:35 .......................................................................................................................................... BREAK 10:35 - 11:25 ........................................................................................ Technical Track Sessions 11:25 – 11:35 .......................................................................................................................................... BREAK 11:35 - 12:25 ........................................................................................ Technical Track Sessions 12:25 - 13:30 ....................................................................................LUNCH AND SPONSOR EXHIBITS 13:30 - 14:20 ........................................................................................ Technical Track Sessions 14:20 - 14:30 ........................................................................................................................................... BREAK 14:30 - 15:20 ........................................................................................ Technical Track Sessions 15:20 - 15:50 .......................................................................................................................................... BREAK 15:50 - 16:40 ..................................................................................... Panel: “Stump the Chump” 16:40 – 17:00 ......................................................................................................................................... BREAK 17:00 - 18:30 ........................................................................................................ Lightning Talks 18:30........................................................................................................................... REVOLUTION PARTY

THURSDAY MAY 26 7:45 – 8:45 ..................................................................................................................Light Breakfast Available 8:45 – 10:15 Keynote ....................................................................... Stephen O’Grady, Redmonk Panel ..................................................... Committers Q&A, Lucene/Solr Roadmap 10:15 – 10:45 .......................................................................................................................................... BREAK 10:45 - 11:35 ........................................................................................ Technical Track Sessions 11:35 - 11:45 ........................................................................................................................................... BREAK 11:45 - 12:35 ........................................................................................ Technical Track Sessions 12:35 - 13:45 ....................................................................................LUNCH AND SPONSOR EXHIBITS 13:45 - 14:35 ........................................................................................ Technical Track Sessions 14:35 - 14:45 ........................................................................................................................................... BREAK 14:45 - 15:35 ........................................................................................ Technical Track Sessions 15:35 - 15:45 ........................................................................................................................................... BREAK 15:45 - 16:35 ....................................................................................... Technical Track Sessions 16:35 - 17:30 ......................................... Panel: “Search for Tomorrow (RDBMS for Yesterday)” 17:30............................................................................................................................ CONFERENCE ENDS

LUCENE REVOLUTION San Francisco 2011

LOGISTICS ! ! ! ! ! ! ! ! ! !

REGISTRATION is in the Grand Peninsula Foyer KEYNOTES and PANEL DISCUSSIONS are Grand Peninsula Ballroom D TRACK 1 is in Grand Peninsula Ballroom A/B/C TRACK 2 is in Grand Peninsula Ballroom D TRACK 3 is in Grand Peninsula Ballroom E/F/G TRACK 4 is in Sand Pebble A/B/C LUNCHES are in the Atrium (upstairs above Ballroom ) THE REVOLUTION PARTY is in the Grand Peninsula Foyer TRAINING CLASSES will be held in the Sandpebble Conference Rooms TRAINING REGISTRATION is outside the Sandpebble Conference Rooms (please contact charelm@gmail.com if are unsure which class you are in):

San Francisco 2011

LUCENE REVOLUTION

San Francisco 2011

LUCENE REVOLUTION

As the worldâ&#x20AC;&#x2122;s leading source of expertise in open source search technology and the commercial company for Apache Solr/Lucene, Lucid Imagination offers the products and services you need for cost-effective development and production deployment of cutting edge search applications that lower your cost of growth. Thousands of organizations around the world have turned to the power of Apache Solr/Lucene open source technology to drive their cutting-edge search applications.

LucidWorks: Enterprise Grade Solr/Lucene LucidWorks Enterprise is a flexible, cost-effective scalable platform that simplifies development, tuning, configuration and deployment of Solr/Lucene open source search technology. It features: POW ERFUL SEARCH

! ! ! !

Complete Apache Solr 4.x Release Integrated and tested with powerful enhancements Scalability Distributed search and indexing Cloud-Ready Centrally managed search replication and configuration REST API Simplifies integration

SIM PLIFIED ADM INSTRATION

! ! ! !

Easy-to-use Installer & Admin UI Streamlines startup and common configuration tasks Data Connectors for databases, file systems, Web sites, SharePoint and more Multiple file types MS Office, PDF, native XML format documents and more Security: LDAP-aware, document level, rolebased, policy-driven.

ADVANCED USER EXPERIENCE

! ! ! !

Enriched Query Parsing: more resilient interpretation of user input Click Scoring: boosts results based on user behavior User Alerts: Automatic notification of new results Integrated Auto-complete and spellchecking.

LUCENE REVOLUTION San Francisco 2011

Global Expertise: Training & 24x7 Services Lucid Imagination offers a deep bench of resources in search and open source, backed by unmatched experience with thousands of diverse search applications at the worldâ&#x20AC;&#x2122;s largest companies. TRAINING

A comprehensive selection of courses and classes for developers, system administrators, managers, and search application users on LucidWorks Enterprise, Solr and Lucene; instruction is offered in a variety of formats around the world. CONSULTING

Our unique ExpertLink Advisory Services provides consultative guidance on design and optimization for search applications during development and production to ensure your Lucene/Solr implementations meet the requirements of your business. ENTERPRISE SUPPORT AND SUBSCRIPTIONS

Lucid Imagination offers attractively priced subscriptions that deliver Solr/Lucene technology in an integrated, well-packaged format. Subscriptions combine stability, security, robust interfaces, and predictable release schedules with unmatched support resources in reach 24 x 7 x 365 across the globe.

San Francisco 2011

LUCENE REVOLUTION

Platinum Sponsor: Basis Technology Basis Technology provides software solutions for multilingual text analytics, information retrieval, and name resolution. Our Rosette© Linguistics Platform is the text analysis engine behind many commercial and government search-based applications, adding language support to Lucene and Solr for better search precision and recall in English or 27 other languages. Starting with language identification in 55 languages, our high quality linguistic analysis seamlessly integrates into Lucene and Solr via a connector — enabling customizable tokenization and stemming/lemmatization for languages like Chinese, Japanese, Arabic, and Persian. Dictionary-based decompounding is available in German, Dutch, Danish, Swedish, Norwegian, and Korean. Entity extraction enriches search by adding auto-generated metadata and faceted navigation to results. Implementing support for new languages to Solr is less than a day’s work. The Rosette Platform powers search, business intelligence, e-discovery, and other enterprise and government applications for customers worldwide including: Microsoft/Bing, Cisco, EMC, Endeca, Oracle, and Yahoo! !!!"#$%&%'()*")+,-

LUCENE REVOLUTION San Francisco 2011

Exhibitors SALESFORCE.COM

Salesforce.com is the enterprise cloud computing leader and the world’s 4th fastest-growing company. We’re also one of the “Best Places to Work” (FORTUNE). Salesforce.com’s Search Team is strong and experienced, with deep architecture expertise. We’re dedicated to delivering the fastest, most reliable cloud-scale enterprise search in the industry. In addition to innovating around scalability and security, we strive to delight our end users with an original, intuitive user experience and relevancy that’s adaptive, robust, and deeply satisfying. If you share our passion for search and for solving tough problems, swing by our booth to chat. !!!"%$.(%/+0)(")+,SEARCH TECHNOLOGIES

Search Technologies is the leading independent provider of search engine integration and support services. Operating internationally, we help clients to gain business advantage using search. Our technical team of more than 80 experts is the most experienced group of search implementation professionals globally, and this mitigates risk for our customers. In short, we are the experts at finetuning search applications to deliver business benefits. !!!"%($0)*'()*1+.+2&(%")+,DOCUM ILL

Documill is an independent software vendor (ISV) enabling browser-based access to Microsoft Office and PDF documents and empowering high volume server-side content processing solutions.Documill Visual Search dramatically improves search user experience and discoverability of multi-page documents. Instant document previews and page-level search results improve document data mining experience and accuracy. With page-level bookmarking features, Documill Visual Search enables collaborative search, allowing users to take actions based on their findings, share results and syndicate relevant pages into new documents. !!!"3+)4,&..")+,-

San Francisco 2011

LUCENE REVOLUTION

Community Sponsors SEM ATEXT

Sematext is a software products and services company focused on Search & Analytics using Lucene, Solr, Nutch, Hadoop, HBase, Flume, Mahout, and other open-source technologies. Sematext also offers Lucene & Solr technical support subscriptions, consulting packages, and training. The company also runs the popular search-hadoop.com and search-lucene.com sites. Founded in 2007 in New York, Sematext is privately held and self-funded with presence in North America and Europe. Sematext’s customers include The Library of Congress, Lockheed Martin, Simon & Schuster, Salesforce, NAVTEQ, Comcast, Cox Communications, ProQuest, Citysearch, Gilt Groupe, Autodesk, and many others. !!!"#$%&'$('")*%+ EM C CORPORATION

EMC Corporation is the world’s leading developer and provider of information infrastructure technology and solutions that enable organizations of all sizes to transform the way they compete and create value from their information.We can help you design, build, and manage flexible, scalable, and secure information infrastructures. And with these infrastructures, you’ll be able to intelligently and efficiently store, protect, and manage your information so that it can be made accessible, searchable, shareable, and, ultimately, actionable.In short, with an information infrastructure, you can avoid the potentially serious risks and reduce the significant costs associated with managing information, while fully exploiting its value for business advantage. !!!"$%)")*%+ SPRINGSOURCE, A DIVISION OF VM W ARE, INC.

SpringSource, a division of VMware, Inc., (NYSE: VMW), employs the open source leaders who created and drive innovation for Spring, the de facto standard programming model for enterprise Java applications, as well as the Java and web thought leaders within the Apache Tomcat, Apache HTTP Server, RabbitMQ, Hyperic, Groovy and Grails open source communities. SpringSource forges open source innovations to create lean and powerful technology that people love to use. From high productivity developer tools and framework to lightweight application server runtimes including data management solutions for the hardest enterprise and cloud scale problems, SpringSource provides solutions for tomorrow’s enterprise challenges. !!!"#,-./0#*1-)$")*%"+

LUCENE REVOLUTION San Francisco 2011

M ANNING PUBLICATIONS

Manning Publications offers computer books for professionals—programmers, system administrators, designers, architects, managers and others. Manning’s focus is on computing titles at professional levels. We care about the quality of our books. Our books are designed without gimmicks. Their main goal is elegance and readability—we feel the two are often the same. Our covers are understated, decorated with pictures of worldwide regional dress habits of two hundred years ago. Many of our books come with online reader support: authors answer the questions of their readers in our Web-based Author Online discussion forums. -

!!!",$11&12")+,-

DZONE

DZone is a social linking and blogging network for the developer and IT communities. According to PC Magazine, “DZone is a developer’s dream—a vast network of user-submitted links to message boards, news, coding tricks, and more.” Launched in June, 2006, DZone is in Alexa’s top 3000 sites, surpassing established leaders like DevX, Sys-con, FTP Online and TheServerSide.com. DZone is the only vertically focused site regularly listed among the web’s largest social bookmarking sites. In its first year of operation DZone sent over 5 million visitors to other developer websites. Today, DZone has curated topic pages for Java, Solr/Lucene, Cloud Computing, PHP, Agile, Mobile, and much more. !!!"37+1(")+,TNR GLOBAL

TNR Global is a systems design and integration company focused on enterprise search and cloud computing solutions. TNR develops scalable, fault-tolerant web-based search solutions built on the open source LAMP stack and utilizing Amazon Web Services and/or physical servers. TNR has over ten years of experience in web systems and enterprise search implementations, both proprietary and open source, and specializes in Lucene Solr and FAST ESP search applications. TNR Global builds solutions for: Vertical Search Engines, Publishing, Web Directories, News Sites, Information Portals, Web Catalogs, Education. We also work with web based startups to build scalable services. !!!"'102.+#$.")+,UCHIDA SPECTRUM

Uchida Spectrum, Inc. (USI) is a leader in the Japan search market. USI provides SMART/Insight, a search application that integrates and analyzes enterprise information. SMART/InSight is used by leading blue chips, like Canon and Moody’s. USI is working with Lucid Imagination as its Strategic Alliance Partner to integrate LucidWorks Enterprise into its products and offer Lucene/Solr support services. In 2011, USI expanded its offerings to Enterprise Search and Web Services/Ecommerce companies across Asia. USI now serves clients and partners in Japan, India, China and Singapore. !!!"%6()'04,")+"86-

San Francisco 2011

LUCENE REVOLUTION

Scaling Search With Big Data And Solr Scaling Search with Big Data and Solr is a 2-day instructor-led, hands-on classroom training course delivered by instructors certified by Lucid in a shared classroom setting. The class is for Solr developers who want to know how to leverage the flexible search functionality of Apache Solr and the Big Data processing of Apache Hadoop, to create the indexes for both general search and augmented data analytics. Lab exercises and real-world examples will be used to reinforce content. We’ll start with Hadoop from the ground up, and cover MapReduce, HDFS—the Hadoop Distributed File System, cluster management, “the shuffle,” etc., before continuing on to connecting it to Solr. We’ll look at common use cases for generating search indexes from big data, typical patterns for the data processing workflow, and how to make it all work reliably at scale. We will explore in-depth an example of processing 1 billion records to create a faceted Solr search solution. You’ll learn how Solr can be used as a NoSQL solution, and how it compares to classic NoSQL projects such as Cassandra and HBase. The class will continue with techniques for scaling your Solr installation, how to identify bottlenecks in your Solr installation, how to monitor your installation, and how determine resource usage. We’ll also cover various Solr architectures, their characteristics and use cases. We’ll examine how to apply these to make appropriate tradeoffs to effectively scale your Solr installation. THE COURSE COVERS

! ! ! ! ! ! ! ! !

An overview of Hadoop. Understanding MapReduce. Principles of Hadoop development, operations & eco-system. How to use Hadoop with Solr. How to Index large volumes of data. How to effectively search large indexes. Understanding NoSQL. How to shard/federate/replicate your data for large indexes. Understanding resources cost & tradeoffs for Solr Features.

PREREQUISITES

Prospective students should be familiar with Solr, obtained either through work experience with Solr, or having completed the Lucid Imagination Solr training course. It is assumed the student does not have prior Hadoop experience.

LUCENE REVOLUTION San Francisco 2011

Developing Search Applications With Lucidworks Enterprise Developing Search Applications with LucidWorks Enterprise is a 2-day instructor-led, hands-on classroom training course designed and developed by the engineers that developed LucidWorks Enterprise (LWE), and delivered by instructors certified by Lucid in a shared classroom setting. The objective of this course is to introduce LucidWorks Enterprise to users with no previous experience working with search applications. Through a combination of lectures and hands-on lab exercises you will learn how to get up and running with LucidWorks Enterprise, what the components of a search application are, and how to make your content searchable and findable in a search application built on LucidWorks Enterprise. There will be time for questions and discussion to enhance your learning experience. At the end of the course you will know what a search application is, and how to set up and use LucidWorks Enterprise to index and search your content. You will also learn about all of the features LWE such as highlighting, spell checking, and custom alerts, and how to use these features to build a satisfying search experience for end users who will search your content. THE COURSE COVERS

! ! ! ! ! ! !

What a search application is and how to build one with LucidWorks Enterprise. How to install and configure LWE. How to make your content searchable and findable. How to work with different data sources such as web pages, relational databases, and rich content files. How to build queries to search for content in LWE. Techniques and features in LWE that can be used to make results for end users more relevant. Different ways to process search results returned by LWE.

PREREQUISITES

No programming skills are necessary, however some technical background and familiarity with application development will be helpful. There will be labs accompanying the lectures that will require basic computer skills including how to run a simple command from the command line.No previous experience with search applications is necessary.

San Francisco 2011

LUCENE REVOLUTION

Solr Application Development Workshop Solr Application Development Workshop is a two-day hands-on training course designed and developed by the engineers that helped write the Apache Lucene/Solr code, and delivered by instructors certified by Lucid in a shared classroom setting. The workshop is targeted at developers who want to build applications with Apache Solr, the Lucene Search Server. You will learn how to set up and use Solr to index and search, how to analyze and solve common problems, and how to use optional Solr modules such as facets, spell check, and highlighting. Lab exercises and real-world examples will be used to reinforce content. There will be time for questions and discussion to enhance your learning experience. At the end of the course you will understand how to set up and use Solr to index and search, how to analyze and solve common problems, and how to use optional Solr modules such as facets, spell check, and highlighting. THE COURSE COVERS

! ! ! ! ! ! ! ! ! ! ! !

Principles of search application development Common search use cases and their application How to make content searchable Key Solr and Lucene concepts Basics of indexing and searching using Solr How to design and run a Solr application Best practices for indexing, searching and performance Techniques to analyze and resolve common search problems How to leverage Solrâ&#x20AC;&#x2122;s optional modules including spell checking, highlighting, Data Import Handler, Tika Integration and other popular capabilities Advanced topics in designing Solr apps and running a site Solr operations and deployment tools and strategies How to customize and extend Solr

PREREQUISITES

Some programming skill and experience with a modern programming language such as Java, PHP, Perl, Ruby, .NET, or any language that supports HTTP and/or XML.

LUCENE REVOLUTION San Francisco 2011

Lucene Application Development Workshop Lucene Application Development Workshop is a two day instructor-led hands-on training workshop, written and led by the engineers who helped write the Apache Lucene/Solr code. The objective of this course is to provide you with real life use cases and teach you how to apply Lucene to real business requirements. During the course you will learn to apply best practices in developing scalable, highly available and high performance search applications. There will be time for questions and discussion to enhance your learning experience. THE COURSE COVERS

! ! ! ! ! ! ! ! !

Principals of search application development. Common search use cases and their application. How to make content searchable. Key Lucene concepts. Basics of indexing and searching with the Lucene APIs. Best practices for indexing, searching and performance. Analysis techniques for solving common search problems. Lucene Internals. Luceneâ&#x20AC;&#x2122;s optional modules to enable spell checking, highlighting and other common search features.

PREREQUISITES

Basic Java programming skills

San Francisco 2011

LUCENE REVOLUTION

The Once and Future History of Enterprise Search and Open Source M ARC KRELLENSTEIN | LUCID IM AGINATION

While it remains challenging to build best practice search applications, core search technology has become commoditized. Open source Lucene/Solr represents the best form of that commodity, as good as or better than any commercial search technology while also providing the cost, control and flexibility advantages of open source. In this talk, we’ll look at how past challenges in search were met and new ones evolved, and the place of Lucene/Solr in that evolution.

From Publisher To Platform: How The Guardian Embraced the Internet using Content, Search, and Open Source STEPHEN DUNN | GUARDIAN NEW S AND M EDIA UK

In 2009 The Guardian launched The Open Platform, a suite of services and tools that enable content partners and developers to build applications with The Guardian’s rich content. The content API, hosted on Solr instances on EC2, contains JSON representations of all Guardian articles back to 1999 - over 1 million articles, and is an increasingly complete representation of the output of the organization. The DataStore contains curated data sets for use in applications and virtualizations. This talk will cover how The Guardian opened up their business, enriched it, and reached new markets with its Open Platform strategy. Stephen will cover the technical architecture, implementation of Solr (the key technology powering the platform), and how The Guardian has used it to embrace disruption in the media space, while finding new sources of revenue and innovation. With two years since its launch, Stephen will cover some of the lessons learned, and explain how the Guardian complements use of Solr with other open-source non-relational technology, as it platform evolves.

All Data Big and Small STEPHEN O’GRADY | REDM ONK

The last twenty four months have seen a veritable explosion in discussion around what is commonly referred to as Big Data and the infrastructure technology employed to manage it. The wealth of available open source software means that businesses from any industry have easily accessible tools with which to tackle projects that would have been out of their reach just a few years prior. Less heralded, however, has been the fact that making data actually useful - whatever its size - remains a challenge. In this session we’ll explore the role of search in putting data - big and small - to work answering the important questions for businesses and society by reducing the friction between question and answer. 18

LUCENE REVOLUTION San Francisco 2011

Integrating Advanced Text Analytics into Solr STEVE KEARNS | BASIS TECHNOLOGY

Text analytics provides a number of interesting analytic capabilities that can enhance enterprise search applications, though in practice it is not always obvious how these can be integrated effectively into Solr. This presentation will describe some of the practical ways that leading organizations are using text analytics by integrating them directly into Solr and their user interface to improve relevance, navigate results, and discover new information. The combination of Solr and quality text analytics can improve existing keyword search solutions, and enable new ways of discovering knowledge hidden in existing data.

Finite State Automata in Lucene: Internals and Applications DAW ID W EISS | POZNAN UNIVERSITY OF TECHNOLOGY, POLAND

Finite state automata and transducers made it into Lucene fairly recently, but already show a very promising impact on search performance. This data structure is rarely exploited because it is commonly (and unfairly) associated with high complexity. During the talk, I will try to show that automata and transducers are in fact very simple, their construction can be very efficient (memory and time-wise) and their field of applications very broad. This will be backed by an introduction to how FSTs are implemented in Lucene (construction and traversals) and practical use cases of where FSTs have been useful so far. If youâ&#x20AC;&#x2122;d like to see how to squeeze a 150MB of text data into 1.8MB of compact data structure, this talk is for you.

Case Study - Panasonic Europe Powered by Apache Solr DANIEL POTZINGER | AOE M EDIA GM BH

In 2010 Panasonic made the decision to replace their legacy enterprise search tool and switched the search for all their European websites to a Apache Solr based solution. Now their customers benefit from an incredibly fast and feature rich solution that is much more than just a search and has become a valuable sales-driving tool for Panasonic. Features like relevancy manipulation, autosuggest, contextual filtering for properties like color or product category were implemented under not the most ideal circumstances mainly that there was no access to structured data. The search was rolled out in close to 30 countries so far also putting Solr multi-lingual handling to a test.

San Francisco 2011

LUCENE REVOLUTION

Real-time Search at Yammer BORIS ALEKSANDROVSKY | YAM M ER, INC.

This talk will be focused on the architecture, scalability concerns, performance bottlenecks, operational characteristics and lessons learned while designing and implementing Yammer distributed real-time search system. Yammer is an enterprise social network SaaS offering with over 100,000 networks (including 85% of the Fortune 100) and nearly 2 million users. The search system we developed scales well up to 1B messages and serves a foundation of knowledge base analysis services Yammer is developing.

Boosting Documents in Solr by Recency, Popularity and Personal Preferences TIM OTHY POTTER | NATIONAL RENEW ABLE ENERGY LABORATORY (NREL)

Attendees with come away from this presentation with a good understanding and access to source code for boosting and/or filtering documents by recency, popularity, and personal preferences. My solution improves upon the common “recipe” based solution for boosting by document age. The framework also supports boosting documents by a popularity score, which is calculated and managed outside the index. I will present a few different ways to calculate popularity in a scalable manner. Lastly, my solution supports the concept of a personal document collection, where each user is only interested in a subset of the total number of documents in the index. My presentation will provide a good example of how to filter and/or boost results based on user preferences, which is a very common requirement of many Web applications.

Jazzed about Solr: People as a Search Problem JOSHUA TUBERVILLE | EHARM ONY

Search oriented architectures are obvious approaches for web pages, emails, documents, and other text based entities. Often with traditional structured data, text searching is “added on” to the traditional Boolean queries in relational stores. When Jazzed was initiated we wanted search to be front and center. When we evaluated Solr we realized we could take the opposite approach “add on” Boolean components to textual searches. This hybrid query approach makes transitioning to flexible ranking easy and straightforward. In this talk we will cover ! ! ! ! !

How we model semi-structured user data in Solr Indexing strategies and their tradeoffs Where in Jazzed architecture Solr does and doesn’t fit What aspects of Solr we are using Future considerations

LUCENE REVOLUTION San Francisco 2011

Heavy Committing: DocValues aka. Column Stride Fields in Lucene 4.0 SIM ON W ILLNAUER | APACHE LUCENE PM C

Lucene 4.0 is on its way to deliver a tremendous amount of new features and improvements. Beside Real-Time Search & Flexible Indexing DocValues aka. Column Stride Fields is one of the “next generation” features. DocValues enable Lucene to efficiently store and retrieve type-safe Document & Value pairs in a column stride fashion either entirely memory resident random access or disk resident iterator based without the need to un-invert fields. Its final goal is to provide a independently update-able per document storage for scoring, sorting or even filtering. This talk will introduce the current state of development, implementation details, its features and how DocValues have been integrated into Lucene’s Codec API for full extendability.

Search, APIs, capability management and the Sensis journey CRAIG REES | SENSIS

Earlier this year, Sensis launched its Business Search API, which allows publishers to develop local search propositions powered by the two million business listings contained in the Australian Yellow Pages® and White Pages® directories. This case study will explore Sensis’ strategic direction for search and explain how the framework and metrics by which search is managed at Sensis were used to define our search roadmap. Key architectural decisions including our use of Solr and MongoDB will be discussed as well as our approach to real-time search tuning and quality management.

A Study of I/O and Virtualization Performance with a Search Engine based on an XML database and Lucene ED BUECHE | EM C

Documentum xPlore provides an integrated Search facility for the Documentum Content Server. The standalone search engine is based on EMC’s xDB (Native XML database) and Lucene. In this talk we will introduce xPlore and some of its key components and capabilities. These include aspects of a tight integration of Lucene with the XML database: xQuery translation and optimization into Lucene query/API’s as well as transactional update Lucene). In addition, xPlore is being deployed aggressively into virtualized environments (both disk I/O and VM). We cover some performance results and tuning tips in these areas.

San Francisco 2011

LUCENE REVOLUTION

Four Pillars of Designing the Search Experience TYLER TATE | TW IGKIT

Lucene and Solr provide many excellent tools for presenting information to users, but what makes some search user interfaces better than others? Should you aim for a rich, advanced UI or should you “just make it look like Google”? Through his work at TwigKit with blue-chip corporations, scientific institutes, and governments, Tyler has identified four guiding pillars of the search experience: ! User Expertise - Novices orienteer, experts teleport ! User Behaviour - Lookup, learn, and investigate ! Information Diversity - homogenous vs. heterogenous data ! Situational Context - factors from the surrounding environment We’ll delve deep into each dimension and discuss how to achieve useful, useable, and beautiful search interfaces using design patterns including: autocomplete, faceted navigation, breadcrumbs, best bets, related searches, spelling suggestions, clickable metadata, result clustering, saved searches, data visualisation, and more.

Using Solr in Online Travel Shopping to Improve User Experience ESTEBAN DONATO, SUDHAKARA KAREGOW DRA AND RAM ON RESM A | TRAVELOCITY

In this talk we would like to present three different use cases of Solr in the travel industry. First of all we would describe how we implemented faceted navigation for hotel shopping. Then, we will introduce how we implemented destination searching functionality like auto-complete and misspelling. Lastly, we will show you how we integrated Solr to provide better experiences to mobile users.

Solr @ eBay Kleinanzeigen OLAF ZSCHIEDRICH | EBAY.DE

Attendees will learn how eBay Germany has implemented Solr, why Solr was selected, which Solr features are utilized. and how Solr is configured and used in production. Recommended best practices will be profiled alomng with eBay Kleinanzeigen plans for future deployment of Solr.

LUCENE REVOLUTION San Francisco 2011

Rapid Prototyping with Solr ERIK HATCHER | LUCID IM AGINATION

Got data? Let’s make it searchable! This interactive presentation will demonstrate getting documents into Solr quickly, will provide some tips in adjusting Solr’s schema to match your needs better, and finally will discuss how showcase your data in a flexible search user interface. We’ll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.

Search Analytics: What? Why? How? OTIS GOSPODNETIC | SEM ATEXT

You’ve indexed your data and people are searching it. But how do you know if they are happy with the results? How do you know if they are finding what they need? With search increasingly becoming the primary information access mechanism, knowing how your search is doing is not just a matter of mere curiosity, but often has direct business impact. In this talk we’ll talk about Search Analytics and how it can be used to answer questions like: ! Are too many users getting the dreaded “no matches” results? ! How deep into search results do people dig? ! Which hits are they clicking on, or what percentage of them don’t click on any hits? ! How much do they use the Did You Mean or Auto-Complete suggestions? We’ll explore what specific Search Analytics reports tell us and what specific actions you should take based on those reports.

San Francisco 2011

LUCENE REVOLUTION

“Stump The Chump”: Get On The Spot Solutions To Your Real Life Solr/Lucene Challenges GRANT INGERSOLL | LUCID IM AGINATION

Got a tough problem with your Solr or Lucene application? Facing challenges that you’d like some advice on? Looking for new approaches to overcome a Lucene/Solr issue? Not sure how to get the results you expected? Don’t know where to get started? Then this session is for you. Now, you can get your questions answered live, in front of an audience of hundreds of Lucene Revolution attendees! Back again by popular demand, “Stump the Chump” at Lucene Revolution 2011 is hosted by PMC chairman and Lucid Imagination co-founder Grant Ingersoll. All you need to do is send in your questions to us here at info@lucenerevolution.org. You can ask anything you like, but consider topics in areas like: ! Data modelling ! Query parsing ! Tricky faceting ! Text analysis ! Scalability You can email your questions to info@lucenerevolution.org. Please describe in detail the challenge you have faced and possible approach you have taken to solve the problem. Anything related to Solr/Lucene is fair game. Our MC will read the questions, and Grant will have to formulate a solution on the spot. A panel of judges will decide if he has provided an effective answer. Prizes will be awarded by the panel for the best question—and for those deemed to have “stumped the chump”.

LUCENE REVOLUTION San Francisco 2011

Improve Relevance by Using Morphology and Named Entity Recognition CHRISTOPH GOLLER, DIRECTOR, RESEARCH | INTRAFIND SOFTW ARE AG

This talk will show how the relevance of search results can be improved by using morphology and named entity recognition. After briefly explaining the purpose of morphological analysis and of named entity recognition we will analyze their potential advantages for search, faceting, and clustering of search results. Based on these ideas we will briefly sketch details how to implement a morphological analyzer in Lucene and how to implement a natural language question answering system based on Lucene using named entity recognition. The talk will be accompanied by a life demo of these ideas. BIO: Christoph Goller has more than 10 years of experience in the search industry. He got a Ph.D in computer science from the Technical University of Munich where he worked in several research projects on artificial intelligence, machine learning and neural networks. Christoph started his career at Lernout & Hauspie. Since 2002 he has been Director Research of Intrafind Software AG (www.intrafind.de), a German company specializing in full-text search and text mining based on Lucene/Solr. Christoph has been a Lucene committer since 2004. He has accompanied dozens of commercial projects using Lucene and Solr. Christoph is author of more than 15 scientific papers, frequently gives presentations on search related topics and is responsible for partner training at Intrafind.

Scientific Data Search in the Pharmaceutical Industry with Solr JEFFREY GUO, CEO | SEM TIFIC SOFTW ARE, INC.

Tremendous amount of experimental information and scientific knowledge has been locked or lost in data silos in the forms of semi-structured or unstructured data in todayâ&#x20AC;&#x2122;s pharmaceutical industry. Out of the box full text search engines do not understand embedded scientific terms and objects and their relationships to facilitate context sensitive and relevant searches. This presentation will discuss a successful implementation at a major pharmaceutical company that utilizes Solr as enterprise search platform and enhances it with chemistry (molecular entities and reactions) search capabilities. The scope of the document indexing process is expanded to cover embedded chemistry objects and terms of various types such as common chemical names, corporate IDs, SMILES, and InChI from documents. Scientifically aware search based on query structure drawing or chemical terms is therefore enabled. Enterprise scientific search strategies and lessons learned will be discussed during the presentation. Bio: Founder of Semtific Software, Inc., a company that provides products and services that streamline drug discovery workflow and enterprise search of scientific research data. 25

San Francisco 2011

LUCENE REVOLUTION

Using Lucene’s Test Framework ROBERT M UIR | LUCID IM AGINATION

The Lucene/Solr community takes testing seriously: we have a suite of over 3500 tests to ensure software quality. Over time we accumulated some useful extensions to JUnit testing, and several people found themselves using our extensions for other projects. We released this “test framework” for the first time in Lucene 3.1, and this talk is a short summary of its feature list to hopefully encourage you to go check it out for yourself. Find out how you can: ! Improve test coverage for custom Lucene components. ! Speed up your unit test suite by running tests in parallel ! Find resource leaks, localization or timezone-sensitive bugs in your application ! Use our extensions to make unit tests easier to write. Bio: Robert Muir, software engineer for Lucid Imagination, us a Lucene/Solr committer & PMC member.

Using Apache Solr and Active Directory to unify data access across Intranet, ERP and Filesystem Cluster ROBERT W EIßGRAEBER, PROJECT DIRECTOR | LIGHTW ERK

Solr is tightly linked into all available data and business intelligence sources in the enterprise: Indexing the TYPO3 CMS-based Intranet, downloads, forms, handbooks, an Oxaion based ERPDatabase, and the file system Cluster running Microsoft Distributed File System – using TIKA for full-text content extraction. All data is connected via ActiveDirectory servers into user based finegrained access control lists, which are evaluated in real-time and early-binding mode by Solr. A worldwide Solr-Cluster using different shards gives additional security for world-wide deployment, e.g. keeping confidential data inside the headquarters own data centers. Bio: Robert Weißgraeber is Project Director at Lightwerk, primary specialized in designing, planning and executing corporate portals.

LUCENE REVOLUTION San Francisco 2011

Thousands of Indexes in the Cloud SHANEAL M ANEK, LEAD SEARCH ENGINEER | GREPLIN

Indexes at Greplin are strange - instead of having one giant index that is searched all the time and updated infrequently, there are thousands of relatively small indexes that are updated much more frequently than they are searched. These unorthodox requirements lead to an unorthodox architecture that uses techniques inspired by Zoie and Bobo. We will discuss techniques that allowed us to exploit the inherent shardability and access patterns of our data to build an extremely high throughput information retrieval architecture. We will also examine some of the challenges and opportunities presented by running Lucene on Amazonâ&#x20AC;&#x2122;s Elastic Compute cloud. Bio: Shaneal Manek is the lead search engineer at Greplin. He was previously the founder and CTO of Signpost.com, which built a geospatial search and recommendation engine on top of Lucene and Lisp.

San Francisco 2011

LUCENE REVOLUTION

Intuit’s Live Community FLOYD M ORGAN | INTUIT

TurboTax Live Community is a large-scale web application that uses user contribution and open source technology to assist millions of TurboTax users complete their tax returns. Other benefits from Live Community include reducing support calls, highly effective advertising campaigns, usability engineering and new for this year conversion prediction analytics. I will present how Solr/Lucene powers the many facets of TurboTax Live Community now in the future.

Highly Relevant Search Result Ranking for Large Law Enforcement Information Sharing Systems RONALD M AYER | FORENSIC LOGIC

Law enforcement data has many interesting complexities for search. Cross-agency searches are even more challenging because each agency has its own shorthand. Many different types of similarity between search clauses and documents should influence the ranking of results. For example, a search clause mentioning a “tall suspect” might want to include results with “6 foot 4 suspect”. Spatial clusters are important, as are temporal patterns. Different fields may be more or less important depending on the type of crime—for example, a victim’s race may matter more than a vehicle’s make in a sex crime but less in an auto theft. Also, documents may be related to each other in various ways that may also affect their ideal search ranking. Solr’s great flexibility in its analyzers, filters, synonyms, and boosting make it excellent tool for such diverse requirements. We’ve contributed a patch to Solr (#SOLR-2058) that helped further improve search result ranking for cases where a search for a suspect with a “red baseball cap, black leather jacket” is compared against many documents mentioning red caps, black caps, etc. This presentation will describe how we addressed some domain-specific challenges of our data.

Using Solr/Lucene/LWE for eCommerce GRANT INGERSOLL | LUCID IM AGINATION

If your user can’t find it, they can’t buy it right? In this talk, Apache Lucene and Solr committer Grant Ingersoll will discuss architecture, techniques and tips for successfully deploying search tools like Lucene, Solr and LucidWorks Enterprise in eCommerce environments.

LUCENE REVOLUTION San Francisco 2011

Flexible Indexing in Lucene 4.0 UW E SCHINDLER | SD DATASOLUTIONS

Apache Lucene’s next major release, 4.0, will introduce lots of flexibility into indexing, but also fundamental changes to the well-known APIs: It features a new and consistent, 4-dimensional iteration API on top of a low-level, pluggable codec API giving applications full control over the postings data. Terms are now arbitrary opaque bytes enabling users to store terms in any encoding, not necessarily UTF-8, natively in the index (e.g. numeric fields). Currently under development is a higher performance postings iteration API, enabling interesting codecs based on recent encoding algorithms to work effectively. Several codecs have already been created, including the default “standard” codec, which enables sizable RAM reduction for searchers, and a “pulsing” codec that inlines postings data directly into the terms dictionary, which provides a solid performance boost for primary key fields. A lot of new codecs are under development like “PFOR”, “FOR”, “AFOR”, or “Simple64”. In this talk, Uwe presents an overview of all of these exciting changes, as well as several concrete, real-world examples of how applications can tap into these new features.

Transforming the House Hunting Experience: How Solr is Helping Trulia Reshape the Real Estate Industry ALEXANDER KANARSKY | TRULIA

Trulia is a real estate search company that helps customers find homes for sale or to rent and provides them with information to help them make better decisions in the process. It is also a hub for real estate professionals to market their listings, view real estate data and promote their services. The presentation describes how Solr helped Trulia to transform the traditional real estate experience and make real estate data accessible and understandable to millions of users. It discusses approaches we took to achieve this by using custom-built distributed index management, indexing integration with Hadoop and geospatial search enhancements to Solr.

San Francisco 2011

LUCENE REVOLUTION

Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platform TREY GRAINGER| CAREERBUILDER

For CareerBuilder, a 1% deviance in search relevancy can mean millions of missed job opportunities for our users. When CareerBuilder moved to Solr from an expensive, proprietary search vendor, our top priorities were maintaining the quality of our search results and drastically improving our agility. This talk will describe how we addressed both needs. For search quality, we’ll cover some of our internal studies and resulting methods for dealing with multi-lingual content across dozens of languages, as well as customizing and experimenting with relevancy calculations. For platform agility, we’ll discuss CareerBuilder’s cloud-like search API framework which seamlessly handles millions of searches an hour, processes hundreds of millions of documents, and is powered by hundreds of globally-distributed servers. Come hear the results of our studies and some best practices for quality and performance. Learn how our framework has lead to staggering improvements in both maintainability and technology innovation, allowing us to learn from our content, not just find it.

Handy Installation Tool “Anuenue” for Solr Cluster & Implementation of “Did you mean” Facility for Queries in Japanese TAKAHIKO ITO| M IXI

mixi is one of the largest social networking services in Japan, providing various communication services for over 14M monthly active users. The latest internal mixi project is to replace the in-house search engine with Apache Solr. This session covers two topics a simple packaging system for Solr that eases the installation process and daily operations, and implementation of a “Did you mean” facility for Japanese queries using a log mining tool. These tools have been released as OSS projects.

Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise ANDRZEJ BIALECKI | LUCID IM AGINATION

This talk will present what are click-through events and how to process them with LucidWorks Enterprise. This innovative technique puts powerful search and relevancy at your fingertips—at a fraction of the time and effort required to program them yourself with native Apache Solr. Andrzej will discuss and present how you can use LucidWorks Enterprise for: ! ! !

Click Scoring to automatically configure relevance for most popular results Simplified implementation of auto-complete and “did-you-mean” functionality Unsupervised feedback to automatically provide relevance improvement on every query

LUCENE REVOLUTION San Francisco 2011

Using Solr to find the Right Person for the Right Job LAURA KANG | THELADDERS

In this talk, we’ll describe how TheLadders.com uses Lucene/Solr to instantly recommend candidates to a recruiter when he/she posts a job on the recruiter site. Our matching algorithm scores candidates from our job seeker site based on the criteria and description of jobs and job seekers’ resume and profile data. This helps recruiters quickly identify candidates that are right for the job and increases the chance of our job seekers getting hired. The talk covers an overview of our Solr architecture and a description of our matching algorithm. We’ll also a discuss criteria for evaluating the algorithm, including an overview of our testing sessions and their format. Finally, we’ll also demo the feature so you can see how it works in practice.

Using Solr For Enabling Highly Customized Sitewide Navigation SHANTANU DEO | AT&T

The organization needed to enable a very customizable form of Global Navigation for the various types of users (based on their profile and other factors). This would normally have involved complex logic to figure out the appropriate set of links to show for a customer, and would have been a maintenance nightmare. Instead we approached the problem as a search problem. Coupled with a novel encoding scheme we were able to solution the problem simply by searching on the customers profile groups and return a coherent global navigation using Solr to index the data. This has resulted in a very simple to understand and maintain solution that will stand in good stead in the future. The presentation is meant to be a description of using Solr to implement a real-world application.

Building Specialized Industry Applications Using Solr, And Migration From FAST ESP RAHUL AGARW ALLA | UCHIDA SPECTRUM INC.

Uchida Spectrum, Inc. is a leader in the Japan search market. USI provides SMART InSight, a search application used by many Fortune 500 companies for specialized industry applications like R&D and quality assurance for manufacturing, claims and customer management etc. Originally SMART/InSight was based on Microsoft FAST. This talk will review how SMART/InSight has migrated from FAST ESP to LucidWorks Enterprise, and how SMART/InSight incorporates virtual data integration, enterprise search, and the ability for users to have a unified way to navigate diverse data sources, analyze data more easily, and personalize results. Several use cases will be profiled with demonstrations of real-world use cases.

San Francisco 2011

LUCENE REVOLUTION

The Seven Deadly Sins of Solr JAY HILL | LUCID IM AGINATION

Sloth. Greed. Pride. Lust. Envy. Gluttony. Wrath. Getting started with Solr can present some pitfalls and temptations, often turning into a trial and error process. (Confess - some or all of these may have been part of your development project.) Based on a broad swath of experience across Solr implementations running in some of the largest Fortune 500 companies as well as some of the smallest start-ups, this talk will cover common mistakes made by newbies and even veteran developers—and how to avoid them. You’ll learn how best to face the challenges that can occur either when starting out with a new Solr implementation, or in keeping up with the latest improvements and changes.

Advanced Search and Analytics in 20 Minutes M ARK DAVIS | KITENGA

Kitenga’s ZettaVox and ZettaSearch products support Solr and Lucene ecosystems at both the ingestion point and for the search user. In this talk, I will show how ZettaVox, our professional content mining platform on Hadoop, can be used to index content and rich metadata into a LucidWorks Enterprise installation. Being built on Hadoop, ZettaVox scales up by scaling out. I will then create an end-user search and analytics experience using our ZettaSearch solution that leverages the faceted metadata to enhance information discovery and analysis. All in about 20 minutes.

Building SaaS Solutions for Online Media Using Apache Solr ALBERTO M IJARES | CANOO ENGINEERING AG

SaaS applications have the advantage of remote web deployment that can be instantaneously be used by potentially any consumer in internet, or of the cost reduction that a Web-based deployment provides. The speaker explains in this talk the architecture of an innovative SaaS solution built for Axel Springer media group (Switzerland). This application can extracting remotely the content of multiple online newspaper articles, analyze them and classify them, determining which articles are the most similar to a given one, and integrating back into the article to provide the user with a “related articles” feature. The core components of the analysis process are: language-specific tools (used to filter the superfluous language terms) and semantic knowledge bases (like Wikipedia, used to enrich the indexed information with new context specific terms, or to disambiguate the extracted terms). In a more technical layer, the speaker will explain the criteria to select the emerging enterprise search framework Apache Solr as platform and how it reduced drastically the development effort required.

LUCENE REVOLUTION San Francisco 2011

Solr Performance: Key Innovations YONIK SEELEY | LUCID IM AGINATION

Recent developments in Solr/Lucene have made significant contributions to distributed search processing, scalability, and throughput. In this talk, Yonik Seeley, creator of Solr, will survey key performance strategies for building search applications with Solr, and review innovations included in Solr 3.1, as well as forthcoming development work in Solr 4.0 and beyond.

Solr and Lucene at Etsy GREGG DONOVAN | ETSY

Etsy is using Solr and Lucene to serve queries at a rate of more than 8 billion per year (and growing). In this case study, we will describe how Etsy has integrated Solr/Lucene into our continuous deployment infrastructure (see: http://codeascraft.etsy.com/2010/05/20/quantum-ofdeployment/), allowing for Solr configuration, Java-based indexers, and query parsing logic to go from passing tests to production code in minutes. We’ll also discuss how we’re leveraging Solr’s new Geo-search to power both local item search and GeoIP-personalized location autosuggest. We’ll also share how we’ve extended Solr, adding personalized faceting and filtering as well as multicurrency sorting and filtering that accounts for real-time currency fluctuation (contributed in SOLR2202) Note that code will be open-sourced/contributed for both of these features]. We will share our real-time monitoring techniques, including how we track Solr replication, query, and GC times in Ganglia. Finally, we’ll discuss how we’ve used Hadoop-based user analytics to improve relevance and power data-driven spelling corrections, autocomplete suggestions, and related searches.

San Francisco 2011

LUCENE REVOLUTION

Lucene @ Yelp SUDARSHAN GAIKAIW ARI | YELP

This talk describes how the Yelp uses Lucene to provide search services. It includes ! !

Statistics of Yelp search usage Overview of Yelp search architecture: Yelp uses different services to provide searches for different types of data. Some are based on Lucene and some on Solr ! Deeper dive into business and review search. This is the most important search service at Yelp. We will cover: ! ! ! !

Yelp’s implementation of a micro sharded architecture and differences with Katta. Yelp extensions to Lucene to implement features such as filters and performance comparison with solr/Bobo Yelp’s implementation of index replication. Various tricks used at Yelp to make the service faster.

Using Solr Cloud to Tame an Index Explosion JON GIFFORD | LOGGLY

We have hundreds of customers, each of whom may have dozens of shards. To manage this explosion of indexes, I’ll describe how we’re using Solr Cloud to manage every index - from creation, through migration from box to box, and finally destruction. I’ll describe some of the performance issues we had to deal with, especially with ZooKeeper.

Lots of Facets, Fast ANNE VELING | BEYONDTREES

We created a web application for a well-known US newspaper, to create a maps-like zooming application on top of the 60,000 newspapers since 1850 and using Solr over the 28,000,000 articles to create an interactive heatmap over it. The out-of-the-box faceting solution was optimized using domain knowledge by order-of-magnitude which allowed us to create a great visual way of exploring trends in historical newspapers.

LUCENE REVOLUTION San Francisco 2011

CPython Embedded in Solr - Search Solution for Python Lovers With the Speed of Native Java ROM AN CHYLA | CERN

SPIRES is the biggest bibliographic database for High Energy Physics, ArXiv is the biggest full text repository for the full text papers in High Energy Physics, and INSPIRE is the biggest digital library that merges the two. We must work with result sets bigger than 1 million for citation related queries and our partners from Astrophysics with 6 million sets, however INSPIRE is written in Python. So how do we move several million result sets between the two systems fast? How do we take advantage of our special NLP processing pipeline written in Python? How do we join them? We do not use Jython. We do not use pipes. We do not embed Solr inside INSPIRE. We embed INSPIRE into Solr! The talk shows benefits and challenges of this surprisingly elegant solution.

San Francisco 2011

LUCENE REVOLUTION

Rahul Agarwalla HEAD OF INTERNATIONAL BUSINESS, UCHIDA SPECTRUM INC

!!!"%6()'04,")+"86Rahul Agarwalla heads international business for Uchida Spectrum Inc, Japan. Previously he has built and exited two content/technology ventures including Matrix Information, the pioneer of digital content syndication in India. He has over 14 years of experience with various search technologies like Verity, FAST ESP and Solr/Lucene.

Boris Aleksandrovsky SEARCH ARCHITECT, YAM M ER

-!!!"9$,,(0")+,Boris Aleksandrovsky works for Yammer, the Enterprise Social Network company, where they are trying to bring benefits of social media to enterprises by creating discoverable knowledge bases. He specializes in solving problems of search, machine learning and data analysis on large scale by employing distributed and scalable software architectures. Boris has almost completed his PhD in Computer Science and Neuroscience at University of California at Irvine.

Josh Berkus CORE TEAM , POSTGRESQL

!!!"62(56(0'%")+,Josh Berkus has been working as a database application consultant for 8 years. Josh primarily builds applications for the legal and HR industries and does performance tuning. He was also head of Sun Microsystem's PosgtreSQL support staff for 2 years and helped launch BI startup Greenplum.

LUCENE REVOLUTION San Francisco 2011

Ed Bueche DISTINGUISHED ENGINEER, EM C

!!!"#$%"%&$' Ed Bueche is an EMC Distinguished Engineer and one of the Architects of the Documentum xPlore search engine (part of EMCâ&#x20AC;&#x2122;s Information Intelligence Group). He has been with Documentum/EMC for 12+ years and has more than 23 years of experience in performance/development in the industry, including companies like AT&T Bell Labs and Sybase. At Documentum he worked to improve performance & scalability for all previous Documentum full-text integrations (Verity and FAST). Ed has been a regular speaker for over 11 years at the Documentum worldwide user conferences (both in America, Europe) as well as at EMC World.

Andrzej Bialecki TECHNICAL ADVISOR, LUCID IM AGINATION

!!!"()%*+*$,-*.,/*&."%&$' Andrzej Bialecki, Apache Lucene PMC Member, also serves as project lead for Nutch, and as committer in the Lucene-java, Nutch and Hadoop projects. He has broad expertise across domains as diverse as information retrieval, systems architecture, embedded systems, networking and business process/ecommerce modeling. Heâ&#x20AC;&#x2122;s also author of the popular Luke index inspection utility.

Roman Chyla RESEARCH FELLOW , CERN

!!!"%#0."%1' Roman Chyla is a research fellow at CERN, Switzerland. He works in the INSPIRE team to build the biggest digital library for the High Energy Physics. He is a developer and also information specialist, presented at four conferences, two of them international: Knihovny soucasnosti 2006, CASLIN 2007, IKI 2009, CASLIN 2009.

Mark Davis CTO, KITENGA, INC

!!!"2*/#.-,"%&$' Mark Davis is Founder and CTO of Kitenga, Inc. Previously he served as Principal Engineer at Xerox PARC spin-out InXight (acquired by Business Objects) and designed their enterprise product suite, as well as at Microsoft as a Program Manager for enterprise search and SharePoint. Mark spent nearly a decade as an academic researcher in the defense/intelligence community specializing in cross-language search and computational linguistics. He has extensive speaking experience in professional and academic forums. 37

San Francisco 2011

LUCENE REVOLUTION

Shantanu Deo TECHNICAL DIRECTOR, AT&T

!!!"$''")+,Shantanu Deo is a Technical Director in AT&T, in charge of their ecommerce CMS team. He is a patent holder and has in the past presented and published his work at the INFORMs conference on Optimization. His interests include web technologies, optimization and lately mobile web communications. Shantanu holds a BS in Computer Engineering from the university of Poona, India and MS degrees in the areas of Operations Research and Computer Science from the Louisiana State University.

Esteban Donato LEAD ARCHITECT, TRAVELOCITY

!!!"'0$;(.+)&'9")+,Esteban Donato works as Lead Architect for Travelocity. He has worked as Java Developer, Technical Leader and Architect for the last 10 years in different industries. Esteban has been working with Solr and Lucene technology for the last 2 years implementing it in different projects. Esteban has given conferences about Solr and Data Mining in Travelocity and Universities in Buenos Aires, Argentina.

Gregg Donovan TECHNICAL LEAD SEARCH, ETSY

!!!"('%9")+,Gregg Donovan is currently Technical Lead, Search at Etsy.com, the worldâ&#x20AC;&#x2122;s most vibrant handmade marketplace. He has worked extensively with Solr and Lucene at Etsy, and, previously, at TheLadders.com. At Etsy, located in Brooklyn, NY, he leads the search engineering team as it tackles the challenges presented by a growing international marketplace with a half-million different sellers in 150 different countries selling tens of millions of items.

Stephen Dunn HEAD OF TECHNOLOGY STRATEGY, GUARDIAN NEW S AND M EDIA UK

!!!"'*(24$03&$1")+"4:Stephen Dunn is Head of Technology Strategy for Guardian News and Media in the UK. He joined The Guardian in 1999 where he helps guide the technology strategy for itâ&#x20AC;&#x2122;s multiple award winning network of web sites and services. His professional interests include open web technologies, digital identity and security. Prior to joining the Guardian, Stephen completed his PhD at the Center for Computational Neuroscience and Robotics at Sussex University, UK. 38

LUCENE REVOLUTION San Francisco 2011

Sudarshan Gaikaiwari SOFTW ARE ENGINEER, YELP INC

!!!"9(.6")+,Sudarshan Gaikaiwari is a software engineer working on Yelp’s search team. Prior to Yelp he worked on various information retrieval technologies at Symantec’s Data Loss Prevention group.

Jon Gifford CO-FOUNDER, LOGGLY

!!!".+22.9")+,Jon Gifford is the CTO and co-founder of Loggly, where he spends all day coercing Solr into playing nice with the cloud, and with high-volume real-time data streams. An active user and frequent hacker of Lucene since 2004, he’s happy to let Solr take care of some of the hard work for a change. Prior to Loggly, he has spent more than a decade working on Search systems at Minimal Loop, Scout Labs, Technorati and LookSmart. He is concerned that his near-complete webanonymity is under threat.

Otis Gospodnetic FOUNDER, SEM ATEXT

!!!"%(,$'(5'")+,Otis Gospodnetic is a coauthor of Lucene in Action (1st and 2nd edition). He has been involved with Lucene since 2000 and Solr since 2006. He is also a member of Nutch, and Mahout development teams, as well as Lucene Project Management Committee. Otis is an Apache Software Foundation member and the founder of Sematext, a software development and consulting company focused on Search & Analytics using open-source technologies like Lucene, Solr, Nutch, Hadoop, HBase, Flume, and more.

San Francisco 2011

LUCENE REVOLUTION

Trey Grainger SEARCH TECHNOLOGY DEVELOPM ENT TEAM LEAD, CAREERBUILDER

!!!")$0((0#4&.3(0")+,Trey Grainger leads the Search Technology Development group at CareerBuilder.com. He introduced Solr to CareerBuilder and led the successful conversion away from the Microsoft FAST ESP platform. He has been with CareerBuilder for 4 years, and his search experience includes handling multi-lingual content across dozens of markets/languages, genetic algorithm and user group based relevancy tuning, geo-spatial search and validation, and work on customized payload scoring models, data mining, clustering, and recommendations. He is responsible for architecting CareerBuilder’s cloud-like search API exposing search as a simple, dynamic, and powerful generic service abstracted away from a large, globally-distributed architecture. Trey is also the founder and Chief Architect of Celiaccess.com, a gluten-free search engine and networking site.

Eric Gries PRESIDENT AND CEO, LUCID IM AGINATION

!!!".4)&3&,$2&1$'&+1")+,Eric Gries joined Lucid Imagination as the President and CEO, after spending more than 20 years in executive leadership roles, where he built high-growth technology-based businesses. Prior to joining the company, Eric was an Executive-in-Residence at Granite Ventures. Eric has served as CEO, general manager and vice president for companies in application development, systems management, networking, financial services and hardware systems, in both the U.S. and Europe. Prior to joining Granite Ventures, Eric led XACCT, a pioneering network mediation market leader, as its president and CEO. XACCT was acquired by Amdocs in 2004, at which time Eric joined Amdocs’ executive team as Senior Vice President. Earlier in his career, Eric served as general manager of Compuware’s Network and Systems Management division, and held product management, marketing, sales and engineering positions at companies such as ACI, Cullinet Software and DEC.

Erik Hatcher TECHNICAL STAFF, LUCID IM AGINATION

!!!".4)&3&,$2&1$'&+1")+,Erik Hatcher is the co-author of two books, Lucene in Action co-author of Java Development with Ant. Erik has been an active member of the Lucene community - a leading Lucene and Solr committer, member of the Lucene Project Management Committee, member of the Apache Software Foundation as well as a frequent invited speaker at various industry events. Erik earned his B.S. in Computer Science from University of Virginia, Charlottesville, VA. 40

LUCENE REVOLUTION San Francisco 2011

Jay Hill SENIOR SEARCH ARCHITECT, LUCID IM AGINATION

!!!".4)&3&,$2&1$'&+1")+,Jay Hill has been building enterprise search applications since 2003, and has worked extensively with Autonomy IDOL, Lucene, and Solr. He is a certified Solr trainer, and is lead author for Lucid Imagination’s Solr training courses.

Grant Ingersoll CO-FOUNDER, LUCID IM AGINATION

!!!".4)&3&,$2&1$'&+1")+,Grant Ingersoll is a founder and member of the technical staff at Lucid Imagination. Grant’s programming interests include information retrieval, machine learning, text categorization, and extraction. Grant is a regularly featured speaker at ApacheCon and other industry events. He has been an active member of the Lucene community – a Lucene and Solr committer, co-founder of the Apache Mahout machine learning project, chairman of the Lucene Project Management Committee (PMC) as well as a Vice President at the Apache Software Foundation. He is also the co-author of Taming Text (Manning, forthcoming) covering open source tools for natural-language processing. Grant’s prior experience includes work at the Center for Natural Language Processing at Syracuse University in natural language processing and information retrieval. Grant earned his B.S. from Amherst College in Math and Computer Science and his M.S. in Computer Science from Syracuse University, NY.

Takahiko Ito SOFTW ARE ENGINEER, MIXI, INC

!!!",&5&"86Takahiko Ito received his Ph.D. in Engineering at Nara Institute of Science and Technology, specializing in graph mining. He was a specialist for Japanese and Asian language processing at Fast Search and Transfer prior to joining mixi, Inc as an R&D engineer. Selected Papers include: ! !

Masashi Shimbo, Takahiko Ito, Daichi Mochihashi, Yuji Matsumoto. On the Properties of von Neumann Kernels for Link Analysis. Machine Learning, 75:37-67, 2009. Takahiko Ito, Massashi Shimbo, Taku Kudo, Yuji Matsumoto. Application of Kernels to Link Analysis, The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2005.

San Francisco 2011

LUCENE REVOLUTION

Alexander Kanarsky SENIOR SOFTW ARE ENGINEER, TRULIA

!!!"'04.&$")+,Alexander Kanarsky is responsible for managing day-to-day operations of Trulia’s indexing and search infrastructure and oversees the search related development there. Prior to Trulia he was a member of core development team for Autonomy’s Digital Safe, world’s largest private archive of electronic documents.

Laura Kang TECHNICAL LEAD, SEARCH AND M ATCHING, THELADDERS

!!!"'*(.$33(0%")+,Laura Kang holds a B.A. in computer science, mathematics, and economics from University of California at Berkeley, and M.S. and Ph.D. in computational mechanism design from Harvard University. She has presented her work at several conferences, including the International Conference for Electronic Commerce and the ACM Conference on Electronic Commerce. Before joining TheLadders, she was a manager at a NYC technology startup. At TheLadders, she focuses on search and matching algorithms.

Sudhakara Karegowdra PRINCIPLE ARCHITECT, TRAVELOCITY

!!!"'0$;(.+)&'9")+,Sudhakara Karegowdra works as Principle Architect for Travelocity. He has worked as Java Developer, Technical Leader and Architect for the last 14 years in different industries and 10 out of those in Travel industry. Sudhakar has been working with Solr and Lucene technology for the last 3 years implementing it in different projects. Sudhakara has given conferences about Solr in Travelocity.

LUCENE REVOLUTION San Francisco 2011

Steve Kearns ROSETTE PRODUCT M ANAGER

!!!"#$%&%'()*")+,Steve is the product manager for the Rosette Platform and is also the subject matter expert for the international compliance market within Basis Technology. Prior to Basis Technology, Steve worked at BBN Technologies where he worked on the Broadcast and Web Monitoring Systems, which capture and extract open-source intelligence from live television and internet news websites. He has experience in information visualization, distributed systems architecture and received his MS in Information Technology and BS in Computer Information Systems from Bentley University. He also spoke at the Apache Lucene EuroCon 2010 in Prague, on the topic of Building Multilingual Search Based Applications.

Marc Krellenstein FOUNDER, LUCID IM AGINATION

!!!".4)&3&,$2&1$'&+1")+,Marc Krellenstein is the founder of Lucid Imagination. Marc has 30 years’ experience in the computer industry, focusing for the last 20 years on information retrieval technology and applications. Marc was previously Chief Technology Officer and Vice President for Search and Discovery Technology at Elsevier, the scientific, technical and medical publishing division of ReedElsevier. Prior to Elsevier Marc was Chief Technology Officer and Senior Vice President of Engineering at Northern Light Technology, where he was the founding technologist and led the design and development of the Northern Light search service, including designing the data model, query interpretation, relevancy ranking, automatic document classification and patented technology for document clustering. Marc has an A.B. in philosophy from Cornell he earned his M.S. in computer science from the University of Wisconsin at Madison and a Ph.D. in psychology (cognitive science) from the New School for Social Research, NY.

Ronald Mayer CTO, FORENSIC LOGIC, INC.

!!!"/+0(1%&).+2&)")+,Ronald Mayer has spent his career with technology start-ups in a number of fields ranging from medical devices to digital video to law enforcement software. Ron has also been involved in Open Source for decades, with code that has been incorporated in the LAME MP3 library, the PostgreSQL database, and the PostGIS geospatial extension. His most recent speaking engagement was when he gave a presentation on a broader aspect of this system to the SD Forum’s Emerging Tech SIG titled “Fighting Crime: Information Chokepoints & New Software Solutions” 43

San Francisco 2011

LUCENE REVOLUTION

Alberto Mijares CANOO ENGINEERING AG

!!!")$1++")+,Alberto Mijares is a software engineer with more than 10 years of experience. He is Scrum Master and an agile practitioner. He has a large background in Web technologies and Java, having participated in the past in W3C activities related with Semantic Web. His usual role is either leading projects or designing architectures for web applications. He started working in Canoo Engineering AG (Switzerland) in 2008 and speaks Spanish, English and German. He has a degree in Computer Engineering. He has participated giving talks in Java and Web related conferences and user groups in Switzerland and Spain.

Floyd Morgan INTUIT

!!!"&1'4&'")+,Floyd is a Principal Software Engineer who works in the Central Technology Organization at Intuit, makers of TurboTax, Quickbooks, Quicken and Intuit Payroll, to name a few. Floyd has developed core features of the flagship TurboTax product line and recently co-founded Intuit’s newest social driven technology Live Community. Under Floyd’s direction, Live Community has gone from a small project to a widely adopted platform used by most Intuit products and services. Floyd earned his B.S. from San Diego State University in Computer Science.

Stephen O’Grady CO-FOUNDER AND PRINCIPAL ANALYST, REDM ONK

!!!"0(3,+1:")+,Stephen O’Grady is the co-founder and Principal Analyst of RedMonk, a boutique industry analyst firm focused on developers. Founded in 2002, RedMonk provides strategic advisory services to some of the most successful technology firms in the world. Stephen’s focus is on infrastructure software such as programming languages, operating systems and databases, with a special focus on open source and big data. Before setting up RedMonk, Stephen worked as an analyst at Illuminata. Prior to joining Illuminata, Stephen served in various senior capacities with large systems integration firms like Keane and consultancies like Blue Hammock. Regularly cited in publications such as the New York Times, NPR, the Boston Globe, and the Wall Street Journal, and a popular speaker and moderator on the conference circuit, Stephen’s advice and opinion is well respected throughout the industry.

LUCENE REVOLUTION San Francisco 2011

Timothy Potter SENIOR ENGINEER, NATIONAL RENEW ABLE ENERGY LABORATORY (NREL)

!!!"10(."2+;Timothy is a highly skilled technologist with over 13 years experience delivering innovative software solutions that encompass a wide range of technologies and business sectors. Currently, Mr. Potter is a Senior Engineer at the National Renewable Energy Laboratory (NREL) where he leads the effort to build a large-scale distributed platform for handling smart grid related energy data using Hadoop and NoSQL technologies. Prior to NREL, Timtohy was the CTO for Viyya Technologies where he developed a large-scale content recommendation system based on Solr, Mahout, and Hadoop running in the Amazon Cloud. As a Senior Software Engineer for the WebLogic Platform at BEA Systems, he was the chief inventor of several US Patents that helped revolutionize J2EE-based enterprise application integration. His technical blog (http://thelabdude.blogspot.com/) is highly respected as a guide for other developers in the open-source Java community. Mr. Potter has a BS in Mathematics and BA in Economics with honors (summa cum laude) from the University of Colorado.

Daniel Potzinger AOE M EDIA GM BH

!!!"$+(,(3&$"3(Daniel Potzinger has more than 10 years of web development experience under his belt. He is a skillful hand at developing clean solutions with a particular love of elegant, easily maintained and reusable coding. Daniel is always open to new projects and development methods, such as Agile Software development. Over the last few years since joining AOE media, Daniel has played â&#x20AC;&#x153;midwifeâ&#x20AC;? to more than 60 Enterprise CMS-Projects for such renowned clients as congstar, Cisco WebEx and VMware, Panasonic and the like: taking care of client requirements, directing the development and launching the results.

San Francisco 2011

LUCENE REVOLUTION

Craig Rees SENSIS

!"#$%$&'()&*+, Craig Rees has been at Sensis since 2008. Craig heads up the content and search groups which manage the search capabilities, platforms and operational teams that support the Yellow Pages® and White Pages® businesses. Craig is the author of the Sensis Content Strategy and the technology owner of the Sensis Business Search API. Prior to joining Sensis, Craig worked in digital strategy development and implementation roles in the United Kingdom with companies including BBC, Sky and Argos.

Ramon Resma ARCHITECT, TRAVELOCITY

---&./*0"1('%.2&'(), Ramon Resma works as an Architect for Travelocity Mobile. He has over 22 years of experience in the travel industry and has worked on technical leadership roles for Travelocity Architecture, Sabre Airline Solutions Architecture, and American Airlines. Ramon has been working with Solr and Lucene technology for the last 2 years. Recently he worked on implementing Solr functions for serving location-based content on travel mobile applications.

Yonik Seeley CREATOR OF APACHE SO LR & CO-FOUNDER LUCID IM AGINATION

---&1+'%3%)*4%#*.%(#&'(), Yonik Seeley is the creator of Solr. He is an expert in distributed search systems architecture and performance. Yonik has been a prolific Lucene/Solr committer, a member of the Lucene PMC, and a member of the Apache Software Foundation. Yonik’s work experience includes CNET Networks, BEA and Telcordia. He earned his M.S. in Computer Science from Stanford University.

LUCENE REVOLUTION San Francisco 2011

Uwe Schindler M ANAGING DIRECTOR, SD DATASOLUTIONS GM BH

!!!"6$12$($"3(Uwe is committer and PMC member of Apache Lucene and Solr. His main focus is on development of Lucene Java. He implemented fast numerical search and is maintaining the new attribute-based text analysis API. He studied Physics at the University of Erlangen-Nuremberg and works as managing director for SD DataSolutions GmbH in Bremen, Germany, a company that provides consulting and support for Apache Lucene and Solr. A primary customer of his company is “PANGAEA – Publishing Network for Geoscientific & Environmental Data” where he implemented the portal’s geo-spatial retrieval functions with Lucene Java. Uwe had talks about Lucene at various international conferences like the previous Lucene Revolution, ApacheCon EU/US, Lucene Eurocon, Berlin Buzzwords and various local meetups.

Tyler Tate HEAD OF USER EXPERIENCE, TW IGKIT

!!!"'!&2:&'")+,Tyler Tate leads user experience at TwigKit where he has helped governments, not-for-profits, and blue-chip corporations build superb search experiences. Tyler also organises the Enterprise Search London meetup and has written for a number of publications including UX Magazine, Johnny Holland, Smashing Magazine, and UX Booth. Tyler lives in London with his wife Ruth and son Galileo, and you can keep up with him on Twitter.

Joshua Tuberville SEARCH ARCHITECT

!!!"(=$0,+19")+,Joshua Tuberville is a Software Architect with eHarmony.com. With over 15 years of Internet technology experience, he specializes in high-scale online architectures. He has been with eHarmony for the past 9 years and previously worked with Sony, Disney, as well as several startups. He regularly speaks at user groups and conferences. His recent focus has leading the architecture of jazzed.com, a new dating site, which uses Solr to allow people to find highly relevant profiles.

San Francisco 2011

LUCENE REVOLUTION

Anne Veling SEARCH ARCHITECT, BEYONDTREES

!!!"#(9+13'0((%")+,After a M.Sc. in Computer Science/Artificial Intelligence, Anne worked for several years in the search engine industry, designing highly scalable knowledge extraction, clustering and visualization modules for search applications. Currently self-employed, helping out global companies create web applications that involve search. Anne is also busy doing performance troubleshooting, and gives Lucene and Solr workshops

Dawid Weiss ASSOCIATE PROFESSOR, INSTITUTE OF COM PUTING SCIENCE POZNAN UNIVERSITY OF TECHNOLOGY, POLAND

!!!")$00+'%($0)*")+,David Weiss shares academic and industrial background: he is an associate professor at the Institute of Computing Science of Poznan University of Technology in Poland (PhD in Information Retrieval) and co-owns Carrot Search, a company that provides commercial services revolving around text processing, text mining and text clustering. In his spare time Dawid contributes to several open source projects, including Carrot2.org, reads books and passionately plays basketball with a bunch of his old friends. He lives in Poznan, Poland with his wife and two children.

Simon Willnauer SOLR / LUCENE COM M ITTER, APACHE LUCENE PM C

!!!"$6$)*("+02Simon is a Lucene core committer and PMC member. During the last couple of years he worked on design and implementation of scalable software systems and search infrastructure. He studied Computer Science at the University of Applied Sciene Berlin. Currently, he work as a consultant for Apache Solr, Lucene Java and Hadoop and is a co-organizer of the â&#x20AC;&#x153;BerlinBuzzwordsâ&#x20AC;? conference on Scalability June 2011 in Berlin (Germany).

LUCENE REVOLUTION San Francisco 2011

Olaf Zschiedrich HEAD OF TECHNOLOGY EBAY KLEINANZEIGEN

:.(&1$17(&2(1"(#$9"3(Olaf leads development for eBay Kleinanzeigen, Germanyâ&#x20AC;&#x2122;s number one classifieds ad site. Before that he was part of the core architecture team at the mobile.international GmbH. He also worked for Siemens TS where he was involved in building the Customer Information System for the MTA New York City Transit subway system. He has a passion for high-traffic web applications, search technologies, agile development methods and is a believer in open source.

San Francisco 2011

LUCENE REVOLUTION

Hotel Information ADDRESS Hyatt Regency San Francisco Airport 1333 Bayshore Highway, Burlingame, California, USA 94010 Tel: +1 650 347 1234 Fax: +1 650 696 2669 !""#$%%&&&'()*+,)*-.(-/).,#/,"!0)""'-/12

DIRECTIONS FROM SAN FRANCISCO INTERNATIONAL AIRPORT (2 M ILES):

Take 101 South toward San Jose. Exit Millbrae Ave. Turn left on Millbrae Ave. Turn right at the second stoplight onto Bayshore Hwy. Proceed through 4 stoplights. Our Burlingame California hotel is on the right hand side. FROM OAKLAND AIRPORT (APPROXIM ATELY 30 M ILES) AND POINTS EAST:

Take I-880 South toward San Jose. Merge onto CA-92 W toward San Mateo Br. Merge onto US-101 N toward San Francisco to the Broadway Exit. Take the Airport Blvd ramp toward Bayshore Blvd, then turn left onto Bayshore Hwy to our Burlingame lodging. FROM SAN JOSE AIRPORT (APPROXIM ATELY 30 M ILES) AND POINTS SO UTH:

Take 101 North to the Broadway Exit. Take the Airport Blvd ramp toward Bayshore Blvd, then turn left onto Bayshore Hwy to the hotel.

LUCENE REVOLUTION San Francisco 2011

HOTEL MAPS M EETING ROOM S

Hyatt Regency San Francisco Airport DIRECTIONS From San Francisco Intâ&#x20AC;&#x2122;l Airport (2 miles): Take 101 South. Exit Millbrae Ave. East. Turn right at stoplight onto Bayshore Hwy. Proceed through 4 stoplights. Hotel is on right.

San Francisco 2011

LUCENE REVOLUTION

M AP OF HOTEL AND AIRPORT

Hyatt Regency DIRECTIONS

From San Francisco Int’l Airpor

Turn right at stoplight onto Bay on right.

LUCENE REVOLUTION San Francisco 2011

PUBLIC TRANSPORTATION (BART):

San Francisco 2011

SAN FRANCISCO DOW NTOW N

LUCENE REVOLUTION

Cloud-scale enterprise search begins here Salesforce.com is the enterprise cloud computing leader and the worldĂs 4th fastest-growing company. Our Search Team is experienced, with deep architecture expertise. WeĂre dedicated to delivering the fastest, most reliable cloud-scale enterprise search. If you share our passion, come introduce yourself.

www.salesforce.com

!"#$%&''()&*$+'(,-+.#/(HQDEOHV \RX WR VHH ZKDW \RX KDYH IRXQG EURZVH DQG SUHYLHZ PXOWL SDJH GRFXPHQWV ZLWK ZHE EURZVHU

)&*$+'(,-+.#/(I-J(<-+4$.-*(

www.documill.com

6LPXOWDQHRXV SUHYLHZLQJ RI PXOWLSOH GRFXPHQWV (PEHGGDEOH WKXPEQDLO YLHZ UHMXYHQDWHV VHDUFK UHVXOW OLVWV 0DWFKLQJ SDJHV ZLWKLQ GRFXPHQWV KLJKOLJKWHG .H\ZRUGV KLJKOLJKWHG LQ WKH SDJH WKXPEQDLOV ,QWHJUDWHG YLHZHU WR DFFHVV GRFXPHQWV ZLWK EURZVHU 6HDUFK HQJLQH LQGHSHQGHQW VHUYLFH DUFKLWHFWXUH HPEHG SUHYLHZ FRPSRQHQWV LQWR DQ\ VHDUFK 8,

!"#$%&''0(1-23&&2+34&-(560(785950(:*;""0(<=>?@>! 4-'A(BC9D(97(67D(5DCE0(F+G(BC9D(E(895H(8878 VDOHV#GRFXPLOO FRP ZZZ GRFXPLOO FRP

San Francisco 2011

LUCENE REVOLUTION