Mayer Ronald - Search Relevance for Law Enforcement by lucid imagination

Highly Relevant Search Result Ranking for Law Enforcement Ronald Mayer, Forensic Logic, Inc ramayer@forensiclogic.com, 2011-05-26

Police car photo by davidsonscott15 (Scott Davidson) on Flickr under (CC BY 2.0) license

What I Will Cover  Highly Relevant Search Result Ranking for Large Law Enforcement Information Sharing Systems  Who I am – Ron Mayer, CTO at Forensic Logic.  The challenge / problem • Ranking law enforcement documents has interesting challenges.

 3 interesting challenges: • Many factors affect relevance for a law-enforcement user • A mix of structured, unstructured, semi-structured data • Improving edismax sub-phrase boosting

 Conclusion • Solr's flexibility & community are both great.

My Background  Ron Mayer  CTO of Forensic Logic, Inc • We power crime analysis and cross-agency search tools for the LEAP (law enforcement analysis portal) project. • About 150 State, Local, and Federal law enforcement agencies use our SAAS software to analyze and share data

 My background • 8 years of delivering software technologies to law enforcement as SAAS solutions. • Use some F/OSS, quite a bit of proprietary. • Play well with F/OSS projects  (contributed back code to PostgreSQL, PostGIS, a memcached client, and earlier contributions from school that found their way into various projects)

The Challenge  Problem I set out to solve • We had a good but complex database-based crime analysis package for investigators with good computer skills. • Needed an easy “google-like” interface that any officer could use.

 Considerations • Most officers don't want to sit around on desks filling out search forms. • Want something like Google – type a guess, and get the most relevant documents on the first page.

 Key hurdles or obstacles to success you had to overcome. • What factors even define “the most relevant” document. • Extremely Disparate data (some almost totally structured; some totally unstructured; most a mix) • How do we implement ranking.

Project background

Project background  Started 8 years ago with a desktop Crime Analysis Application; ported to web application

 Big structured search forms worked well for crime analysts and detectives who can invest time at a desk  Some users wanted quicker/easier simple search

Project background  Prototyped with Project Blacklight • Wonderful F/OSS community • Just added to their facet list in a config file. • Constructuve feedback from customers in couple weeks.

Project background ď&#x201A;§ Eventually rewrote with many law-enforcementcentric features.

Search Relevance for Law Enforcement Users

 Searches often contain multiple clauses

• 'red baseball cap black leather jacket tall male suspect short asian victim' • These search clauses are often noun clauses with a few adjectives preceding a noun; but are often independent from each other.

 Fuzzy searches are common • Victims give incomplete descriptions • Suspects lie • Close counts.

Search Relevance for Law Enforcement Users

 Geospatial factors

• Officers are often interested in things near their own city or beat  Solr does this one well for 1 location of interest in a document: – bf=... recip(dist(2,primary_latlon,vector(#{lat},#{lon})),1,1,1)^0.5

 I haven't yet found a great solution for documents with many locations of interest (say, a document regarding a gang importing drugs from Ciudad Juárez Mexico to Denver, which should be highly relevant to every city touching the southern half of I25.

• Often law enforcement officers want to search for documents near a certain type of landmark    

“near any elementary school in the school district” “near a particular school” “in a predominantly Hispanic neighborhood” “near a freeway”

• Sometimes more convenient to interact with a map and use Solr's geospatial features. Sometimes more convenient to tag the documents with the relevant phrases.

Search Relevance for Law Enforcement Users

 Advanced geospatial searches

• Not having a lot of luck with Solr/Lucene here yet • Often intersecting polygons.  Just off a I5  Walking distance from a Jr High School

• We do it in a more complex app w/ Postgis.  Would love to be able to click a school or road on a map, and use that to filter or sort Solr results