Tuesday, March 29, 2016

Randomizing top-n results in Solr

So, after shuffling a bit [1] the top-n search results returned by Solr, you may want to effectively randomize them in a *non-repeatable* way. Why? I don't know...I'm just enjoying some coding experiment while I'm travelling :)

What I want to do is: run a query and (pseudo) randomly reorder the first top results. I will be using again the query reranking feature, but this time, I need a re-ranking query that produces different results each time is invoked.

I created a simple function [2] (i.e. a ValueSourceParser plus a ValueSource subclasses) that is based on a (threaded-local) java.util.Random instance which simply returns a (pseudo) random number each time it is invoked.

Once the two classes have been packed in a jar, put under the lib folder and configured in solrconfig.xml with the name rnd:

<valueSourceParser name="rnd" class="com.faearch.search.function.RandomValueSourceParser"/>

I only need to use it in a re-rank query using the boost parser:

<requestHandler ...>
    <str name="rqq">{!boost b=rnd() v=$q}</str>
    <str name="rq">{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=1.2}</str>
...

You can now start Solr, index some document, run several times the same query (by default ordered by score) and see what happens.  Don't forget to include the score in the field list (fl) parameter; in this way you will see the concrete effect of the multiplicative random boost:

http://...?q=shoes&fl=score,*

1st time

<result name="response" numFound="2" start="0" maxScore="0.32487732">  <doc>
 <str name="product_name">shoes B</str>
 <float name="score">0.32487732</float></doc>
 <doc>
 <str name="product_name">shoes A</str>
 <float name="score">0.22645184</float></doc>
</result>

2nd time (ooops that's the same order...don't worry, it's the randomness, and I indexed only 2 docs, see the score value, which is different from the previous example)

<result name="response" numFound="2" start="0" maxScore="0.61873287">  <doc>
 <str name="product_name">shoes B</str>
 <float name="score">0.61873287</float></doc>
 <doc>
 <str name="product_name">shoes A</str>
 <float name="score">0.3067757</float></doc>
</result>
  
3rd time

<result name="response" numFound="2" start="0" maxScore="0.24988756">  <doc>
 <str name="product_name">shoes A</str>
 <float name="score">0.24988756</float></doc>
 <doc>
 <str name="product_name">shoes B</str>
 <float name="score">0.22548665</float></doc>
</result>

See you next time ;)

[1] http://andreagazzarini.blogspot.it/2015/11/shuffling-top-results-with-query-re.html
[2] https://gist.github.com/agazzarini/a802eff3b50c03fae2364458719be94e

No comments: