Thursday, July 31, 2014

How to sort string field in Solr


We want to allow site user to sort the old Indian Arts records based on title, id & date fields. We have used Apache Solr to store the records. Well!! Solr is a standalone pre-configured product/webapp which uses Lucene. Solr is ready-to-use out of box. It is a web application that offers infrastructure related and a lot more features in addition to what Lucene offers.You can read more about the Solr here.
Sorting on text/string field works strangely in Solr. Initially I realised it worked normally but later I found its strange behavior as explained below.
If the record contains "Aany ZeRasta", "David Book". Solr shows Aany ZeRasta after David Book by default. Why? Because there is Z in 2nd word of first string. Obviously this is something that we never want. We can't use abs function but we can't use on string type. I tried by changing field type to string form text_general.

alphaOnlySort

I read somewhere that there is a type alphaOnlySort in latest solr schema. alphaOnlySort uses KeywordTokenizer along with various TokenFilterFactories to produce a sortable field that does not include some properties of the source text. This KeywordTokenizer does no actual tokenizing, so the entire input string is preserved as a single token. It removes any leading or trailing whitespace & handles case sensitivity.

Schema.xml

I created a new field title_sort & copied the values from title field as like below.
 
 

We always have to restart the solr server after schema or solrconfig.xml files edited. After reindexing I just changed my solr sort query from tite field to title_sort. Phew!!!! I got the correct sorting result for title field.

1 comment: