Showing posts with label solr sorting text field. Show all posts
Showing posts with label solr sorting text field. Show all posts

Thursday, July 31, 2014

How to sort string field in Solr


We want to allow site user to sort the old Indian Arts records based on title, id & date fields. We have used Apache Solr to store the records. Well!! Solr is a standalone pre-configured product/webapp which uses Lucene. Solr is ready-to-use out of box. It is a web application that offers infrastructure related and a lot more features in addition to what Lucene offers.You can read more about the Solr here.
Sorting on text/string field works strangely in Solr. Initially I realised it worked normally but later I found its strange behavior as explained below.
If the record contains "Aany ZeRasta", "David Book". Solr shows Aany ZeRasta after David Book by default. Why? Because there is Z in 2nd word of first string. Obviously this is something that we never want. We can't use abs function but we can't use on string type. I tried by changing field type to string form text_general.

alphaOnlySort

I read somewhere that there is a type alphaOnlySort in latest solr schema. alphaOnlySort uses KeywordTokenizer along with various TokenFilterFactories to produce a sortable field that does not include some properties of the source text. This KeywordTokenizer does no actual tokenizing, so the entire input string is preserved as a single token. It removes any leading or trailing whitespace & handles case sensitivity.

Schema.xml

I created a new field title_sort & copied the values from title field as like below.
 
 

We always have to restart the solr server after schema or solrconfig.xml files edited. After reindexing I just changed my solr sort query from tite field to title_sort. Phew!!!! I got the correct sorting result for title field.