Tuesday, December 23, 2014

Various ways to delete documents in Solr

Php-Solr provides following two methods to delete the record.
1. deleteByQuery : Using deleteByQuery you can deletes all documents matching the given query. This will erase the entire index if we pass ':'. You can remove a record having id 20 using following query. You just need to pass one of the field name defined in schema.xml followed by : followed by the value of the field.

$client->deleteByQuery("*:*");
$this->solrClient->deleteByQuery('id:'.$entryId);
$this->solrClient->deleteByQuery('field name: field value');
2. deleteById : This deletes the document with the passed id. Id should be the uniqueKey field which is declared in the schema.xml file. We need to commit after delete query otherwise you see the record in solr. Use following solr delete query to delete all the records, except 1, 12 or 123.

$this->solrClient->deleteById('id:'.$recordId);
$solr->solrClient->deleteById("*:* -id:(1 OR 12 OR 123)") After that you can see 3 records are getting removed from your solr core.
3. deleteByIds : This deletes the documents with the passed array of ids. Id should be the uniqueKey field which is declared in the schema.xml file. Use following solr delete query to delete multiple multiple documments. Array should be in an indexed array.

$docIds = array(120, 121,122,10202,12002);
$this->solrClient->deleteByIds($docIds);
$solr->solrClient->deleteById("*:* -id:(1 OR 12 OR 123)") After that you can see 3 records are getting removed from your solr core.

Using Curl
You can delete all the entire index using Curl as well.

curl http://mysolrdomain.com/solr/collection1/update?commit=true -H "Content-Type: text/xml" --data-binary '*:*'
Web interface
If you simply needt to delete the records from your Solr index using the web interface, below is the code snippet that allows you do so: This delete documents where the id field matches 29999. If you want to delete solr records which matches more than one field, just add another query:

http://hostname/solr/update?stream.body=
id:29999&commit=true

http://hostname/solr/update?stream.body=
id:29999
name:amol&commit=true
If you want to delete all items in the index, just use this query:

*:*
Empty data
Hostname/solr/update?stream.body=*:*&commit=true
Empty data on specific condition update?stream.body=(Fieldname : FieldValue)&commit=true

Tuesday, September 30, 2014

How to insert data in solr using php


Inserting data into the Solr using PHP is very easy. When it comes to Solr there are two very important files namely config.xml & schema.xml. These files are the soul of solr. Schema.xml takes the control for data insertion. It comes with pre-defined field types. Schema.xml file defines type of fields, which field should be unique/primary key, which fields are required.
Lets have a simple example of students information like student roll number, name, medal, standard & date of birth. Below will be the schema defination for above fields. Find the <types> in schema.xml file & put the following code into <types> </types>.




RollNumber should be unique & always be an integer value. Name must be string contains alphanumeric charaters. You can see medal field type defination where we have multivalued true. This means we can store multiple values for each record like Gold, Silver, Bronze. We have used datetime field type to store the date of birth. We have done with the solr schema. Save this file & restart the Solr server. Make sure you restart the server whenever you edit both files schema & solrconfig.xml file.
Now lets add php snippet that adds students record into the database. You should have solr hostname, login, password etc details to create the object of solrClient.
$options = array
(
    'hostname' => SOLR_SERVER_HOSTNAME,
    'login'    => SOLR_SERVER_USERNAME,
    'password' => SOLR_SERVER_PASSWORD,
    'port'     => SOLR_SERVER_PORT,
    'path'     => SOLR_PATH_TO_SOLR,
);

$client = new SolrClient($options); 
// You can start loop for multiple records insertions here.

$doc = new SolrInputDocument(); // Create an object of Solr Document.

$doc->addField('RollNo', 12); 
$doc->addField('name', 'John Anderson');
$doc->addField('marks', 'Bronze');
$doc->addField('date', '$date');

Just add list of field names(id, name etc) as added in schema.xml.  

if(!empty($documents)) 
{
$client->addDocuments($documents); 
$client->commit();
$client->optimize(); 
}
You can refer Php-Solr manual also here

Thursday, July 31, 2014

How to sort string field in Solr


We want to allow site user to sort the old Indian Arts records based on title, id & date fields. We have used Apache Solr to store the records. Well!! Solr is a standalone pre-configured product/webapp which uses Lucene. Solr is ready-to-use out of box. It is a web application that offers infrastructure related and a lot more features in addition to what Lucene offers.You can read more about the Solr here.
Sorting on text/string field works strangely in Solr. Initially I realised it worked normally but later I found its strange behavior as explained below.
If the record contains "Aany ZeRasta", "David Book". Solr shows Aany ZeRasta after David Book by default. Why? Because there is Z in 2nd word of first string. Obviously this is something that we never want. We can't use abs function but we can't use on string type. I tried by changing field type to string form text_general.

alphaOnlySort

I read somewhere that there is a type alphaOnlySort in latest solr schema. alphaOnlySort uses KeywordTokenizer along with various TokenFilterFactories to produce a sortable field that does not include some properties of the source text. This KeywordTokenizer does no actual tokenizing, so the entire input string is preserved as a single token. It removes any leading or trailing whitespace & handles case sensitivity.

Schema.xml

I created a new field title_sort & copied the values from title field as like below.
 
 

We always have to restart the solr server after schema or solrconfig.xml files edited. After reindexing I just changed my solr sort query from tite field to title_sort. Phew!!!! I got the correct sorting result for title field.