RUNNING LILY

About

This guide will take you through a first Lily experience, with a sample schema about books and authors. This will only take a few minutes, but make use of built-in versions of Hadoop/HBase which means your data won't be saved between server restarts. It's a good way to familiarize yourself with the deployment of Lily before running it on a real install of Hadoop/HBase/ZooKeeper.

If at any point you run into problems, please let us know on the Lily mailing list.

Linux, Mac OS X, Windows

Linux is the only supported production platform for Hadoop.

For development purposes, you can also use other Unix-variants like Mac OS X.

Windows is not supported.

Java 1.6

You need to have Sun/Oracle Java 1.6 installed. An environment variable JAVA_HOME should point to where it is installed.

If everything is fine, you should be able to execute:

$JAVA_HOME/bin/java -version

and it should show something like:

java version "1.6.0_21"
Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
Java HotSpot(TM) Server VM (build 17.0-b16, mixed mode)

Downloading Lily

Download the Lily binary distribution: lily-1.2.1.tar.gz.

Starting Lily

For testing purposes, Lily ships with a command called launch-test-lily which starts Lily and all its dependent services in one JVM. The started services are: HDFS, HBase, MapReduce's JobTracker and TaskTracker, ZooKeeper, Solr and Lily-server itself.

So start this now as follows:

bin/launch-test-lily -s samples/books/books_sample_solr_schema.xml -c 5

The -s option specifies the Solr schema we need for our demo, the -c option specifies that the Solr index will be auto-committed every 5 seconds.

Wait a few moments for it to be started completely, until you see this:

-----------------------------------------------
Lily is running

This setup will store its data in a temporary directory which is lost each time you stop or restart launch-test-lily.

See further on for running against a 'real' HBase & co.

Create Field & Record Types

Before putting content in Lily, you need to create some field types and record types.

For the purpose of this first run, we will upload some types for managing books and authors using the import tool:

bin/lily-import -s samples/books/books_sample.json

The -s option specifies that we only want to upload the schema at this point (the JSON file contains records too).

Behind the scenes, this command connects to ZooKeeper to find out the available Lily servers and picks one from it at random to talk to.

Define An Index

Define an index using:

bin/lily-add-index -n books -c samples/books/books_sample_indexerconf.xml -s shard1:http://localhost:8983/solr

The books_sample_indexerconf.xml file is the configuration for the indexer: it describes what records should be indexed and how the fields of the records should be mapped to Solr fields.

The lily-add-index command will modify the configuration of indexes stored in ZooKeeper. In response to this, the Lily server(s) will put everything necessary to keep the index up to date in action: register a message queue subscription and start the indexing processes.

Loading Records Into Lily

Use the import tool to upload some records into Lily:

bin/lily-import samples/books/books_sample.json

Querying The Solr Index

Browse to

http://localhost:8983/solr/admin/

Type 'frankenstein' in the input box and press search, you should get a result with one document in it. In some browsers you need to do view-source to see the XML result.

As mentioned above, it can take up to 5 seconds for the new records to be visible in the index, so if you were very fast you may have to retry.

REST interface

There are two protocols available to talk to Lily: an RPC-style binary one based on Avro, which is used when you use the client Java API, and a REST-style API (HTTP+JSON).

The port on which the REST interface is listening is printed on repository startup, by default it is 12060:

Protocol [HTTP/1.1] listening on port 12060

For example, here is how you can access one of the records created earlier by the import:

http://localhost:12060/repository/record/USER.mary_shelley

Rebuilding The Index

Usually an index is kept up-to-date incrementally by listening to repository events. Sometimes it can be useful to rebuild the index: when the configuration is changed or when it was defined after already loading content into Lily, or when the Solr index is lost, or whatever. It is also possible to disable incremental index updating completely, and only update the index through batch rebuilds.

Let's quickly run through how to trigger a batch index build.

A batch index build is triggered by changing the batch build state of an index to BUILD_REQUESTED, as follows:

bin/lily-update-index -n books --build-state BUILD_REQUESTED

In response to this state change, Lily will launch a Hadoop job to perform the index build, and change the batch build state to BUILDING. This can be observed by running lily-list-indexes:

bin/lily-list-indexes

which shows output like this:

books
  + General state: ACTIVE
  + Update state: SUBSCRIBE_AND_LISTEN
  + Batch build state: BUILDING
  + Queue subscription ID: IndexUpdater_books
  + Solr shards: 
    + shard1: http://localhost:8983/solr
  + Active batch build:
    + Hadoop Job ID: job_20101105103522869_0001
    + Submitted at: 2010-11-05T10:38:33.913+01:00
    + Tracking URL: http://localhost:45989/jobdetails.jsp?jobid=job_20101105103522869_0001

Notice it also shows the ID of the Hadoop Job and a tracking URL which will take you to a web ui that displays more information about the progress of the job.

After a little while the job will be finished, and when you run lily-list-indexes again, the batch build state will be INACTIVE and information about the last run batch build will be available:

books
  + General state: ACTIVE
  + Update state: SUBSCRIBE_AND_LISTEN
  + Batch build state: INACTIVE
  + Queue subscription ID: IndexUpdater_books
  + Solr shards: 
    + shard1: http://localhost:8983/solr
  + Last batch build:
    + Hadoop Job ID: job_20101105103522869_0001
    + Submitted at: 2010-11-05T10:38:33.913+01:00
    + Success: true
    + Job state: succeeded
    + Tracking URL: http://localhost:45989/jobdetails.jsp?jobid=job_20101105103522869_0001
    + Map input records: 2
    + Launched map tasks: 1
    + Failed map tasks: 0
    + Index failures: 0

Next steps

Now you know the basics of running Lily. Next steps include:

As mentioned before, the HBase, Hadoop, ZooKeeper and Solr instances launched using launch-hadoop and launch-solr store their data into a temporary directory which is lost when you stop them.