Elliptics: server-side scripting

I added binary data support into server-side scripting (currently we only support Python) as well as extended HTTP fastcgi proxy to support server-side script execution.

Previously we only were able to use C/C++/Python to force some server to execute script on our data – API is rather simple and can be found in examples in binding dirs.
Now we can do that through HTTP.

Here is an example of how to setup python server-side scripting and run scripts on posted through HTTP data.
First of all, you have to put python.init script into ‘history’ directory (specified in server config).
It may look like this:

import sys
sys.path.append('/usr/lib')
from libelliptics_python import *

log = elliptics_log_file('/tmp/data/history.2/python.log', 40)
n = elliptics_node_python(log)
n.add_groups([1,2,3])
n.add_remote('elisto19f.dev', 1030)
__return_data = 'unused'

This creates global context, which is copied for every input request.
Second, you may add some scripts into the same dir, let’s add test script named test_script.py:

__return_data = "its nothing I can do for you without cookies: "

if len(__cookie_string) > 0:
       # here we can add arbitrary complex cookie check, which can go into elliptics to get some data for example
       # aforementioned global context ('n' in above python.init script) is available here
       # so you can do something like data = n.read_data(...)
        __return_data = "cookie: " + __cookie_string + ": "

__return_data = __return_data + __input_binary_data_tuple[0].decode('utf-8')

HTTP proxy places cookies specified in its config into variable __cookie_string, and it is accessible from your script. Anything you put into __return_data is returned to the caller (and eventually to the client who started request).

__input_binary_data_tuple[0] is a global tuple which contains binary data from your request.
When using high-level API you may find, that we send 3 parameters to the server:

  • script name to invoke (can be NULL or empty)
  • script content – it is code which is invoked on the server before named script (if present), its context is available for called script
  • binary data

That binary data is placed into global tuple without modification. In Python 2.6 it holds byte array, I do not yet know what it will be in older versions though (basically, we limited support to python 2.6+ for now).

HTTP proxy puts your POST request data into aforementioned binary data. It also puts simple string like __cookie_string = 'here is your cookie' into ‘script content’ part of the request.
One can put there whatever data you like if using C/C++/Python API of course.

Here is example HTTP request and reply:

$ wget -q -O- -S --post-data="this is some data in POST request" "http://elisto19f.dev:9000/elliptics?exec&name=test_script.py&id=0123456789"
  HTTP/1.0 200 OK
  Content-Length: 79
  Connection: keep-alive
  Date: Wed, 02 Nov 2011 23:24:10 GMT
  Server: lighttpd/1.4.26
its nothing I can do for you without cookies: this is some data in POST request

As you see, we do not have cookies, so our script just concatenated some string with binary data and returned that data to the client. Name used in POST parameters is actually a name of the server-side script to invoke. ID is used to select server to run this code, otherwise it will run on every server in every group.

It is now possible to run our performance testing tools against server-side scripting implementation (it is not straightforward – there is separate SRW library which implements pool of processes each of which has own initialized global context, which is copied for every incoming request and so on

We believe that numbers will be good out of the box, and of course I expect that performance will suffer compared to plain data read or write, but it should not be that slow – we still expect that every request completes within milliseconds even when it uses python to work with data.
Our microbenchmark (every command executed on server is written into log with time it took to complete) shows that timings are essentially the same – hundreds of microseconds to complete simple scripts.
So I believe with IO intensive tasks we will not be limited by python server scripts and/or various reschedulings.

I’m quite excited with idea of adding full-text (rather simple though) search for all uploaded content into elliptics.
And I’m working hard on this – expect some results really soon!

P.S. We have almost finished directory support for POHMELFS via server-side scripts, expect it quite soon also!