Tag Archives: rift

Elliptics, golang, GC and performance

Elliptics distributed storage has native C/C++ client API as well as Python (comes with elliptics sources) and Golang bindings.

There is also elliptics http proxy Rift.

I like golang because of its static type system, garbage collection and built in lightweight threading model. Let’s test HTTP proxying capabilities against Elliptics node. I already tested Elliptics cache purely against native C++ client, it showed impressive 2 millions requests per second from 10 nodes, or about 200-220 krps per node using native API (very small upto 100 bytes requests), what would be HTTP proxying numbers?

First, I ran single client, single Rift proxy, single elliptics node test. After some tuning I got 23 krps for random writes of 1k-5k bytes (very real load) per request. I tested 2 cases when elliptics node and rift server were on the same machine and on different physical servers. Maximum latencies with 98% percentile were about 25ms at the end of the test (about 23 krps) and 0-3 ms at 18 krps not counting rare spikes at graph below.

elliptics-cache-rift-23krpsRift HTTP proxy writing data into elliptics cache, 1k-5k bytes per request

Second, I tested a simple golang HTTP proxy with the same setup – single elliptics node, single proxy node and Yandex Tank benchmark tool.

I ran tests using the following setups: golang 1.2 with GC=100 and GC=off and golang 1.3 with the same garbage collection settings. Results are impressive: without garbage collection (GC=ff) golang 1.3 test ran with the same RPS and latencies as native C++ client. Although proxy ate 90+ Gb of RAm. Golang 1.2 showed 20% worse numbers.

elliptics-cache-golang-1.3-gc-offGolang HTTP proxy (turned off garbage collection) writing data into elliptics cache, 1k-5k bytes per request

Turning garbage collection on with GC=100 setting lead to much worse results than native C++ client but yet it is quite impressive. I got the same RPS numbers for this test of about 23 krps, but latencies at the 20 krps were close to 80-100 msecs, and about 20-40 msecs at the middle of the test. Golang 1.2 showed 30-50% worse results here.

elliptics-cache-golang-1.3-gc-100Golang HTTP proxy (GC=100 garbage collection setting) writing data into elliptics cache, 1k-5k bytes per request

Numbers are not that bad for single-node setup. Writing asynchronous parallel code in Golang is incredibly simpler than that in C++ with its forest of callbacks. So I will stick to Golang for the network async code for now. Will wait for Rust to stabilize though.

RIFT is now fully API backed: buckets, acl, bucket directories, listing and so on

This day has come – we made all RIFT – elliptics HTTP frontend – features (written in the title and more others) accessible via REST APIs.

It required URL format changes, and now URLs are much more like S3 and REST in general:
http://reverbrain.com/get/example.txt?bucket=testns

Main base stone of the RIFT – bucket – a metadata entity which shows where your data lives (group list) and how to access it (ACLs) also hosts a secondary index of the keys uploaded into that bucket (if configured to do so).
Now we have bucket directory – entity which lists your buckets.

Buckets, directories, files and indexes – everything can be created, processed and deleted via REST API calls.
Basically, RIFT + elliptics allow you to create your own private cloud storage and put your data replicas into safe locations you like.

It is like having your own Amazon S3 in the pocket :)

Soon we will set up a test cloud at reverbrain.com where everyone can check our technologies before digging deeper you will be able to create (limited) buckets and upload/download data, which will be stored in Germany and Russia for limited period of time.

For more details about RIFT please check our documentation page: http://doc.reverbrain.com/rift:rift

Stay tuned!

Rift persistent caching

Rift allows you to store popular content into separate groups for caching. This is quite different from elliptics cache where data is stored in memory in segmented LRU lists. Persistent caching allows you to temporarily put your data into additional elliptics groups, which will serve IO requests. This is usually very useful for heavy content like big images or audio/video files, which are rather expensive to put into memory cache.

One can update list of objects to be cached as well as per-object list of additional groups. There is a cache.py with excessive help to work with cached keys. This tool will grab requested key from source groups and put them into caching groups as well as update special elliptics cache list object which is periodically (timeout option in cache configuration block of the Rift) checked by Rift. As soon as Rift found new keys in elliptics cache list object, it will start serving IO from those cached groups too as well as from original groups specified in the bucket. When using cache.py tool please note that its file-namespace option is actually a bucket name.

To remove object from cache one should use the same cache.py tool – it will remove data from caching groups (please note that physically removing objects from disk in elliptics may require running online eblob defragmentation) and update special elliptics cache list object. This object will be reread sometime in the future, so if requested key can not be found in cache, it will be automatically served from original bucket groups.

Elliptics HTTP frontend RIFT got full per-bucket ACL support

The second most wanted feature in elliptics HTTP frontend RIFT is ACL support.

Rift already provides S3-like buckets – namespace metadata which allows to store data in separate per-bucket unique groups, to have ability to fetch the whole list of objects stored in given bucket, and of course use the same object names stored in different namespaces. Rift also allows you to have a set of objects cached in special groups, each ‘cached’ object may have its own set of groups to check first. This option can be used to cache commonly used objects in additional groups like temporal in-memory or SSD groups.

And now I’ve added ACL support to buckets. As stated ACL is an access control list where each username is associated with secure token used to check Authorization header and auth flags which allow to bypass some or every auth check.

Admin must setup per-bucket ACL using rift_bucket_ctl tool where multiple –acl option can be used. This control tool uses following format: user:secure-token:flags

user is a username provided both in URI (&user=XXX) and acl (< code>–acl XXX:token:0). token is used to check Authorization header.

Here is the whole state machine of the Rift’s authentication checker (when bucket has been found and successfully read and parsed by the server):

  1. if group list is empty, not found error is returned
  2. is ACL is empty, ok is returned – there is nothing to check against
  3. if no user= URI parameter found, forbidden error is returned – one must provide username if ACL is configured
  4. user is being searched in ACL, if no match was found, forbidden error is returned
  5. if flags has bit 1 (starting from zero) set, this means bypass security check for given user – ok is returned
  6. Authorization header is being searched for, if there is no such header bad request is returned
  7. security data in Authorization header is being checked using secure token found in ACL entry, is auth data mismatch, forbidden is returned
  8. ok is returned

Result of this check can be found in log with verdict: prefix in ERROR/INFO (1/2) log level and higher.

But even if state machine returned non-ok verdict, operation can be processed. This may happen if per-bucket ACL flags allow not all (bit 1), but only read requests (bit 0). In this case /get, /list, /download-info and other reading-only handlers will check ACL flags bits and optionally rewrite verdict. You will find something like this in logs:

[NOTICE] get-base: checked: url: /get?name=test.txt, original-verdict: 400, passed-no-auth-check

That’s it.
Check out more info about Rift – our full-featured elliptics HTTP frontend: http://doc.reverbrain.com/rift:rift

Listing of the keys uploaded into elliptics

I often get requests on how to get a list of keys written into elliptics. Do not really understand why is this really needed especially considering storage setups where billions of keys were uploaded, but yet, this is one of the most frequently asked question.

Elliptics has secondary indexes for that purpose. Indexes are automatically sharded and evenly distributed across the nodes in the group.

One can tag own uploaded keys with special indexes and then intersect those indexes on servers or read the whole index key-by-key. That’s essentially what RIFT – http elliptics frontend does when you upload file through its HTTP interface.

And I’ve added listing support into RIFT proxy via /list URI – it reads an index from the server, iterates over the keys and creates a nice output json. It also prints a timestamp of the key update in the index, both in seconds and current timezone.

URI accepts a namespace – bucket name to get indexes from and name – a placeholder for future indexes names (if we will support multiple indexes).

$ curl "http://example.com/list?namespace=testns&name="

{
    "indexes": [
        {
            "id": "4e040aa8a798d04d56548d4917460f5759434fdf3ed948fd1cf35fd314cad3290e69b80deb0fc9b87a6bfbcbd08583919eb5b966658b3ed65e127236e1632525",
            "key": "test1",
            "timestamp": "1970-01-01 03:00:00.0",
            "time_seconds": "0"
        },
        {
            "id": "e5b7143155f46c9e9023cbf5e04be7276ae2e9a7583fee655c32aaff39755fa213468217291f0e08428a787bf282b416be1d26a5211f244fc66d1ce8ce545382",
            "key": "test7",
            "timestamp": "2014-02-18 03:29:44.835283",
            "time_seconds": "1392679784"
        }
    ]
}

Zero timestamp is for older indexes when timestamps were not yet supported. key is an object name given at upload time, id is numeric elliptics ID (one can read those objects directly from elliptics without namespace name), time_seconds is a coarse grained timeout in seconds since the Epoch. timestamp is a real parsed timestamp with microsecond resolution.

There is also an example python script which does basically the same – reads an index, unpacks it and print to console: https://github.com/reverbrain/rift/blob/elliptics-2.25/example/listing.py

New elliptics HTTP proxy, authentification, caching and Go bindings

I separated elliptics HTTP proxy from our high-performance server HTTP framework TheVoid. TheVoid continues to be a framework for writing HTTP servers in C++, whlist Rift becomes elliptics HTTP access point.

Rift support usual object upload/get/removal as well as upload/download flow control. The latter (soon will be default and the only possible mode) is basically an arbiter who doesn’t allow to read more data from client if current chunk hasn’t been yet written. It uses chunked upload and download. Rift supports range request (Range: HTTP header).

There is basic authentification support in the Rift. I will extend it to be per-bucket fashion similar to what Amazon S3 has (not the same API though). Rift also support multiple-groups caching, this is extremely useful for bigger content, when you suddenly decided that given objects has to be spread into many groups instead of just those originally written into. There is ‘a button’ (basically a python script) which copies given keys from theirs original groups into caching and broadcasts updates to all Rift proxies via updating special keys which are periodically checked by the proxies. Caching can be turned on and off on per-key basis.

One can create spacial SSD caching groups for example and put the needed files for some time. Or those can be commodity spinning disks for larger files like video content.
More details on this later at documentation site.

Everything above and several new features will be available both in Rift proxies and our new cluster we currently develop. Not that it will be plain Amazon S3, but something similar. More details later this year :)

And right now one can check out new Go elliptics bindings project started by Anton Tyurin. That will be a base for our HTTP entry point.