We’ve updated RIFT documentation at http://doc.reverbrain.com/rift:rift
It includes new Authorization header (only riftv1 method is described, S3 methods are coming), handlers ACL updates
Enjoy and stay tuned!
Rift allows you to store popular content into separate groups for caching. This is quite different from elliptics cache where data is stored in memory in segmented LRU lists. Persistent caching allows you to temporarily put your data into additional elliptics groups, which will serve IO requests. This is usually very useful for heavy content like big images or audio/video files, which are rather expensive to put into memory cache.
One can update list of objects to be cached as well as per-object list of additional groups. There is a cache.py with excessive help to work with cached keys. This tool will grab requested key from source groups and put them into caching groups as well as update special elliptics cache list object which is periodically (timeout option in cache configuration block of the Rift) checked by Rift. As soon as Rift found new keys in elliptics cache list object, it will start serving IO from those cached groups too as well as from original groups specified in the bucket. When using cache.py tool please note that its file-namespace option is actually a bucket name.
To remove object from cache one should use the same cache.py tool – it will remove data from caching groups (please note that physically removing objects from disk in elliptics may require running online eblob defragmentation) and update special elliptics cache list object. This object will be reread sometime in the future, so if requested key can not be found in cache, it will be automatically served from original bucket groups.
The second most wanted feature in elliptics HTTP frontend RIFT is ACL support.
Rift already provides S3-like buckets – namespace metadata which allows to store data in separate per-bucket unique groups, to have ability to fetch the whole list of objects stored in given bucket, and of course use the same object names stored in different namespaces. Rift also allows you to have a set of objects cached in special groups, each ‘cached’ object may have its own set of groups to check first. This option can be used to cache commonly used objects in additional groups like temporal in-memory or SSD groups.
And now I’ve added ACL support to buckets. As stated ACL is an access control list where each username is associated with secure token used to check
Authorization header and auth flags which allow to bypass some or every auth check.
Admin must setup per-bucket ACL using
rift_bucket_ctl tool where multiple
–acl option can be used. This control tool uses following format:
user is a username provided both in URI (
&user=XXX) and acl (< code>–acl XXX:token:0).
token is used to check
Here is the whole state machine of the Rift’s authentication checker (when bucket has been found and successfully read and parsed by the server):
- if group list is empty, not found error is returned
- is ACL is empty, ok is returned – there is nothing to check against
- if no user= URI parameter found, forbidden error is returned – one must provide username if ACL is configured
- user is being searched in ACL, if no match was found, forbidden error is returned
- if flags has bit 1 (starting from zero) set, this means bypass security check for given user – ok is returned
- Authorization header is being searched for, if there is no such header bad request is returned
- security data in Authorization header is being checked using secure token found in ACL entry, is auth data mismatch, forbidden is returned
- ok is returned
Result of this check can be found in log with
verdict: prefix in ERROR/INFO (1/2) log level and higher.
But even if state machine returned non-ok verdict, operation can be processed. This may happen if per-bucket ACL flags allow not all (bit 1), but only read requests (bit 0). In this case /get, /list, /download-info and other reading-only handlers will check ACL flags bits and optionally rewrite verdict. You will find something like this in logs:
[NOTICE] get-base: checked: url: /get?name=test.txt, original-verdict: 400, passed-no-auth-check
Check out more info about Rift – our full-featured elliptics HTTP frontend: http://doc.reverbrain.com/rift:rift
I often get requests on how to get a list of keys written into elliptics. Do not really understand why is this really needed especially considering storage setups where billions of keys were uploaded, but yet, this is one of the most frequently asked question.
Elliptics has secondary indexes for that purpose. Indexes are automatically sharded and evenly distributed across the nodes in the group.
One can tag own uploaded keys with special indexes and then intersect those indexes on servers or read the whole index key-by-key. That’s essentially what RIFT – http elliptics frontend does when you upload file through its HTTP interface.
And I’ve added listing support into RIFT proxy via
/list URI – it reads an index from the server, iterates over the keys and creates a nice output json. It also prints a timestamp of the key update in the index, both in seconds and current timezone.
URI accepts a
namespace – bucket name to get indexes from and
name – a placeholder for future indexes names (if we will support multiple indexes).
$ curl "http://example.com/list?namespace=testns&name="
"timestamp": "1970-01-01 03:00:00.0",
"timestamp": "2014-02-18 03:29:44.835283",
timestamp is for older indexes when timestamps were not yet supported.
key is an object name given at upload time, id is numeric elliptics ID (one can read those objects directly from elliptics without namespace name),
time_seconds is a coarse grained timeout in seconds since the Epoch.
timestamp is a real parsed timestamp with microsecond resolution.
There is also an example python script which does basically the same – reads an index, unpacks it and print to console: https://github.com/reverbrain/rift/blob/elliptics-2.25/example/listing.py