ioremap.net

Storage and beyond

Elliptics: datacenters, open functionality, changes, future

Our team has grown up a little and we started to make things faster.

External projects

First, our friends in Yandex open-sourced FCGI management daemon: fastcgi-daemon
It allows a rapid deployment of C++ FCGI applications and has flexible configuration and management tools.
Kudos to Ilya Golubtsov and Vasily Kiritsev from Yandex team.

The first external elliptics application is HTTP proxy written on top of fastcgi-daemon by Leonid Movsesjan from Yandex team.
Source code contains example configs, which allow to read/write/delete data, get direct link to the object on the server in cluster, which can be streamlined or downloaded directly by client.
It easily handles 1000 random read RPS being connected to 2 elliptics nodes, where each request is handled just within tens of milliseconds. And although data set is not very large (just hundreds of thousands of small files from 10 to 50 kbytes each), it is still a spectacular number.

Datacenters

in a meantime our production clusters grew up. We will get first abroad datacenter this year, which will host content for international users. This cluster currently hosts about 150 millions of rather small objects (10-50 kbytes each) with 3 copies in physically distributed datacenters.
Our second production cluster comes close to half-petabyte scale (4 datacenters, about 200 elliptics nodes), its network load reaches 3.5 gbit/s daytime, although this scale took a fair technical price – we had to rewrite IO model again.

IO thread models

We recently switched IO model in elliptics core from thread-per-client to single network IO thread, which implements non-blocking reactor, and pool of disk IO threads. Elliptics maintains O(1) lookup times which implies that every node knows about every other, which in turn requires N^2 network connections in cluster and thus the same number of IO threads in thread-per-client model.
For 200 nodes this is 4k threads/connections, each machine with 24 disks starts 24 elliptics nodes, ending up with almost 100.000 sockets and threads in peaks. Linux just can not start that much threads by default (we used 2.6.32 and .38 kernels) on 64-bit servers with 24 Gb of RAM.
And even if we tune some things, it will stop at 500 nodes limit and so forth. Eventually this is a dead end.

Now we use IO thread pool and network processing thread, which handles sockets via epoll(). We dropped libevent and friends, since it is still extremely unconvenient and ugly to add events from one thread into other’s processing loops.

Future plans

We started to think about secondary indexes in elliptics network. We can easily build slow-enough map-reduce-like processing on top of our data, but for example it will not quickly return prefix-search data, although we can run brute-force regex-like search.

I will continue PAXOS implementation – we do want to build Chubby-like service for small synchronized data storages as well as locks and counters. Combined with elliptics it should provide a full operation stack for data storage.

POHMELFS is not dead, IT IS NOT DEAD, please remember this :)
I’m just a little bit busy with things.

I think about dropping transactions support in elliptics and always perform rewrite-in-place. Ext* filesystems live without this feature compared to Btrfs and everyone is happy. Well, this is a cool feature, but hardly needed for everyone.

Hmm, I bought myself a small Yamaha KX-49 MIDI-keyboard and it even works in Linux with Rosegarden (modulo non-working controls) and continue to play trumpet, but this is another story.
I will try to post more regulary here, although most of the time twitter wins :)

, ,

Comments are currently closed.

2 Responses to “Elliptics: datacenters, open functionality, changes, future”

  • Anonymous says:

    А есть документация какая нибудь по настройке elliptics? пробовал запустить сервер из elliptics/example всё хорошо компилируется и запускается а вот пробую записать файл не получается что не находит файл…
    dnet_ioclient -r 127.0.0.1:1025:2 -W /home/test.txt если я правильно понял как записать файл в хранилище…

  • zbr says:

    Мы работаем над красивой документацией, но пока есть только example/ioserv.conf – конфиг для сервера, который можно подкуручивать под собственные нужды
    И примеры для клиента, но проще запускать fcgi прокси, сейчас их две, и их опции описаны в соответствующих примерах конфигов. Скоро останется только та, что живет на базе fastcgi-daemon