Elliptics: datacenters, open functionality, changes, future
Our team has grown up a little and we started to make things faster.
First, our friends in Yandex open-sourced FCGI management daemon: fastcgi-daemon
It allows a rapid deployment of C++ FCGI applications and has flexible configuration and management tools.
Kudos to Ilya Golubtsov and Vasily Kiritsev from Yandex team.
The first external elliptics application is HTTP proxy written on top of fastcgi-daemon by Leonid Movsesjan from Yandex team.
Source code contains example configs, which allow to read/write/delete data, get direct link to the object on the server in cluster, which can be streamlined or downloaded directly by client.
It easily handles 1000 random read RPS being connected to 2 elliptics nodes, where each request is handled just within tens of milliseconds. And although data set is not very large (just hundreds of thousands of small files from 10 to 50 kbytes each), it is still a spectacular number.
in a meantime our production clusters grew up. We will get first abroad datacenter this year, which will host content for international users. This cluster currently hosts about 150 millions of rather small objects (10-50 kbytes each) with 3 copies in physically distributed datacenters.
Our second production cluster comes close to half-petabyte scale (4 datacenters, about 200 elliptics nodes), its network load reaches 3.5 gbit/s daytime, although this scale took a fair technical price – we had to rewrite IO model again.
IO thread models
We recently switched IO model in elliptics core from thread-per-client to single network IO thread, which implements non-blocking reactor, and pool of disk IO threads. Elliptics maintains O(1) lookup times which implies that every node knows about every other, which in turn requires N^2 network connections in cluster and thus the same number of IO threads in thread-per-client model.
For 200 nodes this is 4k threads/connections, each machine with 24 disks starts 24 elliptics nodes, ending up with almost 100.000 sockets and threads in peaks. Linux just can not start that much threads by default (we used 2.6.32 and .38 kernels) on 64-bit servers with 24 Gb of RAM.
And even if we tune some things, it will stop at 500 nodes limit and so forth. Eventually this is a dead end.
Now we use IO thread pool and network processing thread, which handles sockets via
epoll(). We dropped
libevent and friends, since it is still extremely unconvenient and ugly to add events from one thread into other’s processing loops.
We started to think about secondary indexes in elliptics network. We can easily build slow-enough map-reduce-like processing on top of our data, but for example it will not quickly return prefix-search data, although we can run brute-force regex-like search.
I will continue PAXOS implementation – we do want to build Chubby-like service for small synchronized data storages as well as locks and counters. Combined with elliptics it should provide a full operation stack for data storage.
POHMELFS is not dead, IT IS NOT DEAD, please remember this :)
I’m just a little bit busy with things.
I think about dropping transactions support in elliptics and always perform rewrite-in-place. Ext* filesystems live without this feature compared to Btrfs and everyone is happy. Well, this is a cool feature, but hardly needed for everyone.
Hmm, I bought myself a small Yamaha KX-49 MIDI-keyboard and it even works in Linux with Rosegarden (modulo non-working controls) and continue to play trumpet, but this is another story.
I will try to post more regulary here, although most of the time twitter wins :)