Yfrog presentation about HBASE usage
They removed my drawings from yfrog, so I will say some words about their presentation – mostly about how great we are
First, its setup – 60 servers for 250 millions of photos. I do not really know whether this number includes number of copies, but we use ONE server with elliptics to host 200+ millions of small objects. And we actually have 3 copies, which turns out to 600 millions of records in 3 different datacenters. Well, that server has 48 Gb of RAM and 24 disks in RAID6 – kinda big box.
But our 1 Pb cluster contains not only those machines, but also smaller ones with 4 disks and smaller amount of RAM.
Metadata cluster with 1 Bn of records and 20 nodes is much more interesting, but I’m afraid most of it fits RAM, since metadata records are small.
Presentation speaks about 10 krps – great number, but wait a minute – from 20 or even 60 nodes? Above mentioned setup handles 3krps (2 krps write and 1 krps read) easily. And I do not call it ‘Super fast!’
We tested Cassandra in a lab, it handled 2 krps great (1400 rps read, 600 rps write) from 4 nodes (24 Gb of RAM, 4-disk RAID10, 15 Gb of data on every node), but only until data fits memory. Overall it was not very stable and required huge amount of tuning, so that it wouldn’t end up in OOM or JVM GC killer.
Huge MongoDB tests are on the road.
Using eblob as low-level IO backend disk bottleneck was pushed away, but it adds limitation on number of records stored effectively – we store key index in RAM. Now when I added on-disk index we are about to retest our benchmarks, which will likely show performance degradation. Although this change allows to put more object into single node and do not heavily depend on amount of RAM, it may be slow enough.
Next change will be to use fast in-memory index for frequently used keys.
Our main disadvantage is lack of documentation. Nobody uses elliptics except our company friends, who bite me about how things are implemented and their meaning. But this will change very soon! We are working, and that’s actually kinda harder than code :)