Elliptics vs HBase on hundreds of millions of small records
We recently run a test on 215 millions of production records, which we wanted to store on single server.
Objects are rather small about 200 bytes, server hardware is quite common in our environment: 24 Gb of RAM, handful of mostly unused CPU cores, 4 SATA disks of 1-2 Tb in RAD10
Unfortunately due to configuration issues there were only 2 working disks in elliptics server (raid near layout sucks), while HBase had fair load spread over all 4 disks
Usage case: random reads, random range reads.
Out of the box elliptics _yet_ does not provide good enough on-disk index, so it is usually a binary search, which is costly. So we warmed index cache files by reading them into /dev/null
Upload speed was roughly the same in HBase and Ellipitcs – 2.5-3 Krps. But using HBase batch upload it was possible to write data upto 30.000 objects per second, object were packed into 5000 blocks. Elliptics does not yet support such batch upload mode.
So, reading. First of all, we started plain run of 200 random IO rps test. Its duration was about 10 minutes.
Elliptics showed about 30ms median reply times. HBase was closer to 100-150 ms.
Elliptics data (timing scale is wrong, though, median time is 30 ms)
Second test was set to find maxium RPS rate of random IOs starting from cold caches and data.
I will not post graphs though, only numbers. Graphs on demand.
Elliptics showed 115 RPS within 100 ms each.
HBase reached almost 300 RPS, but with 100-200 ms median.
And that was with only 2 working disks in elliptics setup (blame on me), while HBase used 4 disks in RAID. Here are proof pictures (first one is elliptics dstat, second – HBase)
HBase also supports index compression, which ended up with 1500 RPS of random IO within 100 ms. Pretty good numbers for single server with 215 millions of records. Our objects are easily compressable, so HBase’s data base only occupied about 15 GBs of space.
Elliptics does not compress index, only data, so compression would not help us. Instead we plan to replace current index (binary search on disk, roughly the same happens in every filesystem with b-tree, when you are looking for a file by name). New scheme will cache each N’th key in ram with specifying ID range stored behind given key. Keys are rather small – 64 bytes by default (specified at compile-time), so we could pack a bunch of them into 4k page for example. Reading this page from disk will take single IO, and searching for the key in this page (read into RAM) will be very fast.
This only optimization will kick hbase’s ass I think. Getting that HBase can not safely live in multi-datacenter environment, elliptics will fit all needs for huge-number-of-small-records data storage.
But there is also a plan to add bloom filter for faster detection whether given key is present in blob or not.
Since blob file is ‘closed’ for writes after reaching maximum size or number of records, and no more writes goes into it except when overwrite is turned on or deletions sets the bit, but both do not change keys, it is possible to create rather static and very optimized index. This comes after HBase index design acutally, when its blocks are immutable and so are index ranges.
We currently cache in RAM each key we read from disk, but practice shows that overwhelming number of old enough records (so theirs keys moved to disk from ram) are never (or extremely rarely) read again, so we will add cache timeout for keys to drop them back to disk.