LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
Smack is a low-level backend for elliptics created with HBase sorted table in mind for maximum write performance.
Anton Kortunov wrote leveldb backend last friday, and here are performance results of both.
Smack was not tuned for maximum write performance, instead I tried to increase read speed, write performance can easily be 20 krps, but with 300-500 read rps only. Smack uses default zlib compression, LevelDB uses Snappy.
We wrote about 70 millions of records (300-900 compressable bytes) into single node with 64 Gb of ram. Above numbers correspond pretty much to page cache IO (leveldb directory is about 30Gb in size).
I’m curious whether we might drop smack backend in favour of well-supported leveldb code?..
I’m please to announce new backend for elliptics codenamed Smack.
It was created to host huge amount of rather small compressible data in elliptics, backend architecture was implemented with HBase in mind, but some changes were tested and made different. This is actually a fourth implementation of backend :)
So, test case: single generator machine (likely this is the bottleneck), 6 different smack backends (each one uses different compression algorithm), ~500 millions of records on each node (300-900 bytes each, maybe there are bigger keys)
64 Gb of RAM (lxctl virtual hosts), 4-way raid10 storage with ext4.
Data is compressed and sorted by key – you get HBase-like scans for free (although this is not yet exported to elliptics API), maybe we will add columns, although none uses this is in practice.
So, here is first graph – write performance
Total this database does not fit RAM on each node – its size is about 70-90 Gb per node, while we only have 64Gb of memory. So reads frequently go to disk, this has to be taken into account
So read performance (separate for each compression node)
So far this backend does not allow data recovery – yup, we do not have enough hands for this – new recovery process in elliptics will see the light later this year. But it definitely will.
Smack is therefore is recommended for cache-like operations or when you do know that your data is safe – using multiple replicas will not guarantee data consistency among them. But it is only for now, we will further test this…
Source code: https://github.com/ioremap/smack
Stay tuned, we will show new results really soon!