Kyoto Cabinet in Elliptics network

Tagged:  

We started to use Kyoto Cabinet database in elliptics distributed hash table storage, and it shows a fair performance as well as stability.

Currently our test system uploaded more than 2 Tb of data (in 2 copies, about 200 millions of objects) and database size is close to 30 Gb. We use small 4-nodes cluster (2 groups in 2 different datacenters, each DC contains just 2 nodes). Data is stored in eblob storage with its single-seek guarantee to get data from disk.

Transaction logs and metadata (object names, reference counters, number of copies and so on) are stored in Kyoto Cabinet, since this information is accessed only during recovery or fsck, which is rather rare operations compared to data access.
Thus we do not want its index to be stored in memory, which is used in eblob.

One of the tests related to recovery performance is parallel database reading. Local check of 20 millions records took about 6 minutes when started with 50 threads. BerkeleyDB was order of magnitude slower (even when configured with enough number of locks, which is trick business by itself).

Kyoto Cabinet has some misconcepts though, for example it uses RW locks to guard multi-thread IO operations, and when disk arrays are getting saturated, applications starts burning CPUs instead of silently waiting for IO completion. It also always opens database files in read-write mode, which does not allow to run 'external' applications without granting them full access.

But those are quite minor issues. If read/write tests (run in a few days I believe) shows stable performance and stability, like what we observe now, we will start uploading more production data. This time into huge disk arrays, since people expect virtually infinite amount of storage out of elliptics :)

And what your final conclusion?
Did you started to use Kyoto Cabinet or it was discarded for some reasons?
Did you make comparison with Google's Leveldb?

We dropped KC support for metadata storage, since on our volumes it unfortunately behaves too slow
Each metadata record is about 200-500 bytes and having more than several millions of records ends up with only dozen or so rps for random read requests

Also at this point it frequently corrupts database, so subsequent start will try to fix it, which takes hours and does not always complete it. Using dump-and-restore frequently fixes it, but we could not recovers database couple of times, fortunately it is not a major problem - we have replicas.

We did not compare it to leveldb directly, but used leveldb in another project, where it showed _extremely_ slow split/merge process, when low levels were compacted. We ended up with a rule of thumb that while leveldb fits the ram, it is ok, otherwise it should be split to separate databases (on different nodes).