BoltDB is a key/value local storage with simple API but powerful transactions and ACID support.
Main goal of the project is to provide a simple, fast, and reliable database for projects that don’t require a full database server such as Postgres or MySQL. It is used in high-load production with database sizes up to 1Tb.
1Tb is not that much actually, but since its main competitors are RocksDB/LevelDB and LMDB it is quite spectacular volume. Main differences between BoltDB and RocksDB/LevelDB are on-disk structure and transaction support – RocksDB and LevelDB are designed on top of LSM trees and thus are well-suited for write load, while BoltDB uses B+-tree – this generally scales better for read workload. LevelDB doesn’t support transactions, while BoltDB does (and it uses very nice and convenient API).
Main difference between BoltDB and relation databases like Postgres or MySQL is, well, relation – BoltDB is a simple key/value store (although its API supports cursors, transactions, prefix key search).
Worth considering for local storage: https://github.com/boltdb/bolt
LevelDB (and actually RocksDB) is a very well known local storage for small records. It is implemented over log-structured merge trees and optimized for write performance.
LevelDB is quite horrible for records larger than 1Kb because of its merge operation – it quickly reaches level where it has to always merge trees and it takes seconds to complete.
But for small keys it is very useful and fast.
RebornDB is a proxy storage, which operates on top of LevelDB/RocksDB and provides Redis API to clients. Basically, it is a Redis on top of on-disk leveldb storage.
There is an interesting sharding scheme – there are 1024 slots each of which uses its own replica set, which can be configured by admin. When client writes a key, it is hashed and one of the 1024 slots is being selected (using modulo (% 1024) operation). When admin decides that some slots should be moved to new/different machine, it uses command line tool to reconfigure the proxies. During migration slots in question are available for IO, although it may span multiple servers since proxy doesn’t yet know whether required key has been or hasn’t been yet moved to the new destination.
Having many slots is a bit more flexible than old-school sharding which uses number of servers, although quite far from automatic ID range generation – manual resharding doesn’t scale for admins.
RebornDB uses zookeeper/etcd to store information about slot/server matching and per-slot replication policies. This doesn’t force every operation to contact zookeeper (this actually kills this service), instead every proxy has abovementioned info locally and every reconfiguration also updates info on every proxy.
There is not that much information about data recovery (and migration) except that it is implemented on key-by-key basis. Given that leveldb databases usually contain tens-to-hundreds millions of keys recovery may take a real while, snapshot migration is on todo list.
We have following config for Elliptics leveldb backend:
sync = 0
root = /opt/elliptics/leveldb
log = /var/log/elliptics/leveldb.log
cache_size = 64424509440
write_buffer_size = 1073741824
block_size = 10485760
max_open_files = 10000
block_restart_interval = 16
And with 1kb of pretty compressable data (ascii strings) chunks pushed into single server with 128 Gb of RAM and 4 SATA disks combined into RAID10 ends up with poor 6-7 krps.
If request rate is about 20 krps median reply time is about 7 seconds (!)
Elliptics with Eblob backend on the same machine easily handles the same load.
dstat shows that it is not disk (well, with 20 krps it is disk), but before that it is neither CPU nor disk – leveldb just doesn’t allow more than 5-7 krps with 1 Kb data chunks from parallel threads (we have 8-64 IO threads depending on config). When snappy compression is enabled things get worse.
Is it ever possible to push 20 MB/s into LevelDB with small-to-medium (1Kb) chunks?
Alexey Ozeritsky added read/del range requests into elliptics leveldb backend.
We are evaluating whether Smack backend worth staying in our tree. I created it to compete with hbase ordered columns, while it happens that leveldb makes things even better for read workload (although slower at writes).
Kudos to Alexey, elliptics with leveldb backend supports range requests now. Which means one can run queries like ‘get me all data starting from key xxxx000 upto to xxxxfff’. Although by default elliptics uses hash of the string as a key, nothing prevents users from filling ID (
struct dnet_id) manually specifying own keys. They are limited by 64 bytes by default (maximum key size is specified at compile time – 64 bytes were selected because of default sha512 hash function used).
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
Smack is a low-level backend for elliptics created with HBase sorted table in mind for maximum write performance.
Anton Kortunov wrote leveldb backend last friday, and here are performance results of both.
Smack was not tuned for maximum write performance, instead I tried to increase read speed, write performance can easily be 20 krps, but with 300-500 read rps only. Smack uses default zlib compression, LevelDB uses Snappy.
We wrote about 70 millions of records (300-900 compressable bytes) into single node with 64 Gb of ram. Above numbers correspond pretty much to page cache IO (leveldb directory is about 30Gb in size).
I’m curious whether we might drop smack backend in favour of well-supported leveldb code?..