Hi, this is rbtz speaking again. I’m the engineer responsible for eblob codebase for
almost a year now. Here is small recap of what was happening with eblob
since v0.17.2 with some commit highlights.
* Eblob now builds under Mac OS X. This improved experience of developers with Macs.
* Changed most links to point to newly created http://reverbrain.com.
* Added comments to all eblob subsystems: e254fc3. This improves learning curve of new developers.
* Added l2hash support: c8fa62c. This reduces memory consumption of elliptics metadata .by 25% on LP64.
* Added first edition of eblob stress test. Year after it’s responsible for catching 99% bugs that otherwise would go to testing: 8eab8ed.
* Added config variables for index block and bloom: a106d9d. This allows sysadmins to limit memory occupied by bloom filter.
* Added config variable to limit total blob size: f7da001. This allows sysadmins to limit eblobs size in case many databases are located on one shared drive.
* Reduce memory consumption of “unsorted” blobs by 20% on LP64: 19e8612
* First static analyzer crusade (feat. clang static analyzer) – number of “almost impossible to spot” bugs found.
* Added data-sort and binlog v1. This allows “on the fly” eblob defragmtntation and memory cleanups.
* Added travis-ci tests after each commit: f08fea2.
* Removed custom in-memory cache in favor of OS page cache: a7e74a7; This removed number of nasty races in eblob code and also opened way for some future optimizations.
* Added Doxyfile stub, so that in future libeblob man pages may be autogenerated: aac9cb3.
* Decreased memory consumption of in-memory data structures by 10% on LP64: c6afffa.
* Replaced core mutexes with rwlocks; This improves out Intel vTune concurrency benchmarks, along with our QA tests.
* Second static analyzer crusade (feat. Coverity);
* Switched to <a href=”https://en.wikipedia.org/wiki/Spinlock#Alternatives”>adaptive mutexes</a> when available: 43b35d8.
* Speeded up statistics update v1: 40a60d7. Do not hold global lock while computing and writing stats to disk.
* Rewritten bloom filter v1: 6f08e07. This improves speed and reduces memory fragmentation.
* Allocate index blocks in one big chunk instead of millions of small, thus speeding up initialization and reducing memory fragmentation: b87e273.
* Do not hold global lock for the whole duration of sync(): 6f6be68. This removes “stalls” in configs where
sync > 0.
* Switched to POSIX.1-2008 + XSI extensions: 6ece045.
* Build with -Wextra and -Wall by default: 0e8c713. This should in long term substantially improve code quality.
* Added options to build with hardening and sanitizers: c8b8a34, 2d8a42c. This improves our internal automated tests.
* Do not set bloom filter bits on start on removed entries: 36e7750. This will improve lookup times of “long removed” but still not defragmentated entries.
* Added separate thread for small periodic tasks: ea17fc0. This in future can be upgraded to simple background task manager;
statvfs(3) to periodic thread which speeds up write-only micro benchmarks by 50%: f36ab9d.
* Lock database on init to prevent data corruption by simultanious accesses to the same database by different processes: 5e5039d. See more about EB0000 in kb article.
* Removed columns aka types: 6b1f173; This greatly simplifies code and as side effect improves elliptics memory usage and startup times;
* Removed compression: 35ac55f; This removes dependency on
bsize knob for write alignment: 8d87b32;
* Rewritten stats v2: 94c85ec; Now stats update very lightweight and atomic;
* Added writev(2)-like interface to eblob, so that elliptics backend could implement very efficient metadata handling: b9e0391;
* Replaced complex binlog with very tiny binlog v2: 1dde6f3; This greatly simplifies code, improves data-sort speed and memory efficiency;
* Made tests multithreaded: 1bd2f43. Now we can spot even more errors via automated tests before they hit staging.
* Move to
GNU99 standard: f65955a. It’s already 15 years old already =)
* Fixed very old bug with log mesage truncation/corruption on multithreaded workloads: 10b6d47.
* Bloom filter rewrite v2: 1bfadaf. Now we use many hash functions instead of one thus trading CPU time for improved IO efficiency. This improved bloom filter efficiency by order of magnitude.
* Merge small blobs into one on defrag: ace7ca7. This improves eblob performance on databases with high record rotation maintaining almost fixed number of blobs.
* Added record record validity check on start: bcdb0be; See more about database corruption EB0001 in kb article.
* More robust
eblob_merge tool that can be used to recover corrupted blobs.
* Reduced memory consumption of in-memory data-structures by 10% on LP64: e851820;
* Added schedule-based data-sort: 2f457b8; More on this topic in previous post: data-sort implications on eblob performance.
Here I’ve mentioned only most notable commits, mostly performance and memory usage oriented changes. There are of course lots of other stuff going on like bugfixes, minor usability improvements and some internal changes.
Here are some basic stats for this year:
Total commits: 1375
Stats total: 65 files changed, 8057 insertions(+), 4670 deletions(-)
Stats excl. docs, tests and ci: 39 files changed, 5368 insertions(+), 3782 deletions(-)
Also if you are interested in whats going to happen in near future in eblob world you should probably take a look into it’s roadmap.
By the way for those of you who is interested in numbers and pretty graphs – after recent upgrade of our internal elliptics cluster storing billions of records to new LTS releases of elliptics 2.24.X and eblob 0.22.X we’ve got:
Response time reduction (log scale):
Disk IO (linear scale):
Memory (linear scale):