POHMELFS and elliptics server-side
POHMELFS in a meantime got ‘noreadcsum’ mount option support as well as some other tricky bits.
This bumps read performance several times, since eblob (elliptics backend) stores data in contiguous chunks increasing read/write performance, but optionally forcing to copy data, when prepared space is not enough.
Reading with csum enabled forces to checksum the whole written area, which in turn requires to populate it from disk to page cache and so forth. For gigabyte file this takes about 5-6 seconds (first time).
And that’s only to read one page. I.e. to read every page (or readahead requested chunk).
Not sure that disabling read checksumming is a good idea, so I made this a mount option. Maybe eventually we end up with some better solution.
I also fixed nasty remount bug in POHMELFS which uncovered a really unexpected (for me at least) behaviour of Linux VFS.
Every inode may have
->drop_inode() callback, which is called each time its reference counter reaches 0 and inode is about to be freed.
But sometimes when inode was recently accessed, it is not evicted, but placed into lru list with special bits set.
Inode cache shrink code (invoked from umount path in particular, but likely may be called from memory shrink path too) grabs all inodes in that lru list and later calls plain
iput() on them, which in turn invokes
->drop_callback() for inode in question.
Thus it is possible to get multiple invocation of callback in question without reinitializing inode between them. This crashed pohmelfs in some cases. Now it is fixed with appropriate comment in the code, but I’m wonder how many other such tricks are yet to discover?
POHMELFS is a great tester for elliptics server-side scripting support. I was lazy and put all somewhat complex processing into server-side scripts written in Python. Anton Kortunov implemented simple sstable-like structure in Python for directory entries used by pohmelfs.
Since every command is processed atomically (on single replica) in elliptics, we can put complex directory update mechanism in this ‘kind-of-transactions’. In particular server-side scripting is used to insert and remove inodes from directory. Lookup is also implemented using server-side scripting – we read directory inode in python code, search for requested name, and return inode information if something is found, which is sent back to pohmelfs.
Overall this takes about 2-13 msecs. I.e. receive command from pohmelfs, ‘forward’ it to pool of python executers (srw project), where python code will read directory inode data from elliptics (using
elliptics_node.read_data_wait()), search for inode with given name there and send it back to pohmelfs.
Insert takes about 30-150 msecs – script reads directory content, adds new entry (or update old) and then writes it back into the storage.
That’s how it looks in python –
Given that we spend 10 msecs in such not really trivial piece of code, I believe that my implementation is actually not that bad.
Those are recent news. Stay tuned for more!