POHMELFS in the 2.6.30 kernel

Tagged:  

2.6.30 is out and contains both POHMELFS and DST in staging directory.

While DST is a rather mature project and I do not expect it to change, POHMELFS will evolve into quite different entity.
To date it is a simple NFS-like filesystem with several features, namely the most interesting are local data and metadata writeback cache, ability to write to multiple servers and balance reading among them (you can find others on the homepage). This is really not a distributed filesystem, albeit a parallel one to some degree.

And reality says that no one really wants to change their existing NFS network to something new, even with higher performance, since the old systems already work and all their underwater stones were found.

Instead POHMELFS will evolve into the real distributed area with elliptics network, which will become its distributed hash table data storage. This is the main reason I will not ask to move POHMELFS from the staging tree for some time (at least in upcoming merge window) and will change it there.

Amount of features elliptics network provides is not really comparable with the existing old-school distributed systems, this is just a completely new and very different solution.

There is a number of potentially complex parts, but they are all solvable, so stay tuned for the new results!

EDITED TO ADD: Someone created a wikipedia article with spartan description :)

Круто! Поздравляю!

Ты сам занимался продвижением или это делал кто-то другой?

Помню твои предыдущие битвы за включение в ядро.. а тут вот так сразу!

совсем не сразу :)

Было десятка два попыток, потом я забил лезть сразу в fs/, так что перенес в drivers/staging.
Пока этого достаточно, так как еще будут серьезные изменения.

Спасибо за поздравления!

в одном из обсуждений (на ЛОРе, кажется) народ уверен что это - разработка Яндекса :)

вообще, довольно забавные суждения народ на форумах высказывает от "не надо изобретать велосипед" до "одному человеку невозможно написать хоть что-то стоящее" :))

находятся, конечно, и понимающие люди, но они, как правило, молчат.

это как раз несложно :)

Hi,

subject says it all... what is all so new and very different in POHMELFS? 'am speaking from the perspective of a distributed {data structures, systems} perspective...

actually not in POHMELFS, but elliptics network, is distributed hash table approach. I do not know anything stable enough implemented using this approach. Also I do not know any POSIX FS with such backend (glusterfs 2.0 has something similar though, but my experience with older releases was not shine).

One can check other features of the storage backend on its homepage, but I would like to be corrected if this already implemented with POSIX (and/or MPI) interfaces.

POHMELFS does not yet support elliptics network backend though.

A quick google search "dht filesystems" revealed many hits.
But you are absolutely correct, I'm also not aware of any public available dht-based filesystem that is known and used "in the field".

Maybe I missed this detail, but basically what you will be doing is to overlay a tree over elleptics' DHT, don't you? If so, there must be a common ground for clients sharing a filesystem: the value of the root directory object. How will you be managing this value, or have you found a way around this value? How will you store it, and more importantly, how will you propagate updates and update-collisions?

Nevertheless, I think your road is the right one: extending DHT by user-definable functions to hash elliptics' objects and gain a lot from a "more intelligent" DHT.

Actually not that much - Magma (which is not updated since 2007 according to its homepage) and GlusterFS, which only in the 2.0 version started to support DHT, it was released a month ago.

POSIX is a challenging feature to add to the storage (even through FUSE, which is a performance killer), but I agree that there are many non-posix DHT-based storages. Elliptics network is one of them, but with additional features like different replication methods and object indentation based on user-provided data transformation functions, flexible ID assignment used for data locality and load balancing (only a single example), transaction storage with log structured updates. There are many features which might exist here and there, but I did not find the final solution.

Yes it's enough to move from an old installation to a new one!!!!

Right now I'm having huge problems to horizontal-scale our NFS servers. Without investing huge amounts of money.

Even if not distributed, parallel could help us a lot.

iirc (not 100% sure though) will not allow to read the same object from the multiple hosts for example, so while it is a good step further, still it is way far from existing parallel solutions.

Also pNFS allows private closed extensions still to be sertified as pNFS-compatible, which allows to tie solution to specific vendor.

POHMELFS, after moved to ellitpics network will allow effectively infinite scaling and parallel data processing. But, at the moment, POHMELFS looks more like NFS than distributed FS.

Is there an "aprox" schedule for a release of POHMELFS based on elliptics?

I expect to start porting POHMELFS to elliptics storage in a week and plan to fully finish (including testing) before next kernel summit, i.e. before October.

Tried to unspartan it. You're famous now. :-)

thanks for the extensions and to original author for the article :)

http://linux.slashdot.org/article.pl?sid=09/06/10/1243232

Keep up the good work. :)

congrats for inclusion in the staging directory !
you're right that no one wants to change their existing NFS setups, but don't forget about new installs though - I think lots of people are watching pohmelfs !

Gratz too, I'm actually in the process of trying to get a couple monster servers to 'mirror' each other with distributed block system, and DST + POHMELFS looks like the fix for me.. Thanks!!

(and, of course, more docs would be helpful as always ;))

DST and device mapper (or MD stack for other RAID types) should be a good pair for this task.

I'm rather bad at writing documentation, so there is not that much of additional info except tiny examples and short README.

Я там в педивикии поправил кой-чего, надеюсь, не запилят :)

Нетрудно догадаться, что ты там наисправлял :)