POHMELFS update and thoughts on moving POHMELFS and DST outside of the staging tree
Synced local tree with changes made in staging tree in vanilla kernel.
Patch is rather small, but includes several bugfixes and command extension made by Pierpaolo Giacomin (yrz_anche.no), which allows to dump and delete all configured indexes.
It is already included in the staging tree and will be pushed upstream when merge window is opened.
This brings us a question whether DST and POHMELFS should be pushed out of the staging tree into main branch.
DST is a block level network device. It has fair number of interesting features like reconnection, large IO support, no need to copy data from userspace, but overall it is still simple point-to-point network block device. My opinion is that it is not really needed in the modern environment.
POHMELFS is a distributed parallel filesystem. Its current state is closer to parallel NFS than to real distributed filesystem like Lustre. But I start integration with the elliptics network, which is a real distributed network hash table storage, which will put POHMELFS to the completely new storage level not actually accessible by existing distributed filesystems. Such storages were only made for extremely huge amount of web 2.0 data, which does not require POSIX and ability to work with the storage as a convenient filesystem. Contrary, existing distributed filesystem are mostly made for the non-faulty environment, i.e. where network does not dissapear frequently, where dedicated servers do not break frequently and so on, where ‘frequently’ is rather subjective measure, for example I work with people who deliberately break network connectivity between major parts of its infrastructure to be sure that system continues to work as expected once per several days. How do you expect cool-named vendors and bought solutions work in that environment?
I designed elliptics network without any assumptions and requirements for stability of the subsystems. The same should be done for POHMELFS, which basically means that whle its network protocol and existing usage model will be completely changed.
So, I’m rather stumbled upon pushing both projects out of staging tree. DST is likely not needed in the vanilla tree, while POHMELFS will be changed dramatically in several days and weeks (but probably I will not complete it before 2.6.31 kernel release and subsequent merge window).
It is possible though to move POHMELFS into fs/, but add a huge warning during module load, which will scream, that POHMELFS will be changed completely in the next kernel version and will not be compatible with the existing usage case.
Opinions?
Climbing: the place of the pain 2.6.1 elliptics network release
Comments are currently closed.

I think DST does have a role in the modern environment simply because of the fact that it’s network agnostic. The TCP-based NBD device isn’t going to have high enough performance for many high-bandwidth I/O streams. Being able to configure the network transport to use something like DCCP is extremely valuable. Nice job, please continue the work!
At first blush it may seem that AoE/vblade are a viable high-throughput network block device, but the AoE development effort seems to have stalled in the kernel and vblade is not a high performance server. Again, DST promises to be a viable alternative.
Hi, I think DST is a good, I haven’t tried it this role yet but I think it would be good as network swap device for a heavily used server. It would be a better alternative to NBD and is newer as well, push it into mainline.
Brian
I agree with Brian regarding DST. I would suggest keeping POHMELFS in staging until the major architectual changes are completed.
My opinion is to not push incomplete/not stable projects into the mainline. So DST is OK, but POHMELFS may wait for a bit.
link as requested: http://lkml.org/lkml/2009/9/3/3 I think DST has made headway over NBD and AoE, you have probably developed DST as experience and moved on to more complex projects, but I think it is very useful to have something like DST in the kernel even if it is only reference for other developers, as the client/server are in kernel and is very well coded.
My thoughts for DST are low end, a number of cheap ramdisk backed DST nodes supporting a powerful server with swap space, so no application level client/server between machines, can have large database in memory on server with overflow going to swap space on nodes, should have good latency.
I’ll maintain DST just to keep it alive as I think it has good value.
Brian
Hi, looking at LKML the .32 kernel release mentions that DST will be included in .32 but dropped in .33 because it appears to be dead, I am wondering why? isn’t DST pretty stable and most of the work is finished so should be ok to use apart from bugs that will appear from time to time like all drivers? I’ll volunteer to maintain DST if that is the problem, I can write some thorough documentation to help give a better understanding of DST, thoughts??
Brian
Hard to tell if it is dead or not. I just do not see why it can be useful for people.
Frankly, no one uses NBD now. There is DRBD, which fills very small and shrinking niche, and that’s all.
For high-end production system there is iSCSI, which albeit has problems is kind of a standard. AoE is another niche product, which development does not look active either.
DST solves many of the problems in the low-end peer-to-peer storage systems, but still it is not a step forward, but instead staying on the same place making no headway. This is a dying storage design and quite soon it won’t be used at all, that’s why I believe DST should not be merged.
Feel free to maintain and continue development if you like of course :)
But I’m quite surprised about its inclusion status in .32, can you drop me a link to this discussion?
Can you explain a bit what the POHMELFS/elliptics merge means for someone using POHMELFS? For example which use cases are being enabled/removed by joining POHMELFS with the elliptics network? Will it still be possible to use POHMELFS in the basic “parallel nfs” mode after the merge, or does it then require more complex server structures backing it? Am I wrong in thinking that adding the elliptics network will also make the POHMELFS project that much more complex?
I’m asking because it seems like POHMELFS is very usable as it is (it rocks as a “better nfs”), and should be pushed to mainline as is. Perhaps it would make sense to make elliptics-backed-POHMELFS a separate project existing alongside of the pushed-to-mainline POHMELFS?
The whole usage model will be completely changed – there will be no ‘network filesystem’ mode when existing directory tree is exported over the network, isntead data will be stored in the elliptics network cloud according to its configuration.
There will be no special POHMELFS server anymore, client will connect to existing elliptics servers.
Thanks for clarifying this. I would then like to strengthen my suggestion to push POHMELFS as is, and perhaps fork and rename the elliptics-backed project. EllipticsFS sounds good to me. The reason is that I believe the “export existing directory tree” is a valuable use case, and having a simple, “better than NFS” filesystem is good enough. Often one does not want anything more, and in these cases setting up an elliptics network would be overkill. They are different beasts with different use cases, and so should be different projects.
Oh, and I managed to fail one of your logic-based captchas. Beware. I may be a bot ;-)
Oh, and I managed to fail one of your logic-based captchas. Beware. I may be a bot ;-)
Or, I may be illogical. Of course I just realized that it is only the POHMELFS _server_ which will change, right? And perhaps some part of the protocol? Presumably it is still possible to use the old server if one wants to export a directory tree. Is this correct?
Actually POHMELFS will be rewritten if not from scratch, but lots of its functionality will be changed.
So effectively neither client nor server will stay intact. Actually there will be no server at all.
I’m not sure that POHMELFS as parallel NFS will have userbase, there are known issues which are hard to resolve in this design, namely replication issues and distributed locking order. And those problems are real already and not some issues one may or may not encounter.
Using path-based indexing requires special metadata server to maintain locking and integrity, which is error-prone design. And by path-based I actually mean any index known to server and client whether it was generated in run-time (like NFS file handle) or at creation time (like inode numbers in Lustre).
DHT system contrary does not need special server to maintain metadata, instead client itself can determine where to get its data, so it requires to change addressing model.