There is a new file distributed file system in the staging area of the 2.6.30 kernel called POHMELFS. Sporting better performance than classic NFS, it’s definitely worth a look.
...
Evgeniy Polyakov, a long time Linux hacker, has recently contributed a new distributed file system, called POHMELFS (Parallel Optimized Host Message Exchange Layered File System). It has appeared in the “chock-full-of-filesystems” kernel version 2.6.30 in the staging area. It is ready for testing and can give you a boost in performance (remember - it’s parallel!). This article will discuss POHMELFS and where it is headed.
An interesting article about POHMELFS, its state and future, namely elliptics network integration (with details on how it works and what to expect), NFS and its limitations.
Many thanks to Jon Smirl for the link.
Great that it's getting more press! :)
I've been thinking about it much over the past week or so with regard to how it can be used more generically by programming languages. Elliptics has a lot to offer above and beyond the file interface.
A native interface for each language would be a tall order. Generally, however, all languages allow working with network streams and a simple stream interface would allow almost every language to piggy back data storage and processing on elliptics. This would be hugely valuable for developing and exploring the elliptics system.
I was thinking along the lines of a simple persistent message protocol to provide CRUD with serial processing. Something along the lines of:
Message:
Message Header:
activity: [CREATE|RETRIEVE|UPDATE|DELETE|TRANSACTIONS|STATUS]
objectID: pre-transformed ID
[optional transactionID for retrieve & update]
Message Body:
content
content would be UTF8 strings and would require a terminator sequence or size.
The transactions activity would return all transactions for an objectID.
The status activity returns the current connection status of the client to the network (I'm thinking here that it would be handy to have a client disconnected for writes to the network for some local processing until the end of some process when the write can either be discarded or committed).
If a transactionID is passed for retrieve then the object is returned from that point in time.
Responses would be a negative numeric for errors indicating the error number otherwise the content or null.
The interface would start as a daemon and the transformation functions would be specified at startup. Any language would then be connect to the network stream and utilise the elliptics network.
It is not tied to files in any way, there is a special API which works with files built on top of the low-level data API.
There is a lot of different exported functions which operate with data. Please check a header with documentation.
You can build any protocol on top of low-level elliptics network API the same way I implemented file store/retrieve/delete/lookup functions.
There is a work in progress to provide Perl and Python bindings for low-level (the one which works with generic data) and high-level (file) API.
Aye - I've checked out the header docs already.
I understand it's not tied to files in any way - it's just that the current samples appear to be mostly file oriented.
With examples that were more stream oriented - it would raise awareness of elliptics to other audiences (most of the discussion of your work revolves around comparisons with NFS - which misrepresents the power of what you've developed).
A stream based approach would eliminate much of the need for language specific bindings - and would open up use of the system to all languages - rather than those you've listed. No? ;)
It is another frontend - just like library API described. And only POHMELFS was compared with NFS, since elliptics storage is a very different entity than any filesystem.
Each computer language has own ABI to work with the data objects, so no matter what you will have to implement special bindings for them. Even if you convert binary data into base64 and use UTF8 text to represent it (which is way too huge overhead), it still requres different methods to put that data into the Unix shared library.
Aye - I know that the FS implementation isn't elliptics - but it's a distinction that many fail to realise I think - but maybe general awareness is better than I think :) (I'm happy to be wrong)
I think most languages have an implementation to serialise it's data - which is what I had in mind with the stream implementation. The language would serialise the data and pass it to elliptics to be stored and then rehydrated when needed. Isn't this what most language bindings will be doing anyway?
With this kind of message loop the data could be a file, an object or some arbitrary data.
Do you mean some kind of pipe interface which is listened by the daemon linked with the elliptics library, and all other clients on the same host (like those invoked from different computer languages) will write its data there using known format?
if yes, then it is not that hard to implement, but there will be really lots of problems with this kind of interface, namely performance and locking. Native interface does not have those problems, although such daemon can be a good project for some kinds of loads.
You are welcome to implement it :)
Aye - something along those lines although on reflection something that's socket based might be cleaner to implement and would help eliminate the performance and locking issues.
heh - I thought that might happen. Although it's been a few years since I had to get my hands dirty at this level - so it would probably take me much longer to knock something up than it would for you :)