Data de-duplication in ZFS and elliptics network (POHMELFS)

Jon Smirl sent me a link describing new ZFS feature - data deduplication.

This is a technique which allows to store multiple data objects in the same place when their content is the same, thus effectively saving the space. There are three levels of data deduplication - files (objects actually), blocks and bytes. Every level allows to store single entity for the multiple identical objects, like single block for several equal data blocks or byte range and so on. ZFS supports block deduplication.

This feature existed effectively from the beginning in the elliptics network distributed hash table storage, but it has two levels of data deduplication: object and transaction. Well, actually we have transaction only, but maximum transaction size can be limited to some large enough block (like megabytes or more, or can be unlimited if needed), so if object is smaller than that, it will be deduplicated automatically.

Which basically means that if multiple users write the same content into the storage and use the same ID, no new storage space will be used, instead transaction log for the selected object will be updated to show that two external objects refer to given transaction.

Depending on transaction size it may have a negative impact, in particular when transaction size is smaller than log entry, it will be actually a waste of space, but transactions are required for the log-strucutred filesystem and to implement things like snapshots and update history. By default log entry size equals to 56 bytes, so it should not be a problem in the common case.

POHMELFS as elliptics network frontend will support this feature without actually any steps out of the box.

Hello,

Please correct me, if I am wrong.

I assume that Pohmelfs is going to be a Linux Cloud Storage solution with any filesystem/block device back end.

Then I think we can design the storage solution as follows, (RAID + RAIN - Redundant Array of Independent Nodes).
btrfs raid + lvm + pohmelfs elliptics network, so this would be a much more fault tolerant solution, since btrfs provides snapshots, compression, multidevice(RAID) support and Pohmelfs with RAIN, Deduplication, parallel execution and more.

Please post the Pohmelfs Roadmap for future, including the TODO list for community members for Development, Testing, Documentation.

Thank you for such a wonderful project.

Hello,

I am curious to know more about Pohmelfs, so please bear with my naive questions.

can we use POHMELFS as Clustered NAS solution ?

does POHMELFS can make use of btrfs ?

how can we compare POHMELFS against Netapp, EMC, Equallogic ... storage solutions ?

No, POHMELFS is not NAS solution right now - it is kind of parallel NFS with fair number of limitations.

VM images are a special problem. They are file systems inside of files. Object level dedupe is not going to work for them. These files have a huge amount of duplication in them but it not going to be transaction aligned. You're probably going to be stuck making a storage class that does block based dedupe if you care about VM images. The kernel is already doing page based dedupe in RAM for VM images.

I'm a fan of off-line deduping. Store everything uncompressed initially and then use idle cycles to hunt for duplication. Idle time lets you try multiple schemes and then pick the best one. You could even plug in special compression algorithms for things like genetic or telemetry data. You could even make a special compression algorithm for VM images that figured out that they were file systems inside of files.

is transaction-based, so whatever transaction is used in the client it will be stored on the servers.

So filesystem frontend can select sector or page based alignment, so every transaction will have those boundaries. Or it can solely rely on how client writes data, so that transactions potentially will only be byte aligned. Or I can implement transaction split at write time to implement the same block-based deduplication as in ZFS.

Although initially I do not plan to implement any special heuristics and rely on how client writes data to create transaction.