NTT Cyber Space Labs presents Sheepdog - distributed storage system for KVM

Tagged:  

MORITA Kazutaka wrote:

Sheepdog is a distributed storage system for KVM/QEMU. It provides
highly available block level storage volumes to VMs like Amazon EBS.
Sheepdog supports advanced volume management features such as snapshot,
cloning, and thin provisioning. Sheepdog runs on several tens or hundreds
of nodes, and the architecture is fully symmetric; there is no central
node such as a meta-data server.

The following list describes the features of Sheepdog.

* Linear scalability in performance and capacity
* No single point of failure
* Redundant architecture (data is written to multiple nodes)
- Tolerance against network failure
* Zero configuration (newly added machines will join the cluster automatically)
- Autonomous load balancing
* Snapshot
- Online snapshot from qemu-monitor
* Clone from a snapshot volume
* Thin provisioning
- Amazon EBS API support (to use from a Eucalyptus instance)

(* = current features, - = on our todo list)

More details and download links are here:
http://www.osrg.net/sheepdog/

Note that the code is still in an early stage.
There are some critical TODO items:

- VM image deletion support
- Support architectures other than X86_64
- Data recoverys
- Free space management
- Guarantee reliability and availability under heavy load
- Performance improvement
- Reclaim unused blocks
- More documentation

IMHO, block level distrubuted systems are dead overall, although it has its niche.

Why do you think distributed block storage is a dead end? Consider virtualization. Today, if you want any kind of shared (block) storage you need a SAN. You can do some tricks with drbd, but for bigger deployments that's not really a manageable long term solution.

Now it doesn't matter if you choose iSCSI or FC for your SAN, these solutions are too expensive for start-ups and small companies, who also want a high level of availability, but can't afford it. Being able to restart your server on other physical hardware in case of trouble or hardware maintenance is great, especially for the sysadmins doing the job :)

The nice thing about distributed block storage is that you can simply re-use your existing server hardware to also function as a storage device. Because you can expand your capacity horizontally it's potentially cheaper, scales better and you can use generic cheap server hardware while you do it. Sounds great to me, although there are of course lots of design challenges to make something like this really work (stability and performance wise).

I was kind of hoping your work on pohmelfs and the elliptics network would lead to something like that, which would be of great use to me and my start-up company. Unfortunately I noticed you stopped blogging about it and now you're saying you think it's a dead end.. If that's your choice, I respect that, but nevertheless I'm disappointed :(

Problem is in locking - block level does not provide fine-grained locks, or they are deadlock-prone (look at berkeley db for example). Whatever general purpose filesystem will live on top of this storage will not be multi-client, or it will be a special filesystem.

This is the same problem with DRBD, DST and whatever else - block level does not know about object nature of the data it stores, so it can not properly guard them.

Perhaps he is of the opinion distributed BLOCK level storage is a dead end (DST), but OBJECT based distributed storage is not (POHMELFS).