Why DHTs suck or do they?

Tagged:  

DHT is virtually the same hash table spread over multiple nodes. Thus it shares its advantages, like O(1) access when properly configured, and its weak parts, like full absence of scalability.

Yes, hash table does not scale. Because to change its size one has to perform lots of tricks which are generally end up in a full table content rehash. With millions of entries on disks this may take a while...

Thus we can not easily extend hash table storage - each node addition will require table rebuild. Depending on DHT configuration and routing protocol used, it can be full or partial content copying.

Contrary to hash table, classical distributed storage with dedicated master server is able to scale without need to copy data each time new server is added. Master server will return this new node's address for each new write, then next node and so on.

This looks shine and well in theory. Let's face the practice.

Data tends to wear down with time - we do not lok at old photos and do not read old mails as frequently as access recently written information. Thus servers which host old data will not be loaded compared to the new nodes in the described master server example.
To fix this issue master server has to start data copying - old servers move their old unaccesible data to some new nodes, and new writes can go to old nodes too. This task is full of non-trivial heuristics about what data to move and what server to use. With time it ends up with data copy for each new write (or usually sufficiently large chunk write).

In distributed hash table this copy is not needed, since by design writes are balanced across the whole storage (when cryptographically strong hash function and storage size are configured of course).
In DHT there is no problem of data wearing, since each write goes to (kind of) random node, thus all nodes will contain roughly the same amount of old and new data.

Drawing the line, in DHT we have to copy data when new node is added to take some load from the neighbours, while in master-server scenario we have to copy each time new data is added (this can be limited to large chunks of course). In theory master-server scenario will copy data to the distribution DHT provides out of the box, and still will have to copy again when new nodes added.

In some cases this is not an issue - we may want not to start data redistribution in master-server storage because of some reasons, or may not foresee that this demand will appear though, while need for data copy in DHT when new node is added is a must - otherwise data will not be accessible because of changed hash distribution over nodes.

This two cases frequently (if not all the time :) becomes the most significant corner cases when DHT is not selected to be used for distributed storage.

Evgeniy,

the classical approach to enabling DHT to handle scalability is to use virtual nodes: for 1,000 serves have 100,000 virtual nodes, assigned to the servers (100 VNs on each physical server on average).
When the server is added, it can take over ~99 virtual nodes from other physical server.
Also, virtual nodes can be mirrored for handling load spikes.

It is a kind of middle path between a central metadata server, and fully distributed DHT. In the case
of VNs, only the VN to physical server assignment should be centrally available. Which is way more
stable and way smaller data than full metadata for 1000,000,000s of objects.

-Yenya, http://www.fi.muni.cz/~kas/blog/

Virtual nodes are just aggregators for data - we copy either million of data objects or 1000 virtual nodes, each of which contains 1000 of data object.