Tachyon is distributed filesystem on top of HDFS with aggressive caching. By ‘filesystem’ one has to accept Java File objects and related methods and Hadoop compatibility.
Tachyon can run on top of HDFS, S3 and GlusterFS, in fact, its fault tolerance is what HDFS provides – single name node. Tachyon requires zookeeper to maintain master server which performs all operations and checks for consistency.
By their own benchmark it scales lineary with new nodes added
Because of aggressive caching Tachyon’s performance is way ahead of in-memory HDFS. Transparent in-memory caching is a great way to speedup Hadoop and particularly Spark, which was designed for immutable datasets with multiple access patterns.
I can also recommend Andreessen Horowitz article as an example of high-profile investment and management language, it looks like it even introduced new ‘memory-centric’ kind of technical term.
As a side note, I can not slip away comparing this caching system with what Elliptics distributed cache is. Not highlighting and providing Java File API instead we concentrated on HTTP access through Backrunner – Elliptics HTTP load balancer and proxy.
Building transparent subsystem which do not require semi-manual caching policies and additional configuration layers is a way forward for distributed systems, which already require quite a configuration to run and maintain. With a16z investment support I’m pretty sure we will hear more about Tachyon soon.