KFS arrives with an impressive set of features for an alpha release:
- Incremental scalability - New chunkserver nodes can be added as storage needs increase; the system automatically adapts to the new nodes.
- Availability - Replication is used to provide availability due to chunk server failures.
- Re-balancing - Periodically, the meta-server may rebalance the chunks amongst chunkservers. This is done to help with balancing disk space utilization amongst nodes.
- Data integrity - To handle disk corruptions to data blocks, data blocks are checksummed. Checksum verification is done on each read; whenever there is a checksum mismatch, re-replication is used to recover the corrupted chunk.
- Client side fail-over - During reads, if the client library determines that the chunkserver it is communicating with is unreachable, the client library will fail-over to another chunkserver and continue the read. This fail-over is transparent to the application.
- Language support - KFS client library can be accessed from C++, Java, and Python.
- FUSE support on Linux - By mounting KFS via FUSE, this support allows existing linux utilities (such as, ls) to interface with KFS.
- Leases - KFS client library uses caching to improve performance. Leases are used to support cache consistency.
Every startup that scales beyond a single machine needs platform technology to build their application and run their cluster. If enough folks adopt the code and contribute, the hope is that it could become something like the gcc/linux/perl of the cluster storage layer.