Distributed File Storage System
A miniature distributed file storage system (like a simplified Dropbox) with file chunking, replication across multiple storage nodes, and fault-tolerant retrieval.
How to build it — step by step
- 1Architecture Design: Implement master node (metadata) + storage nodes (file chunks); use consistent hashing for placement
- 2File Chunking: Split files into 4MB chunks, compute SHA256 hash per chunk for deduplication
- 3Replication: Store each chunk on 3 nodes; use quorum writes (2/3 must succeed) for consistency
- 4Fault Tolerance: Detect failed nodes via heartbeat; trigger re-replication to maintain factor of 3
Key features to implement
- ✓File upload/download with chunk-based transfer
- ✓Automatic replication factor (configurable)
- ✓Deduplication: same file content stored only once
- ✓Node failure detection and automatic recovery
- ✓CLI and REST API interfaces
💡 Unique twist to stand out
Implement content-defined chunking (CDC using Rabin fingerprinting) so that editing a file near the start doesn't invalidate all subsequent chunks — enabling incremental backup.
🎓 What you'll learn
Distributed systems concepts: replication, consistency, fault tolerance, gRPC microservices, and storage system internals.