View the Web Dashboard & Interface Here
This system is a high-performance distributed file system intended to simulate the functionalities of AWS's cloud object storage S3 using external hard drive(s) as storage nodes. Instead of writing large files sequentially, this system leverages the RAID 0 striping system to chunk large files and uses a round-robin algorithm to distribute them across each storage node.
- This method of writing (with the use of extra hard drives) is faster than standard file writing because each drive can write its own chunk simultaneously (cuts down the file write time by however many drives there are)
- This system uses parallelism to cut down write times (via a DMA thread), this allows the main program to keep running and do other tasks without the need for interrupting it
- MiniAWS is also scalable, instead of storing a large file in a single storage unit (such as the computer's C or D drive), adding an additional storage node is simple and maintains the file's accessibility, at a lower storage cost per drive
- Optimized and encrypted storage: the original file is not physically stored, as of a recent update, this system stores file information as metadata and reassembles the file upon a READ operation. Since we don't store the original file, we also remove unneccessary storage
(ex: a 8GB file is preserved and chunked => 16GB of total storage used up)
vs.
(ex: 8GB file is chunked, original deleted, info is stored as metadata => only 8Gb of total storage used up)
You can read more about the metadata storage system below:
- Upload: The file is chunked, metadata (original name, size) is saved to a JSON store, and the local copy is securely deleted
- Download: The system queries the metadata, locates the chunks across the drives, and reassembles the file using the name and original size of file by referencing the Stripe Map
- Command: A file is received by
--read <fileName>which can be executed via this CLI or a Web Dashboard (linked above) - Segmentation: The RAID Controller splits the file into
Nchunks of a predetermined chunk size (TODO: make chunk sizes customizable) - Mapping: The Stripe Map assigns each chunk to a specific drive index (0-3)
- Writing: The RAID controller enqueues each chunk and hands it off to the DMA Controller, which asynchronously writes these chunks to their assigned "Disk" folders.
- Reassembling: During retrieval, the system reverses the process—reading chunks based on the Stripe Map to reconstruct the original binary.
A demonstration of the output can be viewed in this repo under MiniAWS directory