Skip to content

rutujak24/FileSync

Repository files navigation

FileSync++: Distributed File Synchronization with CRDTs

FileSync++ is a distributed file synchronization system built in C++ using gRPC. It functions like a simple version of Dropbox or Google Drive, but with a powerful enhancement: Real-time Collaborative Editing using Conflict-Free Replicated Data Types (CRDTs).

Key Features

1. Bidirectional File Sync

Automatically synchronizes files between the client and server.

  • Smart Sync: Only transfers files that are missing or changed.
  • Efficient: Uses SHA256 hashing to detect changes.

2. Real-Time Collaborative Editing (CRDT)

Supports conflict-free text editing. Multiple users can edit the same file, and the system ensures that everyone sees the same final result without merge conflicts.

  • Algorithm: Implements RGA (Replicated Growable Array).
  • Conflict-Free: Mathematical guarantee of eventual consistency.

3. Fault Tolerance (Replication)

Ensures your data is safe even if a disk fails.

  • Primary-Backup Replication: Every uploaded file is saved to two separate storage locations (storage/primary and storage/backup).
  • Automatic Failover: If the primary file is lost, the server automatically retrieves it from the backup.

Architecture

High-Level Logic (HLL)

graph TD
    User[User] -->|CLI Commands| Client[Client Node]
    Client <-->|"gRPC (File Transfer)"| Server[Server Node]
    Client <-->|"gRPC (CRDT Updates)"| Server
    
    subgraph Server Node
        Service[gRPC Service]
        DB[(SQLite Metadata)]
        CRDT[CRDT Engine]
    end
    
    Service --> DB
    Service --> CRDT
    Service -->|Write| Primary[Primary Storage]
    Service -->|Replicate| Backup[Backup Storage]
Loading

Database Schema (SQLite)

The system uses two main tables to track files and storage chunks.

Table: files

Column Type Description
name TEXT Unique file name (Primary Key)
hash TEXT SHA256 hash of the file content
size INTEGER Size in bytes
timestamp INTEGER Last modification time

Table: chunks

Column Type Description
file_name TEXT Foreign key to files
chunk_index INTEGER Sequence number of the chunk
node_id TEXT Storage node identifier (e.g., "primary")

Low-Level Design (LLD)

classDiagram
    class FileSyncClient {
        +UploadFile()
        +DownloadFile()
        +Sync()
        +EditFile()
    }
    class CRDTManager {
        +LocalInsert()
        +ApplyInsert()
        +GetText()
    }
    class FileSyncServiceImpl {
        +UploadFile()
        +DownloadFile()
        +ListFiles()
    }
    class DBManager {
        +AddFile()
        +GetFile()
        +AddChunk()
    }
    
    FileSyncClient --> CRDTManager : Uses
    FileSyncServiceImpl --> DBManager : Uses
    FileSyncServiceImpl --> CRDTManager : Uses
Loading

Component Details

  1. Client:

    • Command-line interface (CLI) for user interaction.
    • Handles file scanning, hashing, and uploading/downloading.
    • Manages local CRDT state for editing.
  2. Server:

    • gRPC service handling file transfers and metadata.
    • SQLite Database: Stores file metadata (names, hashes, sizes).
    • Storage Engine: Manages dual-write replication.

How to Build and Run

Prerequisites

  • C++17 Compiler
  • CMake
  • gRPC & Protobuf
  • SQLite3
  • OpenSSL

Build

mkdir build && cd build
cmake ..
make -j4

Run Server

./filesync_server

Run Client

# Interactive Mode (Recommended)
./filesync_client interactive

# Commands inside interactive mode:
> upload <file_path>
> download <file_name> <dest_path>
> sync
> edit <file_name> <index> <char>
> cat <file_name>

Project Structure

  • src/client/: Client-side logic and CLI.
  • src/server/: Server-side logic and storage management.
  • src/common/: Shared utilities (CRDT manager, hashing).
  • src/db/: Database management (SQLite).
  • protos/: gRPC protocol definitions.

About

Distributed File Synchronization System

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors