Skip to content

Latest commit

 

History

History
190 lines (155 loc) · 6.3 KB

File metadata and controls

190 lines (155 loc) · 6.3 KB

XetSharp

C# client for downloading files from Hugging Face using the Xet protocol via Rust FFI.

Quick Start

using System.Net.Http.Json;
using System.Text.Json.Serialization;
using XetSharp;

var repo = "dawidope/testrepo";
using var http = new HttpClient();
using var cts = new CancellationTokenSource();

// 1. Get file list (recursive) — files with xetHash use Xet protocol
var tree = await http.GetFromJsonAsync<List<HfTreeItem>>(
    $"https://huggingface.co/api/models/{repo}/tree/main?recursive=true");

var xetFiles = tree!.Where(f => f.XetHash != null).ToList();

// 2. Get temporary CAS access token (no HF account needed for public repos)
var token = await http.GetFromJsonAsync<XetTokenResponse>(
    $"https://huggingface.co/api/models/{repo}/xet-read-token/main");

// 3. Download via Xet protocol
var options = new XetClientOptions
{
    Endpoint = token!.Endpoint,
    Token = token.AccessToken,
    TokenExpiry = token.Expiration,
    // Optional: tune memory usage (default ~6GB) and progress granularity
    // MaxConcurrentDownloads = 4,
    // DownloadBufferSize = 512_000_000,        // 512 MB base buffer
    // DownloadBufferLimit = 1_500_000_000,      // 1.5 GB hard cap
    // ReconstructionFetchSize = 32_000_000,     // 32 MB blocks for smoother progress
};

using var client = new XetClient(options);

var downloads = xetFiles.Select(f => new XetFileDownload
{
    Hash = f.XetHash!,
    FileSize = f.Size,
    DestinationPath = f.Path!,
}).ToList();

var progress = new Progress<XetProgress>(p =>
{
    double pct = p.TotalBytes > 0 ? (double)p.BytesCompleted / p.TotalBytes * 100 : 0;
    Console.Write($"\r[{pct:F1}%] {p.BytesCompleted / 1024 / 1024:F1} MB");
});

try
{
    await client.DownloadAsync(downloads, progress, cts.Token);
}
catch (OperationCanceledException)
{
    Console.WriteLine("\nDownload cancelled.");
}

// --- HF API models ---
record HfTreeItem
{
    [JsonPropertyName("type")]  public string? Type { get; init; }
    [JsonPropertyName("path")]  public string? Path { get; init; }
    [JsonPropertyName("size")]  public long Size { get; init; }
    [JsonPropertyName("xetHash")] public string? XetHash { get; init; }
}

record XetTokenResponse
{
    [JsonPropertyName("casUrl")]      public string? Endpoint { get; init; }
    [JsonPropertyName("accessToken")] public string? AccessToken { get; init; }
    [JsonPropertyName("exp")]         public long Expiration { get; init; }
}

Note: The accessToken above is a temporary CAS token returned by the HF API — it's not your personal HF token. Public repos work without authentication. For private repos, add your HF token as a Bearer header on the HTTP requests.

Features

  • Parallel downloads — multiple files downloaded concurrently via Xet protocol
  • Aggregated progressIProgress<XetProgress> reports combined progress across all files with per-file breakdown
  • CancellationCancellationToken support, download aborts within ~100ms
  • Token refresh — callback for expired HF tokens
  • Memory tuning — configurable buffer sizes and concurrency via XetClientOptions

Memory & Performance Tuning

xet-core uses aggressive buffering by default for maximum throughput. You can tune via XetClientOptions:

Option Default Effect
MaxConcurrentDownloads 8 Parallel file downloads
DownloadBufferSize ~2GB Base memory buffer
DownloadBufferPerFileSize ~512MB Additional buffer per file
DownloadBufferLimit ~8GB Hard memory cap
ReconstructionFetchSize ~256MB Fetch block size (smaller = smoother progress)
PrefetchBufferSize ~1GB Prefetch lookahead

Default memory usage: 2GB + 8 × 512MB = 6GB. For a ~1.5GB footprint:

var options = new XetClientOptions
{
    MaxConcurrentDownloads = 4,
    DownloadBufferSize = 512_000_000,
    DownloadBufferPerFileSize = 128_000_000,
    DownloadBufferLimit = 1_500_000_000,
    ReconstructionFetchSize = 32_000_000,
    PrefetchBufferSize = 64_000_000,
};

These options are process-global — the first XetClient instance sets them.

Building

Prerequisites

Build everything (Rust + C#)

./build.ps1

Options:

./build.ps1                        # Full Release build
./build.ps1 -Configuration Debug   # Debug build
./build.ps1 -SkipRust              # C# only (reuse existing native DLL)

Manual build

# 1. Rust native library
cd native/hf_xet_ffi
cargo build --release

# 2. C# solution
dotnet build

Run example

cd XetSharp.Example
dotnet run -- dawidope/testrepo
dotnet run -- dawidope/testrepo model-00002-of-00002.safetensors
dotnet run -- dawidope/testrepo "" C:\Downloads

Project Structure

XetSharp/
├── XetSharp.slnx                 # Solution
├── build.ps1                     # Build script (Rust + C#)
├── XetSharp/                     # C# library
│   ├── XetSharp.csproj
│   ├── XetClient.cs              # High-level API
│   ├── XetFileDownload.cs        # File info for download
│   ├── XetProgress.cs            # Progress data
│   └── Native/
│       ├── NativeMethods.cs      # P/Invoke declarations
│       └── NativeTypes.cs        # Marshaling structs
├── XetSharp.Example/             # Example console app
│   ├── XetSharp.Example.csproj
│   └── Program.cs
└── native/hf_xet_ffi/           # Rust FFI crate
    ├── Cargo.toml
    └── src/
        ├── lib.rs                # extern "C" exports
        ├── runtime.rs            # Tokio runtime
        ├── progress.rs           # Progress aggregation bridge
        ├── token.rs              # Token refresh bridge
        └── error.rs              # Error handling

Architecture

XetSharp is a thin wrapper — it provides:

  • XetClient.DownloadAsync() — download files via Xet protocol (chunked, dedup, parallel)
  • Progress reporting — via IProgress<XetProgress> with per-file aggregation
  • Cancellation — via CancellationToken with native FFI cancellation flag
  • Token refresh — callback for expired HF tokens

Everything else (HF API calls, file caching, HTTP downloads for small files) is your responsibility.