Skip to content

feat: implement QR-to-login and unified logout#40

Open
mega123-art wants to merge 1 commit into
mainfrom
feat/issue-32-qr-login-logout
Open

feat: implement QR-to-login and unified logout#40
mega123-art wants to merge 1 commit into
mainfrom
feat/issue-32-qr-login-logout

Conversation

@mega123-art

Copy link
Copy Markdown
Contributor

Implements signature-based login using a shared WalletTransport interface and a WalletConnect v2 implementation, conforming CLI, VSCode, Web, and Android surfaces to this seam. Also unifies the logout policy to support soft logout by default and full logout via --full.

@zo-sol

zo-sol commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Design proposal: air-gapped QR signing (phone = signer)

Intent correction from discussion: what we want is QR communication, not secret-key transfer.
The wallet/key stays on the phone; the phone is the signer. The desktop only displays a request as a QR and reads the signed response back through its camera. The private key never transits any channel. This is the air-gapped QR-signing model (same idea as D'CENT-style air-gap, or WalletConnect without a relay).

This inverts the trust direction of the current PR.

Current PR vs. proposal

  • Current PR (key handover): desktop holds the key and hands a secret payload to the phone in a one-shot pairing. The phone ends up with key material. A leaked QR/deeplink ≈ full wallet compromise.
  • Proposal (signing channel): phone holds the key. Desktop emits a sign request; phone signs and returns only a signature. Repeatable, single-use, nonce-bound. The key is never copied.

Why not the alternatives (security rationale)

  • On-chain memo as the return channel — rejected. The login signature is a secret: it derives the AES-GCM key used to decrypt shared memory (see shared-memory design). ed25519 is deterministic, so the signature over a fixed message is effectively a password. Publishing it in a memo lets anyone derive the same key. (A purchase tx signature is public and fine — but the login signature must never be public. Don't conflate them.)
  • LAN relay server — rejected. Plain-HTTP on shared WiFi is sniffable; E2E-encrypting it (WalletConnect-style symmetric key in the QR) works but adds NAT/cert/always-on-server friction that conflicts with the local-first ethos.
  • Air-gapped bidirectional QR — chosen. The secret signature travels only as photons between the phone screen and the desktop camera. Nothing is networked, nothing is published.

Flows

A) Skill / NFT purchase (signature is public — no return-QR needed)

  1. Desktop builds the unsigned transaction → renders as QR.
  2. Phone scans → shows "AgentNet wants to buy X" → user approves → signs → phone submits to RPC itself (phone has its own internet).
  3. Desktop watches the chain for confirmation → installs the skill.

B) Login / session decrypt (signature is secret — air-gap return)

  1. Desktop shows a nonce/challenge QR.
  2. Phone signs the fixed message → displays the signature as a QR.
  3. Desktop camera reads the signature QR → verifies (ed25519 against the known pubkey) → derives the AES-GCM key → decrypts shared memory.

Desktop QR read — the hard part (already verified)

Instead of fighting the webview camera sandbox, the extension host (Node) spawns a native scanner and returns the decoded string to the webview via postMessage — same pattern as the engine spawn. webview camera permission is not needed.

Verified on macOS (real command output):

  • Camera capture: ffmpeg + AVFoundation captured a real frame (1280x720 JPEG). ✓
  • QR decode: native Vision framework (VNDetectBarcodesRequest), zero pip/brew installs. CoreImage→Vision generate/decode roundtrip matched. ✓

Scanner requirements:

  • Continuous scan loop up to ~5s (not one-shot) — keep grabbing frames until a QR decodes, then exit immediately; hard timeout; kill the camera on exit.
  • No junk files — decode frames in-memory (native AVCaptureSession + Vision, zero disk writes) or, if a tool needs a file, mktemp -d + trap cleanup.
  • Cross-platform: native Vision is macOS-only. Linux/Windows need a fallback (zbar or a wasm/JS decoder). ffmpeg capture itself is cross-platform (avfoundation / v4l2 / dshow).

QR payload formats (draft)

  • Request (desktop→phone): agentnet://sign?v=1&type=tx|msg&nonce=<n>&data=<base64>&origin=<app>
  • Response (phone→desktop): agentnet://sig?v=1&nonce=<n>&sig=<base58>&pub=<base58>
  • Purchase data = serialized unsigned tx (≤~1232 bytes → base64 ~1.6k chars → dense single QR, or 2–3 frames if large). Login data = fixed challenge + nonce.
  • Optional: also emit standard Solana Pay Transaction Request URLs so external mobile wallets (Phantom/Solflare) work — open question below.

Code changes (same branch — this is a model swap, not an add-on)

  • pairing.ts: key-handover state machine → request/response signing channel (nonce-bound, single-use, expiring).
  • qr.ts: encode secret payload → encode sign-request / parse sign-response.
  • marketMessages.ts: add signRequest / signResult (coordinate with the getSkillDoc/skillDoc additions on the other in-flight branch to avoid a merge clash).
  • New: desktop QR display component + host-spawned scanner; phone SPA sign-approval UI + signature-QR display.
  • Purchase path split into build (desktop) / sign+submit (phone) / watch (desktop).

Open questions for review

  1. Cross-platform decode fallback — zbar vs. a wasm decoder?
  2. ffmpeg dependency — bundle vs. detect-and-prompt?
  3. macOS TCC camera-permission UX (first-run system dialog granted to VSCode).
  4. Support standard Solana Pay for external wallets, or AgentNet-app-only?
  5. Multi-frame QR threshold for large unsigned txs.

Verification status (honest)

  • Verified: camera capture (ffmpeg), native QR decode (Vision), generate/decode roundtrip — all on this macOS machine.
  • Not yet verified: live end-to-end (a physical phone QR held to the camera), the continuous 5s scan loop, and any cross-platform path.

@zo-sol

zo-sol commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

🧪 Skill test / dogfooding note: these diagrams were generated by an agent using a data-visualization (Mermaid) skill pulled from the public internet — because our own on-chain skill marketplace (search_skills / verify_skill / buy_skill) is currently down, so skill-shopping had nothing to buy. Treating this as a live fallback test of "marketplace empty → fetch a skill from the net → use it." Posting the example output below.


Visual addendum to the air-gapped QR-signing proposal above — same content, just drawn out.

1. Trust direction: current PR (key handover) vs proposal (signing channel)

flowchart LR
    subgraph CURR["Current PR — key handover"]
        direction TB
        D1["Desktop holds key"] -->|"hands secret payload (one-shot)"| P1["Phone ends up with key material"]
        P1 -.->|"leaked QR / deeplink"| X1["⚠️ full wallet compromise"]
    end
    subgraph PROP["Proposal — air-gapped signing channel"]
        direction TB
        P2["Phone holds key · signer"] -->|"returns signature only (nonce-bound, single-use)"| D2["Desktop verifies"]
        D2 -.->|"key never copied"| OK["✅ air-gapped"]
    end
    CURR ==>|"trust direction inverted"| PROP
Loading

2. Flow A — Skill / NFT purchase (signature is public → no return-QR needed)

sequenceDiagram
    participant D as Desktop
    participant P as Phone
    participant C as Solana RPC
    Note over P: holds the key (signer)
    D->>D: build unsigned tx
    D->>P: render QR — agentnet://sign type=tx
    Note over P: user approves "buy X"
    P->>P: sign tx locally
    P->>C: submit signed tx (phone's own internet)
    D->>C: watch chain for confirmation
    C-->>D: confirmed
    D->>D: install skill
    Note over D,P: signature is PUBLIC → no return-QR needed
Loading

3. Flow B — Login / session decrypt (signature is secret → air-gap return)

sequenceDiagram
    participant D as Desktop
    participant Cam as Desktop Camera
    participant P as Phone
    Note over P: holds the key (signer)
    D->>P: show nonce / challenge QR
    P->>P: sign fixed message
    Note over P: signature = SECRET<br/>derives AES-GCM key for shared memory
    P-->>Cam: display signature as QR (photons only)
    Cam->>D: decoded string via postMessage
    D->>D: verify ed25519, derive key, decrypt memory
    Note over D,P: signature is SECRET → never networked, never published
Loading

On open question #1 (cross-platform decode): macOS = Vision; for Linux/Windows, zbar as the fallback looks lighter to bundle than a wasm decoder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants