Skip to content

deriveCodeSourceId produces invalid slug for github HTTPS remotes (contains ".", exceeds 32 chars) #1357

@dboone31

Description

@dboone31

Summary

On /sync-gbrain, the code stage fails with Invalid source id because deriveCodeSourceId (in bin/gstack-gbrain-sync.ts) produces a slug that:

  1. Contains a literal . — gbrain's sources add validator rejects it. The error message says: "Must be 1-32 lowercase alnum chars with optional interior hyphens (e.g. wiki, yc-media)."
  2. Exceeds the 32-char limit for any non-trivial GitHub org/repo combination.

Memory and brain-sync stages still succeed, so the failure is non-fatal — but the per-repo code source never gets registered, which means gbrain code-def/code-refs/code-callers never work for that repo. The CLAUDE.md guidance block written by /sync-gbrain Step 4 ends up advertising tools that don't function against the cwd code corpus.

Repro

Any github HTTPS remote whose host/org/repo exceeds 19 chars (so gstack-code- + slug > 32) reproduces it. Example:

$ git remote get-url origin
https://github.com/EXAMPLE_ORG/example-repo-name.git

$ /sync-gbrain
[gbrain-sync] mode=incremental engine=unknown
gstack-gbrain-sync (incremental):
  ERR   code         source registration failed: gbrain sources add gstack-code-github.com-EXAMPLE_ORG-example-repo-name failed:
                     Invalid source id "gstack-code-github.com-EXAMPLE_ORG-example-repo-name". Must be 1-32 lowercase alnum chars with optional interior hyphens (e.g. "wiki", "yc-media").
  OK    memory       ingest pass complete
  OK    brain-sync   curated artifacts pushed

(.gbrain-sync-state.json records the same failure under last_stages[0].summary.)

Root cause

Two compounding issues in bin/gstack-gbrain-sync.ts:160-175:

function deriveCodeSourceId(repoPath: string): string {
  const remote = canonicalizeRemote(originUrl());
  if (remote) {
    return `gstack-code-${remote.replace(/[\/\s]+/g, "-").replace(/-+/g, "-")}`;
  }
  // Fallback for repos without a remote.
  const base = repoPath.split("/").pop() || "repo";
  return `gstack-code-${base.toLowerCase().replace(/[^a-z0-9-]+/g, "-").replace(/-+/g, "-")}`;
}
  1. The remote-path branch only replaces / and whitespace. canonicalizeRemote returns github.com/... with the dot intact, and the dot survives into the slug. The fallback branch uses [^a-z0-9-]+ which correctly strips dots — the two branches disagree on what's a legal char.
  2. Neither branch enforces gbrain's 32-char limit. Even after fixing the regex, gstack-code-github-com-EXAMPLE_ORG-example-repo is 47 chars, which still fails. The doc comment at line 163 in fact shows this: it claims github.com/garrytan/gstack becomes gstack-code-github-com-garrytan-gstack — that's 38 chars, also over the limit.

Suggested fix

In deriveCodeSourceId:

  1. Use the same [^a-z0-9-]+ strip as the fallback branch in both code paths (so dots, underscores, etc. all become hyphens).
  2. After slugification, if gstack-code-${slug} exceeds 32 chars, truncate the slug and append a short hash of the full canonical remote (e.g. first 6 chars of a sha1) to keep IDs unique across orgs that share a repo basename. Reserve the 12 chars for the gstack-code- prefix; that leaves 20 chars for ${slug-prefix}-${hash6}.

Sketch:

function deriveCodeSourceId(repoPath: string): string {
  const remote = canonicalizeRemote(originUrl());
  const raw = remote
    ? remote.replace(/[^a-z0-9-]+/g, "-").replace(/-+/g, "-").replace(/^-|-$/g, "")
    : (repoPath.split("/").pop() || "repo")
        .toLowerCase().replace(/[^a-z0-9-]+/g, "-").replace(/-+/g, "-");
  const PREFIX = "gstack-code-";
  const MAX = 32 - PREFIX.length;          // 20 chars left
  if (raw.length <= MAX) return PREFIX + raw;
  const hash = createHash("sha1").update(remote || repoPath).digest("hex").slice(0, 6);
  const head = raw.slice(0, MAX - 1 - hash.length); // leave room for "-${hash}"
  return PREFIX + head.replace(/-$/, "") + "-" + hash;
}

Tests worth adding

  • Long github HTTPS remote → valid (≤32 chars, alnum+hyphens, no leading/trailing hyphen).
  • Long github SSH remote (git@github.com:org/repo.git) → matches HTTPS counterpart.
  • Distinct orgs with same repo basename → distinct slugs (covered by the hash suffix).
  • Empty origin (local-only repo) → falls back to basename.
  • Update the doc comment example to a slug ≤32 chars so the example doesn't lie.

Environment

  • gstack: 1.26.4.0
  • gbrain: 0.18.2
  • macOS

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions