Skip to content

muck: accept URLs and Google Drive refs as sources (v1.1.0)#83

Draft
jdidion wants to merge 2 commits intomainfrom
muck-remote-sources
Draft

muck: accept URLs and Google Drive refs as sources (v1.1.0)#83
jdidion wants to merge 2 commits intomainfrom
muck-remote-sources

Conversation

@jdidion
Copy link
Copy Markdown
Owner

@jdidion jdidion commented May 9, 2026

Note: This MR was largely generated by Claude and has not been completely reviewed by me (the human). You should feel free to defer your review until this warning has been removed.

Summary

  • /muck:voice --learn and /muck:voice --feedback now accept any mix of local paths, HTTP(S) URLs, gdrive://<id> refs, and Drive folder refs.
  • /muck:clean grows a new --voice <src> flag — an ephemeral reference voice applied to the reconstruct pass without touching voice-profile.yaml.
  • New plugins/muck/scripts/resolve-sources.py does the resolution: stdlib urllib + SSRF guard for HTTP(S), gws drive files export/get for Google-native and raw Drive files, gws drive files list for folders.
  • Marketplace + plugin.json bumped to 1.1.0; description corrected from "Four tools" to match plugin.json's "Five tools".

Closes: muck-remote-sources handoff (pod 01KR726ZDBAF1AT9DNN6ZV4CAG).

Verification

  • ruff check clean on the new script.
  • Live HTTP fetch: paulgraham.com/foundervisa.html → 407 words of clean prose, passes through analyze-voice.py end-to-end.
  • Live Drive fetch: internal Google Doc exported to text → 4280 analyzer-visible words.
  • SSRF guards tested: rejects 127.0.0.1, 192.168.1.1, file:// schemes with clear errors and exit 2.
  • HTML extraction tested against a fixture with nav/header/footer/script/aside noise — all dropped, <article> content preserved.

Design notes

  • HTML cleaning uses stdlib html.parser with an <article><main> → body-minus-chrome fallback. No new non-stdlib deps; muck still only requires PyYAML.
  • Drive goes through the gws CLI (subprocess-friendly) rather than MCP, so the resolver can run from a plain Python script. Requires gws on PATH and prior auth — fails fast with gws's own error message otherwise.
  • detect.py is unchanged. The ephemeral voice feeds into the LLM-driven reconstruct pass (pass 2), not the pattern-detection pass (pass 1), so no new flag was needed on the detector.

Follow-ups (not in this PR)

  • Regression tests for the HTML extractor against a few real blog templates (Hugo, WordPress, Medium).
  • Consider sharing the resolver with other skills if a second caller shows up (document-specialist, summarize-doc flows). Kept local to muck until then.

jdidion and others added 2 commits May 9, 2026 14:03
/muck:voice --learn, /muck:voice --feedback, and a new /muck:clean --voice
flag now take URLs and gdrive:// refs alongside local file paths. The new
scripts/resolve-sources.py resolves any mix of sources to local files:
stdlib urllib + SSRF guard for HTTP(S), `gws drive files export/get` for
Google-native and raw Drive files, `gws drive files list` for folders.

--voice <src> on /muck:clean is an ephemeral reference voice: resolved,
analyzed inline, applied to the reconstruct pass, never persisted to
voice-profile.yaml. It takes precedence over --preset and the on-disk
profile.

Closes: muck-remote-sources handoff (01KR726ZDBAF1AT9DNN6ZV4CAG).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to the v1.1.0 script addition: wire the skill bodies through
the new resolve-sources.py step so /muck:voice --learn, --feedback, and
/muck:clean --voice have documented usage for URL and Drive sources.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant