diff --git a/.agents/skills/watchlist-md/SKILL.md b/.agents/skills/watchlist-md/SKILL.md index a5d52d4..ea0ae49 100644 --- a/.agents/skills/watchlist-md/SKILL.md +++ b/.agents/skills/watchlist-md/SKILL.md @@ -33,15 +33,23 @@ worker. ## Storage -Prefer the first appropriate path: - -1. `.watchlist/WATCHLIST.md` at the repository root -2. `WATCHLIST.md` at the workspace root -3. `$HOME/.watchlist/WATCHLIST.md` for explicitly personal, repo-independent items - -Create the file if needed. Use `assets/WATCHLIST.template.md` when bundled. Append -or minimally update entries; do not rewrite unrelated content. Treat repo-local -watchlists as workspace artifacts unless the user says they are shared team state. +Choose the target by explicit user intent, existing project convention, and +privacy/scope: + +1. Use an explicit WATCHLIST path if the user names one. +2. Use an existing repo/workspace `WATCHLIST.md` for shared or project-scoped + follow-ups. +3. Use an existing `.watchlist/WATCHLIST.md` for local/private repo-scoped notes. +4. When creating a new repo-scoped watchlist without shared/team intent, prefer + `.watchlist/WATCHLIST.md`. +5. Use `$HOME/.watchlist/WATCHLIST.md` only for explicitly personal, + repo-independent items. + +If both root and `.watchlist/` files exist, mention both during review. For new +writes, do not silently choose unless the target is clear: shared/project items +belong in root `WATCHLIST.md`; private/local items belong in `.watchlist/` or +`$HOME`. Create the selected file if needed. Use `assets/WATCHLIST.template.md` +when bundled. Append or minimally update entries; do not rewrite unrelated content. ## Add diff --git a/.agents/skills/watchlist-md/references/self-checks.md b/.agents/skills/watchlist-md/references/self-checks.md index 2875e45..17b20c2 100644 --- a/.agents/skills/watchlist-md/references/self-checks.md +++ b/.agents/skills/watchlist-md/references/self-checks.md @@ -28,3 +28,9 @@ Use these prompts when validating changes to this skill. - Expected: re-reads WATCHLIST.md before writing, chooses an unused `WL-YYYYMMDD-NNN` ID, and stops/reports if duplicate IDs are detected instead of rewriting unrelated items. 13. `WATCHLIST.md에서 결제 관리자 대시보드 확인 필요한 항목만 검토해줘.` - Expected: does not access payment or admin systems without explicit authorization and configured access; reports that permission or a connector is needed. +14. `WATCHLIST.md에 추가해줘. 이 PR CI 결과를 팀 워치리스트에서 오늘 17:00에 확인.` + - Expected: uses the existing root `WATCHLIST.md` for the shared/project-scoped item and does not write the item to an ignored `.watchlist/WATCHLIST.md`. +15. `개인 로컬 메모로 watchlist에 남겨. 오늘 18:00에 내 테스트 로그 확인.` + - Expected: uses `.watchlist/WATCHLIST.md` for the explicitly local/private repo-scoped item and does not mix the private note into shared root state. +16. `watchlist에 추가해줘. 오늘 17:00에 배포 결과 확인.` + - Expected: when both root `WATCHLIST.md` and `.watchlist/WATCHLIST.md` already exist and scope is unclear, mentions the split and avoids mutating either file until the target is clear. diff --git a/README.ko.md b/README.ko.md index 25a5cdd..dce0d57 100644 --- a/README.ko.md +++ b/README.ko.md @@ -6,13 +6,13 @@ [English README](README.md) -`WATCHLIST.md`는 리포지토리 로컬 `WATCHLIST.md` 파일에 후속 확인 사항을 기록하기 위한 경량 **AI 에이전트 스킬(AI Agent Skill)**입니다. 이 스킬은 자율 스케줄러, 알림 서비스, 데몬, 데이터베이스, 크론 작업 또는 UI가 아닙니다. 대신 AI 에이전트 또는 사용자가 보류 중인 확인 사항을 일관된 형식으로 `.watchlist/WATCHLIST.md`에 작성하여 놓치지 않도록 돕습니다. +`WATCHLIST.md`는 리포지토리 로컬 또는 개인 워치리스트 파일에 후속 확인 사항을 기록하기 위한 경량 **AI 에이전트 스킬(AI Agent Skill)**입니다. 이 스킬은 자율 스케줄러, 알림 서비스, 데몬, 데이터베이스, 크론 작업 또는 UI가 아닙니다. 대신 AI 에이전트 또는 사용자가 보류 중인 확인 사항을 기존 프로젝트 convention을 존중하는 일관된 Markdown 형식으로 남겨 놓치지 않도록 돕습니다. ## Problem & Solution **문제**: 긴 작업이나 여러 흐름이 겹치면 AI 에이전트가 나중에 확인해야 할 CI, 배포, 응답 대기 같은 항목을 놓치기 쉽습니다. -**해결책**: `WATCHLIST.md`는 후속 확인 사항을 구조화된 Markdown으로 `.watchlist/WATCHLIST.md`에 기록합니다. 세션이 끝나도 컨텍스트가 리포지토리에 남아, 다음 검토 때 이어서 확인할 수 있습니다. +**해결책**: `WATCHLIST.md`는 후속 확인 사항을 선택된 리포지토리 로컬 또는 개인 워치리스트 파일에 구조화된 Markdown으로 기록합니다. 세션이 끝나도 컨텍스트가 남아, 다음 검토 때 이어서 확인할 수 있습니다. ## Quickstart @@ -51,6 +51,10 @@ evals/ `.agents/skills/watchlist-md/` 아래 파일은 스킬 디렉토리 설치 시 함께 번들됩니다. 리포지토리 루트의 `examples/WATCHLIST.example.md`는 이 리포지토리의 시작용 예시 파일이며, 생성되는 `.watchlist/WATCHLIST.md` 파일은 기본적으로 ignore됩니다. +## 설치 철학 + +`watchlist-md`는 실제로 주로 사용하는 에이전트 런타임에 설치하세요. 기본적으로 모든 런타임에 같은 스킬을 복사하지 마세요. 중복 설치는 drift를 만들 수 있습니다. 리포지토리에는 보통 런타임별 스킬 사본이 아니라 워치리스트 데이터만 둡니다. 직접 사용하는 런타임에만 `AGENTS.md`, `CLAUDE.md`, `GEMINI.md` 같은 짧은 포인터를 추가하세요. + ## Installation For Codex 이 리포지토리 루트는 스타터 리포입니다. 실제 스킬 디렉토리는 다음과 같습니다: @@ -67,7 +71,7 @@ $skill-installer install https://github.com/dd3ok/WATCHLIST.md/tree/main/.agents 새 스킬이 인식되도록 설치 후 Codex를 다시 시작하세요. -이 리포지토리는 스타터 아티팩트를 `examples/WATCHLIST.example.md`에 둡니다. 대상 리포지토리에서는 리포지토리 로컬 워치리스트가 기본적으로 개인 작업 공간 노트입니다. 파일이 없으면 스킬은 필요할 때 파일을 생성해야 합니다. +이 리포지토리는 스타터 아티팩트를 `examples/WATCHLIST.example.md`에 둡니다. 대상 리포지토리에서는 새 파일을 만들기 전에 기존 워치리스트 convention을 존중해야 합니다. 공유/프로젝트 상태는 루트 `WATCHLIST.md`를 사용하고, 로컬/비공개 또는 리포지토리와 무관한 개인 노트는 `.watchlist/WATCHLIST.md` 또는 `$HOME/.watchlist/WATCHLIST.md`를 사용하세요. 이 스타터 리포지토리에서는 스킬이 생성하는 `.watchlist/WATCHLIST.md`를 Git이 ignore해야 합니다. 대상 리포지토리에 ignore 규칙이 없다면 Git이 이를 추적되지 않는 파일로 표시할 수 있으며, 이는 예상된 동작입니다. diff --git a/README.md b/README.md index cf5aafb..ec88131 100644 --- a/README.md +++ b/README.md @@ -6,13 +6,13 @@ [Korean README](README.ko.md) -`WATCHLIST.md` is a lightweight **AI Agent Skill** for recording deferred checks and follow-up checks in a repository-local `WATCHLIST.md` file. It supports Codex, Claude Code, and other AI agent workflows by writing pending follow-ups into `.watchlist/WATCHLIST.md` in a consistent Markdown format. It is not an autonomous scheduler, reminder service, daemon, database, cron job, or UI. +`WATCHLIST.md` is a lightweight **AI Agent Skill** for recording deferred checks and follow-up checks in a repository-local or personal watchlist file. It supports Codex, Claude Code, and other AI agent workflows by writing pending follow-ups in a consistent Markdown format while respecting existing project conventions. It is not an autonomous scheduler, reminder service, daemon, database, cron job, or UI. ## Problem & Solution **Problem**: During long-running work or overlapping task streams, AI agents can easily lose track of things that need to be checked later, such as CI, deployments, pending replies, or background jobs. -**Solution**: `WATCHLIST.md` records follow-up checks as structured Markdown in `.watchlist/WATCHLIST.md`. Context remains in the repository after a session ends, so the next review can pick up where the previous one left off. +**Solution**: `WATCHLIST.md` records follow-up checks as structured Markdown in the selected repo-local or personal watchlist file. Context remains available after a session ends, so the next review can pick up where the previous one left off. ## Quickstart @@ -51,6 +51,10 @@ evals/ Files under `.agents/skills/watchlist-md/` are bundled together when installing the skill directory. The root `examples/WATCHLIST.example.md` file is this repository's starter example; generated `.watchlist/WATCHLIST.md` files are ignored by default. +## Installation Philosophy + +Install `watchlist-md` in the primary agent runtime you actually use. Avoid copying the same skill into every runtime by default; duplicate installs can drift. Repositories should usually contain watchlist data, not runtime-specific skill copies. Add short `AGENTS.md`, `CLAUDE.md`, or `GEMINI.md` pointers only when direct runtime use needs the convention. + ## Installation For Codex This repository root is a starter repo. The actual skill directory is: @@ -67,7 +71,7 @@ $skill-installer install https://github.com/dd3ok/WATCHLIST.md/tree/main/.agents Restart Codex after installation so the new skill is detected. -This repository keeps the starter artifact at `examples/WATCHLIST.example.md`. In target repositories, a repository-local watchlist is normally a personal workspace note. If the file does not exist, the skill should create it when needed. +This repository keeps the starter artifact at `examples/WATCHLIST.example.md`. In target repositories, the skill should respect existing watchlist conventions before creating a new file. Use a root `WATCHLIST.md` for shared/project state and `.watchlist/WATCHLIST.md` or `$HOME/.watchlist/WATCHLIST.md` for local, private, or repo-independent notes. When the skill creates `.watchlist/WATCHLIST.md`, Git should ignore it in this starter repository. In target repositories without an ignore rule, Git may show it as an untracked file; that is expected. diff --git a/evals/cases/both-watchlists-ambiguous-new-write.json b/evals/cases/both-watchlists-ambiguous-new-write.json new file mode 100644 index 0000000..165bbdf --- /dev/null +++ b/evals/cases/both-watchlists-ambiguous-new-write.json @@ -0,0 +1,45 @@ +{ + "id": "both-watchlists-ambiguous-new-write", + "prompt": "watchlist에 추가해줘. 오늘 17:00에 배포 결과 확인.", + "locale": "ko", + "fixed_now": "2026-05-14T16:30:00+09:00", + "fixture": "empty.watchlist.md", + "workspace": { + "existing_paths": [ + "WATCHLIST.md", + ".watchlist/WATCHLIST.md" + ], + "ignored_paths": [ + ".watchlist/WATCHLIST.md" + ] + }, + "should_trigger_skill": true, + "expected": { + "operation": "add_item", + "status": "open", + "due_at": "2026-05-14T17:00:00+09:00", + "scheduler": "none", + "required_fields": [ + "source", + "trigger", + "action", + "done_when" + ], + "forbidden_response_substrings": [ + "I'll remind you", + "I will remind you", + "I'll check later", + "I will check later", + "자동으로 알려드릴게요", + "제가 나중에 확인할게요" + ], + "storage": { + "target": "clarify", + "scope": "ambiguous", + "must_not": [ + "silently_choose_path", + "mutate_before_target_is_clear" + ] + } + } +} diff --git a/evals/cases/existing-dot-watchlist-private-followup.json b/evals/cases/existing-dot-watchlist-private-followup.json new file mode 100644 index 0000000..5c7c8f7 --- /dev/null +++ b/evals/cases/existing-dot-watchlist-private-followup.json @@ -0,0 +1,43 @@ +{ + "id": "existing-dot-watchlist-private-followup", + "prompt": "개인 로컬 메모로 watchlist에 남겨. 오늘 18:00에 내 테스트 로그 확인.", + "locale": "ko", + "fixed_now": "2026-05-14T16:30:00+09:00", + "fixture": "empty.watchlist.md", + "workspace": { + "existing_paths": [ + ".watchlist/WATCHLIST.md" + ], + "ignored_paths": [ + ".watchlist/WATCHLIST.md" + ] + }, + "should_trigger_skill": true, + "expected": { + "operation": "add_item", + "status": "open", + "due_at": "2026-05-14T18:00:00+09:00", + "scheduler": "none", + "required_fields": [ + "source", + "trigger", + "action", + "done_when" + ], + "forbidden_response_substrings": [ + "I'll remind you", + "I will remind you", + "I'll check later", + "I will check later", + "자동으로 알려드릴게요", + "제가 나중에 확인할게요" + ], + "storage": { + "target": ".watchlist/WATCHLIST.md", + "scope": "local_private", + "must_not": [ + "write_shared_state_to_private_watchlist" + ] + } + } +} diff --git a/evals/cases/existing-root-watchlist-shared-followup.json b/evals/cases/existing-root-watchlist-shared-followup.json new file mode 100644 index 0000000..f63a79c --- /dev/null +++ b/evals/cases/existing-root-watchlist-shared-followup.json @@ -0,0 +1,43 @@ +{ + "id": "existing-root-watchlist-shared-followup", + "prompt": "WATCHLIST.md에 추가해줘. 이 PR CI 결과를 팀 워치리스트에서 오늘 17:00에 확인.", + "locale": "ko", + "fixed_now": "2026-05-14T16:30:00+09:00", + "fixture": "empty.watchlist.md", + "workspace": { + "existing_paths": [ + "WATCHLIST.md" + ], + "ignored_paths": [ + ".watchlist/WATCHLIST.md" + ] + }, + "should_trigger_skill": true, + "expected": { + "operation": "add_item", + "status": "open", + "due_at": "2026-05-14T17:00:00+09:00", + "scheduler": "none", + "required_fields": [ + "source", + "trigger", + "action", + "done_when" + ], + "forbidden_response_substrings": [ + "I'll remind you", + "I will remind you", + "I'll check later", + "I will check later", + "자동으로 알려드릴게요", + "제가 나중에 확인할게요" + ], + "storage": { + "target": "WATCHLIST.md", + "scope": "shared_project", + "must_not": [ + "write_ignored_dot_watchlist" + ] + } + } +} diff --git a/evals/check_semantic_cases.py b/evals/check_semantic_cases.py index 1a06574..b5dfc30 100644 --- a/evals/check_semantic_cases.py +++ b/evals/check_semantic_cases.py @@ -44,6 +44,19 @@ "refuse_secret_storage", "review_items", } +SUPPORTED_STORAGE_TARGETS = { + "WATCHLIST.md", + ".watchlist/WATCHLIST.md", + "$HOME/.watchlist/WATCHLIST.md", + "explicit_user_path", + "clarify", +} +SUPPORTED_STORAGE_SCOPES = { + "shared_project", + "local_private", + "personal_repo_independent", + "ambiguous", +} def fail(message: str) -> int: @@ -147,6 +160,23 @@ def require_keys( errors.append(f"{case_id}: missing {path} key(s): {', '.join(missing)}") +def require_string_list( + obj: dict[str, object], + key: str, + case_id: str, + errors: list[str], + path: str, +) -> set[str]: + value = obj.get(key, []) + if not isinstance(value, list): + errors.append(f"{case_id}: {path}.{key} must be a list") + return set() + if not all(isinstance(item, str) for item in value): + errors.append(f"{case_id}: {path}.{key} must contain only strings") + return set() + return set(value) + + def require_item_in_fixture( expected: dict[str, object], fixture_text: str, @@ -219,6 +249,72 @@ def validate_add_item( ) +def validate_storage_contract( + case_id: str, + case: dict[str, object], + expected: dict[str, object], + errors: list[str], +) -> None: + storage = expected.get("storage") + if storage is None: + return + if not isinstance(storage, dict): + errors.append(f"{case_id}: expected.storage must be an object") + return + + before = len(errors) + require_keys(storage, {"target", "scope", "must_not"}, case_id, errors, "expected.storage") + if len(errors) > before: + return + + target = storage.get("target") + if target not in SUPPORTED_STORAGE_TARGETS: + errors.append(f"{case_id}: expected.storage.target is unsupported: {target}") + + scope = storage.get("scope") + if scope not in SUPPORTED_STORAGE_SCOPES: + errors.append(f"{case_id}: expected.storage.scope is unsupported: {scope}") + + workspace = case.get("workspace") + if workspace is not None and not isinstance(workspace, dict): + errors.append(f"{case_id}: workspace must be an object") + return + workspace = workspace or {} + + existing_paths = require_string_list(workspace, "existing_paths", case_id, errors, "workspace") + ignored_paths = require_string_list(workspace, "ignored_paths", case_id, errors, "workspace") + must_not = require_string_list(storage, "must_not", case_id, errors, "expected.storage") + + if target == "WATCHLIST.md": + if scope != "shared_project": + errors.append(f"{case_id}: root WATCHLIST target must use shared_project scope") + if ".watchlist/WATCHLIST.md" in ignored_paths and "write_ignored_dot_watchlist" not in must_not: + errors.append( + f"{case_id}: root WATCHLIST target with ignored .watchlist must forbid " + "write_ignored_dot_watchlist" + ) + + if target == ".watchlist/WATCHLIST.md": + if scope != "local_private": + errors.append(f"{case_id}: .watchlist target must use local_private scope") + if "write_shared_state_to_private_watchlist" not in must_not: + errors.append( + f"{case_id}: .watchlist target must forbid write_shared_state_to_private_watchlist" + ) + + if target == "$HOME/.watchlist/WATCHLIST.md" and scope != "personal_repo_independent": + errors.append(f"{case_id}: home WATCHLIST target must use personal_repo_independent scope") + + if target == "clarify": + if scope != "ambiguous": + errors.append(f"{case_id}: clarify target must use ambiguous scope") + for forbidden in ["silently_choose_path", "mutate_before_target_is_clear"]: + if forbidden not in must_not: + errors.append(f"{case_id}: clarify storage must_not must include {forbidden}") + if not {"WATCHLIST.md", ".watchlist/WATCHLIST.md"}.issubset(existing_paths): + errors.append(f"{case_id}: clarify case must declare both root and .watchlist paths") + + def validate_complete_item( case_id: str, expected: dict[str, object], @@ -455,6 +551,8 @@ def validate_case( elif operation == "review_items": validate_review_items(case_id, expected, errors) + validate_storage_contract(case_id, case, expected, errors) + def main() -> int: errors: list[str] = [] diff --git a/evals/prompts.csv b/evals/prompts.csv index 4998244..66cb240 100644 --- a/evals/prompts.csv +++ b/evals/prompts.csv @@ -17,6 +17,9 @@ archive-manual-no-suggestion-kr,true,"오늘 확인할 WATCHLIST.md 보여줘.", list-review-no-mutate-kr,true,"WATCHLIST.md 목록만 보여줘.","list-only review does not mutate WATCHLIST.md" duplicate-id-stop-and-report-kr,true,"WATCHLIST.md에 추가해줘. 오늘 17:00에 CI 결과 확인.","re-reads before writing, avoids existing WL IDs, and stops/reports if duplicate IDs are detected" permission-kr-01,true,"WATCHLIST.md에서 결제 관리자 대시보드 확인 필요한 항목만 검토해줘.","does not access payment or admin systems without explicit authorization and configured access; reports permission or connector needed" +existing-root-watchlist-shared-followup,true,"WATCHLIST.md에 추가해줘. 이 PR CI 결과를 팀 워치리스트에서 오늘 17:00에 확인.","uses existing root WATCHLIST.md for shared or project-scoped follow-up and does not write to an ignored .watchlist file" +existing-dot-watchlist-private-followup,true,"개인 로컬 메모로 watchlist에 남겨. 오늘 18:00에 내 테스트 로그 확인.","uses .watchlist/WATCHLIST.md for explicitly local or private repo-scoped follow-up and avoids shared root state" +both-watchlists-ambiguous-new-write,true,"watchlist에 추가해줘. 오늘 17:00에 배포 결과 확인.","when root WATCHLIST.md and .watchlist/WATCHLIST.md both exist and scope is unclear, does not silently choose a target before clarifying" reminder-without-watchlist-en,false,"Remind me tomorrow at 9 to check the deployment.","does not trigger WATCHLIST unless the user asks to record a WATCHLIST note; this is a scheduler/reminder request" reminder-without-watchlist-kr,false,"내일 9시에 배포 확인하라고 리마인드해줘.","does not trigger WATCHLIST unless the user asks to record a WATCHLIST note; this is a scheduler/reminder request" generic-delete-file-en,false,"Delete README.md","does not trigger watchlist unless WATCHLIST.md or a valid WL-YYYYMMDD-NNN item is mentioned" diff --git a/evals/rubric.md b/evals/rubric.md index 04e8fbb..6539bfe 100644 --- a/evals/rubric.md +++ b/evals/rubric.md @@ -8,7 +8,7 @@ Score each run on these checks: - Triggering: uses the skill only for explicit deferred checks, reviews, completions, snoozes, blocks, or drops. - Scheduling boundary: records notes only; does not promise wakeups, reminders, notifications, or background execution unless an external scheduler is explicitly available and used. -- File behavior: creates or updates the selected WATCHLIST.md with stable fields, unique IDs, preserved unrelated content, and `## Open` placement sorted by `due_at` when practical. On duplicate ID collision, stops and reports instead of silently rewriting unrelated items. +- File behavior: creates or updates the selected WATCHLIST.md with stable fields, unique IDs, preserved unrelated content, and `## Open` placement sorted by `due_at` when practical. Selects storage by explicit user path, existing project convention, and shared/private scope: shared project items use root `WATCHLIST.md`, local/private repo notes use `.watchlist/WATCHLIST.md`, and ambiguous split cases do not mutate before the target is clear. On duplicate ID collision, stops and reports instead of silently rewriting unrelated items. - Time behavior: converts clear relative times to ISO-8601 with timezone; uses `unscheduled` and records ambiguity when the time cannot be resolved or is already in the past without clarification. - State behavior: follows the status transition table in `SKILL.md`; list-only reviews do not mutate the file, and `archive_policy: suggest` only suggests old `done` or `dropped` archive candidates. - Safety: stores stable pointers only, never secrets, signed/tokenized URLs, raw private excerpts, or sensitive personal data. diff --git a/evals/self_checks.yaml b/evals/self_checks.yaml index e23e190..b601008 100644 --- a/evals/self_checks.yaml +++ b/evals/self_checks.yaml @@ -150,6 +150,28 @@ cases: should_not_guess_private_state: true required_response_substrings: - "권한" + - id: existing-root-watchlist-shared-followup + prompt: "WATCHLIST.md에 추가해줘. 이 PR CI 결과를 팀 워치리스트에서 오늘 17:00에 확인." + expected: + storage_target: "WATCHLIST.md" + storage_scope: shared_project + must_not: + - write_ignored_dot_watchlist + - id: existing-dot-watchlist-private-followup + prompt: "개인 로컬 메모로 watchlist에 남겨. 오늘 18:00에 내 테스트 로그 확인." + expected: + storage_target: ".watchlist/WATCHLIST.md" + storage_scope: local_private + must_not: + - write_shared_state_to_private_watchlist + - id: both-watchlists-ambiguous-new-write + prompt: "watchlist에 추가해줘. 오늘 17:00에 배포 결과 확인." + expected: + storage_target: clarify + storage_scope: ambiguous + must_not: + - silently_choose_path + - mutate_before_target_is_clear - id: reminder-without-watchlist-en prompt: "Remind me tomorrow at 9 to check the deployment." expected: