From 11523570e4e9f53eeb319a200a9d550f0e54c507 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 9 Mar 2026 12:22:12 +0000 Subject: [PATCH 1/4] Initial plan From 990cf5e96af1892b939bdb2119b886ebb09076a4 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 9 Mar 2026 12:26:01 +0000 Subject: [PATCH 2/4] docs: adapt AGENTS.md and skills from tibblify (#8) Co-authored-by: jonthegeek <33983824+jonthegeek@users.noreply.github.com> --- .github/skills/implement-issue/SKILL.md | 57 +++++++ .github/skills/r-code/SKILL.md | 138 +++++++++++++++++ .github/skills/tdd-workflow/SKILL.md | 197 ++++++++++++++++++++++++ AGENTS.md | 59 ++++++- 4 files changed, 446 insertions(+), 5 deletions(-) create mode 100644 .github/skills/implement-issue/SKILL.md create mode 100644 .github/skills/r-code/SKILL.md create mode 100644 .github/skills/tdd-workflow/SKILL.md diff --git a/.github/skills/implement-issue/SKILL.md b/.github/skills/implement-issue/SKILL.md new file mode 100644 index 0000000..fe794f8 --- /dev/null +++ b/.github/skills/implement-issue/SKILL.md @@ -0,0 +1,57 @@ +--- +name: implement-issue +description: Implements a GitHub issue end-to-end. Use when asked to implement, work on, or fix a specific issue number. +compatibility: Requires the `gh` CLI and an authenticated GitHub session. +--- + +# Implement a GitHub issue + +This skill wraps the Standard Workflow defined in `AGENTS.md`. Run the steps +below before and after that workflow. + +## Before the standard workflow + +**A. Read the issue in full:** + +```bash +gh issue view {number} +``` + +If `gh` is not authenticated, stop and ask the user to authenticate before continuing. + +**B. Check/create the branch:** + +- If on `main`: `usethis::pr_init("fix-{number}-{description}")` +- Branch format: `fix-{number}-{description}` + - Parts separated by hyphens; `{description}` uses snake_case + - Example: `fix-279-ai_infrastructure` +- If a branch already exists for this issue, check it out instead + +## Run the Standard Workflow from AGENTS.md + +Steps 1–9 of the Standard Workflow are the core development loop. + +## After the standard workflow + +**C. Commit and push:** + +1. Review commits already on this branch (not on `main`) — these are all part + of the same PR and should inform the PR description: + ```bash + git log main..HEAD --oneline + ``` +2. Stage and commit all changes: + ```bash + git add -A + git commit -m "{short imperative summary}" + ``` +3. Push and open the PR: + ```bash + gh pr create --fill + ``` + Use `--title` and `--body` explicitly if `--fill` produces an inadequate + description. + +This step may be overridden — the user may ask you to stop before committing, +handle the push themselves, or complete only part of the workflow. Always follow +explicit user instructions over these defaults. diff --git a/.github/skills/r-code/SKILL.md b/.github/skills/r-code/SKILL.md new file mode 100644 index 0000000..b807fbd --- /dev/null +++ b/.github/skills/r-code/SKILL.md @@ -0,0 +1,138 @@ +--- +name: r-code +description: Guide for writing R code in datawrap. Use when writing new functions, designing APIs, or reviewing/modifying existing R code. +--- + +# R code in datawrap + +This skill covers how to design and write R functions in this package — +including signatures, API conventions, error handling, and common pitfalls. +For documenting functions, use the `document` skill. For tests, use the +`tdd-workflow` skill. + +## General API design patterns + +**Enum-like arguments** — declare choices as the default vector; resolve with +`arg_match0()` (exact match, no partial matching) at the top of the function: + +```r +my_function <- function(x, mode = c("fast", "safe")) { + mode <- arg_match0(mode, c("fast", "safe")) + # mode is now guaranteed to be "fast" or "safe" +} +``` + +**`NULL` as "not provided"** — use `NULL` as the default for optional +arguments where there is no sensible scalar fallback; check with `is.null()`: + +```r +my_function <- function(x, names_to = NULL) { + if (!is.null(names_to)) { ... } +} +``` + +**S3 object construction** — build as a named list, set class explicitly: + +```r +.my_object <- function(x, type) { + out <- list(value = x, type = type) + class(out) <- c(paste0("my_", type), "my_object") + out +} +``` + +**`.call` propagation in internal validators** — helpers that validate +arguments and may throw errors should accept and forward `.call`: + +```r +.check_something <- function(x, .call = caller_env()) { + if (bad(x)) cli::cli_abort("...", call = .call) +} + +my_exported_fn <- function(x, .call = caller_env()) { + .check_something(x, .call) +} +``` + +**Return tibbles, not data frames:** + +```r +my_function <- function(x) { + result |> tibble::as_tibble() +} +``` + +## Internal vs. exported functions + +Export a function when: +- Users will call it directly +- Other packages may want to extend it +- It is a stable, intentional part of the API + +Keep a function internal when: +- It is an implementation detail that may change +- It is only used within the package +- Exporting it would clutter the user-facing API + +Internal helpers use a dot prefix (e.g. `.my_helper()`). Mark them with +`@keywords internal` and omit `@export`. + +## Error handling + +```r +cli::cli_abort( + "Input {.arg x} cannot be empty.", + "i" = "Provide a non-empty vector.", + call = caller_env() +) +``` + +- Always pass `call = caller_env()` (or `call = .call` or `call = call`, if a + call is passed to the function) so errors point to the user's call frame, not + an internal helper. +- Use custom error classes for errors that callers may want to catch + programmatically: + ```r + cli::cli_abort( + "...", + class = c("my_specific_error", "datawrap_error"), + call = caller_env() + ) + ``` + +## Common package mistakes + +```r +# Never use library() inside package code +library(dplyr) # Wrong +dplyr::filter(...) # Right +# or `@importFrom dplyr filter` if used extensively + +# Never modify global state without restoring it +options(my_option = TRUE) # Wrong +withr::local_options(list(my_option = TRUE)) # Right + +# Use system.file() for package data, not hardcoded paths +read.csv("/home/user/data.csv") # Wrong +system.file("extdata", "data.csv", package = "datawrap") # Right +``` + +## Dependencies + +### Use existing imports first + +Packages that are already in `Imports` in @DESCRIPTION should be used in +preference to base R equivalents when relevant. + +For example: prefer `rlang::is_*()` predicates over `is.*()`, and +`withr::local_*()` over manual `on.exit()` state management. + +### When to add a new dependency + +Add a dependency when it provides significant functionality that would be +complex or brittle to reimplement — date parsing, web requests, complex string +manipulation. Stick with base R when the solution is straightforward and a +dependency would add overhead for little gain. + +datawrap's existing dependencies are intentional. **Adding a new dependency +requires explicit discussion with the developer.** diff --git a/.github/skills/tdd-workflow/SKILL.md b/.github/skills/tdd-workflow/SKILL.md new file mode 100644 index 0000000..28512d7 --- /dev/null +++ b/.github/skills/tdd-workflow/SKILL.md @@ -0,0 +1,197 @@ +--- +name: tdd-workflow +description: Test-driven development workflow for datawrap. Use when writing any R code (writing new features, fixing bugs, refactoring, or reviewing tests). +--- + +# TDD workflow for datawrap + +## Core principle + +Write a failing test first, then implement the minimal code to make it pass, +then refactor. Never write implementation code without a failing test driving it. + +## File naming + +Tests for `R/{name}.R` go in `tests/testthat/test-{name}.R`. Place new tests +next to similar existing ones. + +## Running tests + +```r +# Full suite +devtools::test(reporter = "check") + +# Single file +devtools::test(filter = "name", reporter = "check") +``` + +Testing functions load code automatically. You do not need to call `library()` or `devtools::load_all()` separately. + +## Coverage + +Goal: **100%** for every edited file. After editing `R/file_name.R`, verify: + +```r +covr_res <- devtools:::test_coverage_active_file("R/file_name.R") +which(vapply(covr_res, `[[`, numeric(1), "value") == 0) +``` + +Files excluded from the coverage requirement: +- `R/datawrap-package.R` + +## Test types + +### Unit tests + +Test individual functions in isolation: + +```r +test_that("create_dataset_dictionary() returns a data frame (#1)", { + result <- create_dataset_dictionary(mtcars) + expect_s3_class(result, "data.frame") +}) +``` + +### Integration tests + +Test end-to-end pipelines through the full function → output path: + +```r +test_that("create_dataset_dictionary() round-trips through describe_dataset() (#1)", { + dict <- create_dataset_dictionary(mtcars) + desc <- describe_dataset(dict) + expect_type(desc, "character") +}) +``` + +### Snapshot tests + +For complex outputs that are hard to specify with equality assertions: + +```r +test_that("print method is stable (#123)", { + expect_snapshot(print(create_dataset_dictionary(mtcars))) +}) + +# For errors, wrap expect_error() inside expect_snapshot() so both the error +# class and the message text are captured in the snapshot: +test_that("errors on invalid input (#456)", { + expect_snapshot({ + (expect_error( + create_dataset_dictionary("not a data frame"), + class = "datawrap_error" + )) + }) +}) +``` + +When snapshots change intentionally: + +```r +testthat::snapshot_review("test_name") +testthat::snapshot_accept("test_name") +``` + +Snapshots are stored in `tests/testthat/_snaps/`. + +## Test design principles + +- **Self-sufficient:** each test contains its own setup, execution, and + assertion. Tests must be runnable in isolation. +- **Duplication over factoring:** repeat setup code rather than extracting it. + Clarity beats DRY in tests. +- **One concept per test:** a failing test should tell you exactly what broke. +- **Issue reference in description:** the `desc` of every new `test_that()` call + should end with a parenthetical issue reference: + ```r + test_that("create_dataset_dictionary() returns correct columns (#1)", { ... }) + test_that("errors on empty input (#noissue)", { ... }) + ``` + +## testthat Edition 3 — deprecated patterns + +```r +# Deprecated → Modern +context("Data validation") # Remove — filename serves this purpose +expect_equivalent(x, y) # expect_equal(x, y, ignore_attr = TRUE) +with_mock(...) # local_mocked_bindings(...) +expect_is(x, "data.frame") # expect_s3_class(x, "data.frame") +``` + +## Essential expectations + +### Equality & identity + +```r +expect_equal(x, y) # with numeric tolerance +expect_equal(x, y, tolerance = 0.001) +expect_equal(x, y, ignore_attr = TRUE) +expect_identical(x, y) # exact match +``` + +### Conditions + +```r +expect_error(code) +expect_error(code, "pattern") +expect_error(code, class = "datawrap_error") +expect_warning(code) +expect_no_warning(code) +expect_message(code) +expect_no_message(code) +``` + +### Collections + +```r +expect_setequal(x, y) # same elements, any order +expect_in(element, set) +expect_named(x, c("a", "b")) +``` + +### Type & structure + +```r +expect_type(x, "double") +expect_s3_class(x, "tbl_df") +expect_length(x, 10) +``` + +### Logical + +```r +expect_true(x) +expect_false(x) +expect_null(x) +``` + +## `withr` patterns for temporary state + +```r +withr::local_options(list(datawrap.option = TRUE)) +withr::local_envvar(MY_VAR = "value") +withr::local_tempfile(lines = c("a", "b")) +``` + +## Fixtures + +Store static test data in `tests/testthat/fixtures/` and access via: + +```r +test_path("fixtures", "sample.rds") +``` + +## Mocking + +```r +local_mocked_bindings( + external_fn = function(...) "mocked_result" +) +result <- my_function_that_calls_external_fn() +``` + +## Common mistakes + +- **Do not modify tests to make them pass.** Fix the implementation. +- **Do not write tests that depend on other tests' state.** Each test must + be independently runnable. diff --git a/AGENTS.md b/AGENTS.md index 3760f6a..1781bb8 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -18,6 +18,43 @@ --- +## Standard workflow + +For any feature, fix, or refactor: + +1. **Update packages**: `pak::pak()` +2. **Run existing tests** — confirm everything passes before touching code: + `devtools::test(reporter = "check")` + If any tests fail, stop and ask the user how to proceed. +3. **Plan** — identify which R files are affected; check whether new exported + functions are needed (→ r-code skill) +4. **Test first** — write a failing test, then implement (→ tdd-workflow skill): + `devtools::test(filter = "name", reporter = "check")` — should fail +5. **Implement** — minimal code to make the tests pass +6. **Refactor** — clean up while keeping all tests green +7. **Document** — for any new or changed exported functions, use the document skill +8. **Verify**: + ```r + devtools::test(reporter = "check") + covr_res <- devtools:::test_coverage_active_file("R/file_name.R") + which(vapply(covr_res, `[[`, numeric(1), "value") == 0) + ``` + Then run `air format .` + Once, just before wrapping up: `devtools::check(error_on = "warning")`. + Warnings and errors must be resolved. NOTEs should be resolved too. +9. **News** — add a bullet at the top of `NEWS.md` (under the development version + heading). Rules: + - Only for user-facing changes; skip purely internal changes. + - One line per bullet; end with a full stop. + - Write for users, not developers. Frame positively, present tense: + `` * `create_dataset_dictionary()` now accepts ... `` not `* Fixed a bug where ...` + - Put the function name (in backticks with `()`) near the start. + - Issue number and contributor go in parentheses before the final period: + `` * `create_dataset_dictionary()` now accepts ... (@{username}, #{issue_number}). `` + - Get username: `gh api user --jq .login` + +--- + ## GitHub Use the `gh` CLI to interact with GitHub rather than fetching web URLs. Common examples: @@ -51,15 +88,27 @@ with `@eval` in package dataset documentation. --- +## General + +- When running R from the console, use `--quiet --vanilla`. +- Always run `air format .` after generating code. +- Code comments should explain *why*, not *what*. Omit comments that restate the code. +- When writing or reviewing any code, load the relevant skills (usually `r-code`, `tdd-workflow`, and `document`). + +--- + ## Skills Skills in @.github/skills should be loaded when the user triggers them. -| Triggers | Path | -------------------------|------------------------------------------------| -| document functions | @.github/skills/document/SKILL.md | -| search / rewrite code | @.github/skills/search-code/SKILL.md | -| create github issues | @.github/skills/create-issue/SKILL.md | +| Triggers | Path | +|---------------------------------------------------|------------------------------------------------| +| document functions | @.github/skills/document/SKILL.md | +| search / rewrite code | @.github/skills/search-code/SKILL.md | +| create github issues | @.github/skills/create-issue/SKILL.md | +| implement issue / work on #NNN | @.github/skills/implement-issue/SKILL.md | +| writing R functions / API design / error handling | @.github/skills/r-code/SKILL.md | +| writing or reviewing tests | @.github/skills/tdd-workflow/SKILL.md | ## File Organization From dadcee3feac0516da6329d4ef5c4432cdfdde358 Mon Sep 17 00:00:00 2001 From: Jon Harmon Date: Mon, 9 Mar 2026 07:50:26 -0500 Subject: [PATCH 3/4] Tweaks from tibblify versions. --- .Rbuildignore | 1 + .github/skills/r-code/SKILL.md | 3 +- .github/skills/tdd-workflow/SKILL.md | 3 +- AGENTS.md | 49 ++-------------------------- 4 files changed, 8 insertions(+), 48 deletions(-) diff --git a/.Rbuildignore b/.Rbuildignore index 1f85bbf..252789d 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -12,3 +12,4 @@ ^\.vscode$ ^AGENTS\.md$ ^\.positai$ +^\.claude$ diff --git a/.github/skills/r-code/SKILL.md b/.github/skills/r-code/SKILL.md index b807fbd..e10e768 100644 --- a/.github/skills/r-code/SKILL.md +++ b/.github/skills/r-code/SKILL.md @@ -124,7 +124,8 @@ system.file("extdata", "data.csv", package = "datawrap") # Right Packages that are already in `Imports` in @DESCRIPTION should be used in preference to base R equivalents when relevant. -For example: prefer `rlang::is_*()` predicates over `is.*()`, and +For example: prefer `purrr::map()` over `lapply()`, `rlang::is_*()` predicates +over `is.*()`, `vctrs::vec_*()` over base length/NA checks, and `withr::local_*()` over manual `on.exit()` state management. ### When to add a new dependency diff --git a/.github/skills/tdd-workflow/SKILL.md b/.github/skills/tdd-workflow/SKILL.md index 28512d7..2c06093 100644 --- a/.github/skills/tdd-workflow/SKILL.md +++ b/.github/skills/tdd-workflow/SKILL.md @@ -33,11 +33,12 @@ Goal: **100%** for every edited file. After editing `R/file_name.R`, verify: ```r covr_res <- devtools:::test_coverage_active_file("R/file_name.R") -which(vapply(covr_res, `[[`, numeric(1), "value") == 0) +which(purrr::map_int(covr_res, "value") == 0) ``` Files excluded from the coverage requirement: - `R/datawrap-package.R` +- Files matching `R/import-standalone-*.R` ## Test types diff --git a/AGENTS.md b/AGENTS.md index 1781bb8..b9cce39 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -37,7 +37,7 @@ For any feature, fix, or refactor: ```r devtools::test(reporter = "check") covr_res <- devtools:::test_coverage_active_file("R/file_name.R") - which(vapply(covr_res, `[[`, numeric(1), "value") == 0) + which(purrr::map_int(covr_res, "value") == 0) ``` Then run `air format .` Once, just before wrapping up: `devtools::check(error_on = "warning")`. @@ -91,7 +91,7 @@ with `@eval` in package dataset documentation. ## General - When running R from the console, use `--quiet --vanilla`. -- Always run `air format .` after generating code. +- Always run `air format .` after generating R code. - Code comments should explain *why*, not *what*. Omit comments that restate the code. - When writing or reviewing any code, load the relevant skills (usually `r-code`, `tdd-workflow`, and `document`). @@ -99,7 +99,7 @@ with `@eval` in package dataset documentation. ## Skills -Skills in @.github/skills should be loaded when the user triggers them. +Load skills from @.github/skills when the user triggers them. | Triggers | Path | |---------------------------------------------------|------------------------------------------------| @@ -113,46 +113,3 @@ Skills in @.github/skills should be loaded when the user triggers them. ## File Organization Each exported function should be defined in its own file named `R/{function_name}.R`. For example, `create_dataset_dictionary()` belongs in `R/create_dataset_dictionary.R`. Any helper functions used exclusively by that exported function should also live in the same file. General-purpose helpers shared across multiple functions belong in `R/utils.R`. - -## Testing - -### Workflow: Red-Green-Refactor TDD - -Unless explicitly told otherwise, follow strict red-green-refactor TDD for all development: - -1. **Red** — Write a failing test that captures the desired behavior. Run it to confirm it fails for the right reason. -2. **Green** — Write the minimal implementation needed to make the test pass. Run the test to confirm it passes. -3. **Refactor** — Clean up code and tests while keeping all tests green. Run the full test suite after each refactor step. - -Never write implementation code before there is a failing test for it. - -### General rules - -- Before starting any coding task, run the relevant tests and check coverage so you know the baseline state. -- Always run `air format .` before running tests, after every R file edit. -- Tests for `R/{name}.R` go in `tests/testthat/test-{name}.R`. -- Use `devtools::test(reporter = "check")` to run all tests -- Use `devtools::test(filter = "name", reporter = "check")` to run tests for `R/{name}.R` -- All testing functions automatically load code; you don't need to. -- All new code should have an accompanying test. -- If there are existing tests, place new tests next to similar existing tests. -- Test descriptions should reference an issue that they are closed with "(#123)" (for issue 123), etc. If the test doesn't relate to a particular issue (for example, if it's testing an underlying piece of infrastructure), tag it with "(#noissue)". - -### Test coverage - -The goal is 100% file-level test coverage across all R source files. After editing a file, ensure that it still has 100% test coverage. - -To check coverage for a single file: - -```r -covr_res <- devtools:::test_coverage_active_file("R/file_name.R") -which(vapply(covr_res, `[[`, numeric(1), "value") == 0) -``` - -The following files are intentionally excluded from coverage requirements (no associated tests): - -- `R/datawrap-package.R` - -## Documentation - -After adding or changing functions, create or update their documentation. See @.github/skills/document/SKILL.md for details on documentation for this project. From 8544880fc0edd9021f513398ba067868850b7cde Mon Sep 17 00:00:00 2001 From: Jon Harmon Date: Mon, 9 Mar 2026 08:19:45 -0500 Subject: [PATCH 4/4] fix: copilot suggestions Use more realistic example. Remove irrelevant integration test example. --- .github/skills/tdd-workflow/SKILL.md | 18 +++--------------- 1 file changed, 3 insertions(+), 15 deletions(-) diff --git a/.github/skills/tdd-workflow/SKILL.md b/.github/skills/tdd-workflow/SKILL.md index 2c06093..b5b18d0 100644 --- a/.github/skills/tdd-workflow/SKILL.md +++ b/.github/skills/tdd-workflow/SKILL.md @@ -47,21 +47,9 @@ Files excluded from the coverage requirement: Test individual functions in isolation: ```r -test_that("create_dataset_dictionary() returns a data frame (#1)", { - result <- create_dataset_dictionary(mtcars) - expect_s3_class(result, "data.frame") -}) -``` - -### Integration tests - -Test end-to-end pipelines through the full function → output path: - -```r -test_that("create_dataset_dictionary() round-trips through describe_dataset() (#1)", { - dict <- create_dataset_dictionary(mtcars) - desc <- describe_dataset(dict) - expect_type(desc, "character") +test_that("create_dataset_dictionary returns a tibble (#2)", { + result <- create_dataset_dictionary(data.frame(x = 1L)) + expect_s3_class(result, "tbl_df") }) ```