You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add settings to include or exclude types of code (tag support) (#232)
* feat: namespace tree-sitter capture tags for tag-based filtering
Rename all capture names in .scm query files to use a dot-separated
hierarchy (e.g., @identifier.function instead of @func_declaration)
so users can filter spell checking by tag via include_tags/exclude_tags
in codebook.toml. Add query tag reference README and document the
new config options in the main README.
* test: validate capture names in .scm files against allowed tag list
Adds a test that checks every capture name across all language queries
matches the allowed tag taxonomy (comment, string, identifier, etc.).
This prevents ad-hoc tag names from being introduced.
* feat: implement tag-based filtering for spell checking
Add include_tags/exclude_tags to ConfigSettings with prefix-based
matching. exclude_tags takes precedence over include_tags, matching
how ignore_paths takes precedence over include_paths.
The parser now extracts capture names from tree-sitter queries and
skips captures whose tags don't pass the filter. Text mode (no
tree-sitter) ignores tag filters since there are no captures.
* Allow build
* docs: add changelog entry for tag-based filtering
* Refactor settings
Copy file name to clipboardExpand all lines: README.md
+34-62Lines changed: 34 additions & 62 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -293,6 +293,19 @@ ignore_patterns = [
293
293
# Set to 2 to check words with 2 or more characters
294
294
min_word_length = 3
295
295
296
+
# Filter which parts of your code are spell-checked by tag.
297
+
# Tags use a dot-separated hierarchy (e.g., "comment", "identifier.function").
298
+
# Matching is prefix-based: "comment" matches "comment", "comment.line",
299
+
# "comment.block", etc.
300
+
#
301
+
# Only check these tags (if set, everything else is excluded)
302
+
# Default: [] (empty = check everything)
303
+
include_tags = ["comment", "string"]
304
+
#
305
+
# Exclude these tags from checking (takes precedence over include_tags)
306
+
# Default: []
307
+
exclude_tags = ["string.heredoc"]
308
+
296
309
# Whether to use global configuration (project config only)
297
310
# Set to false to completely ignore global settings
298
311
# Default: true
@@ -355,6 +368,26 @@ ignore_patterns = [
355
368
356
369
**Tip**: Include the identifier in your pattern. `'vim\.opt\.[a-z]+'` skips `showmode` in `vim.opt.showmode`, but `'vim\.opt\.'` alone won't (it only matches up to the dot).
357
370
371
+
### Tag-Based Filtering
372
+
373
+
Codebook categorizes every piece of text it checks using **tags** — dot-separated labels like `comment`, `string`, `identifier.function`, etc. You can use `include_tags` and `exclude_tags` to control which categories are spell-checked.
374
+
375
+
Matching is **prefix-based**: `"comment"` matches `comment`, `comment.line`, `comment.block`, etc. `include_tags` narrows what is checked (allowlist), and `exclude_tags` removes from that set (blocklist, takes precedence). This works the same way as `include_paths`/`ignore_paths`.
376
+
377
+
```toml
378
+
# Only check comments and strings, ignore all identifiers
379
+
include_tags = ["comment", "string"]
380
+
381
+
# Check everything except variable and parameter names
# Both can be combined: check comments and strings, but skip heredocs
385
+
include_tags = ["comment", "string"]
386
+
exclude_tags = ["string.heredoc"]
387
+
```
388
+
389
+
For the full list of available tags, see the [query tag reference](crates/codebook/src/queries/README.md).
390
+
358
391
### LSP Initialization Options
359
392
360
393
Editors can pass `initializationOptions` when starting the Codebook LSP for LSP-specific options. Refer to your editor's documentation for how to apply these options. All values are optional, omit them for the default behavior:
@@ -451,68 +484,7 @@ For plain text dictionaries, use `TextRepo::new()` instead and add to `TEXT_DICT
451
484
452
485
## Adding New Programming Language Support
453
486
454
-
Codebook uses Tree-sitter support additional programming languages. Here's how to add support for a new language:
455
-
456
-
### 1. Create a Tree-sitter Query
457
-
458
-
Each language needs a Tree-sitter query file that defines which parts of the code should be checked for spelling issues. The query needs to capture:
459
-
460
-
- Identifiers (variable names, function names, class names, etc.)
461
-
- String literals
462
-
- Comments
463
-
464
-
Create a new `.scm` file in `codebook/crates/codebook/src/queries/` named after your language (e.g., `java.scm`).
465
-
466
-
### 2. Understand the Language's AST
467
-
468
-
To write an effective query, you need to understand the Abstract Syntax Tree (AST) structure of your language. Use these tools:
469
-
470
-
-[Tree-sitter Playground](https://tree-sitter.github.io/tree-sitter/7-playground.html): Interactively explore how Tree-sitter parses code
471
-
-[Tree-sitter Visualizer](https://blopker.github.io/ts-visualizer/): Visualize the AST of your code in a more detailed way
472
-
473
-
A good approach is to:
474
-
475
-
1. Write sample code with identifiers, strings, and comments
476
-
2. Paste it into the playground/visualizer
477
-
3. Observe the node types used for each element
478
-
4. Create capture patterns that target only definition nodes, not usages
479
-
480
-
### 3. Update the Language Settings
481
-
482
-
Add your language to `codebook/crates/codebook/src/queries.rs`:
483
-
484
-
1. Add a new variant to the `LanguageType` enum
485
-
2. Add a new entry to the `LANGUAGE_SETTINGS` array with:
486
-
- The language type
487
-
- File extensions for your language
488
-
- Language identifiers
489
-
- Path to your query file
490
-
491
-
### 4. Add the Tree-sitter Grammar
492
-
493
-
Make sure the appropriate Tree-sitter grammar is added as a dependency in `Cargo.toml` and update the `language()` function in `queries.rs` to return the correct language parser.
494
-
495
-
### 5. Test Your Implementation
496
-
497
-
Run the tests to ensure your query is valid:
498
-
499
-
```bash
500
-
cargo test -p codebook queries::tests::test_all_queries_are_valid
501
-
```
502
-
503
-
Additional language tests should go in `codebook/tests`. There are many example tests to copy.
504
-
505
-
You can also test with real code files to verify that Codebook correctly identifies spelling issues in your language. Example files should go in `examples/` and contain at least one spelling error to pass integration tests.
506
-
507
-
### Tips for Writing Effective Queries
508
-
509
-
- Focus on capturing definitions, not usages
510
-
- Include only nodes that contain user-defined text (not keywords)
511
-
- Test with representative code samples
512
-
- Start simple and add complexity as needed
513
-
- Look at existing language queries for patterns
514
-
515
-
If you've successfully added support for a new language, please consider contributing it back to Codebook with a pull request!
487
+
See the [query development guide](crates/codebook/src/queries/README.md) for instructions on adding Tree-sitter queries for new languages, the tag naming convention, and tips for writing effective queries.
0 commit comments