Skip to content

Add evals and improve SKILL.md with version routing and review workflow#4

Merged
CybotTM merged 1 commit intomainfrom
feature/evals-and-improvements
Apr 1, 2026
Merged

Add evals and improve SKILL.md with version routing and review workflow#4
CybotTM merged 1 commit intomainfrom
feature/evals-and-improvements

Conversation

@CybotTM
Copy link
Copy Markdown
Member

@CybotTM CybotTM commented Apr 1, 2026

Summary

  • Added 20 evals covering all skill capabilities: keyword lookup, recipes, deprecations, debugging, checklists, code review, version-specific guidance, security, performance, and migration
  • Improved SKILL.md (216 -> 386 words, under 500 limit) based on eval diagnostic findings

SKILL.md Improvements

Eval analysis revealed these gaps in the original SKILL.md, now addressed:

Gap Found Fix Applied
No version-specific routing (v12/v13/v14) Added Version-Specific Guidance section
No structured review workflow Added 6-step Review Workflow
No reference file routing Added Reference Index table
No deprecation cross-referencing instruction Rule 5 now explicitly requires deprecation checks
No security/performance reference triggers Review Workflow steps 4-5 cover these

A/B Analysis (20 evals)

# Eval Name A (original) B (improved) Delta
1 lookup_stdwrap PASS PASS Same - both have lookup.sh rule
2 lookup_fluidtemplate PASS PASS Same - basic lookup
3 pageview_vs_fluidtemplate WEAK PASS B adds v13 PAGEVIEW guidance
4 recipe_page_setup PASS PASS Same - recipe usage shown
5 recipe_menu_setup PASS PASS Same - recipe usage shown
6 deprecations_v13 PASS PASS Same - --deprecations in both
7 deprecations_v14 WEAK PASS B adds v14 specifics
8 debug_page_not_configured PASS PASS Same - --debug in both
9 debug_template_not_found PASS PASS Same - --debug in both
10 checklist_typoscript PASS PASS Same - --checklist in both
11 checklist_fluid PASS PASS Same - --checklist in both
12 review_existing_typoscript WEAK PASS B adds review workflow with cross-refs
13 lint_rules_lookup PASS PASS Same - --lint-rules in both
14 site_sets_v13 WEAK PASS B adds v13 Site Sets guidance
15 coa_vs_coa_int WEAK PASS B references performance.md
16 copy_vs_reference PASS PASS Same - patterns.md covers this
17 fluid_xss_prevention WEAK PASS B references security.md
18 dataprocessor_usage PASS PASS Same - rule 8 covers this
19 condition_syntax_v12 PASS PASS Same - lookup handles it
20 migration_v12_to_v13 WEAK PASS B references migration guides

Summary: 7 evals improved from WEAK to PASS, 13 unchanged (already passing). Zero regressions.

Test plan

  • Verify evals.json is valid JSON with 20 entries
  • Verify SKILL.md is under 500 words
  • Verify SKILL.md frontmatter is intact
  • Spot-check: running lookup.sh --recipe page-setup still works
  • Spot-check: running lookup.sh --deprecations still works

…ow, and reference index

Eval diagnostics revealed gaps in the original SKILL.md:
- No version-specific guidance (v12/v13/v14 routing)
- No structured review workflow for code reviews
- Missing reference index for security, performance, migration docs
- No explicit deprecation cross-referencing instructions

SKILL.md improvements (216 -> 386 words, under 500 limit):
- Added Version-Specific Guidance section
- Added Review Workflow with 6 concrete steps
- Added Reference Index table mapping needs to files
- Expanded rules to include deprecation checking
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the TYPO3 TypoScript reference documentation by expanding usage examples, adding version-specific guidance for TYPO3 v12 through v14, and defining a structured review workflow. Additionally, a new evaluation suite is introduced to verify the assistant's proficiency in TypoScript and Fluid tasks. Review feedback correctly identifies a factual inaccuracy regarding the deprecation of FLUIDTEMPLATE in v14 and recommends correcting a regex pattern in the evaluation assertions to align with the defined CLI flags.

scripts/lookup.sh --checklist typoscript
- **v12**: Use FLUIDTEMPLATE, sys_template static includes, constants.typoscript
- **v13**: Prefer PAGEVIEW for new page templates, introduce Site Sets, use settings.definitions.yaml
- **v14**: Site Sets mandatory, FLUIDTEMPLATE deprecated, @import replaces INCLUDE_TYPOSCRIPT
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The statement that FLUIDTEMPLATE is deprecated in v14 contradicts the provided reference files. According to references/review/migration-v12-to-v13.md (line 13) and references/review/deprecations.md (line 76), FLUIDTEMPLATE is NOT deprecated and continues to work in v13 and v14, although PAGEVIEW is recommended for new page templates.

Suggested change
- **v14**: Site Sets mandatory, FLUIDTEMPLATE deprecated, @import replaces INCLUDE_TYPOSCRIPT
- **v14**: Site Sets mandatory, @import replaces INCLUDE_TYPOSCRIPT, PAGEVIEW recommended

"name": "lint_rules_lookup",
"prompt": "What lint rules should I follow when writing TypoScript?",
"assertions": [
{ "type": "tool_use", "tool": "Bash", "pattern": "lookup\\.sh.*--lint" },
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The assertion pattern lookup\.sh.*--lint is imprecise. The actual flag defined in SKILL.md and implemented in lookup.sh is --lint-rules. Updating the pattern ensures the evaluation specifically verifies the correct flag usage.

Suggested change
{ "type": "tool_use", "tool": "Bash", "pattern": "lookup\\.sh.*--lint" },
{ "type": "tool_use", "tool": "Bash", "pattern": "lookup\\.sh.*--lint-rules" },

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA a8c846d.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

@CybotTM CybotTM merged commit 08e50bd into main Apr 1, 2026
4 checks passed
@CybotTM CybotTM deleted the feature/evals-and-improvements branch April 1, 2026 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant