Skip to content

Refactor example assembly #379

@kendrickb-nvidia

Description

@kendrickb-nvidia

Priority Level

Medium

Task Summary

Reduce brittleness and per model required customization by always building examples as strings and call the tokenizer directly. Do not concatenate token ids ourselves.

Other goals and requirements:

  • Reduce duplicate code in assembler.py
  • Examples after assembly should round trip tokenizer's decode and encode with zero changes (not true today)
  • Test that structured generation patterns match assembled examples

Technical Details & Implementation Plan

No response

Dependencies

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    taskDevelopment task

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions