Skip to content

Explore: domain-specific tuning beyond general writing style #45

Description

@NotYuSheng

Today the pipeline captures how you write across all chats uniformly. But a person writes differently per domain — terse technical messages, inside jokes with one friend, formal work tone — and the README itself notes that stronger emulation needs domain-specific data. Worth exploring whether conditioning or selectively weighting the dataset on domain improves fidelity over a single blended adapter.

Directions: tag/cluster conversations by domain (topic, contact, register) during ingestion; per-domain dataset weighting or curriculum; domain-conditioned generation (system-prompt or token signal) vs. separate per-domain LoRAs; measure with the style eval whether domain tuning beats the blended baseline. Pairs with multi-LoRA personas (#23) and style embeddings (#28).

Part of the exploratory roadmap.

Metadata

Metadata

Assignees

No one assigned

    Labels

    explorationExploratory research direction

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions