Today the pipeline captures how you write across all chats uniformly. But a person writes differently per domain — terse technical messages, inside jokes with one friend, formal work tone — and the README itself notes that stronger emulation needs domain-specific data. Worth exploring whether conditioning or selectively weighting the dataset on domain improves fidelity over a single blended adapter.
Directions: tag/cluster conversations by domain (topic, contact, register) during ingestion; per-domain dataset weighting or curriculum; domain-conditioned generation (system-prompt or token signal) vs. separate per-domain LoRAs; measure with the style eval whether domain tuning beats the blended baseline. Pairs with multi-LoRA personas (#23) and style embeddings (#28).
Part of the exploratory roadmap.
Today the pipeline captures how you write across all chats uniformly. But a person writes differently per domain — terse technical messages, inside jokes with one friend, formal work tone — and the README itself notes that stronger emulation needs domain-specific data. Worth exploring whether conditioning or selectively weighting the dataset on domain improves fidelity over a single blended adapter.
Directions: tag/cluster conversations by domain (topic, contact, register) during ingestion; per-domain dataset weighting or curriculum; domain-conditioned generation (system-prompt or token signal) vs. separate per-domain LoRAs; measure with the style eval whether domain tuning beats the blended baseline. Pairs with multi-LoRA personas (#23) and style embeddings (#28).
Part of the exploratory roadmap.