Explore: interpretability — "what did it learn about me?"

Probe the fine-tuned model for the traits/facts it encoded about you — a fascinating learning angle and a safety check at once.

Directions: probing classifiers; activation analysis; surface (and let users correct) what the model thinks is "you". Part of the exploratory roadmap.