Skip to content

Explore: interpretability — "what did it learn about me?" #38

Description

@NotYuSheng

Probe the fine-tuned model for the traits/facts it encoded about you — a fascinating learning angle and a safety check at once.

Directions: probing classifiers; activation analysis; surface (and let users correct) what the model thinks is "you". Part of the exploratory roadmap.

Metadata

Metadata

Assignees

No one assigned

    Labels

    explorationExploratory research direction

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions