Skip to content

Current Claude Opus 4.5 degradation is not being reflected #1

@wheelerz

Description

@wheelerz

There are a few possibilities -- among them the degradation is unevenly distributed (some accounts/keys and not others), or they are gaming your tests while degrading elsewhere.

What is the solution to this? A random distributed test generated from first principles rather than tests made statically available open-source? I haven't looked to see how you are currently handling this adversarial possibility, if at all.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions