Skip to content

feat(blueprints): Add v5-for-weval-evaluating-ai-performance-in-women-peace-security-scenarios.yml#22

Open
frantj wants to merge 3 commits into
weval-org:mainfrom
frantj:proposal/v5-for-weval-evaluating-ai-performance-in-women-peace-security-scenarios-1774983184568
Open

feat(blueprints): Add v5-for-weval-evaluating-ai-performance-in-women-peace-security-scenarios.yml#22
frantj wants to merge 3 commits into
weval-org:mainfrom
frantj:proposal/v5-for-weval-evaluating-ai-performance-in-women-peace-security-scenarios-1774983184568

Conversation

@frantj
Copy link
Copy Markdown

@frantj frantj commented Mar 31, 2026

This benchmark evaluates how well large language models (LLMs) integrate Women, Peace and Security (WPS) principles when advising on conflict and peace operations. This version of the benchmark (v5) contains 24 scenarios across three prompt tiers, scored against 7 criteria and 3 negative indicators.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@weval-bot
Copy link
Copy Markdown

weval-bot Bot commented Mar 31, 2026

Evaluation started!

  • blueprints/users/frantj/v5-for-weval-evaluating-ai-performance-in-women-peace-security-scenarios.yml - View Status
    ⚠️ Blueprint trimmed to fit PR evaluation limits (full evaluation runs after merge)

Note: 1 blueprint exceeded PR evaluation limits and was automatically trimmed:

  • Limited to 10 prompts, 5 models (CORE), 2 temps, 2 systems
  • Full evaluation with all prompts/models will run automatically after merge

Results will be posted here when complete.


Commit: 198a956

@weval-bot
Copy link
Copy Markdown

weval-bot Bot commented Mar 31, 2026

Evaluation complete for blueprints/users/frantj/v5-for-weval-evaluating-ai-performance-in-women-peace-security-scenarios.yml

View evaluation status | View full analysis

The blueprint has been successfully evaluated against all configured models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant