feat(blueprints): Add v5-for-weval-evaluating-ai-performance-in-women-peace-security-scenarios.yml by frantj · Pull Request #22 · weval-org/configs

frantj · 2026-03-31T19:21:39Z

This benchmark evaluates how well large language models (LLMs) integrate Women, Peace and Security (WPS) principles when advising on conflict and peace operations. This version of the benchmark (v5) contains 24 scenarios across three prompt tiers, scored against 7 criteria and 3 negative indicators.

…ing-ai-performance-in-women-peace-security-scenarios.yml on new branch

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

weval-bot · 2026-03-31T19:21:43Z

⚡ Evaluation started!

✅ blueprints/users/frantj/v5-for-weval-evaluating-ai-performance-in-women-peace-security-scenarios.yml - View Status
⚠️ Blueprint trimmed to fit PR evaluation limits (full evaluation runs after merge)

Note: 1 blueprint exceeded PR evaluation limits and was automatically trimmed:

Limited to 10 prompts, 5 models (CORE), 2 temps, 2 systems
Full evaluation with all prompts/models will run automatically after merge

Results will be posted here when complete.

Commit: 198a956

weval-bot · 2026-03-31T19:24:40Z

Evaluation complete for blueprints/users/frantj/v5-for-weval-evaluating-ai-performance-in-women-peace-security-scenarios.yml

View evaluation status | View full analysis

The blueprint has been successfully evaluated against all configured models.

frantj added 3 commits February 10, 2026 09:21

feat: initialize user blueprint directory

4f3fe3d

Merge branch 'weval-org:main' into main

ab0e978

feat(blueprints): create blueprints/users/frantj/v5-for-weval-evaluat…

198a956

…ing-ai-performance-in-women-peace-security-scenarios.yml on new branch

claude Bot reviewed Mar 31, 2026

View reviewed changes

frantj mentioned this pull request Mar 31, 2026

feat(blueprints): Add v4-evaluating-ai-performance-in-women-peace-security-scenarios.yml #21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(blueprints): Add v5-for-weval-evaluating-ai-performance-in-women-peace-security-scenarios.yml#22

feat(blueprints): Add v5-for-weval-evaluating-ai-performance-in-women-peace-security-scenarios.yml#22
frantj wants to merge 3 commits into
weval-org:mainfrom
frantj:proposal/v5-for-weval-evaluating-ai-performance-in-women-peace-security-scenarios-1774983184568

frantj commented Mar 31, 2026

Uh oh!

claude Bot left a comment

Uh oh!

weval-bot Bot commented Mar 31, 2026

Uh oh!

weval-bot Bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

frantj commented Mar 31, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

weval-bot Bot commented Mar 31, 2026

Uh oh!

weval-bot Bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant