feat(blueprints): Add v5.1-evaluating-ai-performance-in-women-peace-security-scenarios.yml by frantj · Pull Request #24 · weval-org/configs

frantj · 2026-04-06T15:40:54Z

v5.1 - compliance release due to deprecation of negative criteria.

This benchmark evaluates how well large language models (LLMs) integrate Women, Peace and Security (WPS) principles when advising on conflict and peace operations. It contains 24 scenarios across three prompt tiers, cored against 7 positive criteria and 2 negative criteria (converted to negative 'should' statements for platform compliance).

…mance-in-women-peace-security-scenarios.yml on new branch

…men-peace-security-scenarios.yml' to 'blueprints/users/frantj/v5.1-evaluating-ai-performance-in-women-peace-security-scenarios.yml'

…valuating-ai-performance-in-women-peace-security-scenarios.yml'

weval-bot · 2026-04-06T15:40:58Z

⚡ Evaluation started!

✅ blueprints/users/frantj/v5.1-evaluating-ai-performance-in-women-peace-security-scenarios.yml - View Status
⚠️ Blueprint trimmed to fit PR evaluation limits (full evaluation runs after merge)

Note: 1 blueprint exceeded PR evaluation limits and was automatically trimmed:

Limited to 10 prompts, 5 models (CORE), 2 temps, 2 systems
Full evaluation with all prompts/models will run automatically after merge

Results will be posted here when complete.

Commit: a7072d5

weval-bot · 2026-04-06T15:44:51Z

Evaluation complete for blueprints/users/frantj/v5.1-evaluating-ai-performance-in-women-peace-security-scenarios.yml

View evaluation status | View full analysis

The blueprint has been successfully evaluated against all configured models.

frantj added 5 commits February 10, 2026 09:21

feat: initialize user blueprint directory

4f3fe3d

Merge branch 'weval-org:main' into main

ab0e978

feat(blueprints): create blueprints/users/frantj/evaluating-ai-perfor…

95480a7

…mance-in-women-peace-security-scenarios.yml on new branch

feat: rename 'blueprints/users/frantj/evaluating-ai-performance-in-wo…

cd895f5

…men-peace-security-scenarios.yml' to 'blueprints/users/frantj/v5.1-evaluating-ai-performance-in-women-peace-security-scenarios.yml'

feat: remove old file after rename to 'blueprints/users/frantj/v5.1-e…

a7072d5

…valuating-ai-performance-in-women-peace-security-scenarios.yml'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(blueprints): Add v5.1-evaluating-ai-performance-in-women-peace-security-scenarios.yml#24

feat(blueprints): Add v5.1-evaluating-ai-performance-in-women-peace-security-scenarios.yml#24
frantj wants to merge 5 commits into
weval-org:mainfrom
frantj:proposal/evaluating-ai-performance-in-women-peace-security-scenarios-1775231959253

frantj commented Apr 6, 2026

Uh oh!

weval-bot Bot commented Apr 6, 2026

Uh oh!

weval-bot Bot commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

frantj commented Apr 6, 2026

Uh oh!

weval-bot Bot commented Apr 6, 2026

Uh oh!

weval-bot Bot commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant