Skip to content

Commit 6d27f8b

Browse files
committed
refactor to benchmarks and adding test decorator
1 parent 8d465bc commit 6d27f8b

File tree

12 files changed

+179
-26
lines changed

12 files changed

+179
-26
lines changed
Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,19 @@
1-
# IFEval Reward Function
1+
# IFEval Benchmark
22

33
Evaluates how well model responses follow instruction constraints. Returns a partial credit score (0.0 to 1.0).
44

5-
## Quick Start
5+
## Usage
6+
7+
### As eval-protocol benchmark test
8+
9+
```bash
10+
pytest eval_protocol/benchmarks/ifeval/test_ifeval.py -v
11+
```
12+
13+
### Standalone scoring function
614

715
```python
8-
import sys
9-
sys.path.insert(0, '/path/to/eval_protocol/rewards/ifeval')
10-
from reward import ifeval_partial_credit_reward
16+
from eval_protocol.benchmarks.ifeval import ifeval_partial_credit_reward
1117

1218
response = "Hello world! This is my response."
1319
ground_truth = {
@@ -36,15 +42,11 @@ NLTK resources are downloaded automatically on first use.
3642
## File Sources
3743

3844
**Copied from `open-instruct/open_instruct/IFEvalG/`:**
39-
- `ifeval_instructions.py` (from `instructions.py`)
40-
- `ifeval_registry.py` (from `instructions_registry.py`)
41-
- `ifeval_util.py` (from `instructions_util.py`)
45+
- `ifeval_instructions.py`, `ifeval_registry.py`, `ifeval_util.py`
4246

4347
**Copied from `IFBench/` (commit 8e6a9be, 2025-01):**
44-
- `ifbench_instructions.py` (from `instructions.py`)
45-
- `ifbench_registry.py` (from `instructions_registry.py`)
46-
- `ifbench_util.py` (from `instructions_util.py`)
48+
- `ifbench_instructions.py`, `ifbench_registry.py`, `ifbench_util.py`
4749

4850
**New code:**
49-
- `reward.py` - main reward function
50-
- `__init__.py` - package exports
51+
- `reward.py` - scoring function
52+
- `test_ifeval.py` - eval-protocol benchmark test
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
"""IFEval benchmark for evaluating instruction-following capabilities.
2+
3+
Usage:
4+
from eval_protocol.benchmarks.ifeval import ifeval_partial_credit_reward
5+
6+
score = ifeval_partial_credit_reward(response, ground_truth)
7+
"""
8+
9+
from .reward import ifeval_partial_credit_reward
10+
11+
__all__ = ["ifeval_partial_credit_reward"]

eval_protocol/benchmarks/ifeval/data/ifbench_test_sample.jsonl

Lines changed: 50 additions & 0 deletions
Large diffs are not rendered by default.

eval_protocol/rewards/ifeval/ifbench_instructions.py renamed to eval_protocol/benchmarks/ifeval/ifbench_instructions.py

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)