Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
9a180ee
fix(skill-eval): trigger-eval no longer mis-records executing skills …
azalio Jun 5, 2026
20f70d7
fix(skill-eval): optimizer corrupted block-scalar SKILL.md descriptio…
azalio Jun 5, 2026
e81123e
feat(skill-eval): --model/--runs flags + model-tier trigger experiment
azalio Jun 5, 2026
52eebbb
docs(skill-eval): firm-up model-tier results revise the pilot conclusion
azalio Jun 5, 2026
6285660
feat(whole-skill): model-lever knobs + a discriminating outcome fixture
azalio Jun 5, 2026
c1fd351
docs(whole-skill): model lever on OUTCOME is absorbed by the gate (se…
azalio Jun 5, 2026
5e7c604
feat(whole-skill): capture serial retry_count in the outcome harness
azalio Jun 5, 2026
6ca941f
feat(whole-skill): weakly-gated fixture + hidden-suite scoring (model…
azalio Jun 5, 2026
6f52189
test(whole-skill): vague-contract fixture to isolate the model lever
azalio Jun 5, 2026
0d9a452
docs(whole-skill): outcome is invariant to model/gate/contract within…
azalio Jun 5, 2026
418b3ad
test(whole-skill): capability-discriminating calc fixture (expression…
azalio Jun 5, 2026
a8d3916
test(whole-skill): vague-contract calc fixture (decisive model-thresh…
azalio Jun 5, 2026
da73759
docs(whole-skill): capability probe — haiku matches bigger models on …
azalio Jun 5, 2026
84881f1
fix(whole-skill): acceptEdits in outcome harness + use orchestrator-m…
azalio Jun 5, 2026
16bd462
docs(whole-skill): CORRECTED model-outcome result (two confounds remo…
azalio Jun 6, 2026
188751d
docs(whole-skill): DECISIVE — actor model matters for code (direct-in…
azalio Jun 6, 2026
d78acd5
polish(actor): cut 247 lines of worked examples (A/B-validated, -23%)
azalio Jun 6, 2026
97fb3a9
docs(whole-skill): A/B agent polish — actor examples cuttable, monito…
azalio Jun 6, 2026
a17c2d0
polish(synthesizer): cut 95 lines of Step-7 strategy worked-examples …
azalio Jun 6, 2026
9e6d373
docs(whole-skill): agent-polish validated rule (2 wins, 2 rejects) + …
azalio Jun 6, 2026
834ba8e
polish(task-decomposer): cut 151-line REFERENCE EXAMPLES (A/B-validat…
azalio Jun 6, 2026
946b779
polish(predictor): cut 417-line <examples> block (A/B-validated, -21%)
azalio Jun 6, 2026
3fb92c2
polish(reflector): cut 139-line COMPLETE EXAMPLES (A/B-validated)
azalio Jun 6, 2026
166efb5
docs(whole-skill): comprehensive agent-polish tally (5 wins, 1049 lin…
azalio Jun 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
249 changes: 0 additions & 249 deletions .claude/agents/actor.md
Original file line number Diff line number Diff line change
Expand Up @@ -844,252 +844,3 @@ Default:
```

</Actor_Decision_Protocol>


<Actor_Reference_Examples>

## Example 1: New Feature (Backend API)

**Subtask**: "Create user registration endpoint"

**Approach**: POST /api/register with email/password validation, bcrypt hashing, JWT response. Using standard password hashing and validation patterns.

**Code Changes**:

```python
# ===== File: api/auth.py =====
from flask import request, jsonify
from werkzeug.security import generate_password_hash
from sqlalchemy.exc import IntegrityError
import jwt
import re

EMAIL_REGEX = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')

@app.route('/api/register', methods=['POST'])
def register():
"""Register new user with email and password."""
try:
data = request.get_json()
if not data:
return jsonify({'error': 'Request body required'}), 400

email = data.get('email', '').strip().lower()
password = data.get('password', '')

# Validation (fail fast)
if not email or not password:
return jsonify({'error': 'Email and password required'}), 400
if not EMAIL_REGEX.match(email):
return jsonify({'error': 'Invalid email format'}), 400
if len(password) < 8:
return jsonify({'error': 'Password must be at least 8 characters'}), 400

# Create user with hashed password
hashed = generate_password_hash(password, method='pbkdf2:sha256')
user = User(email=email, password_hash=hashed)
db.session.add(user)
db.session.commit()

# Generate JWT for immediate authentication
token = jwt.encode(
{'user_id': user.id, 'email': user.email},
app.config['SECRET_KEY'],
algorithm='HS256'
)
return jsonify({'token': token, 'user_id': user.id}), 201

except IntegrityError:
db.session.rollback()
return jsonify({'error': 'Email already registered'}), 409
except Exception as e:
db.session.rollback()
app.logger.exception("Registration error")
return jsonify({'error': 'Internal server error'}), 500
```

**Trade-offs**:
- **Decision**: pbkdf2:sha256 for hashing
- **Alternatives**: bcrypt (slower, more secure), argon2 (newest)
- **Rationale**: Good security/performance balance for current scale and threat model
- **Trade-off**: Slightly less secure than bcrypt, but adequate given expected user base and upgrade path documented

**Testing Considerations**:
1. test_register_valid_input_returns_201
2. test_register_duplicate_email_returns_409
3. test_register_invalid_email_returns_400
4. test_register_short_password_returns_400
5. test_register_missing_fields_returns_400
6. test_register_sql_injection_safe
7. test_register_token_decodes_correctly


---

## Example 2: Modification to Existing Code

**Subtask**: "Add rate limiting to existing API endpoint"

**Approach**: Token bucket rate limiting using existing Redis connection. 100 req/min per IP. Using standard Redis rate limiting patterns.

**Code Changes**:

```python
# ===== File: api/middleware.py =====
# ... (existing imports unchanged) ...
from functools import wraps
import time

# ADDED: Rate limiter class (after line 45)
class RateLimiter:
"""Token bucket rate limiter using Redis."""

def __init__(self, redis_client, limit=100, window=60):
self.redis = redis_client
self.limit = limit
self.window = window

def is_allowed(self, key: str) -> tuple[bool, int]:
"""Check if request allowed. Returns (allowed, remaining)."""
pipe = self.redis.pipeline()
now = time.time()
window_start = now - self.window

pipe.zremrangebyscore(key, 0, window_start)
pipe.zadd(key, {str(now): now})
pipe.zcard(key)
pipe.expire(key, self.window)

_, _, count, _ = pipe.execute()
remaining = max(0, self.limit - count)
return count <= self.limit, remaining


def rate_limit(limit=100, window=60):
"""Decorator for rate-limited endpoints."""
limiter = RateLimiter(redis_client, limit, window)

def decorator(f):
@wraps(f)
def wrapped(*args, **kwargs):
key = f"ratelimit:{request.remote_addr}:{f.__name__}"
allowed, remaining = limiter.is_allowed(key)

if not allowed:
return jsonify({
'error': 'Rate limit exceeded',
'retry_after': window
}), 429

response = f(*args, **kwargs)
# Add rate limit headers
if hasattr(response, 'headers'):
response.headers['X-RateLimit-Remaining'] = str(remaining)
return response
return wrapped
return decorator

# ===== File: api/endpoints.py =====
# MODIFIED: Added rate limiting decorator (line 45)

@app.route('/api/data', methods=['GET'])
@rate_limit(limit=100, window=60) # ← ADDED
def get_data():
return jsonify(fetch_data())
```

**Trade-offs**:
- **Decision**: Redis-based (not in-memory)
- **Alternatives**: In-memory dict, database counter
- **Rationale**: Shared across instances, survives restarts
- **Trade-off**: Redis dependency, but already in stack

**Testing Considerations**:
1. test_rate_limit_under_limit_passes
2. test_rate_limit_exceeded_returns_429
3. test_rate_limit_window_resets
4. test_rate_limit_per_ip_isolation
5. test_rate_limit_headers_present


---

## Example 3: Blocked Task

**Subtask**: "Add caching to user service"

**Output**:
```yaml
status: BLOCKED
reason: "Implementing caching requires modifying core/database.py outside {{allowed_scope}} (api/*)"

attempted_approaches:
- "Decorator-based caching in api/ layer"
problem: "Cache invalidation requires hooks in database write methods"
- "Read-through caching only"
problem: "Stale data unacceptable per requirements"

proposed_solutions:
1: "Expand {{allowed_scope}} to include core/database.py"
pros: "Clean implementation, proper invalidation"
cons: "Broader change scope"
2: "Reduce subtask to read-only caching (accepts staleness)"
pros: "Stays in scope"
cons: "Tech debt, data consistency issues"

recommendation: "Option 1 - clean solution worth scope expansion"
```

---

## Example 4: MCP Tool Failure Handling

**Subtask**: "Implement WebSocket reconnection logic"


**Code Changes**:
```typescript
// ===== File: lib/websocket.ts =====

export class ReconnectingWebSocket {
private ws: WebSocket | null = null;
private reconnectAttempts = 0;
private maxAttempts = 5;
private baseDelay = 1000;

connect(url: string): void {
this.ws = new WebSocket(url);

this.ws.onclose = () => {
if (this.reconnectAttempts < this.maxAttempts) {
const delay = this.baseDelay * Math.pow(2, this.reconnectAttempts);
this.reconnectAttempts++;
console.log(`Reconnecting in ${delay}ms (attempt ${this.reconnectAttempts})`);
setTimeout(() => this.connect(url), delay);
} else {
console.error('Max reconnection attempts reached');
this.onMaxRetriesExceeded?.();
}
};

this.ws.onopen = () => {
this.reconnectAttempts = 0; // Reset on successful connection
};
}

onMaxRetriesExceeded?: () => void;
}
```

**Trade-offs**:
- **Decision**: Standard exponential backoff pattern
- **Mitigation**: Added comprehensive tests, runtime version check
- **Risk**: May use outdated API - flagged for manual review

**Testing Considerations**:
1. test_reconnect_success_resets_counter
2. test_reconnect_exponential_backoff_timing
3. test_reconnect_max_attempts_triggers_callback
4. test_reconnect_handles_immediate_disconnect

</Actor_Reference_Examples>
Loading
Loading