Skip to content

Commit 94324f2

Browse files
authored
chore: switch from sonnet to haiku in maintainers processing [CM-1049] (#3915)
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
1 parent 2ac368b commit 94324f2

2 files changed

Lines changed: 16 additions & 6 deletions

File tree

services/apps/git_integration/src/crowdgit/services/maintainer/bedrock.py

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ async def invoke_bedrock(
7878
}
7979
)
8080

81-
modelId = "us.anthropic.claude-sonnet-4-20250514-v1:0"
81+
modelId = "us.anthropic.claude-haiku-4-5-20251001-v1:0"
8282
accept = "application/json"
8383
contentType = "application/json"
8484

@@ -107,14 +107,20 @@ async def invoke_bedrock(
107107
response_body = json.loads(body_bytes.decode("utf-8"))
108108
raw_text = response_body["content"][0]["text"].replace('"""', "").strip()
109109

110-
# Expect pure JSON - no markdown handling
110+
# Strip markdown code fences if present (Haiku sometimes ignores the system prompt)
111+
if raw_text.startswith("```"):
112+
raw_text = raw_text.split("\n", 1)[-1]
113+
if raw_text.endswith("```"):
114+
raw_text = raw_text.rsplit("```", 1)[0]
115+
raw_text = raw_text.strip()
116+
111117
output = json.loads(raw_text)
112118

113-
# Calculate cost
119+
# Calculate cost (Claude Haiku 4.5 on AWS Bedrock: $1.00/$5.00 per 1M tokens)
114120
input_tokens = response_body["usage"]["input_tokens"]
115121
output_tokens = response_body["usage"]["output_tokens"]
116-
input_cost = (input_tokens / 1000) * 0.003
117-
output_cost = (output_tokens / 1000) * 0.015
122+
input_cost = (input_tokens / 1_000_000) * 1.00
123+
output_cost = (output_tokens / 1_000_000) * 5.00
118124
total_cost = input_cost + output_cost
119125

120126
# Validate output with the provided model if it exists

services/apps/git_integration/src/crowdgit/services/maintainer/maintainer_service.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -218,12 +218,16 @@ def get_extraction_prompt(self, filename: str, content_to_analyze: str) -> str:
218218
- The person's role, with a maximum of two words (e.g., "Lead Reviewer", "Core Maintainer").
219219
- The role must be about project governance, not a generic job title like "Software Engineer".
220220
- Do not include filler words like "repository", "project", or "active".
221+
- **If the content does not assign an explicit individual role to each person** (e.g. a flat list with no per-person labels), set the title to the capitalized form of `normalized_title` (i.e. "Maintainer" or "Contributor"). Every person in the same response MUST receive the same derived title.
221222
4. `normalized_title`:
222-
- Must be exactly "maintainer" or "contributor". If the role is ambiguous, use the `<filename>` as the primary hint. For example, a file named `MAINTAINERS` or `CODEOWNERS` implies "maintainer", while `CONTRIBUTORS` implies "contributor".
223+
- Must be exactly "maintainer" or "contributor". If the role is ambiguous, use the `{filename}` as the primary hint:
224+
- Filenames containing `MAINTAINERS`, `CODEOWNERS`, `OWNERS`, or `REVIEWERS` → "maintainer"
225+
- All other filenames (AUTHORS, CONTRIBUTORS, CREDITS, COMMITTERS, etc.) → "contributor"
223226
5. `email`:
224227
- Extract the person's email address from the content. Look for patterns like `FullName <email@domain>`, `email@domain`, or email addresses in various formats.
225228
- The email must be a valid email address format (containing @ and a domain).
226229
- If no valid email can be found for the individual, use the string "unknown".
230+
- **You MUST include every person found in the content regardless of whether their email is known. Never omit a person because their email is missing.**
227231
228232
---
229233
Filename: {filename}

0 commit comments

Comments
 (0)