feat: improve maintainers analysis prompt [CM-1049]#3919
Conversation
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
2 similar comments
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
There was a problem hiding this comment.
Pull request overview
Updates the LLM extraction prompt used by the maintainer analysis service to broaden extraction beyond just “maintainers” and improve identification/mapping guidance.
Changes:
- Expands prompt scope to scan the entire file and include every listed individual (across roles/sections).
- Clarifies role normalization (e.g., reviewers →
maintainer) and strengthens “don’t filter” instructions. - Expands GitHub username detection patterns (markdown profile links and noreply email variants).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| - Your final output MUST be a single JSON object. | ||
| - If maintainers are found, the JSON format must be: `{{"info": [list_of_maintainer_objects]}}` | ||
| - If no individual maintainers are found, or only teams/groups are mentioned, the JSON format must be: `{{"error": "not_found"}}` | ||
| - If no individual maintainers are found, the JSON format must be: `{{"error": "not_found"}}` |
| Your task is to extract every person listed in the file content provided below, regardless of which section they appear in. Follow these rules precisely: | ||
|
|
||
| - **Primary Directive**: First, check if the content itself contains a legend or instructions on how to parse it (e.g., "M: Maintainer, R: Reviewer"). If it does, use that legend to guide your extraction. | ||
| - **Scope**: Process the entire file. Do not stop after the first section. Every section (Maintainers, Contributors, Authors, Previous Maintainers, Reviewers, etc.) must be scanned and all listed individuals extracted. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
services/apps/git_integration/src/crowdgit/services/maintainer/maintainer_service.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>

This pull request updates the maintainer extraction prompt logic to ensure more comprehensive and accurate extraction of individual contributors from repository files. The changes expand the scope of extraction, clarify role mapping, and improve instructions for identifying GitHub usernames and emails.
Enhancements to extraction scope and instructions:
Improvements to role mapping and identification:
Note
Medium Risk
Changes the LLM prompt that drives maintainer ingestion, which can materially alter who gets extracted and persisted (including role classification) and may increase false positives/DB churn. No runtime logic changes beyond prompt text, but output variance can affect downstream maintainer updates.
Overview
Improves the maintainer-extraction LLM prompt to scan the entire file and return every listed individual (across maintainers/contributors/reviewers/etc.) instead of implicitly focusing on a single section.
Clarifies parsing rules by expanding GitHub username detection (markdown links and more
users.noreply.github.comemail formats) and explicitly mapping reviewers tonormalized_title: "maintainer", while keeping the same JSON output contract andnot_foundbehavior.Written by Cursor Bugbot for commit 3f21068. This will update automatically on new commits. Configure here.