fix(desktop): reduce group chat task false positives#7538
Conversation
Greptile SummaryThis PR tightens the LLM task-extraction prompt to reduce false positives in public and group channels (Discord, Slack, Teams, etc.) by adding an explicit
Confidence Score: 4/5Safe to merge — the change is additive prompt text with no Swift logic modifications; the main risk is minor prompt wording ambiguity in the new section. The new section introduces one condition ("It is a direct message (DM) thread") that is logically redundant within its own heading, which could create subtle model confusion. One of the four test assertions also checks a string that already existed in the prompt before this PR, so that assertion does not guard the new text. The new CRITICAL FOR PUBLIC/GROUP CHANNELS block in TaskAssistantSettings.swift (condition 2) and the redundant assertion in TaskAssistantPromptTests.swift (line 11) deserve a second look before merging. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Screenshot captured] --> B{Conversation visible?}
B -- No --> Z[no_task_found]
B -- Yes --> C{Latest exchange pattern?}
C -- Pattern 1: User agreed --> D[extract_task]
C -- Pattern 2: Unaddressed request --> E{Public / Group channel?}
C -- No actionable pattern --> Z
E -- No, it is a DM --> D
E -- Yes --> F{Direct involvement evidence?}
F -- "@mentions user by name/handle" --> D
F -- "User already replied in thread" --> D
F -- "Cannot determine" --> Z
F -- "Merely observing" --> Z
Reviews (1): Last reviewed commit: "fix(desktop): reduce group-channel task ..." | Re-trigger Greptile |
|
|
||
| XCTAssertTrue(prompt.contains("CRITICAL FOR PUBLIC/GROUP CHANNELS")) | ||
| XCTAssertTrue(prompt.contains("visible evidence shows the user is directly involved")) | ||
| XCTAssertTrue(prompt.contains("call no_task_found")) |
There was a problem hiding this comment.
Redundant assertion doesn't cover the new section
"call no_task_found" already appears in the pre-existing MANDATORY WORKFLOW step 2 ("→ call no_task_found immediately"), so this assertion passes even if the entire new CRITICAL FOR PUBLIC/GROUP CHANNELS block is deleted from the prompt. The other three assertions (lines 9, 10, 12) are unique to the new section and do provide real regression coverage; this one adds nothing. Consider replacing it with a substring that is unique to the new rule, such as "merely observing a public channel", or simply remove it since "community at large" is already asserted on line 12.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| CRITICAL FOR PUBLIC/GROUP CHANNELS: | ||
| In Discord, Slack, Teams, community chats, and other public/group channels, extract ONLY when the visible evidence shows the user is directly involved: | ||
| - The message explicitly @mentions the user by name or handle | ||
| - It is a direct message (DM) thread, not a public or community channel | ||
| - The user has already replied in the same thread and is an active participant | ||
| If the user is merely observing a public channel, or if you cannot tell whether the request is directed at them, call no_task_found. | ||
| Do NOT extract tasks from broad bug reports, feature requests, or questions posted to the community at large. |
There was a problem hiding this comment.
Condition 2 is logically redundant within the section scope
The section header is CRITICAL FOR PUBLIC/GROUP CHANNELS, so a DM thread is already outside the scope this section addresses. Listing "It is a direct message (DM) thread, not a public or community channel" as one of the extraction-allowed conditions may confuse the model: it reads as if you need to first be in a public/group channel AND simultaneously be in a DM thread. In practice this condition belongs in the parent decision tree (before the section is even reached) rather than as a peer of the @mention and active-participant conditions. Removing or relocating it would reduce ambiguity, especially since an LLM reading the bullet list may treat the three items as an OR set whose second member contradicts the section heading.
kodjima33
left a comment
There was a problem hiding this comment.
thanks for tightening the group-chat task prompt + the regression test
Bot fit:
Summary:
Tests:
Notes: