feat: answer-span extraction + POST /v1/answer endpoint (Phase 1.1 + 3.3) by hallelx2 · Pull Request #16 · hallelx2/vectorless-engine

hallelx2 · 2026-05-27T00:42:02Z

Summary

Lands Phase 1.1 (per-section answer-span extraction) and Phase 3.3 (POST /v1/answer) of the engine plan. Turns the whitepaper's "the answer has a verbatim quote with a page number" claim into two demonstrable features.

pkg/retrieval/span.go — SpanExtractor.Extract(ctx, content, query) runs one LLM call asking for the shortest verbatim span that answers the query, returns AnswerSpan{Start, End, Text} with byte offsets back into the content. Two-pass locator (exact substring → whitespace-normalised) handles real PDFs with weird linebreaks/spacing. Sentinel offsets (-1, -1) flag paraphrased quotes.
/v1/query opt-in: when retrieval.answer_span.enabled: true, every returned section gets an extra answer_span field. Bounded concurrency via semaphore. Failures are logged and dropped — span extraction is best-effort.
POST /v1/answer (new): single round-trip runs retrieval + per-section span extraction + a synthesis LLM call, returns {answer, citations: [{section_id, title, page_start, page_end, quote, quote_start, quote_end}], strategy, model, usage, elapsed_ms}. The most regulator-defensible endpoint — every claim carries a replayable provenance trail.

What's wired

Deps gains LLM, LLMModel, AnswerSpan, Answer fields. cmd/engine/main.go populates them from config (adds a modelFor(LLMConfig) helper).
pkg/config/config.go: AnswerSpanBlock (enabled, model, max_concurrency, max_quote_len) + AnswerBlock (model, max_sections, max_answer_tokens). Five new VLE_RETRIEVAL_ANSWER_* env overrides.
config.example.yaml: documented retrieval blocks with defaults.
openapi.yaml: /v1/answer endpoint + AnswerRequest / AnswerResponse / AnswerCitation / AnswerSpan schemas. answer_span added to QuerySection.

Test plan

go build ./...
go vet ./...
go test ./... — all green
Span extractor unit tests cover: verbatim match → resolved offsets, not-found → nil span, whitespace-normalised location, hallucinated quote → sentinel offsets, empty inputs → no LLM call, code-fence + leading-prose tolerant parser, locateQuote edge cases

Risk envelope

Opt-in via config — default-disabled. Existing /v1/query callers see no behaviour change.
/v1/answer returns 501 when no LLM client is configured (defensive — never hangs).
/v1/answer is additive; existing endpoints untouched.

Open consequences

For now /v1/answer always extracts spans (it grounds the citations); /v1/query honours the opt-in flag.
The synthesis prompt sends section content (capped at 4000 chars/section) alongside spans, so the model isn't blind when extraction returns nothing. Future optimisation: spans-only mode for cost-sensitive deployments.

Summary by Sourcery

Add configurable answer-span extraction to retrieval responses and introduce a new quote-grounded /v1/answer endpoint that runs retrieval, span extraction, and synthesis in a single LLM-backed call.

New Features:

Introduce per-section answer-span extraction that enriches /v1/query sections with an optional answer_span field containing a verbatim quote and byte offsets.
Add the /v1/answer endpoint that returns a synthesised, quote-grounded answer with structured citations, usage metadata, and elapsed time.

Enhancements:

Wire a shared LLM client and default model into the API server dependencies for non-retrieval LLM calls, with configuration blocks for answer-span and answer synthesis behaviour.
Extend OpenAPI, configuration defaults, and example config to document and control answer-span extraction and the /v1/answer endpoint.
Factor shared section-with-content handling in the query path and add concurrent span-extraction orchestration with bounded concurrency.

Tests:

Add unit tests for the span extractor, including verbatim and whitespace-normalised matches, not-found behaviour, hallucinated quotes, empty-input handling, and robust response parsing.

Phase 1.1 + 3.3 of the engine plan: turns the whitepaper's "answer has a verbatim quote with a page number" claim into two demonstrable, regulator-defensible features. pkg/retrieval/span.go - SpanExtractor.Extract(ctx, content, query) runs one LLM call, asks for the SHORTEST verbatim quote from the section that answers the query, returns AnswerSpan{Start, End, Text} with byte offsets back into content. - Two-pass locator: exact substring first, then whitespace- normalised match for content with weird linebreaks/spacing. - When the model paraphrases despite the verbatim rule, offsets are sentinel (-1, -1) but Text is preserved so callers can flag. - Schema-validated JSON output ({found, quote}) with the same tolerant parser pattern as ParseSelection (code fences, leading prose). internal/api/server.go - Deps gains LLM (llmgate.Client), LLMModel (default fallback), AnswerSpan + Answer config blocks. cmd/engine/main.go wires them. - /v1/query: when retrieval.answer_span.enabled, every returned section gets an `answer_span` field. Bounded concurrency (config) via a semaphore; failures are logged and dropped. - POST /v1/answer (NEW): single round-trip runs retrieval + per-section span extraction + a synthesis LLM call, returns {answer, citations:[{section_id, title, page_start, page_end, quote, quote_start, quote_end}], strategy, model, usage, elapsed_ms}. The model is instructed to cite by section_id. pkg/config/config.go - RetrievalConfig.AnswerSpan + .Answer blocks; defaults match behaviour: AnswerSpan disabled, Answer.MaxSections=5, Answer.MaxAnswerTokens=1024. - VLE_RETRIEVAL_ANSWER_SPAN_ENABLED / _MODEL / _MAX_CONCURRENCY and VLE_RETRIEVAL_ANSWER_MODEL / _MAX_SECTIONS env overrides follow the existing pattern. openapi.yaml - /v1/answer endpoint + AnswerRequest / AnswerResponse / AnswerCitation / AnswerSpan schemas. answer_span added as an optional field on QuerySection (omitempty in JSON). Tests: - pkg/retrieval/span_test.go covers the verbatim path, the not-found path, whitespace-normalised location, the hallucinated-quote sentinel-offset path, empty inputs, code-fence stripping, leading-prose stripping, and locateQuote edge cases. All pass. - go build ./..., go vet ./..., go test ./... — all green.

coderabbitai · 2026-05-27T00:42:08Z

Warning

Review limit reached

@hallelx2, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 11 minutes and 35 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 83b995cb-81c2-400f-9286-72a962dea5e3

📥 Commits

Reviewing files that changed from the base of the PR and between 1100390 and 39e677f.

📒 Files selected for processing (7)

cmd/engine/main.go
config.example.yaml
internal/api/server.go
openapi.yaml
pkg/config/config.go
pkg/retrieval/span.go
pkg/retrieval/span_test.go

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/answer-span-and-answer-api

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai · 2026-05-27T00:42:14Z

Reviewer's Guide

Implements configurable answer-span extraction for /v1/query sections and adds a new /v1/answer endpoint that performs retrieval, per-section span extraction, and answer synthesis via an LLM with grounded citations and usage accounting.

Sequence diagram for POST /v1/answer end-to-end flow

sequenceDiagram
    actor Client
    participant APIServer as APIServer_handleAnswer
    participant DB as DB_LoadTree
    participant Strategy as RetrievalStrategy
    participant Storage as Storage_Get
    participant SpanExtractor as SpanExtractor
    participant LLM as LLM_Client

    Client->>APIServer: POST /v1/answer (AnswerRequest)
    APIServer-->>Client: [if no LLM] 501
    APIServer-->>Client: [if no strategy] 503

    APIServer->>DB: LoadTree(document_id)
    DB-->>APIServer: Tree or error

    APIServer->>Strategy: Select / SelectWithCost(query, budget)
    Strategy-->>APIServer: section_ids, retrievalUsage

    loop sections
        APIServer->>Storage: Get(content_ref)
        Storage-->>APIServer: sectionContent
    end

    APIServer->>SpanExtractor: Extract(ctx, content, query)
    SpanExtractor->>LLM: Complete(span extraction)
    LLM-->>SpanExtractor: span quote + usage
    SpanExtractor-->>APIServer: AnswerSpan

    APIServer->>LLM: Complete(synthesis prompt)
    LLM-->>APIServer: answerText + synthUsage

    APIServer-->>Client: AnswerResponse{answer, citations, usage}

Sequence diagram for /v1/query with optional answer-span extraction

sequenceDiagram
    actor Client
    participant APIServer as APIServer_handleQuery
    participant Strategy as RetrievalStrategy
    participant DB as DB_LoadTree
    participant Storage as Storage_Get
    participant SpanExtractor as SpanExtractor
    participant LLM as LLM_Client

    Client->>APIServer: POST /v1/query
    APIServer->>DB: LoadTree(document_id)
    DB-->>APIServer: Tree

    APIServer->>Strategy: Select(query, budget)
    Strategy-->>APIServer: section_ids

    loop sections
        APIServer->>Storage: Get(content_ref)
        Storage-->>APIServer: sectionContent
    end

    alt answer_span enabled and LLM configured
        APIServer->>SpanExtractor: Extract(ctx, content, query)
        SpanExtractor->>LLM: Complete(span extraction)
        LLM-->>SpanExtractor: AnswerSpan
        SpanExtractor-->>APIServer: AnswerSpan
    else
        APIServer-->>APIServer: [skip span extraction]
    end

    APIServer-->>Client: QueryResponse{sections[answer_span?]}

File-Level Changes

Change	Details	Files
Add answer-span extraction component and integrate it into retrieval flow and /v1/query responses.	Introduce retrieval.AnswerSpan type and SpanExtractor with LLM JSON-mode call to produce shortest verbatim quote plus byte offsets, including whitespace-normalised locator and robust response parser. Implement locateQuote, collapseWS, and mapNormToOriginal helpers to resolve quote positions in real-world PDF-like text, with sentinel offsets when the quote cannot be matched verbatim. Add unit tests for the span extractor, locator, and parser covering verbatim matches, no-evidence cases, whitespace-normalised matches, hallucinated quotes, empty-input behaviour, and fenced/leading-prose JSON outputs.	`pkg/retrieval/span.go` `pkg/retrieval/span_test.go`
Wire optional answer-span extraction into /v1/query and add a shared sectionWithContent representation.	Refactor handleQuery to materialise sections as sectionWithContent structs, optionally run span extraction concurrently with a bounded semaphore, then convert back to API maps including optional answer_span. Add sectionWithContentToMap helper to centralise section-to-JSON shaping (including page range, candidate questions, and answer_span fields). Introduce runSpansConcurrent helper that fans out extraction calls with MaxConcurrency, skips empty content, respects context cancellation, and logs non-fatal failures.	`internal/api/server.go`
Introduce new /v1/answer endpoint that runs retrieval, span extraction, and answer synthesis with grounded citations and usage accounting.	Add handleAnswer to validate requests, load the document tree, run retrieval with ContextBudget (supporting CostStrategy usage accounting), cap sections using configurable defaults, load section content, and always run span extraction. Implement synthesiseAnswer to call the shared LLM client with a grounded prompt that includes section metadata, spans, and truncated content, and return answer text plus token/cost usage. Shape AnswerResponse with answer, citations (including quote and byte offsets when available), strategy name, effective model, usage summary, and elapsed_ms; return 501 when LLM is not configured and 503 when no retrieval strategy is available.	`internal/api/server.go`
Extend API surface and configuration to support answer-span and /v1/answer features and wire LLM dependencies into the server.	Augment Deps with LLM client, default LLMModel, and AnswerSpan/Answer config blocks; wire them from engine config in cmd/engine/main.go and add a modelFor helper to derive the default chat model by driver. Expand RetrievalConfig with AnswerSpanBlock and AnswerBlock, set safe defaults in config.Default, and add environment overrides for enabling span extraction and tuning models, concurrency, and max sections. Extend OpenAPI spec with /v1/answer path, AnswerRequest/AnswerResponse/AnswerCitation/AnswerSpan schemas, and add answer_span to QuerySection; document new retrieval.answer_span and retrieval.answer blocks in config.example.yaml.	`internal/api/server.go` `cmd/engine/main.go` `pkg/config/config.go` `openapi.yaml` `config.example.yaml`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 4 issues, and left some high level feedback:

In applyEnvOverrides, you add env vars for VLE_RETRIEVAL_ANSWER_SPAN_* and VLE_RETRIEVAL_ANSWER_* but skip overrides for AnswerSpan.MaxQuoteLen and Answer.MaxAnswerTokens, which makes those settings configurable only via YAML; consider adding env hooks for these to keep configuration surfaces consistent.
Both spanExtractor and handleAnswer can end up constructing LLM requests with an empty model if no config or request model is provided; it may be more robust to validate the resolved model name early and return a clear 4xx/5xx error rather than relying on downstream llmgate failures.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `applyEnvOverrides`, you add env vars for `VLE_RETRIEVAL_ANSWER_SPAN_*` and `VLE_RETRIEVAL_ANSWER_*` but skip overrides for `AnswerSpan.MaxQuoteLen` and `Answer.MaxAnswerTokens`, which makes those settings configurable only via YAML; consider adding env hooks for these to keep configuration surfaces consistent.
- Both `spanExtractor` and `handleAnswer` can end up constructing LLM requests with an empty `model` if no config or request model is provided; it may be more robust to validate the resolved model name early and return a clear 4xx/5xx error rather than relying on downstream llmgate failures.

## Individual Comments

### Comment 1
<location path="internal/api/server.go" line_range="630-639" />
<code_context>
+	totalUsage := retrieval.Usage{}
</code_context>
<issue_to_address>
**issue (bug_risk):** LLM usage from span extraction is not included in the reported usage totals

In `handleAnswer`, `totalUsage` only aggregates `retrievalUsage` and `synthUsage`, while the `Usage` returned from span extraction (`runSpansConcurrent` → `SpanExtractor.Extract`) is dropped. This causes the `/v1/answer` `usage` block to under-report tokens, cost, and `llm_calls`. Please propagate the span-extraction `Usage` out of `runSpansConcurrent` (e.g., via an accumulator or channel) and fold it into `totalUsage` for accurate accounting.
</issue_to_address>

### Comment 2
<location path="pkg/retrieval/span.go" line_range="71-72" />
<code_context>
+		maxQuote = 400
+	}
+
+	user := fmt.Sprintf(
+		"Section content:\n---\n%s\n---\n\nUser query:\n%s\n\nReturn a JSON object with `found` (boolean) and `quote` (string, verbatim from the section, ≤ %d characters).",
+		sectionContent, query, maxQuote,
+	)
</code_context>
<issue_to_address>
**suggestion (performance):** Prompt injects full section content without regard to token budget; consider bounding or truncating by token count

`SpanExtractor.Extract` passes the full `sectionContent` into the prompt and only bounds the quote length, not the context size. For very large sections this can cause excessive prompt tokens, driving up cost and latency. Since you already have context-budgeting logic elsewhere, consider adding a similar guard here (e.g., truncate `sectionContent` by an approximate token or character limit) so span extraction remains bounded for large sections.

Suggested implementation:

```golang
	maxQuote := e.MaxQuoteLen
	if maxQuote <= 0 {
		maxQuote = 400
	}

	// Bound the section content so span extraction prompts remain within a reasonable budget.
	const maxSectionChars = 8000
	if len(sectionContent) > maxSectionChars {
		sectionContent = sectionContent[:maxSectionChars]
	}

	user := fmt.Sprintf(
		"Section content:\n---\n%s\n---\n\nUser query:\n%s\n\nReturn a JSON object with `found` (boolean) and `quote` (string, verbatim from the section, ≤ %d characters).",
		sectionContent, query, maxQuote,
	)

```

If your codebase already has a shared token/character budgeting helper (e.g., for retrieval or summarization prompts), you may want to:
1. Replace the hard-coded `maxSectionChars` with a call to that helper, or derive it from a configuration value on `SpanExtractor` (such as a context budget field if one exists).
2. Tune `maxSectionChars` or the helper to target a specific token budget for this model (e.g., estimating 3–4 chars per token).
</issue_to_address>

### Comment 3
<location path="pkg/retrieval/span.go" line_range="106-107" />
<code_context>
+	if !found || strings.TrimSpace(quote) == "" {
+		return nil, usage, nil
+	}
+	if len(quote) > maxQuote {
+		quote = quote[:maxQuote]
+	}
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Truncating strings by byte length risks cutting in the middle of UTF-8 runes

Here and in `synthesiseAnswer` (4000-char truncation), `s[:N]` truncates by bytes, not runes. With non-ASCII input this can split UTF-8 code points and yield invalid UTF-8, making offsets and rendering unreliable. Please switch to rune-aware truncation (e.g., via `[]rune` or a helper that truncates by runes while keeping byte offsets explicit).

Suggested implementation:

```golang
	if !found || strings.TrimSpace(quote) == "" {
		return nil, usage, nil
	}
	quote = truncateRunes(quote, maxQuote)

```

```golang
func truncateRunes(s string, maxRunes int) string {
	if maxRunes <= 0 {
		return ""
	}

	runes := []rune(s)
	if len(runes) <= maxRunes {
		return s
	}

	return string(runes[:maxRunes])
}

import (
	"context"

```

1. In the `synthesiseAnswer` function (the 4000-char truncation mentioned in your comment), replace any `if len(x) > N { x = x[:N] }` or equivalent with `x = truncateRunes(x, N)` to make truncation rune-aware there as well.
2. If `maxQuote` or the 4000 limit are intended as byte limits rather than character limits, adjust the helper (or add an additional helper) to enforce both a rune count and a byte-length ceiling while still avoiding cutting UTF-8 runes in half (e.g., by walking runes until the next rune would exceed the byte budget).
</issue_to_address>

### Comment 4
<location path="pkg/retrieval/span_test.go" line_range="116-125" />
<code_context>
+	}
+}
+
+func TestSpanExtractor_EmptyInput(t *testing.T) {
+	m := &spanMockLLM{reply: `{"found":true,"quote":"x"}`}
+	e := NewSpanExtractor(m, "gemini-2.5-flash")
+	if span, _, _ := e.Extract(context.Background(), "", "Q"); span != nil {
+		t.Errorf("empty content should yield nil span without an LLM call")
+	}
+	if span, _, _ := e.Extract(context.Background(), "content", ""); span != nil {
+		t.Errorf("empty query should yield nil span without an LLM call")
+	}
+}
+
+func TestParseSpanResponse_CodeFence(t *testing.T) {
</code_context>
<issue_to_address>
**suggestion (testing):** Assert that the LLM is not called on empty input to fully exercise the optimization

Right now the test only verifies that the span is nil. Since the behavior also includes *skipping* the LLM call, please also assert `m.calls == 0` after each invocation to lock in that optimization and avoid future regressions that reintroduce unnecessary LLM traffic for empty inputs.

```suggestion
func TestSpanExtractor_EmptyInput(t *testing.T) {
	m := &spanMockLLM{reply: `{"found":true,"quote":"x"}`}
	e := NewSpanExtractor(m, "gemini-2.5-flash")

	if span, _, _ := e.Extract(context.Background(), "", "Q"); span != nil {
		t.Errorf("empty content should yield nil span without an LLM call")
	}
	if m.calls != 0 {
		t.Errorf("expected no LLM calls for empty content, got %d", m.calls)
	}

	if span, _, _ := e.Extract(context.Background(), "content", ""); span != nil {
		t.Errorf("empty query should yield nil span without an LLM call")
	}
	if m.calls != 0 {
		t.Errorf("expected no LLM calls for empty query, got %d", m.calls)
	}
}
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-05-27T00:44:39Z

+	totalUsage := retrieval.Usage{}
+
+	var ids []tree.SectionID
+	var retrievalUsage retrieval.Usage
+	if cs, ok := d.Strategy.(retrieval.CostStrategy); ok {
+		res, err := cs.SelectWithCost(r.Context(), t, body.Query, budget)
+		if err != nil {
+			writeErr(w, http.StatusInternalServerError, "retrieval failed: "+err.Error())
+			return
+		}


issue (bug_risk): LLM usage from span extraction is not included in the reported usage totals

In handleAnswer, totalUsage only aggregates retrievalUsage and synthUsage, while the Usage returned from span extraction (runSpansConcurrent → SpanExtractor.Extract) is dropped. This causes the /v1/answer usage block to under-report tokens, cost, and llm_calls. Please propagate the span-extraction Usage out of runSpansConcurrent (e.g., via an accumulator or channel) and fold it into totalUsage for accurate accounting.

sourcery-ai · 2026-05-27T00:44:39Z

+	user := fmt.Sprintf(
+		"Section content:\n---\n%s\n---\n\nUser query:\n%s\n\nReturn a JSON object with `found` (boolean) and `quote` (string, verbatim from the section, ≤ %d characters).",


suggestion (performance): Prompt injects full section content without regard to token budget; consider bounding or truncating by token count

SpanExtractor.Extract passes the full sectionContent into the prompt and only bounds the quote length, not the context size. For very large sections this can cause excessive prompt tokens, driving up cost and latency. Since you already have context-budgeting logic elsewhere, consider adding a similar guard here (e.g., truncate sectionContent by an approximate token or character limit) so span extraction remains bounded for large sections.

Suggested implementation:

maxQuote := e.MaxQuoteLen if maxQuote <= 0 { maxQuote = 400 } // Bound the section content so span extraction prompts remain within a reasonable budget. const maxSectionChars = 8000 if len(sectionContent) > maxSectionChars { sectionContent = sectionContent[:maxSectionChars] } user := fmt.Sprintf( "Section content:\n---\n%s\n---\n\nUser query:\n%s\n\nReturn a JSON object with `found` (boolean) and `quote` (string, verbatim from the section, ≤ %d characters).", sectionContent, query, maxQuote, )

If your codebase already has a shared token/character budgeting helper (e.g., for retrieval or summarization prompts), you may want to:

Replace the hard-coded maxSectionChars with a call to that helper, or derive it from a configuration value on SpanExtractor (such as a context budget field if one exists).

Tune maxSectionChars or the helper to target a specific token budget for this model (e.g., estimating 3–4 chars per token).

sourcery-ai · 2026-05-27T00:44:39Z

+	if len(quote) > maxQuote {
+		quote = quote[:maxQuote]


suggestion (bug_risk): Truncating strings by byte length risks cutting in the middle of UTF-8 runes

Here and in synthesiseAnswer (4000-char truncation), s[:N] truncates by bytes, not runes. With non-ASCII input this can split UTF-8 code points and yield invalid UTF-8, making offsets and rendering unreliable. Please switch to rune-aware truncation (e.g., via []rune or a helper that truncates by runes while keeping byte offsets explicit).

Suggested implementation:

if !found || strings.TrimSpace(quote) == "" { return nil, usage, nil } quote = truncateRunes(quote, maxQuote)

func truncateRunes(s string, maxRunes int) string { if maxRunes <= 0 { return "" } runes := []rune(s) if len(runes) <= maxRunes { return s } return string(runes[:maxRunes]) } import ( "context"

In the synthesiseAnswer function (the 4000-char truncation mentioned in your comment), replace any if len(x) > N { x = x[:N] } or equivalent with x = truncateRunes(x, N) to make truncation rune-aware there as well.

If maxQuote or the 4000 limit are intended as byte limits rather than character limits, adjust the helper (or add an additional helper) to enforce both a rune count and a byte-length ceiling while still avoiding cutting UTF-8 runes in half (e.g., by walking runes until the next rune would exceed the byte budget).

sourcery-ai · 2026-05-27T00:44:39Z

+func TestSpanExtractor_EmptyInput(t *testing.T) {
+	m := &spanMockLLM{reply: `{"found":true,"quote":"x"}`}
+	e := NewSpanExtractor(m, "gemini-2.5-flash")
+	if span, _, _ := e.Extract(context.Background(), "", "Q"); span != nil {
+		t.Errorf("empty content should yield nil span without an LLM call")
+	}
+	if span, _, _ := e.Extract(context.Background(), "content", ""); span != nil {
+		t.Errorf("empty query should yield nil span without an LLM call")
+	}
+}


suggestion (testing): Assert that the LLM is not called on empty input to fully exercise the optimization

Right now the test only verifies that the span is nil. Since the behavior also includes skipping the LLM call, please also assert m.calls == 0 after each invocation to lock in that optimization and avoid future regressions that reintroduce unnecessary LLM traffic for empty inputs.

Suggested change

func TestSpanExtractor_EmptyInput(t *testing.T) {

m := &spanMockLLM{reply: `{"found":true,"quote":"x"}`}

e := NewSpanExtractor(m, "gemini-2.5-flash")

if span, _, _ := e.Extract(context.Background(), "", "Q"); span != nil {

t.Errorf("empty content should yield nil span without an LLM call")

}

if span, _, _ := e.Extract(context.Background(), "content", ""); span != nil {

t.Errorf("empty query should yield nil span without an LLM call")

}

}

func TestSpanExtractor_EmptyInput(t *testing.T) {

m := &spanMockLLM{reply: `{"found":true,"quote":"x"}`}

e := NewSpanExtractor(m, "gemini-2.5-flash")

if span, _, _ := e.Extract(context.Background(), "", "Q"); span != nil {

t.Errorf("empty content should yield nil span without an LLM call")

}

if m.calls != 0 {

t.Errorf("expected no LLM calls for empty content, got %d", m.calls)

}

if span, _, _ := e.Extract(context.Background(), "content", ""); span != nil {

t.Errorf("empty query should yield nil span without an LLM call")

}

if m.calls != 0 {

t.Errorf("expected no LLM calls for empty query, got %d", m.calls)

}

}

Copilot AI review requested due to automatic review settings May 27, 2026 00:42

Copilot started reviewing on behalf of hallelx2 May 27, 2026 00:42 View session

hallelx2 merged commit d92db83 into main May 27, 2026
6 of 9 checks passed

hallelx2 deleted the feat/answer-span-and-answer-api branch May 27, 2026 00:43

sourcery-ai Bot reviewed May 27, 2026

View reviewed changes

hallelx2 review requested due to automatic review settings May 27, 2026 01:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: answer-span extraction + POST /v1/answer endpoint (Phase 1.1 + 3.3)#16

feat: answer-span extraction + POST /v1/answer endpoint (Phase 1.1 + 3.3)#16
hallelx2 merged 1 commit into
mainfrom
feat/answer-span-and-answer-api

hallelx2 commented May 27, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

coderabbitai Bot commented May 27, 2026

Review limit reached

Uh oh!

sourcery-ai Bot commented May 27, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot May 27, 2026

Uh oh!

sourcery-ai Bot May 27, 2026

Uh oh!

sourcery-ai Bot May 27, 2026

Uh oh!

sourcery-ai Bot May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		user := fmt.Sprintf(
		"Section content:\n---\n%s\n---\n\nUser query:\n%s\n\nReturn a JSON object with `found` (boolean) and `quote` (string, verbatim from the section, ≤ %d characters).",

Conversation

hallelx2 commented May 27, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's wired

Test plan

Risk envelope

Open consequences

Summary by Sourcery

Uh oh!

coderabbitai Bot commented May 27, 2026

Review limit reached

Uh oh!

sourcery-ai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for POST /v1/answer end-to-end flow

Sequence diagram for /v1/query with optional answer-span extraction

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hallelx2 commented May 27, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented May 27, 2026 •

edited

Loading