feat: add Langfuse metrics API for per-agent token/cost stats and trace tags by sks · Pull Request #47 · stackgenhq/genie

sks · 2026-03-06T00:52:32Z

Summary

Adds the ability to query Langfuse for per-agent token usage and cost statistics, and improves trace tagging for better observability.

Langfuse Metrics API

New GetAgentStats method on langfuse.Client interface
Queries the Langfuse Metrics v1 API (/api/public/metrics) using the observations view grouped by traceName
Returns AgentUsageStats per agent: TotalCost, InputTokens, OutputTokens, TotalTokens, Count
Supports filtering by agent name and configurable lookback duration
Uses v1 endpoint (not v2) since v2 is cloud-only and unavailable on self-hosted Langfuse
Handles v1 API quirks: string-encoded token counts, nullable costs, aggregation_measure key format

Trace Tags

Added persona/agent name to langfuse.trace.tags in both AG-UI and messenger handler paths
AG-UI handler now creates a root span with langfuse.trace.name, langfuse.trace.input, and tags
Enables filtering traces by persona in the Langfuse dashboard

Vector Store Spans

Added parent spans for vectorstore.add and vectorstore.search operations
Embedding spans are now properly nested instead of appearing as orphaned root traces

Tests

12 new Ginkgo test specs covering: multi-agent response, agent name filtering, null costs, empty results, error status codes, invalid JSON, parseAgentStats, numericFromMap, and global function behavior
All 54 specs pass (existing + new)

Usage Example

// Get stats for all agents in the last 24 hours
stats, err := langfuse.GetAgentStats(ctx, langfuse.GetAgentStatsRequest{
    Duration: 24 * time.Hour,
})

// Get stats for a specific agent
stats, err := langfuse.GetAgentStats(ctx, langfuse.GetAgentStatsRequest{
    Duration:  7 * 24 * time.Hour,
    AgentName: "codeowner.chat",
})

Copilot

Pull request overview

Adds Langfuse “metrics v1” querying to retrieve per-agent token/cost usage stats, and improves trace/span structure and tagging to make Langfuse observability more useful (persona filtering + proper span nesting for vectorstore ops).

Changes:

Add Client.GetAgentStats (+ global wrapper) and implement Langfuse Metrics v1 /api/public/metrics query/parse logic.
Add Ginkgo coverage for metrics querying/parsing and global behavior.
Improve Langfuse trace tagging (persona/agent) and add parent spans for vectorstore add/search to avoid orphan spans.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
pkg/memory/vector/store.go	Adds parent spans for `vectorstore.add` / `vectorstore.search` and operation attributes.
pkg/langfuse/metrics.go	Implements Langfuse Metrics v1 query + parsing helpers and new request/response types.
pkg/langfuse/metrics_test.go	Adds test coverage for `GetAgentStats`, parsing, error cases, and global wrapper behavior.
pkg/langfuse/client.go	Extends `Client` interface and adds package-level `GetAgentStats` + noop implementation.
pkg/langfuse/langfusefakes/fake_client.go	Updates generated fake to include `GetAgentStats`.
pkg/expert/modelprovider/model.go	Removes debug logging for token tailoring enabled.
pkg/app/app.go	Adds persona tag to messenger traces and introduces AG-UI root span with Langfuse trace attributes/tags.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/langfuse/metrics.go

pkg/langfuse/client.go

pkg/app/app.go

pkg/langfuse/metrics.go

…ce tags - Add GetAgentStats to langfuse.Client interface for querying per-agent token usage and cost statistics via the Langfuse Metrics v1 API - New metrics.go with AgentUsageStats, GetAgentStatsRequest types and implementation using observations view grouped by traceName - Handle v1 API quirks: string-encoded token counts, nullable costs, aggregation_measure key format - Comprehensive test coverage (12 new specs) including null costs, string parsing, filtering, error handling - Regenerate counterfeiter fakes for updated Client interface - Add persona/agent name to langfuse.trace.tags in both AG-UI and messenger handler paths for dashboard filtering - Add vectorstore.add and vectorstore.search parent spans for cleaner Langfuse trace nesting

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

examples/devops-copilot/.genie.toml

Use OTel Baggage to propagate langfuse.trace.tags to all spans in the trace, including trpc-agent-go internal spans. The Langfuse exporter's baggageBatchSpanProcessor copies baggage members onto every span at start time, ensuring tags appear even when the library's own spans become the trace root. Also keep the explicit StringSlice attribute on our handler spans as the primary/correct format (string[] vs comma-separated string).

codecov · 2026-03-06T01:36:58Z

Codecov Report

❌ Patch coverage is 60.81633% with 96 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
pkg/orchestrator/halguard_text_gen.go	3.44%	27 Missing and 1 partial ⚠️
pkg/reactree/create_agent.go	0.00%	26 Missing ⚠️
pkg/app/app.go	41.37%	17 Missing ⚠️
pkg/langfuse/metrics.go	80.95%	9 Missing and 7 partials ⚠️
pkg/expert/modelprovider/single.go	0.00%	4 Missing ⚠️
pkg/orchestrator/orchestrator.go	57.14%	2 Missing and 1 partial ⚠️
pkg/halguard/guard_impl.go	77.77%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

…g config - Fix NewClient() not setting config field on client struct, causing GetAgentStats to generate URLs with no host (https:///api/...) - Use actual agent name from context as Langfuse trace root span name in orchestrator.Chat() instead of hardcoded 'codeowner.chat' - Update adaptive loop span in tree_v2 to use agent-prefixed name (e.g. 'devops-copilot.adaptive_loop') instead of 'reactree.adaptive_loop' - Add trace-cost-calculator example that queries Langfuse API for per-agent usage stats

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/app/app.go

pkg/reactree/tree_v2.go

docs/js/chat.js

pkg/langfuse/metrics.go

pkg/langfuse/client.go

pkg/app/app.go

- Remove langfuse.trace.tags from vectorstore.add and vectorstore.search spans — these trace-level attributes caused Langfuse to promote child spans into independent root traces - Remove langfuse.trace.input and langfuse.trace.tags from the adaptive loop span in tree_v2 — same promotion issue - Use regular span attributes (vectorstore.agent, reactree.agent) instead for observability without trace splitting - Restore agent name prefix on adaptive loop span name

Halguard's generateText() calls model.GenerateContent() directly, bypassing the trpc-agent-go runner which normally instruments LLM calls. This meant halguard pre-check and post-check model calls were invisible in Langfuse traces. - Add OTel span to generateText following 'chat {model}' naming convention - Record gen_ai semantic attributes (request model, response model, tokens) - Spans appear as children of halguard.PreCheck/PostCheck spans

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/app/app.go

docs/js/chat.js

examples/devops-copilot/.genie.toml

- Fix metricsQuery doc comment: says v1 (not v2) endpoint - Validate req.Duration > 0 in GetAgentStats to prevent zero/negative time window queries - Fix GetAgentStats package-level docstring: says nil,nil (not empty slice) when no client is configured - Add pii.Redact to messenger trace input for consistent PII handling (AG-UI path already redacted) - Merge baggage instead of replacing: withLangfuseTraceBaggage now preserves existing upstream baggage values

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/halguard/guard_impl.go

pkg/orchestrator/halguard_text_gen.go

pkg/reactree/tree_v2.go

Two-layer fix for sub-agents that output commands as text instead of executing them via run_shell, or refuse with 'I don't know' when tools are available. Layer 1 - Stronger system prompt (subagent_instruction.go): - Added TOOL USE IS MANDATORY directive - Prohibit 'I don't know' / 'I don't have access' / 'I cannot' refusals - Require executing embedded scripts via run_shell, not echoing as markdown - Explicit instruction that tools exist to gather real data Layer 2 - Runtime guard (create_agent.go): - Zero-tool-use guard detects when toolCallCount==0 with tools available - Annotates output with warning and sets status='tool_use_failure' - Gives parent agent clear signal to re-spawn the sub-agent Tests: - buildSubAgentInstruction: tool-use enforcement, allowlist, core directives - Zero-tool-use guard: table-driven condition tests, output annotation format

Root spans in AGUI and messenger paths used otel.Tracer() which resolves to the global OTel noop provider. The Langfuse exporter registers its own TracerProvider on trace.Tracer but never calls otel.SetTracerProvider(), so the root spans were never exported. All downstream spans (orchestrator, tree, vectorstore) became orphaned root-level Langfuse traces instead of children. Switch both paths to trace.Tracer and inline SetAttributes into Start() via oteltrace.WithAttributes for brevity.

Address PR review comments: - Set langfuse.trace.name to a.displayName() (agent name) instead of 'agui chat' / platform label so GetAgentStats can filter by agent. Channel/platform info is already in tags. - Log warnings in withLangfuseTraceBaggage when baggage creation fails instead of silently returning the original context.

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/halguard/guard_impl.go

docs/js/chat.js

pkg/reactree/create_agent.go

pkg/app/app.go

…ontext - halguard.New: panic on nil textGenerator to fail fast instead of allowing a runtime nil-pointer dereference during PostCheck. - create_agent: use toolNameList (from selectedTools) instead of scopedRegistry.ToolNames() in the zero-tool-use guard message so the reported tools match what the sub-agent actually had. - app.go: set orchestratorcontext.WithAgent in the messenger path so downstream spans use the correct agent name instead of DefaultAgentName.

When a browser tab is present, bridgeBrowserTab creates a new context derived from the chromedp tab context which has no OTel span context. This caused child spans (adaptive_loop, invoke_agent, vectorstore.add) to get new traceIDs and appear as separate top-level traces in Langfuse instead of being nested under the parent trace. Fix: inject the parent's OTel span and baggage into the bridged context so that all downstream spans maintain the correct trace hierarchy and inherit Langfuse attributes (userId, sessionId, tags, metadata). Add tests verifying span context, baggage, per-request values, and deadline propagation through bridgeBrowserTab.

Copilot

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-06T05:34:02Z

pkg/orchestrator/halguard_text_gen.go

+// singleModelProvider adapts a single model.Model into the
+// modelprovider.ModelProvider interface so that expert.ExpertBio.ToExpert can
+// use a specific, pre-selected model rather than task-type-based routing.
+// This is necessary because halguard selects diverse models for cross-model
+// verification and needs to call each one individually.
+type singleModelProvider struct {
+	key   string
+	model model.Model
+}
+
+// GetModel always returns the single wrapped model regardless of task type.
+func (p *singleModelProvider) GetModel(_ context.Context, _ modelprovider.TaskType) (modelprovider.ModelMap, error) {
+	return modelprovider.ModelMap{p.key: p.model}, nil
+}


This file defines a local singleModelProvider, but the PR also adds expert/modelprovider.NewSingleModelProvider(...) with the same purpose. Keeping both duplicates logic and invites divergence (e.g., interface changes). Prefer using the shared helper here (and drop the local type), or remove the new helper file if it’s not intended for reuse.

examples/trace-cost-calculator/main.go

Copilot · 2026-03-06T05:34:02Z

pkg/reactree/subagent_instruction.go

+	instruction += "\nTOOL USE IS MANDATORY: You have tools — you MUST use them. " +
+		"NEVER say 'I don't know', 'I don't have access', or 'I cannot' when you have tools that can gather the information. " +
+		"If your goal describes a task and you have tools like run_shell, use them to execute the task. " +
+		"NEVER output commands, scripts, or code as text in your response — always execute them via the appropriate tool. " +
+		"If your goal contains a shell script or command to run, call run_shell to EXECUTE it — do NOT echo or display the script as markdown. " +
+		"You have tools for a reason: USE THEM to gather real data, then summarize the results. " +
+		"\nHITL REJECTION: If a tool call is rejected by the user with feedback suggesting a different tool or approach, " +


buildSubAgentInstruction unconditionally instructs the sub-agent to "call run_shell" when the goal contains a shell command/script. When toolNames does not include run_shell (or when toolNames is nil/empty), this directive conflicts with the allowlist and can cause the sub-agent to attempt an unavailable tool. Suggest gating the run_shell-specific text on toolNames containing run_shell, or rewriting it to reference "an available shell tool" / "if run_shell is available".

Suggested change

instruction += "\nTOOL USE IS MANDATORY: You have tools — you MUST use them. " +

"NEVER say 'I don't know', 'I don't have access', or 'I cannot' when you have tools that can gather the information. " +

"If your goal describes a task and you have tools like run_shell, use them to execute the task. " +

"NEVER output commands, scripts, or code as text in your response — always execute them via the appropriate tool. " +

"If your goal contains a shell script or command to run, call run_shell to EXECUTE it — do NOT echo or display the script as markdown. " +

"You have tools for a reason: USE THEM to gather real data, then summarize the results. " +

"\nHITL REJECTION: If a tool call is rejected by the user with feedback suggesting a different tool or approach, " +

// Detect whether run_shell is actually available so we don't instruct

// the sub-agent to call a tool that is not in its allowlist.

hasRunShell := false

for _, name := range toolNames {

if name == "run_shell" {

hasRunShell = true

break

}

}

if hasRunShell {

instruction += "\nTOOL USE IS MANDATORY: You have tools — you MUST use them. " +

"NEVER say 'I don't know', 'I don't have access', or 'I cannot' when you have tools that can gather the information. " +

"If your goal describes a task and you have tools like run_shell, use them to execute the task. " +

"NEVER output commands, scripts, or code as text in your response — always execute them via the appropriate tool. " +

"If your goal contains a shell script or command to run, call run_shell to EXECUTE it — do NOT echo or display the script as markdown. " +

"You have tools for a reason: USE THEM to gather real data, then summarize the results. "

} else {

instruction += "\nTOOL USE IS MANDATORY: You have tools — you MUST use them. " +

"NEVER say 'I don't know', 'I don't have access', or 'I cannot' when you have tools that can gather the information. " +

"If your goal describes a task and you have tools suitable for executing commands or scripts, use them to execute the task. " +

"NEVER output commands, scripts, or code as text in your response — always execute them via the appropriate available tool. " +

"If your goal contains a shell script or command to run, call an appropriate available shell-execution tool to EXECUTE it — do NOT echo or display the script as markdown. " +

"You have tools for a reason: USE THEM to gather real data, then summarize the results. "

}

instruction += "\nHITL REJECTION: If a tool call is rejected by the user with feedback suggesting a different tool or approach, " +

Copilot · 2026-03-06T05:34:03Z

pkg/reactree/create_agent_pvt_test.go

+	shouldFire := func(gi guardInput) bool {
+		return gi.toolCallCount == 0 && gi.result != "" && gi.status != "error" && !gi.timedOut && gi.numSelectedTools > 0
+	}
+
+	DescribeTable("fires or skips based on conditions",
+		func(gi guardInput, expectFire bool) {
+			Expect(shouldFire(gi)).To(Equal(expectFire))
+		},


The added "zero-tool-use guard" specs mostly re-implement the guard predicate and string formatting rather than exercising the actual createAgentTool.executeInner behavior. This won’t catch regressions like the guard not firing due to event parsing changes, toolNameList mismatches, or status being overwritten later. Consider adding an integration-style unit test that runs executeInner with a stubbed runner/event stream to assert Status becomes tool_use_failure and Output is annotated only when toolCallCount == 0 and tools are available.

…_failure Sub-agents (especially Gemini models) were echoing shell scripts as text instead of calling run_shell. Two fixes: 1. Restructure subagent instruction: move MANDATORY tool-use rule to the very first sentence (LLMs attend most to prompt start). Add dedicated SCRIPT EXECUTION section targeting the exact failure mode of embedded bash scripts being echoed as markdown. 2. Auto-retry on tool_use_failure: when zero-tool-use guard fires, retry once with a reinforced prompt ('[RETRY] You MUST call run_shell...') before returning failure to the parent agent. Uses -retry agent name suffix for tracing. Tests: fix 2 existing tests for new wording, add 5 new specs for SCRIPT EXECUTION directive and auto-retry logic (149 specs pass).

sks requested a review from a team as a code owner March 6, 2026 00:52

Copilot AI review requested due to automatic review settings March 6, 2026 00:52

Copilot started reviewing on behalf of sks March 6, 2026 00:53 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

sks force-pushed the feature/langfuse-metrics-and-trace-tags branch from 0853770 to 0e69215 Compare March 6, 2026 01:03

Copilot AI review requested due to automatic review settings March 6, 2026 01:08

sks force-pushed the feature/langfuse-metrics-and-trace-tags branch from 0e69215 to b0d77ea Compare March 6, 2026 01:08

Copilot started reviewing on behalf of sks March 6, 2026 01:09 View session

PII redact

185eec0

Copilot AI reviewed Mar 6, 2026

View reviewed changes

examples/devops-copilot/.genie.toml Show resolved Hide resolved

Copilot AI review requested due to automatic review settings March 6, 2026 01:40

Copilot started reviewing on behalf of sks March 6, 2026 01:41 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

sks added 2 commits March 5, 2026 17:50

Copilot AI review requested due to automatic review settings March 6, 2026 01:58

Copilot started reviewing on behalf of sks March 6, 2026 01:59 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

pkg/app/app.go Outdated Show resolved Hide resolved

pkg/app/app.go Outdated Show resolved Hide resolved

pkg/app/app.go Outdated Show resolved Hide resolved

docs/js/chat.js Show resolved Hide resolved

examples/devops-copilot/.genie.toml Show resolved Hide resolved

sks and others added 2 commits March 5, 2026 18:06

fix the halguard trace

3ff1141

Copilot AI review requested due to automatic review settings March 6, 2026 02:36

Copilot started reviewing on behalf of sks March 6, 2026 02:37 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

pkg/halguard/guard_impl.go Show resolved Hide resolved

pkg/halguard/guard_impl.go Show resolved Hide resolved

pkg/orchestrator/halguard_text_gen.go Show resolved Hide resolved

pkg/reactree/tree_v2.go Show resolved Hide resolved

sks and others added 2 commits March 5, 2026 18:45

use the correct span

945895a

Copilot AI review requested due to automatic review settings March 6, 2026 02:48

Copilot started reviewing on behalf of sks March 6, 2026 02:48 View session

sks added 2 commits March 5, 2026 18:50

Copilot AI reviewed Mar 6, 2026

View reviewed changes

pkg/halguard/guard_impl.go Show resolved Hide resolved

docs/js/chat.js Show resolved Hide resolved

docs/js/chat.js Show resolved Hide resolved

pkg/reactree/create_agent.go Outdated Show resolved Hide resolved

pkg/app/app.go Show resolved Hide resolved

pkg/app/app.go Show resolved Hide resolved

sks added 2 commits March 5, 2026 19:39

Copilot AI review requested due to automatic review settings March 6, 2026 05:26

Copilot started reviewing on behalf of sks March 6, 2026 05:26 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

sks merged commit 8e02fa6 into main Mar 6, 2026
4 checks passed

sks deleted the feature/langfuse-metrics-and-trace-tags branch March 6, 2026 05:39

Conversation

sks commented Mar 6, 2026

Summary

Langfuse Metrics API

Trace Tags

Vector Store Spans

Tests

Usage Example

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

codecov bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 6, 2026 •

edited

Loading