Skip to content

EP: content citation and attribution telemetry #185

@jalexspringer

Description

@jalexspringer

Summary

org.openattribution.telemetry is a vendor extension that tracks which content influenced AI agent outcomes in commerce conversations. It embeds an attribution object in UCP checkout sessions, recording which content was retrieved and cited during the conversation that led to a purchase.

This starts as a vendor extension under the org.openattribution.* namespace, per UCP's governance model for new use cases.

Motivation

Content creators — reviewers, guide authors, comparison sites — invest significantly in producing the expert content that AI shopping agents rely on to generate recommendations. This content is the raw material of AI-assisted commerce: without reviews, there are no credible recommendations; without guides, there is no informed comparison.

Yet creators currently have zero visibility into whether their work influences purchases. This is a market failure:

  1. No measurement exists. There is no standardised way to track which content was retrieved and cited in the conversation that led to a purchase.
  2. Attribution is impossible. Commerce outcomes (purchases, cart additions) cannot be linked to specific content pieces, so creators cannot demonstrate ROI to partners or advertisers.
  3. Privacy is unaddressed. Without a purpose-built signal, ad-hoc tracking solutions will emerge that compromise user privacy rather than preserving it.
  4. Multi-session journeys are invisible. Research, comparison, and purchase often happen across separate conversations over days. Without temporal signals on content retrieval, these journeys are invisible to attribution.

Without measurement, the rational response from creators is to restrict AI access to their content — threatening the content supply chain that the entire AI commerce ecosystem depends on. Attribution telemetry is the feedback loop that makes the ecosystem sustainable.

Value to UCP

  • First-mover advantage. UCP becomes the first commerce protocol with native content attribution, differentiating it from competitors.
  • Merchant insight. Merchants gain content partnership ROI data they cannot get from any other source, enabling data-driven content investment.
  • Zero-risk adoption. The extension is additive and non-blocking — it cannot interfere with existing checkout flows. Merchants and agents can adopt incrementally.

Goals

  • Define an optional attribution extension that embeds content citation data in UCP checkout sessions
  • Support privacy-preserving data sharing with lightweight conversation context (turn_count, topics)
  • Enable multi-conversation attribution by accumulating content from prior conversations into the final checkout's content_retrieved array — timestamps on entries provide the temporal signal
  • Support negative attribution via contradiction citation type (content retrieved but disagreed with)
  • Provide citation quality signals (citation_type, excerpt_tokens, position, content_hash) for weighted attribution

Non-Goals

  • Defining specific attribution algorithms (left to implementers)
  • Mandating payment structures or compensation models
  • Requiring specific privacy policies (left to agreements between parties)

Detailed Design

Capability Declaration

{
  "capabilities": [
    {
      "name": "org.openattribution.telemetry",
      "version": "2026-02-17",
      "spec": "https://github.com/openattribution-org/telemetry/blob/main/ucp/EXTENSION.md",
      "schema": "https://raw.githubusercontent.com/openattribution-org/telemetry/main/ucp/extension-schema.json",
      "extends": "dev.ucp.shopping.checkout"
    }
  ]
}

Checkout Extension

Adds an attribution object to UCP checkout sessions:

{
  "id": "chk_123",
  "line_items": ["..."],
  "attribution": {
    "content_scope": "electronics-reviews",
    "content_retrieved": [
      {
        "content_url": "https://www.wirecutter.com/reviews/best-wireless-headphones",
        "timestamp": "2026-01-15T10:30:01Z"
      }
    ],
    "content_cited": [
      {
        "content_url": "https://www.wirecutter.com/reviews/best-wireless-headphones",
        "timestamp": "2026-01-15T10:30:05Z",
        "citation_type": "paraphrase",
        "excerpt_tokens": 85,
        "position": "primary"
      }
    ],
    "conversation_summary": {
      "turn_count": 3,
      "topics": ["headphones", "noise-cancelling"]
    }
  }
}

For complete field definitions, citation types, and implementation notes, see EXTENSION.md.

Standalone Capability

For use cases beyond checkout — browse sessions, multi-agent attribution chains, conversation analytics — the extension also defines a standalone REST/MCP capability with session lifecycle endpoints. See org.openattribution.telemetry.yaml for the full specification.

The two approaches share the same underlying schema and are interoperable: a session tracked via standalone endpoints can link to a UCP checkout via checkout_id in the session outcome.

Negotiation

  • When both agent and merchant declare org.openattribution.telemetry: full bidirectional attribution flow
  • When only one party supports it: graceful degradation. Checkout proceeds normally; attribution is additive, not blocking.

Risks and Mitigations

Privacy risk: Conversation data could leak through attribution signals.

  • Mitigation: conversation_summary is limited to turn_count and topics — no raw text, no subjective classification. content_scope must be opaque (not PII). external_id must be hashed, not raw PII.

Adoption risk: Vendor extension may not gain traction.

  • Mitigation: Open-source reference implementation (Apache 2.0), Python SDK, and reference server lower the barrier.

Schema evolution risk: Breaking changes could fragment implementations.

  • Mitigation: Schema version field (0.4) and date-based capability versions (2026-02-17). Deprecation policy requires 6 months notice. All nested objects allow additional properties for forward compatibility.

Fraud Mitigation

Signal integrity risk: Agents could report false citations or inflated quality signals.

Mitigations:

  • content_hash (SHA-256) provides an integrity audit trail — agents hash whatever content they processed, useful for dispute resolution when attribution involves commission payments
  • Citation quality signals (citation_type, excerpt_tokens, position) are agent-reported metadata, not trusted assertions — consumers apply their own confidence weighting
  • Cross-reference content_retrieved timestamps against content_cited timestamps: an agent cannot legitimately cite content before retrieving it
  • Rate limiting and anomaly detection at the consumer level (unusual citation volume, repeated patterns, suspiciously high excerpt_tokens)
  • The extension provides attribution signals, not attribution decisions — the consuming platform is responsible for fraud detection in its attribution model

Test Plan

Unit tests:

  • Schema validation for all models (session, event, outcome, conversation turn)
  • content_retrieved required with minItems: 1 enforcement
  • Citation type and position enum validation

Integration tests:

  • Checkout extension: attribution object in checkout request/response
  • Multi-conversation attribution via accumulated content_retrieved entries with timestamps
  • Capability negotiation: graceful degradation when only one party supports extension

End-to-end tests:

  • Shopping conversation with content retrieval, citation, and purchase
  • Privacy level enforcement across the full flow

Graduation Criteria

Working Draft to Candidate:

  • Schema validation passing against all published examples
  • Reference implementation with ≥80% test coverage
  • At least two independent implementations (one agent, one merchant)
  • 3-month feedback period with no unresolved blocking issues
  • TC majority vote to advance

Candidate to Stable:

  • At least three production deployments processing real attribution data
  • Interoperability testing across at least 2 agents × 2 merchants
  • Published migration guide for version upgrades
  • TC majority vote to advance

Implementation History

  • 2026-02-11: Initial vendor extension specification drafted
  • 2026-02-15: Reference implementation published (Python SDK, FastAPI server, JSON Schemas)
  • 2026-02-17: Checkout extension updated — removed prior_session_ids from checkout payload (privacy concern; agents accumulate content into content_retrieved instead), stripped conversation_summary to turn_count + topics only, added content_retrieved as required with minItems: 1.

References

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions