Skip to content

Build per-source contact databases for contextual memory #103

@ElioNeto

Description

@ElioNeto

Issue imported from tinyhumansai/openhuman#1366
Created at: unknown


Summary

Build contact databases for each connected source, such as Gmail contacts through the Composer API, so the agent can understand who people are when using memories and source data.

Problem

The agent can ingest source data, but it needs richer contact context to understand people mentioned in messages, emails, meetings, and tasks. Without a per-source contact model, the agent may see names, emails, phone numbers, Slack handles, or WhatsApp numbers as disconnected strings instead of people with roles, relationships, and source provenance.

This hurts tasks like summarizing conversations, drafting replies, finding relevant history, preparing meeting notes, and reasoning about who a user is referring to.

Constraints:

  • Contacts should be source-aware and preserve provenance.
  • Contact records must handle duplicate people across sources without prematurely merging uncertain identities.
  • Sensitive contact data should be stored and exposed only as needed for agent tasks.
  • Gmail contacts via the Composer API should be an early target, with room for Slack, WhatsApp, calendar, and email participants.

Solution (optional)

Add a contacts domain or extend the memory/source ingestion pipeline with per-source contact extraction, storage, lookup, and identity-linking primitives. Start with Gmail/Composer API contacts and define a normalized internal contact shape that can represent source-specific identifiers and metadata.

Acceptance criteria

  • Per-source storage — Contacts can be stored by source with stable source identifiers, names, handles/emails/phone numbers, and provenance metadata.
  • Gmail contact ingestion — Gmail contacts from the Composer API can be imported or synchronized into the contact database.
  • Contextual lookup — Agent memory/task retrieval can look up contact context for people referenced in source data.
  • Duplicate-safe modeling — Potential duplicate contacts across sources are represented without unsafe automatic merges.
  • Privacy controls — Contact data exposure to agent prompts/tools is minimized to what the task needs and avoids leaking unrelated contacts.
  • Regression safety — Unit or integration coverage verifies contact ingestion, lookup, provenance, and duplicate/ambiguous identity handling.
  • Diff coverage ≥ 80% — the implementing PR meets the changed-lines coverage gate (Vitest + cargo-llvm-cov, enforced by .github/workflows/coverage.yml).

Related

  • Memory tree and source ingestion workstream.
  • Gmail / Composer API contact context.

Metadata

Metadata

Assignees

No one assigned

    Labels

    composioComposio-backed provider integrations, sync, and provider adapters.featureNet-new user-facing capability or product behavior.memoryMemory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions