Knowledge graph construction and reasoning library with proposition-based architecture and Prolog inference.
Domain-Integrated Context Engineering
DICE (Domain-Integrated Context Engineering) extends context engineering by emphasizing the importance of a domain model to structure context, and considering LLM outputs as well as inputs.
Despite their seductive ability to work with natural language, LLMs become safer to use the more we add structure to inputs and outputs. DICE helps LLMs converse in the established language of our business and applications.
Domain objects are not mere structs. They not only provide typing, but define focused behaviour. In an agentic system, behaviour can be exposed to manually authored code and selectively exposed to LLMs as tools.
| Benefit | Description |
|---|---|
| Structured Context | Use code to fill the context window—less delicate, more scientific |
| System Integration | Precisely integrate with existing systems using domain objects |
| Reuse | Domain models capture business understanding across agents |
| Persistence | Structured query via SQL, Cypher, Prolog—not just vector search |
| Testability | Structure and encapsulation facilitate testing |
| Observability | Debuggers and tracing tools understand typed objects |
DICE uses a proposition-based architecture inspired by the General User Models (GUM) research from Stanford/Microsoft. Like GUM, it constructs confidence-weighted propositions that capture knowledge and preferences through a pipeline of Propose, Retrieve, and Revise operations.
Natural language propositions are the system of record. They accumulate evidence and project to multiple typed views for different use cases.
flowchart TB
subgraph Input["📄 Input"]
TEXT["Text / Chunks"]
end
subgraph Pipeline["🔄 Proposition Pipeline"]
EXTRACT["LLM Extraction"]
end
subgraph SOR["📚 System of Record"]
PROPS[("Propositions<br/>confidence + decay")]
end
subgraph Projections["🎯 Materialized Views"]
VEC["🔍 Vector<br/>Semantic Retrieval"]
NEO["🕸️ Neo4j<br/>Graph Traversal"]
PRO["🧠 Prolog<br/>Inference & Rules"]
MEM["💭 Memory<br/>Agent Context"]
ORA["💬 Oracle<br/>Natural Language QA"]
end
TEXT --> EXTRACT
EXTRACT --> PROPS
PROPS --> VEC
PROPS --> NEO
PROPS --> PRO
PROPS --> MEM
PROPS --> ORA
style Input fill:#d4eeff,stroke:#63c0f5,color:#1e1e1e
style Pipeline fill:#fff3cd,stroke:#e9b306,color:#1e1e1e
style SOR fill:#e8dcf4,stroke:#9f77cd,color:#1e1e1e
style Projections fill:#d4f5d4,stroke:#3fd73c,color:#1e1e1e
Impromptu is a classical music exploration chatbot that uses DICE to build a knowledge graph from conversations. It demonstrates production usage of:
PropositionPipelinefor extractionIncrementalAnalyzerfor streaming conversation analysisEscalatingEntityResolverwithAgenticCandidateSearcherfor LLM-driven entity resolution- Spring Boot integration with async processing
@Bean
PropositionPipeline propositionPipeline(
PropositionExtractor propositionExtractor,
PropositionReviser propositionReviser,
PropositionRepository propositionRepository) {
return PropositionPipeline
.withExtractor(propositionExtractor)
.withRevision(propositionReviser, propositionRepository);
}
@Bean
LlmPropositionExtractor llmPropositionExtractor(AiBuilder aiBuilder, ...) {
return LlmPropositionExtractor
.withLlm(llmOptions)
.withAi(ai)
.withPropositionRepository(propositionRepository)
.withSchemaAdherence(SchemaAdherence.DEFAULT)
.withTemplate("dice/extract_impromptu_user_propositions");
}@Async
@Transactional
@EventListener
public void onConversationExchange(ConversationAnalysisRequestEvent event) {
// Build context with user-specific entity resolver
var context = SourceAnalysisContext
.withContextId(event.user.currentContext())
.withEntityResolver(entityResolverForUser(event.user))
.withSchema(dataDictionary)
.withRelations(relations)
.withKnownEntities(KnownEntity.asCurrentUser(event.user));
// Wrap conversation and analyze incrementally
var source = new ConversationSource(event.conversation);
var result = analyzer.analyze(source, context);
// Persist propositions and resolved entities
result.persist(propositionRepository, entityRepository);
}- Extraction: LLM extracts typed propositions from text with confidence and decay scores
- Entity Resolution: Mentions resolve to canonical entity IDs
- Evidence Accumulation: Multiple observations reinforce or contradict propositions
- Revision: Merge identical, reinforce similar, contradict conflicting propositions
- Promotion: High-confidence propositions project to typed backends
flowchart TB
subgraph Extraction["1️⃣ Extraction"]
Text["📄 Source Text"] --> LLM["🤖 LLM Extractor"]
LLM --> Props["Propositions<br/>+ confidence<br/>+ decay"]
end
subgraph Resolution["2️⃣ Entity Resolution"]
Props --> ER["Entity Resolver"]
ER --> Resolved["Resolved Mentions<br/>→ canonical IDs"]
end
subgraph Revision["3️⃣ Revision"]
Resolved --> Similar["Find Similar<br/>(vector search)"]
Similar --> Classify["LLM Classify"]
Classify --> Identical["🔄 IDENTICAL<br/>merge, boost confidence"]
Classify --> SimilarR["🔗 SIMILAR<br/>reinforce existing"]
Classify --> Contra["⚡ CONTRADICTORY<br/>reduce old confidence"]
Classify --> General["📊 GENERALIZES<br/>abstracts existing"]
Classify --> Unrel["✨ UNRELATED<br/>add as new"]
end
subgraph Persist["4️⃣ Persistence"]
Identical --> Repo[("PropositionRepository")]
SimilarR --> Repo
Contra --> Repo
General --> Repo
Unrel --> Repo
end
style Extraction fill:#e8dcf4,stroke:#9f77cd,color:#1e1e1e
style Resolution fill:#d4eeff,stroke:#63c0f5,color:#1e1e1e
style Revision fill:#fff3cd,stroke:#e9b306,color:#1e1e1e
style Persist fill:#d4f5d4,stroke:#3fd73c,color:#1e1e1e
Mention filtering provides quality control for entity mentions extracted by the LLM. It prevents low-quality mentions (vague references, overly long spans, duplicates) from polluting your knowledge graph.
DICE provides compile-time checked validation rules that can be composed:
| Rule | Description | Example |
|---|---|---|
NotBlank |
Rejects empty/whitespace mentions | Filters " " |
NoVagueReferences() |
Rejects demonstratives | Filters "this company", "that person" |
LengthConstraint() |
Enforces length limits | LengthConstraint(maxLength = 150) |
MinWordCount() |
Requires minimum words | MinWordCount(2) for person names |
PatternConstraint() |
Regex validation | Custom patterns |
AllOf() |
All rules must pass | Combine rules with AND |
AnyOf() |
At least one must pass | Combine rules with OR |
Define validation rules directly in your schema using ValidatedPropertyDefinition:
import com.embabel.agent.core.*
import com.embabel.dice.common.validation.*
// Define entity types with type-safe validation rules
val companyType = DynamicType(
name = "Company",
description = "A business organization",
ownProperties = listOf(
ValidatedPropertyDefinition(
name = "name",
validationRules = listOf(
NotBlank,
NoVagueReferences(),
LengthConstraint(maxLength = 150)
)
)
),
parents = emptyList(),
creationPermitted = true
)
val personType = DynamicType(
name = "Person",
description = "A person",
ownProperties = listOf(
ValidatedPropertyDefinition(
name = "name",
validationRules = listOf(
NotBlank,
MinWordCount(2), // Require first + last name
LengthConstraint(maxLength = 80)
)
)
),
parents = emptyList(),
creationPermitted = true
)
// Create DataDictionary from your types
val schema = DataDictionary.fromDomainTypes("my-schema", listOf(companyType, personType))Use SchemaValidatedMentionFilter to apply schema-driven validation:
import com.embabel.dice.common.filter.*
import com.embabel.dice.pipeline.PropositionPipeline
// Create schema-driven filter
val mentionFilter = SchemaValidatedMentionFilter(schema)
// Configure pipeline with mention filter
val pipeline = PropositionPipeline
.withExtractor(llmExtractor)
.withMentionFilter(mentionFilter)
.withRevision(reviser, repository)
// Process chunks - mentions are automatically filtered
val result = pipeline.process(chunks, context)For filters that need proposition context (not just the mention span), use context-aware filters:
import com.embabel.dice.common.filter.*
// PropositionDuplicateFilter detects LLM field mapping errors
// (when the LLM copies the entire proposition as the mention span)
val mentionFilter = CompositeMentionFilter(listOf(
SchemaValidatedMentionFilter(schema), // Schema-driven validation
PropositionDuplicateFilter() // Catches LLM field mapping errors
))
// Configure pipeline
val pipeline = PropositionPipeline
.withExtractor(llmExtractor)
.withMentionFilter(mentionFilter)Wrap any filter with ObservableMentionFilter for Micrometer metrics:
import io.micrometer.core.instrument.MeterRegistry
val observableFilter = ObservableMentionFilter(
delegate = mentionFilter,
meterRegistry = meterRegistry,
filterName = "company-validation" // Optional: custom metric tag
)
// Metrics recorded:
// - dice.mention.filter.total (counter)
// - dice.mention.filter.accepted (counter)
// - dice.mention.filter.rejected (counter, with rejection_reason tag)Given a company schema with NoVagueReferences() and LengthConstraint(maxLength = 150):
✅ Accepted:
"OpenAI"- Valid company name"Microsoft Corporation"- Valid, descriptive"Goldman Sachs"- Valid multi-word name
❌ Rejected:
"this company"- Vague reference"that investment"- Vague reference"A".repeat(200)- Exceeds length limit" "- Blank (if usingNotBlank)
For use cases that need entity extraction without propositions, DICE provides a lightweight
EntityPipeline and EntityIncrementalAnalyzer. This is useful when you want to:
- Extract and resolve entities from conversations without creating propositions
- Build entity-only knowledge graphs
- Track entities mentioned in streaming data
flowchart TB
subgraph Input["📄 Input"]
TEXT["Text / Chunks"]
end
subgraph Pipeline["🔄 Entity Pipeline"]
EXTRACT["LlmEntityExtractor"]
RESOLVE["EntityResolver"]
end
subgraph Output["🎯 Output"]
ENT[("Resolved Entities<br/>new, existing, reference-only")]
end
TEXT --> EXTRACT
EXTRACT --> RESOLVE
RESOLVE --> ENT
style Input fill:#d4eeff,stroke:#63c0f5,color:#1e1e1e
style Pipeline fill:#fff3cd,stroke:#e9b306,color:#1e1e1e
style Output fill:#d4f5d4,stroke:#3fd73c,color:#1e1e1e
The EntityExtractor interface defines entity extraction from chunks:
interface EntityExtractor {
fun suggestEntities(chunk: Chunk, context: SourceAnalysisContext): SuggestedEntities
}Use LlmEntityExtractor for LLM-based extraction:
val extractor = LlmEntityExtractor
.withLlm(llmOptions)
.withAi(ai)
// Optionally use a custom prompt template
val customExtractor = extractor.withTemplate("my_entity_prompt")
val entities = extractor.suggestEntities(chunk, context)The EntityPipeline orchestrates extraction and resolution:
// Create pipeline
val pipeline = EntityPipeline.withExtractor(
LlmEntityExtractor.withLlm(llmOptions).withAi(ai)
)
// Process a single chunk
val result: ChunkEntityResult = pipeline.processChunk(chunk, context)
// Process multiple chunks (with cross-chunk entity resolution)
val results: EntityResults = pipeline.process(chunks, context)
// Persist extracted entities
results.persist(entityRepository)The pipeline does NOT persist anything automatically—the caller controls persistence via the
persist() method or by accessing entitiesToPersist().
For streaming/incremental entity extraction (e.g., from conversations), use EntityIncrementalAnalyzer:
val analyzer = EntityIncrementalAnalyzer(
pipeline = EntityPipeline.withExtractor(
LlmEntityExtractor.withLlm(llmOptions).withAi(ai)
),
historyStore = myHistoryStore,
formatter = MessageFormatter.INSTANCE,
config = WindowConfig(
windowSize = 10,
triggerThreshold = 3,
),
)
// Wrap conversation as incremental source
val source = ConversationSource(conversation)
// Analyze—returns null if trigger threshold not met
val result: ChunkEntityResult? = analyzer.analyze(source, context)
// Persist if we got results (also creates chunk-entity relationships)
result?.persist(entityRepository)The persist() method saves entities and creates (Chunk)-[:HAS_ENTITY]->(Entity) relationships,
linking each extracted entity back to its source chunk for provenance tracking.
Key differences from PropositionIncrementalAnalyzer:
| Aspect | EntityIncrementalAnalyzer | PropositionIncrementalAnalyzer |
|---|---|---|
| Output | ChunkEntityResult |
ChunkPropositionResult |
| Pipeline | EntityPipeline |
PropositionPipeline |
| Creates | Entities only | Entities + Propositions |
| Use case | Entity tracking | Full knowledge extraction |
ChunkEntityResult and EntityResults implement EntityExtractionResult, providing access to:
// Individual chunk result
val chunkResult: ChunkEntityResult = pipeline.processChunk(chunk, context)
// Access entities by resolution type
chunkResult.newEntities() // Newly created entities
chunkResult.updatedEntities() // Matched to existing entities
chunkResult.referenceOnlyEntities() // Known entities (not modified)
chunkResult.resolvedEntities() // All resolved (excludes vetoed)
// Get entities that need persistence
chunkResult.entitiesToPersist() // new + updated
// Statistics
val stats = chunkResult.entityExtractionStats
println("${stats.newCount} new, ${stats.updatedCount} updated")
// Multi-chunk results (deduplicated)
val results: EntityResults = pipeline.process(chunks, context)
results.totalSuggested // Total suggested across all chunks
results.totalResolved // Unique resolved entitiesEntity resolution is the process of mapping entity mentions in text to canonical entities in a knowledge graph. When an LLM extracts "Sherlock Holmes" from one document and "Holmes" from another, entity resolution determines whether these refer to the same entity and links them to a single canonical ID.
| Challenge | Without Resolution | With Resolution |
|---|---|---|
| Duplicate entities | "Alice", "Alice Smith", "Ms. Smith" → 3 entities | → 1 canonical entity |
| Cross-document linking | Entities isolated per document | Entities connected across corpus |
| System integration | Cannot link to existing databases | Ties into CRM, HR, product catalogs |
| Graph quality | Fragmented, redundant nodes | Clean, connected knowledge graph |
The EntityResolver interface returns one of four resolution types:
flowchart LR
subgraph Input["Suggested Entity"]
SE["'Sherlock Holmes'<br/>labels: [Person, Detective]"]
end
subgraph Resolver["Entity Resolver"]
R["Match against<br/>existing entities"]
end
subgraph Outcomes["Resolution Outcomes"]
NEW["✨ NewEntity<br/>No match found<br/>Create new entity"]
EXIST["🔗 ExistingEntity<br/>Match found<br/>Merge labels"]
REF["👤 ReferenceOnly<br/>Known entity<br/>Don't update"]
VETO["🚫 VetoedEntity<br/>Type not creatable<br/>No match found"]
end
SE --> R
R --> NEW
R --> EXIST
R --> REF
R --> VETO
style Input fill:#d4eeff,stroke:#63c0f5,color:#1e1e1e
style Resolver fill:#fff3cd,stroke:#e9b306,color:#1e1e1e
style Outcomes fill:#d4f5d4,stroke:#3fd73c,color:#1e1e1e
| Outcome | When | Result |
|---|---|---|
| NewEntity | No matching entity found | Create new entity with generated UUID |
| ExistingEntity | Match found in repository | Merge labels from suggested + existing |
| ReferenceOnlyEntity | Known entity (e.g., current user) | Reference existing, don't modify |
| VetoedEntity | Non-creatable type, no match | Entity rejected, not persisted |
sequenceDiagram
autonumber
participant P as PropositionPipeline
participant MR as MultiEntityResolver
participant KR as KnownEntityResolver
participant PR as PrimaryResolver<br/>(Repository/Agentic)
participant IM as InMemoryResolver
participant DB as Entity Repository
P->>MR: resolve(suggestedEntities)
loop For each suggested entity
MR->>KR: resolve(entity)
alt Entity in knownEntities list
KR-->>MR: ReferenceOnlyEntity
else Not a known entity
KR->>PR: delegate(entity)
alt Using NamedEntityDataRepositoryEntityResolver
PR->>DB: findById(id)
alt ID match
DB-->>PR: existing entity
PR-->>KR: ExistingEntity
else No ID match
PR->>DB: textSearch(name)
PR->>DB: vectorSearch(name)
alt Candidates found
PR->>PR: LLM bakeoff (optional)
PR-->>KR: ExistingEntity
else No candidates
PR-->>KR: NewEntity
end
end
else Using AgenticCandidateSearcher
PR->>PR: LLM crafts search queries
PR->>DB: ToolishRag search
PR->>PR: LLM iterates & selects
PR-->>KR: ExistingEntity or NewEntity
end
KR-->>MR: resolution
end
MR->>IM: cache resolution
Note over IM: Cross-chunk<br/>deduplication
end
MR-->>P: Resolutions
DICE provides several EntityResolver implementations that can be composed:
| Implementation | Purpose | Use Case |
|---|---|---|
| EscalatingEntityResolver | Recommended - Escalating searcher chain with early stopping | Production, optimized performance |
| InMemoryEntityResolver | Session-scoped deduplication | Cross-chunk entity recognition |
| ChainedEntityResolver | Chain resolvers with fallback | Combine strategies |
| KnownEntityResolver | Fast-path for pre-defined entities | Current user, system entities |
| AlwaysCreateEntityResolver | Always creates new entities | Testing, baseline comparison |
The recommended setup uses EscalatingEntityResolver with InMemoryEntityResolver for cross-chunk deduplication:
flowchart TB
subgraph Input["Suggested Entity"]
SE["name: 'Brahms'<br/>labels: [Composer]"]
end
subgraph InMem["InMemoryEntityResolver"]
IM["Session Cache<br/>(cross-chunk dedup)"]
end
subgraph Chain["EscalatingEntityResolver - Cheapest First"]
direction TB
S1["🔍 ByIdCandidateSearcher<br/><i>instant</i>"]
S2["🔍 ByExactNameCandidateSearcher<br/><i>instant</i>"]
S3["🔍 NormalizedNameCandidateSearcher<br/><i>fast</i>"]
S4["🔍 PartialNameCandidateSearcher<br/><i>fast</i>"]
S5["🔍 FuzzyNameCandidateSearcher<br/><i>fast</i>"]
S6["🔍 VectorCandidateSearcher<br/><i>moderate</i>"]
S7["🤖 AgenticCandidateSearcher<br/><i>expensive (optional)</i>"]
end
subgraph Output["Resolution"]
EX["ExistingEntity"]
NEW["NewEntity"]
end
SE --> IM
IM -->|"Cache hit"| EX
IM -->|"Cache miss"| S1
S1 -->|"No match"| S2
S2 -->|"No match"| S3
S3 -->|"No match"| S4
S4 -->|"No match"| S5
S5 -->|"No match"| S6
S6 -->|"No match"| S7
S7 -->|"No match"| NEW
S1 -->|"✓ Confident"| EX
S2 -->|"✓ Confident"| EX
S3 -->|"✓ Confident"| EX
S4 -->|"✓ Confident"| EX
S5 -->|"✓ Confident"| EX
S6 -->|"✓ Confident"| EX
S7 -->|"✓ Match"| EX
style Input fill:#e8f4fc,stroke:#4a90d9,color:#1e1e1e
style InMem fill:#fff3cd,stroke:#e9b306,color:#1e1e1e
style Chain fill:#d4eeff,stroke:#63c0f5,color:#1e1e1e
style Output fill:#d4f5d4,stroke:#3fd73c,color:#1e1e1e
Key design principles:
- Cheapest first - ID lookup and exact match before expensive vector/LLM searches
- Early stopping - Returns immediately when a confident match is found
- Exactly-one rule - Searchers only return confident when exactly 1 result matches
- Cross-chunk dedup -
InMemoryEntityResolverprevents duplicates across chunks
Maintains an in-memory cache of resolved entities within a processing session. Uses name matching including exact, normalized, partial, and fuzzy matching:
val resolver = InMemoryEntityResolver(
config = InMemoryEntityResolver.Config(
maxDistanceRatio = 0.2, // Levenshtein distance threshold
minLengthForFuzzy = 4, // Minimum length for fuzzy matching
minPartLength = 4, // Minimum part length for partial matching
)
)The CandidateSearcher interface represents a searcher that finds candidate entities:
interface CandidateSearcher {
fun search(suggested: SuggestedEntity, schema: DataDictionary): SearchResult
}
data class SearchResult(
val confident: NamedEntityData? = null, // Confident match (stop early)
val candidates: List<NamedEntityData> = emptyList(), // All candidates found
)Built-in searchers (ordered cheapest-first):
| Searcher | Purpose |
|---|---|
ByIdCandidateSearcher |
ID lookup (instant) |
ByExactNameCandidateSearcher |
Exact name match |
NormalizedNameCandidateSearcher |
Normalized names (removes "Dr.", "Jr.", etc.) |
PartialNameCandidateSearcher |
Partial matching ("Brahms" → "Johannes Brahms") |
FuzzyNameCandidateSearcher |
Levenshtein distance matching |
VectorCandidateSearcher |
Embedding/vector similarity |
AgenticCandidateSearcher |
LLM-driven search (expensive) |
Use DefaultCandidateSearchers.create(repository) for the standard chain (without agentic).
Create custom searchers by implementing the interface:
class MyCustomSearcher(private val myDataSource: MyDataSource) : CandidateSearcher {
override fun search(suggested: SuggestedEntity, schema: DataDictionary): SearchResult {
val match = myDataSource.findExact(suggested.name)
return if (match != null) SearchResult.confident(match)
else SearchResult.empty()
}
}Performance-optimized resolver that chains CandidateSearchers, stopping early when confident.
Each searcher performs its own search and returns candidates. If a searcher returns a confident match,
resolution stops. Otherwise, candidates accumulate for optional LLM arbitration.
flowchart LR
subgraph Searchers["CandidateSearcher Chain"]
S1["ByIdCandidateSearcher"]
S2["ByExactNameCandidateSearcher"]
S3["NormalizedNameCandidateSearcher"]
S4["PartialNameCandidateSearcher"]
S5["FuzzyNameCandidateSearcher"]
S6["VectorCandidateSearcher"]
end
subgraph Arbiter["Optional CandidateBakeoff"]
LLM["CandidateBakeoff<br/>Selects from candidates"]
end
S1 -->|"No match"| S2
S2 -->|"No match"| S3
S3 -->|"No match"| S4
S4 -->|"No match"| S5
S5 -->|"No match"| S6
S6 -->|"No match"| LLM
S1 -->|"✓ Confident"| Done["Stop & Return"]
S2 -->|"✓ Confident"| Done
S3 -->|"✓ Confident"| Done
S4 -->|"✓ Confident"| Done
S5 -->|"✓ Confident"| Done
S6 -->|"✓ Confident"| Done
LLM -->|"Match/None"| Done
style Searchers fill:#d4eeff,stroke:#63c0f5,color:#1e1e1e
style Arbiter fill:#fff3cd,stroke:#e9b306,color:#1e1e1e
style Done fill:#d4f5d4,stroke:#3fd73c,color:#1e1e1e
| Searcher | Strategy | Returns Confident When |
|---|---|---|
| ByIdCandidateSearcher | ID lookup | Exactly 1 ID match |
| ByExactNameCandidateSearcher | Exact name match | Exactly 1 exact match |
| NormalizedNameCandidateSearcher | Normalized names | Exactly 1 normalized match |
| PartialNameCandidateSearcher | Partial names | Exactly 1 partial match |
| FuzzyNameCandidateSearcher | Levenshtein distance | Exactly 1 fuzzy match |
| VectorCandidateSearcher | Embedding similarity | Score ≥ 0.95 (exactly 1) |
| AgenticCandidateSearcher | LLM-driven search | LLM selects match |
// Simple: use factory method with defaults
val resolver = EscalatingEntityResolver.create(
repository = entityRepository,
candidateBakeoff = LlmCandidateBakeoff(ai, llmOptions, PromptMode.COMPACT),
)
// Custom: compose your own searcher chain
val resolver = EscalatingEntityResolver(
searchers = DefaultCandidateSearchers.create(entityRepository),
candidateBakeoff = LlmCandidateBakeoff(ai, llmOptions),
contextCompressor = ContextCompressor.default(),
config = EscalatingEntityResolver.Config(heuristicOnly = false),
)
// Without vector search
val resolver = EscalatingEntityResolver.withoutVector(entityRepository)
// Add bakeoff to existing resolver
val resolverWithBakeoff = resolver.withCandidateBakeoff(LlmCandidateBakeoff(ai, llmOptions))Context Compression reduces LLM token usage by extracting only relevant snippets:
// Full context (500 tokens):
// "Hello! How are you? I've been listening to music. I really love Brahms.
// His symphonies are incredible... [300 more tokens]"
// Compressed context (~50 tokens):
// "...I really love Brahms. His symphonies are incredible, especially..."
// Compressor options:
val compressor = WindowContextCompressor(windowChars = 100, maxSnippets = 3)
val compressor = SentenceContextCompressor(maxSentences = 3)
val compressor = AdaptiveContextCompressor() // Chooses strategy by lengthChain multiple resolvers with fallback logic:
val resolver = MultiEntityResolver(
resolvers = listOf(
knownEntityResolver, // Fast path: check known entities first
repositoryResolver, // Primary: search repository
InMemoryEntityResolver(...), // Fallback: session cache
)
)
// First ExistingEntity wins; otherwise first NewEntityInMemoryEntityResolver uses a chain of match strategies. Each returns Match, NoMatch, or Inconclusive:
flowchart LR
subgraph Chain["Match Strategy Chain"]
L["LabelCompatibility<br/>Type hierarchy"]
E["ExactName<br/>Case-insensitive"]
N["NormalizedName<br/>Remove titles"]
P["PartialName<br/>'Holmes' = 'Sherlock Holmes'"]
F["FuzzyName<br/>Levenshtein distance"]
end
L -->|Inconclusive| E
E -->|Inconclusive| N
N -->|Inconclusive| P
P -->|Inconclusive| F
L -->|Match/NoMatch| Result["Final Result"]
E -->|Match/NoMatch| Result
N -->|Match/NoMatch| Result
P -->|Match/NoMatch| Result
F -->|Match/NoMatch| Result
style Chain fill:#fff3cd,stroke:#e9b306,color:#1e1e1e
| Strategy | Description |
|---|---|
| CandidateBakeoff | Interface for selecting best match from candidates |
| LlmCandidateBakeoff | LLM selects best from multiple candidates (COMPACT: ~100 tokens, FULL: ~400 tokens) |
Entity resolution is integrated into the proposition pipeline via SourceAnalysisContext:
// Configure context with entity resolver
val context = SourceAnalysisContext
.withContextId("session-123")
.withEntityResolver(
MultiEntityResolver(
KnownEntityResolver(
knownEntities = listOf(KnownEntity.asCurrentUser(currentUser)),
delegate = repositoryResolver,
),
InMemoryEntityResolver(defaultMatchStrategies()),
)
)
.withSchema(dataDictionary)
.withKnownEntities(KnownEntity.asCurrentUser(currentUser))
// Process chunks—entities automatically resolved
val result = pipeline.process(chunks, context)
// Access resolution results
result.chunkResults.forEach { chunkResult ->
chunkResult.entityResolutions.resolutions.forEach { resolution ->
when (resolution) {
is NewEntity -> println("Created: ${resolution.recommended.name}")
is ExistingEntity -> println("Matched: ${resolution.existing.name}")
is ReferenceOnlyEntity -> println("Referenced: ${resolution.existing.name}")
is VetoedEntity -> println("Rejected: ${resolution.suggested.name}")
}
}
}All DICE operations require a SourceAnalysisContext that carries configuration for source analysis:
| Property | Description |
|---|---|
schema |
DataDictionary defining valid entity and relationship types |
entityResolver |
Strategy for resolving entity mentions to canonical IDs |
contextId |
Identifies the source/purpose of the analysis (session, batch, etc.) |
knownEntities |
Optional list of pre-defined entities to assist disambiguation |
templateModel |
Optional model data passed to LLM prompt templates |
The ContextId is a Kotlin value class that tags all propositions extracted during
a processing run. ContextId is the primary scoping mechanism for all proposition queries
and should be considered the starting point when retrieving knowledge.
| Scoping Pattern | Description | Example |
|---|---|---|
| User-specific context | Each user has their own context | ContextId("user-alice-123") |
| Shared context | Multiple users share knowledge | ContextId("team-engineering") |
| Session context | Per-conversation knowledge | ContextId("session-abc") |
| Batch context | Processing run grouping | ContextId("batch-2025-01-09") |
Key design points:
- One user can have multiple contexts (personal, team, project-specific)
- One context can be shared between users (team knowledge, organizational facts)
- ContextId is independent of entity identity—an entity like "Alice" can appear in many contexts
- Query by contextId first, then refine with entity, confidence, or temporal filters
// Create context for a processing run
val context = SourceAnalysisContext(
schema = DataDictionary.fromClasses("myschema", Person::class.java, Company::class.java),
entityResolver = AlwaysCreateEntityResolver,
contextId = ContextId("user-session-123"),
)
// Process chunks with context
val result = pipeline.process(chunks, context)Java Interop: Since
ContextIdis a Kotlin value class, Java code should use the strongly-typed builder pattern and access the context ID viagetContextIdValue():SourceAnalysisContext context = SourceAnalysisContext .withContextId("my-context") .withEntityResolver(AlwaysCreateEntityResolver.INSTANCE) .withSchema(DataDictionary.fromClasses("myschema", Person.class)) .withKnownEntities(knownEntities) // optional .withTemplateModel(templateModel); // optional
PropositionQuery provides a composable, Java-friendly builder pattern for querying propositions.
It consolidates filtering, ordering, and limiting into a single specification object.
flowchart LR
subgraph Filters["Filters"]
CTX["contextId<br/>(primary scope)"]
ENT["entityId<br/>(mentions entity)"]
CONF["minEffectiveConfidence<br/>(with decay)"]
TIME["createdAfter/Before<br/>revisedAfter/Before"]
LVL["minLevel/maxLevel<br/>(abstraction)"]
STAT["status<br/>(ACTIVE, etc.)"]
end
subgraph Order["Ordering"]
ORD["orderBy<br/>EFFECTIVE_CONFIDENCE_DESC<br/>CREATED_DESC<br/>REVISED_DESC"]
end
subgraph Limit["Limiting"]
LIM["limit<br/>(max results)"]
end
Filters --> Order --> Limit --> Results[("Propositions")]
style Filters fill:#d4eeff,stroke:#63c0f5,color:#1e1e1e
style Order fill:#fff3cd,stroke:#e9b306,color:#1e1e1e
style Limit fill:#d4f5d4,stroke:#3fd73c,color:#1e1e1e
Kotlin usage (infix factory methods + direct construction):
// Query by context using infix notation (the primary scope)
val contextProps = repository.query(
PropositionQuery forContextId sessionContext
)
// Query with multiple filters using direct construction
val query = PropositionQuery(
contextId = sessionContext,
entityId = "alice-123",
minEffectiveConfidence = 0.5,
orderBy = PropositionQuery.OrderBy.EFFECTIVE_CONFIDENCE_DESC,
limit = 20,
)
val results = repository.query(query)
// Infix with entity
val entityProps = repository.query(
PropositionQuery mentioningEntity "alice-123"
)Java usage (builder pattern via withers):
// Start with factory method, chain withers
PropositionQuery query = PropositionQuery.againstContext("session-123")
.withEntityId("alice-123")
.withMinEffectiveConfidence(0.5)
.orderedByEffectiveConfidence()
.withLimit(20);
List<Proposition> results = repository.query(query);Factory methods (all are infix for Kotlin):
| Method | Description |
|---|---|
PropositionQuery.forContextId(contextId) |
Scoped to a ContextId |
PropositionQuery.againstContext(contextIdValue) |
Scoped to a context (Java-friendly, takes String) |
PropositionQuery.mentioningEntity(entityId) |
Propositions mentioning an entity |
Note: There is no
create()method by design—always start with a scoped query to avoid accidentally fetching all propositions.
Effective confidence applies time-based decay to confidence scores, so older propositions with high decay rates rank lower than recent ones. This is useful for ranking memories by relevance rather than just raw confidence.
The Relations class provides a builder-style API for defining relationship predicates with their
knowledge types. These predicates are used for classification and graph projection:
val relations = Relations.empty()
.withProcedural("likes", "expresses preference for")
.withProcedural("prefers", "indicates preference")
.withSemantic("works at", "is employed by")
.withSemantic("is located in", "geographical location")
.withEpisodic("met", "encountered")
.withEpisodic("visited", "went to")Predicates can also be defined on schema properties using @Semantics annotations:
data class Person(
val id: String,
val name: String,
@field:Semantics([With(key = Proposition.PREDICATE, value = "works at")])
val employer: Company? = null,
) : NamedEntityProjectors transform propositions into specialized representations. Each projector creates a different "view" optimized for specific query patterns:
flowchart TB
subgraph Source["📝 Source of Truth"]
P[("Propositions<br/>with confidence & decay")]
end
subgraph Projectors["🔄 Projectors"]
GP["GraphProjector<br/>━━━━━━━━━━━━━━━<br/>RelationBasedGraphProjector<br/>LlmGraphProjector"]
PP["PrologProjector<br/>━━━━━━━━━━━━━━━<br/>Logical inference"]
MP["MemoryProjection<br/>━━━━━━━━━━━━━━━<br/>Agent context"]
CP["Custom Projector<br/>━━━━━━━━━━━━━━━<br/>Your representation"]
end
subgraph Targets["🎯 Materialized Views"]
Neo[("Neo4j<br/>Graph Traversal")]
Pro[("tuProlog<br/>Inference & Rules")]
Mem[("Agent Memory<br/>Semantic/Episodic/Procedural")]
Cus[("Your Backend")]
end
P --> GP --> Neo
P --> PP --> Pro
P --> MP --> Mem
P --> CP --> Cus
style Source fill:#d4eeff,stroke:#63c0f5,color:#1e1e1e
style Projectors fill:#fff3cd,stroke:#e9b306,color:#1e1e1e
style Targets fill:#d4f5d4,stroke:#3fd73c,color:#1e1e1e
The RelationBasedGraphProjector projects propositions to graph relationships by matching
predicates from the schema and Relations:
flowchart LR
subgraph Input
Prop["Proposition<br/>'Bob works at Acme'"]
end
subgraph Matching["Predicate Matching"]
Schema["1️⃣ Schema<br/>@Semantics predicate"]
Rel["2️⃣ Relations<br/>fallback predicates"]
end
subgraph Output
Graph["(:Person)-[:employer]->(:Company)"]
end
Prop --> Schema
Schema -->|"match: 'works at'"| Graph
Schema -->|"no match"| Rel
Rel -->|"match"| Graph
style Input fill:#d4eeff,stroke:#63c0f5,color:#1e1e1e
style Matching fill:#fff3cd,stroke:#e9b306,color:#1e1e1e
style Output fill:#d4f5d4,stroke:#3fd73c,color:#1e1e1e
Priority order:
- Schema relationships with
@Semantics(predicate="...")→ uses property name as relationship type Relationspredicates → derives relationship type via UPPER_SNAKE_CASE
// Schema-driven: uses property name "employer"
// "Bob works at Acme" → (bob)-[:employer]->(acme)
// Relations fallback: derives from predicate
val relations = Relations.empty().withProcedural("likes")
// "Alice likes jazz" → (alice)-[:LIKES]->(jazz)
val projector = RelationBasedGraphProjector.from(relations)
val results = projector.projectAll(propositions, schema)
// Persist to graph database
val persister = NamedEntityDataRepositoryGraphRelationshipPersister(repository)
val persistResult = persister.persist(results)The Prolog projector converts propositions to Prolog facts for logical inference:
+----------------+ +------------------+ +------------------+
| Propositions | --> | GraphProjector | --> | PrologProjector |
| | | (LLM classifies) | | (converts to |
| "Alice knows | | | | Prolog syntax) |
| Kubernetes" | | EXPERT_IN | | |
+----------------+ +------------------+ +------------------+
|
v
+----------------+
| tuProlog |
| Knowledge |
| Base |
+----------------+
Facts project to Prolog predicates:
expert_in(Person, Technology)- expertise relationshipsfriend_of(Person, Person)- social connectionsworks_at(Person, Company)- employmentreports_to(Person, Manager)- hierarchy
Rules are loaded from prolog/dice-rules.pl on the classpath:
% Transitive reporting chain
reports_to_chain(X, Y) :- reports_to(X, Y).
reports_to_chain(X, Y) :- reports_to(X, Z), reports_to_chain(Z, Y).
% Derived relationships
coworker(X, Y) :- works_at(X, Company), works_at(Y, Company), X \= Y.
% Expertise queries
can_consult(Person, Expert, Topic) :-
friend_of(Person, Expert),
expert_in(Expert, Topic).The Memory class provides an LlmReference that gives agents access to their stored memories (propositions)
within a context. It exposes three search tools via a MatryoshkaTool:
- searchByTopic: Vector similarity search for relevant memories
- searchRecent: Temporal ordering to recall recent memories
- searchByType: Find memories by knowledge type (facts, events, preferences)
All tools support optional type filtering:
semantic: Facts about entities (e.g., "Alice works at Acme")episodic: Events that happened (e.g., "Alice met Bob yesterday")procedural: Preferences and habits (e.g., "Alice prefers morning meetings")working: Current session context
The context is baked in at construction time, ensuring the agent can only access memories within its authorized context. The description dynamically reflects how many memories are available.
// Kotlin
val memory = Memory.forContext(contextId)
.withRepository(propositionRepository)
.withProjector(DefaultMemoryProjector.withKnowledgeTypeClassifier(myClassifier))
.withMinConfidence(0.6)
ai.withReference(memory).respond(...)// Java
LlmReference memory = Memory.forContext("user-session-123")
.withRepository(propositionRepository)
.withProjector(DefaultMemoryProjector.withKnowledgeTypeClassifier(myClassifier))
.withMinConfidence(0.6);
ai.withReference(memory).respond(...);Configuration options:
withProjector(MemoryProjector): Projector for memory types (default:DefaultMemoryProjector.DEFAULTwith heuristic classifier)withMinConfidence(Double): Minimum effective confidence threshold (default 0.5)withDefaultLimit(Int): Maximum results per search (default 10)
Memory projection classifies propositions into cognitive memory types for agent context.
The design separates querying (via PropositionQuery) from classification (via MemoryProjector).
flowchart LR
subgraph Query["1. Query Propositions"]
PQ["PropositionQuery<br/>━━━━━━━━━━━━━━━<br/>contextId, entityId,<br/>confidence, temporal"]
REPO[("Repository")]
PQ --> REPO
end
subgraph Project["2. Project to Memory"]
PROJ["MemoryProjector<br/>━━━━━━━━━━━━━━━<br/>Classify by<br/>KnowledgeType"]
end
subgraph Result["3. MemoryProjection"]
SEM["🧠 semantic<br/>Facts"]
EPI["📅 episodic<br/>Events"]
PRO["⚙️ procedural<br/>Preferences"]
WRK["💭 working<br/>Session"]
end
REPO -->|"List<Proposition>"| PROJ
PROJ --> SEM
PROJ --> EPI
PROJ --> PRO
PROJ --> WRK
style Query fill:#d4eeff,stroke:#63c0f5,color:#1e1e1e
style Project fill:#fff3cd,stroke:#e9b306,color:#1e1e1e
style Result fill:#e8dcf4,stroke:#9f77cd,color:#1e1e1e
Complete example:
// 1. Query propositions (caller controls what to fetch)
val props = repository.query(
(PropositionQuery forContextId sessionContext)
.withEntityId("alice-123")
.withMinEffectiveConfidence(0.5)
.orderedByEffectiveConfidence()
.withLimit(50)
)
// 2. Project into memory types
val projector = DefaultMemoryProjector.DEFAULT
val memory = projector.project(props)
// 3. Use classified propositions
memory.semantic // factual knowledge
memory.episodic // event-based memories
memory.procedural // preferences and rules
memory.working // session context
// Access by type
val facts = memory[KnowledgeType.SEMANTIC]Classification sources:
- Relations predicates: "likes" → PROCEDURAL, "works at" → SEMANTIC, "met" → EPISODIC
- Heuristic fallback: High decay → EPISODIC, High confidence + low decay → SEMANTIC
val relations = Relations.empty()
.withProcedural("likes", "prefers", "enjoys")
.withSemantic("works at", "is located in")
.withEpisodic("met", "visited", "attended")
val classifier = RelationBasedKnowledgeTypeClassifier.from(relations)
val projector = DefaultMemoryProjector.create(classifier)
val memory = projector.project(propositions)Operations transform groups of propositions into new, derived propositions. Unlike projections (which convert to different representations), operations produce new propositions at higher abstraction levels.
flowchart LR
subgraph Input["Input"]
G1["PropositionGroup<br/>'Alice'"]
G2["PropositionGroup<br/>'Bob'"]
end
subgraph Operations["Operations"]
ABS["🔭 Abstract<br/>━━━━━━━━━━━━<br/>Synthesize insights"]
CON["⚖️ Contrast<br/>━━━━━━━━━━━━<br/>Find differences"]
CMP["🔗 Compose<br/>━━━━━━━━━━━━<br/>Chain relationships"]
end
subgraph Output["Output"]
P["Derived Propositions<br/>level > 0<br/>sourceIds populated"]
end
G1 --> ABS --> P
G1 --> CON
G2 --> CON --> P
G1 --> CMP --> P
style Input fill:#d4eeff,stroke:#63c0f5,color:#1e1e1e
style Operations fill:#e8dcf4,stroke:#9f77cd,color:#1e1e1e
style Output fill:#d4f5d4,stroke:#3fd73c,color:#1e1e1e
| Operation | Description | Example |
|---|---|---|
| Abstract | Synthesize higher-level insights from a group | "likes jazz, blues, classical" → "enjoys music" |
| Contrast | Identify differences between groups | Alice vs Bob → "opposite meeting preferences" |
| Compose | Chain transitive relationships (via Prolog) | "A→B, B→C" → "A indirectly relates to C" |
Generate higher-level propositions that capture the essence of a group:
val abstractor = LlmPropositionAbstractor.withLlm(llm).withAi(ai)
// Group propositions with a label
val bobGroup = PropositionGroup("Bob", repository.findByEntity("bob-123"))
// Generate abstractions
val abstractions = abstractor.abstract(bobGroup, targetCount = 2)
// "Bob values thoroughness and clarity in work processes"
// "Bob prefers structured communication"Identify and articulate differences between two groups:
val contraster = LlmPropositionContraster.withLlm(llm).withAi(ai)
val aliceGroup = PropositionGroup("Alice", aliceProps)
val bobGroup = PropositionGroup("Bob", bobProps)
val differences = contraster.contrast(aliceGroup, bobGroup, targetCount = 3)
// "Alice prefers morning meetings while Bob prefers afternoons"
// "Alice and Bob have different language preferences (Python vs Java)"Derived propositions track their abstraction level and provenance:
data class Proposition(
// ... other fields ...
val level: Int = 0, // 0 = raw, 1+ = derived
val sourceIds: List<String>, // IDs of source propositions
)
// Query by abstraction level
val rawObservations = repository.findByMinLevel(0)
val abstractions = repository.findByMinLevel(1)The Oracle answers questions using LLM tool calling with Prolog reasoning:
| Tool | Description |
|---|---|
show_facts |
Display sample facts with human-readable names |
query_prolog |
Execute Prolog queries with variable bindings |
check_fact |
Verify if a specific fact is true |
list_entities |
Browse all known entities |
list_predicates |
Show available relationship types |
com.embabel.dice
├── agent/ # Agent integration
│ └── Memory # LlmReference for memory search tools
│
├── common/ # Shared types
│ ├── SourceAnalysisContext # Context for all DICE operations
│ ├── EntityResolver # Entity disambiguation interface
│ ├── KnownEntity # Pre-defined entity for hints
│ ├── Relation # Predicate with KnowledgeType
│ ├── Relations # Builder for relation collections
│ ├── KnowledgeType # SEMANTIC, EPISODIC, PROCEDURAL, WORKING
│ └── resolver/ # Entity resolution implementations
│ ├── CandidateSearcher # Interface for candidate search
│ ├── SearchResult # Confident match + candidates
│ ├── CandidateBakeoff # Interface for selecting best match
│ ├── LlmCandidateBakeoff # LLM-based candidate selection
│ ├── EscalatingEntityResolver # Chains searchers with optional bakeoff
│ ├── InMemoryEntityResolver # Session-level deduplication
│ ├── ChainedEntityResolver # Chains multiple resolvers
│ ├── KnownEntityResolver # Fast-path for pre-defined entities
│ └── searcher/ # Built-in searchers (cheapest-first)
│ ├── ByIdCandidateSearcher # ID lookup
│ ├── ByExactNameCandidateSearcher # Exact name match
│ ├── NormalizedNameCandidateSearcher # Normalized names
│ ├── PartialNameCandidateSearcher # Partial name match
│ ├── FuzzyNameCandidateSearcher # Levenshtein distance
│ ├── VectorCandidateSearcher # Embedding similarity
│ ├── AgenticCandidateSearcher # LLM-driven search
│ └── DefaultCandidateSearchers # Factory for defaults
│
├── proposition/ # Core types (source of truth)
│ ├── Proposition # Natural language fact with confidence/decay
│ ├── EntityMention # Entity reference within proposition
│ ├── PropositionQuery # Composable query specification
│ ├── Projector<T> # Generic projection interface
│ ├── PropositionRepository # Storage interface (with query() method)
│ ├── revision/ # Proposition revision
│ │ ├── PropositionReviser
│ │ ├── LlmPropositionReviser
│ │ └── RevisionResult # New, Merged, Reinforced, Contradicted, Generalized
│ └── extraction/
│ └── LlmPropositionExtractor
│
├── projection/ # Materialized views from propositions
│ ├── graph/ # Knowledge graph projection
│ │ ├── GraphProjector # Interface for graph projection
│ │ ├── RelationBasedGraphProjector # Predicate-based (no LLM)
│ │ ├── LlmGraphProjector # LLM-based classification
│ │ ├── ProjectionPolicy # Filter before projection
│ │ ├── GraphRelationshipPersister # Persistence interface
│ │ └── NamedEntityDataRepositoryGraphRelationshipPersister
│ │
│ ├── prolog/ # Prolog projection for inference
│ │ ├── PrologProjector
│ │ ├── PrologEngine # tuProlog wrapper
│ │ └── PrologSchema
│ │
│ └── memory/ # Agent memory projection
│ ├── MemoryProjector # Interface: project(propositions) -> MemoryProjection
│ ├── MemoryProjection # Result: semantic, episodic, procedural, working
│ ├── KnowledgeTypeClassifier # Interface for classification
│ ├── RelationBasedKnowledgeTypeClassifier
│ ├── HeuristicKnowledgeTypeClassifier
│ ├── DefaultMemoryProjector # Default implementation
│ └── MemoryRetriever # Similarity + entity + recency retrieval
│
├── query/oracle/ # Question answering
│ ├── Oracle
│ ├── ToolOracle
│ └── PrologTools
│
├── operations/ # Proposition transformations
│ ├── PropositionGroup # Labeled collection of propositions
│ ├── abstraction/ # Higher-level synthesis
│ │ ├── PropositionAbstractor
│ │ └── LlmPropositionAbstractor
│ └── contrast/ # Difference identification
│ ├── PropositionContraster
│ └── LlmPropositionContraster
│
├── entity/ # Entity extraction domain
│ ├── EntityExtractor # Entity extraction interface
│ ├── LlmEntityExtractor # LLM-based entity extraction
│ ├── EntityPipeline # Entity extraction + resolution pipeline
│ ├── ChunkEntityResult # Single chunk entity results
│ ├── EntityResults # Multi-chunk entity results
│ └── EntityIncrementalAnalyzer # Streaming entity extraction
│
├── pipeline/ # Proposition pipeline orchestration
│ └── PropositionPipeline # Full proposition extraction
│
├── incremental/ # Incremental/streaming infrastructure
│ ├── IncrementalAnalyzer<T,R> # Interface for incremental analysis
│ ├── AbstractIncrementalAnalyzer # Base implementation with windowing
│ ├── IncrementalSource<T> # Source abstraction
│ ├── IncrementalSourceFormatter<T> # Formats items to text
│ ├── ConversationSource # Conversation as incremental source
│ ├── MessageFormatter # Formats messages
│ ├── ChunkHistoryStore # Tracks processing history
│ ├── WindowConfig # Window and trigger configuration
│ └── proposition/ # Proposition-based incremental
│ └── PropositionIncrementalAnalyzer
│
├── text2graph/ # Legacy knowledge graph building
│ ├── KnowledgeGraphBuilder
│ ├── SourceAnalyzer
│ └── SourceAnalyzerEntityExtractor # Adapter wrapping SourceAnalyzer
DICE provides REST endpoints for extracting propositions and managing memory. All endpoints are scoped by contextId.
curl -X POST http://localhost:8080/api/v1/contexts/{contextId}/extract \
-H "Content-Type: application/json" \
-d '{
"text": "I love Brahms and his symphonies are incredible",
"sourceId": "conversation-123",
"knownEntities": [
{"id": "user-1", "name": "Alice", "type": "User", "role": "SUBJECT"}
]
}'Response:
{
"chunkId": "chunk-abc",
"contextId": "user-session-123",
"propositions": [
{
"id": "prop-xyz",
"text": "User loves Brahms",
"mentions": [{"name": "Brahms", "type": "Composer", "role": "OBJECT"}],
"confidence": 0.95,
"action": "CREATED"
}
],
"entities": {"created": ["composer-brahms"], "resolved": [], "failed": []},
"revision": {"created": 1, "merged": 0, "reinforced": 0, "contradicted": 0, "generalized": 0}
}Supports PDF, Word, Markdown, HTML, and other formats via Apache Tika:
curl -X POST http://localhost:8080/api/v1/contexts/{contextId}/extract/file \
-F "file=@document.pdf" \
-F "sourceId=doc-123"Response:
{
"sourceId": "doc-123",
"contextId": "user-session-123",
"filename": "document.pdf",
"chunksProcessed": 5,
"totalPropositions": 12,
"chunks": [
{"chunkId": "chunk-1", "propositionCount": 3, "preview": "Introduction to classical music..."}
],
"entities": {"created": ["composer-brahms"], "resolved": ["composer-wagner"], "failed": []},
"revision": {"created": 10, "merged": 2, "reinforced": 0, "contradicted": 0, "generalized": 0}
}# Get all propositions for context
curl http://localhost:8080/api/v1/contexts/{contextId}/memory
# Filter by status and confidence
curl "http://localhost:8080/api/v1/contexts/{contextId}/memory?status=ACTIVE&minConfidence=0.8&limit=50"curl http://localhost:8080/api/v1/contexts/{contextId}/memory/{propositionId}curl -X POST http://localhost:8080/api/v1/contexts/{contextId}/memory \
-H "Content-Type: application/json" \
-d '{
"text": "User prefers morning meetings",
"mentions": [
{"name": "User", "type": "User", "role": "SUBJECT"}
],
"confidence": 0.9,
"reasoning": "Explicitly stated preference"
}'curl -X DELETE http://localhost:8080/api/v1/contexts/{contextId}/memory/{propositionId}curl -X POST http://localhost:8080/api/v1/contexts/{contextId}/memory/search \
-H "Content-Type: application/json" \
-d '{
"query": "music preferences",
"topK": 10,
"similarityThreshold": 0.7,
"filters": {
"status": ["ACTIVE"],
"minConfidence": 0.5
}
}'curl http://localhost:8080/api/v1/contexts/{contextId}/memory/entity/{entityType}/{entityId}
# Example
curl http://localhost:8080/api/v1/contexts/user-123/memory/entity/Composer/composer-brahmsControllers auto-configure when required beans are present:
@Configuration
class DiceApiConfig {
@Bean
fun propositionRepository(embeddingService: EmbeddingService): PropositionRepository =
InMemoryPropositionRepository(embeddingService)
@Bean
fun propositionPipeline(
extractor: PropositionExtractor,
reviser: PropositionReviser,
repository: PropositionRepository,
): PropositionPipeline = PropositionPipeline
.withExtractor(extractor)
.withRevision(reviser, repository)
@Bean
fun entityResolver(): EntityResolver = AlwaysCreateEntityResolver
@Bean
fun schema(): DataDictionary = DataDictionary.fromClasses(
"ecommerce",
Customer::class.java,
Product::class.java,
)
}MemoryControllerloads whenPropositionRepositoryis availablePropositionPipelineControllerloads whenPropositionPipelineis available (via@ConditionalOnBean)
DICE provides API key authentication for the REST endpoints. Enable it via configuration:
# application.yml
dice:
security:
api-key:
enabled: true
keys:
- sk-your-secret-key-here
- sk-another-key-for-different-clientThen include the API key in requests:
curl -H "X-API-Key: sk-your-secret-key-here" \
http://localhost:8080/api/v1/contexts/user-123/memorydice:
security:
api-key:
enabled: true # Enable API key auth (default: false)
keys: # List of valid API keys
- sk-key-1
- sk-key-2
header-name: X-API-Key # Header name (default: X-API-Key)
path-patterns: # Paths to protect (default: /api/v1/**)
- /api/v1/**For production, implement your own ApiKeyAuthenticator to validate keys against a database or secrets manager:
@Component
class DatabaseApiKeyAuthenticator(
private val apiKeyRepository: ApiKeyRepository,
) : ApiKeyAuthenticator {
override fun authenticate(apiKey: String): AuthResult {
val keyEntity = apiKeyRepository.findByKey(apiKey)
?: return AuthResult.Unauthorized("Invalid API key")
if (keyEntity.isExpired()) {
return AuthResult.Unauthorized("API key expired")
}
return AuthResult.Authorized(
principal = keyEntity.clientId,
metadata = mapOf("scopes" to keyEntity.scopes),
)
}
}When you provide your own ApiKeyAuthenticator bean, it takes precedence over the in-memory implementation.
For more control (e.g., combining with other auth methods), integrate with Spring Security directly:
@Configuration
@EnableWebSecurity
class SecurityConfig {
@Bean
fun apiKeyAuthenticator(): ApiKeyAuthenticator =
InMemoryApiKeyAuthenticator.withKey("sk-your-secret-key")
@Bean
fun securityFilterChain(
http: HttpSecurity,
authenticator: ApiKeyAuthenticator,
): SecurityFilterChain {
val apiKeyFilter = ApiKeyAuthenticationFilter(
authenticator = authenticator,
pathPatterns = listOf("/api/v1/**"),
)
return http
.csrf { it.disable() }
.sessionManagement { it.sessionCreationPolicy(SessionCreationPolicy.STATELESS) }
.authorizeHttpRequests { auth ->
auth.requestMatchers("/api/v1/**").authenticated()
auth.requestMatchers("/actuator/health").permitAll()
auth.anyRequest().permitAll()
}
.addFilterBefore(apiKeyFilter, UsernamePasswordAuthenticationFilter::class.java)
.build()
}
}Key points:
- Disable CSRF for stateless API
- Use
STATELESSsession management - Add the
ApiKeyAuthenticationFilterbefore Spring'sUsernamePasswordAuthenticationFilter - Configure path patterns to match your API routes
If your application has OAuth or form login configured elsewhere, exclude the DICE endpoints:
@Configuration
@EnableWebSecurity
class SecurityConfig {
@Bean
fun securityFilterChain(http: HttpSecurity): SecurityFilterChain {
return http
// API endpoints use API key auth
.securityMatcher("/api/v1/**")
.csrf { it.disable() }
.sessionManagement { it.sessionCreationPolicy(SessionCreationPolicy.STATELESS) }
.authorizeHttpRequests { it.anyRequest().authenticated() }
.addFilterBefore(apiKeyFilter(), UsernamePasswordAuthenticationFilter::class.java)
// Disable OAuth for these endpoints
.oauth2Login { it.disable() }
.formLogin { it.disable() }
.build()
}
}Or use multiple SecurityFilterChain beans with different matchers:
@Bean
@Order(1)
fun apiSecurityFilterChain(http: HttpSecurity): SecurityFilterChain {
return http
.securityMatcher("/api/v1/**")
// API key auth config...
.build()
}
@Bean
@Order(2)
fun webSecurityFilterChain(http: HttpSecurity): SecurityFilterChain {
return http
.securityMatcher("/**")
// OAuth/form login config for web UI...
.build()
}Add to your pom.xml:
<dependency>
<groupId>com.embabel</groupId>
<artifactId>dice</artifactId>
<version>0.1.0-SNAPSHOT</version>
</dependency>- tuProlog (2p-kt): Pure Kotlin Prolog engine for inference
- Spring Framework: Dependency injection and web support (optional)
- Embabel Agent: AI/LLM integration framework
- Kotlin: Primary language
Johnson, R. (2025). Context Engineering Needs Domain Understanding. Medium. https://medium.com/@springrod/context-engineering-needs-domain-understanding-b4387e8e4bf8
Shaikh, O., Sapkota, S., Rizvi, S., Horvitz, E., Park, J.S., Yang, D., & Bernstein, M.S. (2025). Creating General User Models from Computer Use. arXiv:2505.10831. https://arxiv.org/abs/2505.10831
The proposition-based architecture is inspired by GUM's approach to building unified user models through confidence-weighted propositions. GUM's four-module pipeline (Propose, Retrieve, Revise, Audit) demonstrates 76% accuracy overall and 100% for high-confidence propositions.
GUM Pipeline DICE
──────────── ────
Propose ─────────────────► PropositionExtractor
Retrieve ─────────────────► PropositionRepository.findSimilar()
Revise ─────────────────► PropositionReviser
Audit ─────────────────► ProjectionPolicy
Apache License 2.0

