creating a design doc by djarecka · Pull Request #21 · sensein/brainkbdocs

djarecka · 2026-05-01T15:14:18Z

I have added the markdown created by @satra and clude design. A slide deck was shared on slack.

We should have a discussion on this document (and can create PR to design_doc branch), and at teh end we can recreate the slides if we want.

gemini-code-assist

Code Review

This pull request introduces a comprehensive Markdown design document for the BrainKB architecture deck redesign, covering user stories, architecture levels, and technical contracts. Feedback addressed the use of a non-portable local absolute path and highlighted inconsistencies in the traceability matrix regarding architecture levels.

gemini-code-assist · 2026-05-01T15:16:04Z

+
+Status: review-ready planning artifact  
+Audience: engineering team  
+Source material: `/Users/satra/Downloads/brainkb-ui-arch.pptx` and companion web export  


The source material path is a local absolute path, which is not portable and may expose internal directory structures. It should be replaced with a generic reference or a relative path.

Suggested change

Source material: `/Users/satra/Downloads/brainkb-ui-arch.pptx` and companion web export

Source material: Slide deck brainkb-ui-arch.pptx and companion web export

I'm guessing that we don't have to include the slides?

gemini-code-assist · 2026-05-01T15:16:04Z

+| 01 Knowledge Review | L1, L3, L4 | Search, drill-down, provenance | Entity hydration, named graph aggregation, BG taxonomy/asset/claim traversal, conflict display, evidence scoring. |
+| 02 Hypothesis Generation | L1, L2, L3, L4 | Search, LLM-assisted query, memory promotion | Graph patterns, similarity retrieval, task memory, grounded suggestions, reviewable drafts. |
+| 03 Methods and Models | L0, L1, L4 | Search, drill-down | Tool/model entities, applicability schema, benchmarks, failure-mode provenance. |
+| 04 Resource Landscape | L0, L1, L4 | Search, federation | Resource lifecycle model, version edges, as-of queries, timeline/supersession UI. |
+| 05 Entity Exploration | L1, L2, L3, L4 | Search, drill-down, federation | Canonical identifiers, synonym search, taxonomy/cell/gene/file/paper entity pages, evidence badges, source links. |
+| 06 Curated Claim Ingest | L1, L2, L3, L4 | Curator, ingest, auth | Document parsing, NER/extraction drafts, analysis-graph evidence, review queue, schema validation, named graph write. |
+| 07 Automated Partner Release Ingest | L2, L3, L4 | Auth, ingest | Service credentials, idempotent atlas/package jobs, file manifests, validation reports, graph diff, atomic activation. |
+| 08 Cross-KB Federated Query | L0, L2, L3, L4 | Federation, search, cache lookup | Query planning, atlas/archive/publication/gene connectors, connector/result cache, source attribution, partial results, URI reconciliation. |
+| 09 Grounded Assistant | L1, L2, L3, L4 | LLM-assisted query, search, memory retrieval | pgvector retrieval, cache-aware graph hydration, scoped memory, citations to claims/assets/papers, provider boundary, fallback behavior. |
+| 10 Provenance Audit | L1, L3, L4 | Provenance, drill-down | Evidence nodes, PROV-O paths, schema/version metadata, supersession history. |


The Primary architecture levels column in the Traceability Matrix is inconsistent with the Architecture dependencies listed in the detailed Epic descriptions. For example, Epic 01 (line 496) lists L0-L4, but the matrix only shows L1, L3, L4. Synchronizing these will ensure the document remains a reliable source of truth for engineering planning.

Suggested change

| 01 Knowledge Review | L1, L3, L4 | Search, drill-down, provenance | Entity hydration, named graph aggregation, BG taxonomy/asset/claim traversal, conflict display, evidence scoring. |

| 02 Hypothesis Generation | L1, L2, L3, L4 | Search, LLM-assisted query, memory promotion | Graph patterns, similarity retrieval, task memory, grounded suggestions, reviewable drafts. |

| 03 Methods and Models | L0, L1, L4 | Search, drill-down | Tool/model entities, applicability schema, benchmarks, failure-mode provenance. |

| 04 Resource Landscape | L0, L1, L4 | Search, federation | Resource lifecycle model, version edges, as-of queries, timeline/supersession UI. |

| 05 Entity Exploration | L1, L2, L3, L4 | Search, drill-down, federation | Canonical identifiers, synonym search, taxonomy/cell/gene/file/paper entity pages, evidence badges, source links. |

| 06 Curated Claim Ingest | L1, L2, L3, L4 | Curator, ingest, auth | Document parsing, NER/extraction drafts, analysis-graph evidence, review queue, schema validation, named graph write. |

| 07 Automated Partner Release Ingest | L2, L3, L4 | Auth, ingest | Service credentials, idempotent atlas/package jobs, file manifests, validation reports, graph diff, atomic activation. |

| 08 Cross-KB Federated Query | L0, L2, L3, L4 | Federation, search, cache lookup | Query planning, atlas/archive/publication/gene connectors, connector/result cache, source attribution, partial results, URI reconciliation. |

| 09 Grounded Assistant | L1, L2, L3, L4 | LLM-assisted query, search, memory retrieval | pgvector retrieval, cache-aware graph hydration, scoped memory, citations to claims/assets/papers, provider boundary, fallback behavior. |

| 10 Provenance Audit | L1, L3, L4 | Provenance, drill-down | Evidence nodes, PROV-O paths, schema/version metadata, supersession history. |

| 01 Knowledge Review | L0, L1, L2, L3, L4 | Search, drill-down, provenance | Entity hydration, named graph aggregation, BG taxonomy/asset/claim traversal, conflict display, evidence scoring. |

| 02 Hypothesis Generation | L1, L2, L3, L4 | Search, LLM-assisted query, memory promotion | Graph patterns, similarity retrieval, task memory, grounded suggestions, reviewable drafts. |

| 03 Methods and Models | L0, L1, L2, L3, L4 | Search, drill-down | Tool/model entities, applicability schema, benchmarks, failure-mode provenance. |

| 04 Resource Landscape | L0, L1, L2, L3, L4 | Search, federation | Resource lifecycle model, version edges, as-of queries, timeline/supersession UI. |

| 05 Entity Exploration | L1, L2, L3, L4 | Search, drill-down, federation | Canonical identifiers, synonym search, taxonomy/cell/gene/file/paper entity pages, evidence badges, source links. |

| 06 Curated Claim Ingest | L1, L2, L3, L4 | Curator, ingest, auth | Document parsing, NER/extraction drafts, analysis-graph evidence, review queue, schema validation, named graph write. |

| 07 Automated Partner Release Ingest | L1, L2, L3, L4 | Auth, ingest | Service credentials, idempotent atlas/package jobs, file manifests, validation reports, graph diff, atomic activation. |

| 08 Cross-KB Federated Query | L0, L1, L2, L3, L4 | Federation, search, cache lookup | Query planning, atlas/archive/publication/gene connectors, connector/result cache, source attribution, partial results, URI reconciliation. |

| 09 Grounded Assistant | L1, L2, L3, L4 | LLM-assisted query, search, memory retrieval | pgvector retrieval, cache-aware graph hydration, scoped memory, citations to claims/assets/papers, provider boundary, fallback behavior. |

| 10 Provenance Audit | L1, L2, L3, L4 | Provenance, drill-down | Evidence nodes, PROV-O paths, schema/version metadata, supersession history. |

- Move original combined doc to design/original_source/ (reference only) - Add brainkb-architecture.md and brainkb-review-deck-plan.md as the editable working files - Add utils/split_design_doc.py and utils/merge_design_doc.py with CLI arguments - Add design/README.md and utils/README.md documenting the workflow Co-Authored-By: Dorota Jarecka <djarecka@gmail.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

djarecka · 2026-05-01T23:28:11Z

decided to create #22 to separate the architecture and the slides instructions

Split architecture design doc into focused working files

tekrajchhetri

It seems this PR includes everything from the slides. @djarecka could you remove the contents as discussed during our meeting so that we have only relevant things for architecture in this md file (same for others).

tekrajchhetri · 2026-05-06T19:07:18Z

@djarecka here's the figure from the slide.

@satra @djarecka some of my major comment's regarding the architecture:

kg-api and ingest-api seems have the similar functionalities and they point to same database, oxigraph.
now there's no service for user management, i.e., user details + tracking other user related information, e.g., contribution, what role the user is performing--reviewer, curator..
Ingest API still connects to Bioportal, it's rate limit. It should connect to the local concept mapping service and have an option to connect to the external service for Bioportal.
PDF parser is listed as external service and it list's multiple one -- are we going to use all of them?
the description also talks about search, but no dedicated search component is visible in the architecture.
for celery, are we planning to use external brokers like rabbit mq?

From the architecture md:

one major concern i see especially in the schema part is that it doesn't mention assertion/evidence schema that we've created. It needs to be also extended for hypothesis.
in the description regarding modeling, it also mentions adopting things like RDF-star. we should consider not using RDF-star as RDF 1.2 does not allow it. it's also the reason why pyoxigraph has dropped RDF-star support, see https://pyoxigraph.readthedocs.io/en/stable/migration.html.

satra · 2026-05-06T19:36:14Z

@tekrajchhetri - it's important for us to align on use cases as much as the architecture. let's not get to implementation details before agreeing on the use cases and what supports them being met. obviously this doc has more than that. but i would recommend focusing on the use cases before the system architecture.

tekrajchhetri · 2026-05-06T19:48:09Z

@satra for me those use-cases are fine. It's just that i would just change the order to 3,4,1,2 in which things should be implemented.

adding the design doc from Satra and claude design discussion

6642b0c

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

Merge pull request #22 from djarecka/design_doc_split

aaa2791

Split architecture design doc into focused working files

djarecka mentioned this pull request May 4, 2026

finalize design doc for BrainKB #23

Open

tekrajchhetri requested changes May 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

creating a design doc#21

creating a design doc#21
djarecka wants to merge 3 commits intomainfrom
design_doc

djarecka commented May 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

djarecka May 1, 2026

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

djarecka commented May 1, 2026

Uh oh!

tekrajchhetri left a comment

Uh oh!

tekrajchhetri commented May 6, 2026 •

edited

Loading

Uh oh!

satra commented May 6, 2026

Uh oh!

tekrajchhetri commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	Source material: `/Users/satra/Downloads/brainkb-ui-arch.pptx` and companion web export
	Source material: Slide deck brainkb-ui-arch.pptx and companion web export

Conversation

djarecka commented May 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

djarecka May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

djarecka commented May 1, 2026

Uh oh!

tekrajchhetri left a comment

Choose a reason for hiding this comment

Uh oh!

tekrajchhetri commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

satra commented May 6, 2026

Uh oh!

tekrajchhetri commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tekrajchhetri commented May 6, 2026 •

edited

Loading