Skip to content

creating a design doc#21

Open
djarecka wants to merge 3 commits intomainfrom
design_doc
Open

creating a design doc#21
djarecka wants to merge 3 commits intomainfrom
design_doc

Conversation

@djarecka
Copy link
Copy Markdown
Contributor

@djarecka djarecka commented May 1, 2026

I have added the markdown created by @satra and clude design. A slide deck was shared on slack.

We should have a discussion on this document (and can create PR to design_doc branch), and at teh end we can recreate the slides if we want.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive Markdown design document for the BrainKB architecture deck redesign, covering user stories, architecture levels, and technical contracts. Feedback addressed the use of a non-portable local absolute path and highlighted inconsistencies in the traceability matrix regarding architecture levels.


Status: review-ready planning artifact
Audience: engineering team
Source material: `/Users/satra/Downloads/brainkb-ui-arch.pptx` and companion web export
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The source material path is a local absolute path, which is not portable and may expose internal directory structures. It should be replaced with a generic reference or a relative path.

Suggested change
Source material: `/Users/satra/Downloads/brainkb-ui-arch.pptx` and companion web export
Source material: Slide deck brainkb-ui-arch.pptx and companion web export

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that we don't have to include the slides?

Comment on lines +974 to +983
| 01 Knowledge Review | L1, L3, L4 | Search, drill-down, provenance | Entity hydration, named graph aggregation, BG taxonomy/asset/claim traversal, conflict display, evidence scoring. |
| 02 Hypothesis Generation | L1, L2, L3, L4 | Search, LLM-assisted query, memory promotion | Graph patterns, similarity retrieval, task memory, grounded suggestions, reviewable drafts. |
| 03 Methods and Models | L0, L1, L4 | Search, drill-down | Tool/model entities, applicability schema, benchmarks, failure-mode provenance. |
| 04 Resource Landscape | L0, L1, L4 | Search, federation | Resource lifecycle model, version edges, as-of queries, timeline/supersession UI. |
| 05 Entity Exploration | L1, L2, L3, L4 | Search, drill-down, federation | Canonical identifiers, synonym search, taxonomy/cell/gene/file/paper entity pages, evidence badges, source links. |
| 06 Curated Claim Ingest | L1, L2, L3, L4 | Curator, ingest, auth | Document parsing, NER/extraction drafts, analysis-graph evidence, review queue, schema validation, named graph write. |
| 07 Automated Partner Release Ingest | L2, L3, L4 | Auth, ingest | Service credentials, idempotent atlas/package jobs, file manifests, validation reports, graph diff, atomic activation. |
| 08 Cross-KB Federated Query | L0, L2, L3, L4 | Federation, search, cache lookup | Query planning, atlas/archive/publication/gene connectors, connector/result cache, source attribution, partial results, URI reconciliation. |
| 09 Grounded Assistant | L1, L2, L3, L4 | LLM-assisted query, search, memory retrieval | pgvector retrieval, cache-aware graph hydration, scoped memory, citations to claims/assets/papers, provider boundary, fallback behavior. |
| 10 Provenance Audit | L1, L3, L4 | Provenance, drill-down | Evidence nodes, PROV-O paths, schema/version metadata, supersession history. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Primary architecture levels column in the Traceability Matrix is inconsistent with the Architecture dependencies listed in the detailed Epic descriptions. For example, Epic 01 (line 496) lists L0-L4, but the matrix only shows L1, L3, L4. Synchronizing these will ensure the document remains a reliable source of truth for engineering planning.

Suggested change
| 01 Knowledge Review | L1, L3, L4 | Search, drill-down, provenance | Entity hydration, named graph aggregation, BG taxonomy/asset/claim traversal, conflict display, evidence scoring. |
| 02 Hypothesis Generation | L1, L2, L3, L4 | Search, LLM-assisted query, memory promotion | Graph patterns, similarity retrieval, task memory, grounded suggestions, reviewable drafts. |
| 03 Methods and Models | L0, L1, L4 | Search, drill-down | Tool/model entities, applicability schema, benchmarks, failure-mode provenance. |
| 04 Resource Landscape | L0, L1, L4 | Search, federation | Resource lifecycle model, version edges, as-of queries, timeline/supersession UI. |
| 05 Entity Exploration | L1, L2, L3, L4 | Search, drill-down, federation | Canonical identifiers, synonym search, taxonomy/cell/gene/file/paper entity pages, evidence badges, source links. |
| 06 Curated Claim Ingest | L1, L2, L3, L4 | Curator, ingest, auth | Document parsing, NER/extraction drafts, analysis-graph evidence, review queue, schema validation, named graph write. |
| 07 Automated Partner Release Ingest | L2, L3, L4 | Auth, ingest | Service credentials, idempotent atlas/package jobs, file manifests, validation reports, graph diff, atomic activation. |
| 08 Cross-KB Federated Query | L0, L2, L3, L4 | Federation, search, cache lookup | Query planning, atlas/archive/publication/gene connectors, connector/result cache, source attribution, partial results, URI reconciliation. |
| 09 Grounded Assistant | L1, L2, L3, L4 | LLM-assisted query, search, memory retrieval | pgvector retrieval, cache-aware graph hydration, scoped memory, citations to claims/assets/papers, provider boundary, fallback behavior. |
| 10 Provenance Audit | L1, L3, L4 | Provenance, drill-down | Evidence nodes, PROV-O paths, schema/version metadata, supersession history. |
| 01 Knowledge Review | L0, L1, L2, L3, L4 | Search, drill-down, provenance | Entity hydration, named graph aggregation, BG taxonomy/asset/claim traversal, conflict display, evidence scoring. |
| 02 Hypothesis Generation | L1, L2, L3, L4 | Search, LLM-assisted query, memory promotion | Graph patterns, similarity retrieval, task memory, grounded suggestions, reviewable drafts. |
| 03 Methods and Models | L0, L1, L2, L3, L4 | Search, drill-down | Tool/model entities, applicability schema, benchmarks, failure-mode provenance. |
| 04 Resource Landscape | L0, L1, L2, L3, L4 | Search, federation | Resource lifecycle model, version edges, as-of queries, timeline/supersession UI. |
| 05 Entity Exploration | L1, L2, L3, L4 | Search, drill-down, federation | Canonical identifiers, synonym search, taxonomy/cell/gene/file/paper entity pages, evidence badges, source links. |
| 06 Curated Claim Ingest | L1, L2, L3, L4 | Curator, ingest, auth | Document parsing, NER/extraction drafts, analysis-graph evidence, review queue, schema validation, named graph write. |
| 07 Automated Partner Release Ingest | L1, L2, L3, L4 | Auth, ingest | Service credentials, idempotent atlas/package jobs, file manifests, validation reports, graph diff, atomic activation. |
| 08 Cross-KB Federated Query | L0, L1, L2, L3, L4 | Federation, search, cache lookup | Query planning, atlas/archive/publication/gene connectors, connector/result cache, source attribution, partial results, URI reconciliation. |
| 09 Grounded Assistant | L1, L2, L3, L4 | LLM-assisted query, search, memory retrieval | pgvector retrieval, cache-aware graph hydration, scoped memory, citations to claims/assets/papers, provider boundary, fallback behavior. |
| 10 Provenance Audit | L1, L2, L3, L4 | Provenance, drill-down | Evidence nodes, PROV-O paths, schema/version metadata, supersession history. |

- Move original combined doc to design/original_source/ (reference only)
- Add brainkb-architecture.md and brainkb-review-deck-plan.md as the editable working files
- Add utils/split_design_doc.py and utils/merge_design_doc.py with CLI arguments
- Add design/README.md and utils/README.md documenting the workflow

Co-Authored-By: Dorota Jarecka <djarecka@gmail.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@djarecka
Copy link
Copy Markdown
Contributor Author

djarecka commented May 1, 2026

decided to create #22 to separate the architecture and the slides instructions

Split architecture design doc into focused working files
Copy link
Copy Markdown
Collaborator

@tekrajchhetri tekrajchhetri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this PR includes everything from the slides. @djarecka could you remove the contents as discussed during our meeting so that we have only relevant things for architecture in this md file (same for others).

@tekrajchhetri
Copy link
Copy Markdown
Collaborator

tekrajchhetri commented May 6, 2026

@djarecka here's the figure from the slide.

image

@satra @djarecka some of my major comment's regarding the architecture:

  1. kg-api and ingest-api seems have the similar functionalities and they point to same database, oxigraph.
  2. now there's no service for user management, i.e., user details + tracking other user related information, e.g., contribution, what role the user is performing--reviewer, curator..
  3. Ingest API still connects to Bioportal, it's rate limit. It should connect to the local concept mapping service and have an option to connect to the external service for Bioportal.
  4. PDF parser is listed as external service and it list's multiple one -- are we going to use all of them?
  5. the description also talks about search, but no dedicated search component is visible in the architecture.
  6. for celery, are we planning to use external brokers like rabbit mq?

From the architecture md:

  • one major concern i see especially in the schema part is that it doesn't mention assertion/evidence schema that we've created. It needs to be also extended for hypothesis.
  • in the description regarding modeling, it also mentions adopting things like RDF-star. we should consider not using RDF-star as RDF 1.2 does not allow it. it's also the reason why pyoxigraph has dropped RDF-star support, see https://pyoxigraph.readthedocs.io/en/stable/migration.html.

@satra
Copy link
Copy Markdown

satra commented May 6, 2026

@tekrajchhetri - it's important for us to align on use cases as much as the architecture. let's not get to implementation details before agreeing on the use cases and what supports them being met. obviously this doc has more than that. but i would recommend focusing on the use cases before the system architecture.

@tekrajchhetri
Copy link
Copy Markdown
Collaborator

@satra for me those use-cases are fine. It's just that i would just change the order to 3,4,1,2 in which things should be implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants