Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
8644329
Add Python/uv tool to generate Translator component dependency diagrams
gaurav May 26, 2026
c409621
Add --google-sheet option to download CSV from a world-readable Googl…
gaurav May 26, 2026
490bfc6
Added an empty .env.default file.
gaurav May 26, 2026
a7e3882
Update diagram: new column names, node labels, edge styles, and legend
gaurav May 26, 2026
d7035b6
Update edge semantics: rename column, reverse arrow direction, horizo…
gaurav May 26, 2026
2ebc831
Remove owner cluster boxes from diagram
gaurav May 26, 2026
4635356
Update color scheme to reflect team roles
gaurav May 26, 2026
9a49bd6
Reverse Calls arrow direction and update legend labels
gaurav May 26, 2026
9b0f130
Add entry and exit terminal nodes to diagram
gaurav May 26, 2026
138380c
Add planned-but-not-implemented edge convention (~prefix)
gaurav May 26, 2026
2add573
Add README for translator-components-diagram tool
gaurav May 26, 2026
eeeb3e8
Reject inputs that would silently corrupt the diagram
gaurav May 26, 2026
e8d36f4
Tidy code-review nits
gaurav May 26, 2026
68adeed
Refactor to dataclass and improve diagram readability
gaurav May 26, 2026
d298253
Add pytest tests for pure functions
gaurav May 26, 2026
c877ea7
Move owner-to-color mapping into owner-colors.csv
gaurav May 26, 2026
3024742
Render ubiquitous components as per-caller clones; tighten layout
gaurav May 26, 2026
6df5ef7
Improved name.
gaurav May 26, 2026
7c3ea2f
Trim legend to owner colors and edge styles only
gaurav May 26, 2026
287efd0
Filter owner legend to only show owners present in the rendered diagram
gaurav May 26, 2026
4d1fddb
Updated documentation.
gaurav May 26, 2026
ec04a47
Support "Part of" column to group components into labeled layer clusters
gaurav May 29, 2026
28b6db3
Replace hardcoded terminal nodes with CSV-driven Externals column
gaurav May 29, 2026
f8472b0
Move group cluster label to bottom of the bounding box
gaurav May 29, 2026
2933075
Style group cluster label as a dark tab with white bold text
gaurav May 29, 2026
89ff7b9
Shrink group label tab and switch cluster border to solid gray fill
gaurav May 29, 2026
42f4830
Move owner legend to bottom rank (rank=max)
gaurav May 29, 2026
3bdf94a
Pin externals to top/bottom rows; nudge owner legend to bottom-right
gaurav May 29, 2026
cd8d10c
Remove rank pinning from owner legend; let layout engine place it freely
gaurav May 29, 2026
1f35586
Pin both legend clusters to rank=max (bottom of diagram)
gaurav May 29, 2026
0b0812b
Pin all edge-legend nodes to rank=max to prevent cluster stretching
gaurav May 29, 2026
9373700
Stack edge-legend pairs on separate rows; float legend cluster freely
gaurav May 29, 2026
968688a
Add external node shapes to the edge-style legend
gaurav May 29, 2026
acb0ee6
Improved Legend labels.
gaurav May 29, 2026
3c66021
Add Hide column support to suppress components from diagram output
gaurav Jun 2, 2026
cc9f037
Widen Results edge in legend for better label visibility
gaurav Jun 3, 2026
8d12397
Suppress dotted/dashed edges when a solid edge already exists between…
gaurav Jun 3, 2026
f25ce99
Increase legend edge minlen to 5 for clearer edge labels
gaurav Jun 3, 2026
039be30
Style planned edges in red so they stand out clearly
gaurav Jun 3, 2026
a243ef7
Change API call edges from dotted to dashed
gaurav Jun 3, 2026
39d671e
Add --no-concentrate flag to disable edge merging
gaurav Jun 3, 2026
ee7252c
Change --concentrate default to off (no-concentrate)
gaurav Jun 3, 2026
537f313
Add CLAUDE.md for translator-components-diagram
gaurav Jun 3, 2026
a4b7dc5
Show non-ITRB hosting location on component node labels
gaurav Jun 3, 2026
0cd7ca1
Update CLAUDE.md files with hosted-at data model and workflow notes
gaurav Jun 3, 2026
0cbe326
Add Layer field to Component data model
gaurav Jun 9, 2026
08ea3a3
Add --layer-column flag to generate per-layer sub-figures
gaurav Jun 9, 2026
deb9702
Split legends into separate PNGs by default
gaurav Jun 9, 2026
1908cf4
Include external nodes in per-layer sub-figures
gaurav Jun 9, 2026
2b94bd2
Removed spurious text.
gaurav Jun 9, 2026
cd33e57
Added Scripps logo as a hosting location.
gaurav Jun 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Generated output and input data files live in data/ directories
data/
**/data/

# Local environment files (contain secrets like Google Sheet IDs)
.env
**/.env

# IDE files
.idea/

# Python caches and OS metadata
__pycache__/
*.pyc
.DS_Store
14 changes: 14 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Core-Components-Working-Group

This repository contains tooling for the Translator platform's Core Components Working Group.

## Subdirectories

| Directory | Purpose |
|-----------|---------|
| `translator-components-diagram/` | Generates Graphviz dependency diagrams from the Translator components Google Sheet. See its own `CLAUDE.md` for full details. |

## Workflow notes for Claude

- After making code changes in `translator-components-diagram/`, do **not** run `uv run generate-diagram` — the user will run the script themselves. Only run `uv run pytest` to check for test failures.
- The active branch for diagram work is `add-translator-components-diagrams-code`; PRs target `main`.
118 changes: 118 additions & 0 deletions translator-components-diagram/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# translator-components-diagram

Generates Graphviz dependency diagrams for Translator platform components from a Google Sheet CSV.

> **Note for Claude:** After making code changes, do not run `uv run generate-diagram` yourself — the user will run it. Only run `uv run pytest` to check for test failures.

## Quick start

```bash
# Download from Google Sheet and render (most common)
uv run generate-diagram --google-sheet

# From a local CSV
uv run generate-diagram --input data/components.csv

# Include all components (not just active refactor statuses)
uv run generate-diagram --google-sheet --all

# Left-to-right layout, PDF output
uv run generate-diagram --google-sheet --direction LR --format pdf

# Run tests
uv run pytest
```

## Script layout (`generate_diagram.py`)

| Lines | What's there |
|-------|-------------|
| 17–36 | Global constants: `DEFAULT_STATUSES`, `FALLBACK_COLORS`, color constants for planned/ghost/external nodes |
| 39–61 | `ColorAssigner` — maps owners to fill colors, falls back to rotating palette |
| 63–69 | `text_color_for` — picks black/white text for contrast against a fill hex |
| 72–103 | `Component` dataclass — one CSV row after parsing |
| 106–154 | CSV parsing utilities: `_parse_bool`, `parse_id_list`, `parse_externals` |
| 157–213 | Data loading: `load_owner_colors`, `load_components`, `index_by_id` |
| 216–258 | `validate` — duplicate ID detection, unknown reference checking |
| 261–284 | `write_json` — serializes all components to `components.json` |
| 290–711 | Graph construction: `_compute_*`, `_add_*`, `build_graph` (see table below) |
| 714–891 | CLI: `@click.option` decorators + `main` |

### Graph construction helpers (290–711)

| Function | Lines | Purpose |
|----------|-------|---------|
| `_compute_active_set` | 290–296 | IDs to render based on refactor status filter |
| `_compute_ghost_ids` | 299–316 | IDs of excluded-but-referenced components (shown dimmed) |
| `_emit_component_node` | 319–341 | Renders one component node (used for primary nodes and ubiquitous clones) |
| `_compute_groups` | 343–355 | Groups nodes by `Part of` label |
| `_add_active_nodes` | 358–372 | Emits all non-grouped, non-ubiquitous active nodes |
| `_add_ghost_nodes` | 375–394 | Emits dimmed nodes for excluded-but-referenced components |
| `_add_group_clusters` | 397–440 | Wraps `Part of` groups in labeled dotted-border subgraphs |
| `_add_edges` | 443–500 | Emits all dependency edges (solid/dashed, implemented/planned) |
| `_ext_node_id` | 503–506 | Stable node ID from an external-entity name |
| `_add_external_nodes_and_edges` | 509–568 | Emits external source/sink nodes from the `Externals` column |
| `_owner_legend_html` | 571–593 | Builds HTML-table label for the owner-color legend |
| `_add_legend` | 596–658 | Assembles the full legend (owner swatches + edge style examples) |
| `build_graph` | 661–711 | Top-level assembler — calls all the above in order |

## Data model

CSV column → `Component` field:

| CSV column | Field | Notes |
|-----------|-------|-------|
| `id` | `id` | Unique identifier; case-insensitive for references |
| `Name` | `name` | Display name; falls back to `id` if blank |
| `Owner` | `owner` | Defaults to `"None"` if blank |
| `Component in ITRB` | `itrb` | Informational only |
| `Refactor status` | `refactor_status` | Drives active-set filtering |
| `Gets results from` | `depends_on` / `depends_on_planned` | Comma-separated IDs; `~` prefix = planned |
| `Calls` | `uses` / `uses_planned` | Comma-separated IDs; `~` prefix = planned |
| `Notes` | `notes` | Informational only |
| `Ubiquitous` | `ubiquitous` | TRUE/yes/1 → render as per-caller clones |
| `Hide` | `hide` | TRUE/yes/1 → suppress entirely (not even as ghost) |
| `Part of` | `part_of` | Groups node into a named cluster subgraph |
| `Hosted at` | `hosted_at` | Deployment location; `ITRB` is default (no label shown); others get a third label line, e.g. `Hosted at: RENCI 🌐` |
| `Externals` | `externals` | `<Source` = data in, `>Sink` = data out |

## Common change patterns

**Change owner node colors** → edit `owner-colors.csv` (no code change). Row order = legend order.

**Change ghost/external node colors** → constants `GHOST_FILL_COLOR`, `GHOST_BORDER_COLOR`, `GHOST_FONT_COLOR`, `EXTERNAL_FILL_COLOR` at lines 31–36.

**Change planned-edge color** → `PLANNED_EDGE_COLOR` constant at line 30.

**Change active refactor statuses** → `DEFAULT_STATUSES` list at line 17.

**Change node label format** → `_emit_component_node` (line 319). Active node labels are `display_name\nid` plus an optional third line for non-ITRB hosts. Emoji mapping lives in `HOSTED_AT_EMOJI` at line ~37.

**Change node shape or border style** → `_emit_component_node` (line 319) for active nodes; `_add_ghost_nodes` (line 375) for ghost nodes. The `is_new` bold border is set at line 339.

**Change edge styles** (solid/dashed/color) → `_add_edges` (line 443). Each of the four dependency lists (`depends_on`, `depends_on_planned`, `uses`, `uses_planned`) has its own `dot.edge(...)` call (lines 483–500).

**Change external node shapes** → `_add_external_nodes_and_edges` (line 509). Sources use `shape="cylinder"`, sinks use `shape="oval", peripheries="2"`.

**Change graph layout settings** (dpi, ranksep, splines) → `build_graph` `graph_attr` dict at line 673.

**Add a new CSV column** → three places:
1. `load_components` (line 175) — read from `row`
2. `Component` dataclass (line 72) — add the field
3. `write_json` (line 261) — add to the export dict

**Add a new CLI flag** → add `@click.option` before `main` (line 714) and add the parameter to the `main` signature.

**Change the legend** → `_add_legend` (line 596) for structure; `_owner_legend_html` (line 571) for the owner-color table HTML.

## Special features

**Ubiquitous components** (e.g. telemetry, logging): Set `Ubiquitous=TRUE` in the CSV. Instead of one central node, a per-caller clone is emitted inline next to each caller. No central node is created. Logic lives in `edge_target()` inside `_add_edges` (line 457). These components are excluded from `_add_active_nodes` and `_compute_ghost_ids`.

**Ghost nodes**: When an active component references one that is filtered out (wrong refactor status), the excluded component appears dimmed with `(excluded)` in its label. Computed by `_compute_ghost_ids` (line 299).

**Planned edges** (`~id` in `Gets results from` or `Calls`): Parsed as `depends_on_planned` / `uses_planned` by `parse_id_list` (line 111). Rendered in red in `_add_edges` (lines 488–500). Solid red for "Gets results from", dashed red for "Calls".

**`--concentrate` flag**: Merges partially-parallel edges. Off by default because it can visually blend solid and dashed edges between nearby nodes.

**Google Sheet download**: Checks `Content-Type: text/csv` to catch the case where a private/missing sheet returns an HTML login page instead of CSV (line 826).
200 changes: 200 additions & 0 deletions translator-components-diagram/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
# Translator Components Diagram

A Python CLI tool that reads a spreadsheet of Translator platform components,
validates their dependency declarations, and produces Graphviz diagrams showing
how data flows through the system and which services call each other.

## Purpose

The Translator platform comprises many components maintained by different teams.
This tool makes the overall architecture visible by turning a human-maintained
Google Sheet into a shareable diagram. The default view filters to components
that are active in the current refactor ("Continues into Refactor" and
"New in Refactor"), so the diagram stays focused on what is currently relevant.

## Quick start

Requires Python ≥ 3.11 and [uv](https://docs.astral.sh/uv/).
The [Graphviz](https://graphviz.org/) system package must also be installed
(`brew install graphviz` on macOS).

```bash
cd translator-components-diagram
uv sync # first-time setup; creates .venv/

# Download latest data from the Google Sheet and regenerate
uv run generate-diagram --google-sheet

# Use a locally cached CSV instead
uv run generate-diagram

# Include all components, not just the refactor-active ones
uv run generate-diagram --all

# Also produce a PDF (useful for presentations)
uv run generate-diagram --google-sheet --format pdf
```

## Input data

### Google Sheet

The canonical source of truth is a world-readable Google Sheet. Its ID is
stored in `.env` (gitignored; never committed):

```
# translator-components-diagram/.env
GOOGLE_SHEET_ID=<paste the sheet ID here>
```

Run with `--google-sheet` to download the latest CSV export into `data/` and
use it immediately. The downloaded file is also gitignored.

### CSV format

The sheet must have these columns (order does not matter):

| Column | Description |
|---|---|
| `id` | Unique machine-readable identifier (kebab-case preferred) |
| `Name` | Human-readable display name shown in the diagram |
| `Owner` | Team that owns the component; controls node colour |
| `Component in ITRB` | ITRB category (informational only) |
| `Refactor status` | Lifecycle status — see filtering below |
| `Gets results from` | Comma-separated IDs this component receives data from |
| `Calls` | Comma-separated IDs this component makes optional API calls to |
| `Ubiquitous` | `TRUE` to render this component as a per-caller clone (see below) |
| `Notes` | Free-text notes (not used by the tool) |

#### Planned (not-yet-implemented) relationships

Prefix any ID in `Gets results from` or `Calls` with `~` to mark it as
planned but not yet implemented:

```
Gets results from: nodenorm-es, ~new-service
Calls: ars, ~future-api
```

Planned edges render in gray; implemented edges render in black.

#### Ubiquitous components

Cross-cutting infrastructure that nearly every component depends on
(telemetry, name resolution, logging…) creates long converging edges in
the diagram that obscure the real data-flow structure. Marking such a
component `TRUE` in the `Ubiquitous` column renders it as a small copy
next to each caller instead of as a single central node — the underlying
data stays normalised, only the visual layout duplicates. Jaeger (OTel)
is the canonical example.

## Output files

All outputs go to `data/` (gitignored) by default.

| File | Always? | Description |
|---|---|---|
| `data/diagram.png` | yes | Main shareable diagram |
| `data/diagram.dot` | yes | Graphviz source — useful for debugging or tweaking |
| `data/components.json` | yes | All components parsed (all statuses, not filtered) |
| `data/diagram.pdf` | `--format pdf` | Vector format for presentations |
| `data/diagram.svg` | `--format svg` | Vector format for web embedding |

> The `.dot` and `.json` files are intended to eventually be committed to the
> repo so people can inspect the data without running the tool.

## Diagram conventions

### Node colours (by Owner)

Owner-to-colour mappings live in [`owner-colors.csv`](owner-colors.csv)
(two columns: `owner`, `color`). Edit that file to add a new owner,
re-order the legend, or change a colour — no Python edit required.

New owners not listed there receive fallback colours automatically.

### Node border weight

- **Bold border** — component is "New in Refactor"
- **Normal border** — component "Continues into Refactor"

### Edge types

| Style | Meaning |
|---|---|
| Solid black arrow B → A | B provides results to A ("Gets results from") |
| Indigo dashed arrow B → A | Same, but planned / not yet implemented |
| Dotted black arrow A → B | A makes an optional API call to B ("Calls") |
| Indigo dotted arrow A → B | Same, but planned / not yet implemented |

Planned-edge indigo is distinct from the gray used for ghost-node borders,
so the two encodings don't blur together visually.

### Special nodes

- **External data sources** (cylinder, gray) — entry point; represents all
upstream data stores that feed into `kgx-storage-pipeline`
- **User** (double-border oval, gray) — exit point; the human end-consumer
who receives results from the UI

### Ghost nodes

Components that are referenced by an active component but are themselves
outside the current filter (e.g. "Removed after Refactor") appear as gray
dashed boxes labelled `(excluded)`. This keeps cross-boundary edges visible
without cluttering the main diagram.

## All CLI options

```
uv run generate-diagram [OPTIONS]

--input PATH Local CSV file [default: data/components.csv]
--google-sheet Download CSV from Google Sheet (reads GOOGLE_SHEET_ID
from .env) instead of using --input
--sheet-gid INTEGER Google Sheet tab GID (0 = first tab) [default: 0]
--output-dir PATH Directory for output files [default: data]
--output-name TEXT Base filename for outputs [default: diagram]
--refactor-status TEXT Comma-separated Refactor status values to include
[default: "Continues into Refactor,New in Refactor"]
--all Include all components regardless of Refactor status
--format [pdf|svg] Additional output format beyond PNG (PNG is
always produced; can be repeated)
--direction [LR|TB] Graphviz layout direction [default: TB]
--help Show this message and exit.
```

## Repository layout

```
translator-components-diagram/
├── generate_diagram.py # The tool
├── owner-colors.csv # Owner → fill colour mapping (edit me)
├── tests/ # pytest suite for the pure functions
├── pyproject.toml # uv/hatchling project metadata and dependencies
├── uv.lock # Pinned dependency versions
├── .env # GOOGLE_SHEET_ID — gitignored, fill in locally
├── README.md # This file
└── data/ # Gitignored — all inputs and outputs go here
├── components.csv # Downloaded from Google Sheet
├── components.json # Parsed component data (all statuses)
├── diagram.dot # Graphviz source
└── diagram.png # Rendered diagram
```

## Possible future improvements

- **Commit `.dot` and `.json` to Git** — move these outputs outside `data/` so
they are version-controlled and reviewable without running the tool.
- **Interactive SVG or HTML output** — embed tooltips (owner, notes, status)
using Graphviz's `tooltip` attribute or a post-processing step with a library
like `d3-graphviz`.
- **Grouping / filtering by ITRB category** — the `Component in ITRB` column
is loaded but not currently used; it could drive an alternative colour scheme
or a `--group-by itrb` flag.
- **Cycle detection** — the validator checks for unknown IDs but does not yet
detect dependency cycles, which would be a useful integrity check.
- **Multiple sheet tabs** — `--sheet-gid` already supports non-default tabs;
a `--all-tabs` mode could merge or overlay multiple views.
- **Diff mode** — compare two runs of the tool (e.g. before and after a sprint)
and highlight added, removed, or changed components and edges.
2 changes: 2 additions & 0 deletions translator-components-diagram/env.default
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# The Google Sheet ID to download the component information from
GOOGLE_SHEET_ID=
Loading