Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 110 additions & 0 deletions .claude/skills/nimbus-interface/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
name: nimbus-interface
description: Reference for the NimbusImage/Girder API used by all workers in this repository. Use when building, debugging, or testing NimbusImage workers — including image loading, annotation CRUD, property computation, multi-channel merging, coordinate conversions, local test environments, and infrastructure troubleshooting (e.g. HTTP 500 errors). Also use when writing test scripts that interact with the Nimbus API.
---

# NimbusImage Worker Development

## Quick Start

Determine the task type:
- **Building/modifying a worker** → See [references/api.md](references/api.md) for full API patterns
- **Debugging HTTP 500 errors** → Check prerequisites below
- **Writing local test scripts** → See local testing section below
- **Coordinate confusion** → See critical pitfalls below

## Infrastructure Prerequisites

The Girder server requires **MongoDB**. Without it, all endpoints return HTTP 500 (except `/system/version`). Debug with:
```bash
docker ps | grep mongo # Must be running
curl -s http://localhost:8080/api/v1/system/version # Works without MongoDB
```

Full stack: `girder`, `worker` (celery), `rabbitmq`, `memcached`, `mongodb`.
Compose file: `/home/arjun/UPennContrast/docker-compose.yaml`.

## Critical Pitfalls

### Coordinate swap (numpy vs annotations)
Numpy is `[row, col]` = `[y, x]`. Annotations use `{'x': pixel_x, 'y': pixel_y}`.
```python
# skimage contour (row, col) → annotation:
coords = [{'x': float(col), 'y': float(row)} for row, col in contour]

# Use annotation_tools helpers to avoid manual swaps:
from annotation_utilities.annotation_tools import polygons_to_annotations, annotations_to_polygons
```

### The 0.5 pixel offset
scikit-image uses pixel centers; Girder uses top-left corner:
```python
polygon = np.array([[c['y'] - 0.5, c['x'] - 0.5] for c in annotation['coordinates']])
rr, cc = draw.polygon(polygon[:, 0], polygon[:, 1], shape=image.shape)
```

### Tags interface returns a list, not a dict
```python
# CORRECT:
tags = params['workerInterface'].get('Training Tag', [])
# WRONG (crashes with AttributeError):
tags = params['workerInterface'].get('Training Tag', {}).get('tags', [])
```

### Multi-channel merge output dtype
`process_and_merge_channels` returns `float64` with values 0-255 (not 0-1). Convert for ML:
```python
rgb_uint8 = np.clip(merged, 0, 255).astype(np.uint8)
```

Typical shapes:
- `getRegion().squeeze()`: `(H, W)` uint16
- `get_images_for_all_channels`: each `(H, W, 1)` uint16
- `process_and_merge_channels`: `(H, W, 3)` float64, values 0-255

## Local Testing

### Avoid importing entrypoint.py
Worker entrypoints import heavy ML libraries (torch, sam2) at module level. Copy helper functions locally instead of importing the entrypoint.

### Local venv dependencies
```bash
pip install girder-client tifffile
pip install -e /home/arjun/UPennContrast/devops/girder/annotation_client
pip install -e /home/arjun/ImageAnalysisProject/annotation_utilities
pip install -e /home/arjun/ImageAnalysisProject/worker_client
pip install numpy scipy scikit-image shapely matplotlib pillow numba
# ML deps (torch, sam2, etc.) only needed for inference, not API testing
```

### Authentication for test scripts
```python
import girder_client
gc = girder_client.GirderClient(apiUrl='http://localhost:8080/api/v1')
gc.authenticate('username', 'password')
token = gc.token # Use this token with annotation_client classes
```
Env vars: `NIMBUS_API_URL` (default `http://localhost:8080/api/v1`), `NIMBUS_TOKEN`.

### Test dataset
Dataset `69988c84b48d8121b565aba4`: 2 channels (Brightfield, YFP), 7Z, 4T, 6XY, 1024x1022 uint16. 544 polygons tagged "YFP blob" at XY=0 Z=3 Time=0.

## Key Packages

| Package | Location |
|---------|----------|
| annotation_client | `/home/arjun/UPennContrast/devops/girder/annotation_client/` |
| annotation_utilities | `/home/arjun/ImageAnalysisProject/annotation_utilities/` |
| worker_client | `/home/arjun/ImageAnalysisProject/worker_client/` |
| Workers | `/home/arjun/ImageAnalysisProject/workers/` |

Key source files: `annotation_client/{annotations,tiles,workers}.py`, `annotation_utilities/{annotation_tools,batch_argument_parser}.py`

## Detailed API Reference

See [references/api.md](references/api.md) for complete API patterns including:
- Image access (single frame, subregion, multi-channel merge)
- Annotation CRUD (fetch, filter, create, delete)
- Property value computation and submission
- Writing images back to Girder
- Worker interface type table
204 changes: 204 additions & 0 deletions .claude/skills/nimbus-interface/references/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# NimbusImage API Reference

## Table of Contents
- [Image Access](#image-access)
- [Annotations](#annotations)
- [Property Values](#property-values)
- [Writing Images to Girder](#writing-images-to-girder)
- [Worker Interface Types](#worker-interface-types)

---

## Image Access

### Setup
```python
import annotation_client.tiles as tiles
tileClient = tiles.UPennContrastDataset(apiUrl=apiUrl, token=token, datasetId=datasetId)
```

### Metadata
```python
idx = tileClient.tiles['IndexRange']
num_channels = idx.get('IndexC', 1)
num_z = idx.get('IndexZ', 1)
num_time = idx.get('IndexT', 1)
num_xy = idx.get('IndexXY', 1)
size_x = tileClient.tiles['sizeX']
size_y = tileClient.tiles['sizeY']
channel_names = tileClient.tiles.get('channels', [])
pixel_scale = tileClient.tiles.get('mm_x') # mm per pixel
```

### Single frame
```python
frame = tileClient.coordinatesToFrameIndex(XY, Z=z, T=time, channel=channel)
image = tileClient.getRegion(datasetId, frame=frame).squeeze()
# Returns (H, W) uint16
```

### Subregion
```python
image = tileClient.getRegion(datasetId, frame=frame,
left=x_min, top=y_min, right=x_max, bottom=y_max,
units="base_pixels").squeeze()
```

### Multi-channel merged RGB
```python
import annotation_utilities.annotation_tools as annotation_tools

images = annotation_tools.get_images_for_all_channels(tileClient, datasetId, XY, Z, Time)
# Each: (H, W, 1) uint16
layers = annotation_tools.get_layers(tileClient.client, datasetId)
merged = annotation_tools.process_and_merge_channels(images, layers)
# Returns: (H, W, 3) float64, values 0-255
```
Merge modes: `'lighten'` (max, default), `'add'` (sum), `'screen'`.

---

## Annotations

### Client setup
```python
import annotation_client.annotations as annotations_client
annotationClient = annotations_client.UPennContrastAnnotationClient(apiUrl=apiUrl, token=token)
```

### Data structure
```python
{
'shape': 'polygon', # or 'point', 'line'
'coordinates': [{'x': float, 'y': float}, ...],
'location': {'XY': int, 'Z': int, 'Time': int},
'channel': int,
'datasetId': str,
'tags': ['tag1', 'tag2'],
}
```

### Fetch
```python
polygons = annotationClient.getAnnotationsByDatasetId(datasetId, shape='polygon')

# Filter by tags server-side (must JSON-serialize)
import json
polygons = annotationClient.getAnnotationsByDatasetId(
datasetId, shape='polygon', tags=json.dumps(['my_tag']))

ann = annotationClient.getAnnotationById(annotationId)
```

### Client-side filtering
```python
import annotation_utilities.annotation_tools as annotation_tools

filtered = annotation_tools.get_annotations_with_tags(annotations, tags, exclusive=False)
# exclusive=False: any matching tag; exclusive=True: exact tag set match

filtered = annotation_tools.filter_elements_T_XY_Z(annotations, time, xy, z)
```

### Create
```python
annotationClient.createAnnotation(annotation_dict)
annotationClient.createMultipleAnnotations(annotation_list) # preferred

# Using helpers (handles coordinate swap):
from annotation_utilities.annotation_tools import polygons_to_annotations
annotations = polygons_to_annotations(
shapely_polygons, datasetId, XY=0, Time=0, Z=0, tags=['my_tag'], channel=0)
```

### Delete
```python
annotationClient.deleteAnnotation(annotationId)
annotationClient.deleteMultipleAnnotations([id1, id2, ...])
```

---

## Property Values

### Setup
```python
import annotation_client.workers as workers
workerClient = workers.UPennContrastWorkerClient(datasetId, apiUrl, token, params)
```

### Get annotations for computation
```python
annotationList = workerClient.get_annotation_list_by_shape('polygon', limit=0)
annotationList = annotation_tools.get_annotations_with_tags(
annotationList,
params.get('tags', {}).get('tags', []),
params.get('tags', {}).get('exclusive', False))
```

### Submit values
```python
property_values = {}
for ann in annotationList:
property_values[ann['_id']] = {
'Area': float(area),
'MeanIntensity': float(mean),
}
workerClient.add_multiple_annotation_property_values({datasetId: property_values})
```

### Nested properties (per-Z, per-channel)
```python
property_values[ann['_id']] = {
'MeanIntensity': {'z001': 42.0, 'z002': 84.0},
}
```

### Pixel scale
```python
pixel_size = params['scales']['pixelSize'] # {'unit': 'mm', 'value': 0.000219}
z_step = params['scales']['zStep']
t_step = params['scales']['tStep']
```

---

## Writing Images to Girder

```python
import large_image as li

sink = li.new()
for i, frame in enumerate(tileClient.tiles['frames']):
large_image_params = {f'{k.lower()[5:]}': v for k, v in frame.items()
if k.startswith('Index') and len(k) > 5}
image = tileClient.getRegion(datasetId, frame=i).squeeze()
processed = your_function(image)
sink.addTile(processed, 0, 0, **large_image_params)

if 'channels' in tileClient.tiles:
sink.channelNames = tileClient.tiles['channels']
sink.mm_x = tileClient.tiles['mm_x']
sink.mm_y = tileClient.tiles['mm_y']
sink.magnification = tileClient.tiles['magnification']

sink.write('/tmp/output.tiff')
gc = tileClient.client
item = gc.uploadFileToFolder(datasetId, '/tmp/output.tiff')
gc.addMetadataToItem(item['itemId'], {'tool': 'YourWorker'})
```

---

## Worker Interface Types

| Type | Returns | Example |
|------|---------|---------|
| `number` | `int`/`float` | `32`, `0.5` |
| `text` | `str` | `"1-3, 5-8"` |
| `select` | `str` | `"model_name.pt"` |
| `checkbox` | `bool` | `True` |
| `channel` | `int` | `0` |
| `channelCheckboxes` | `dict[str, bool]` | `{"0": True, "1": False}` |
| `tags` | `list[str]` | `["DAPI blob"]` |
| `layer` | `str` | `"layer_id"` |
47 changes: 47 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,53 @@ def compute(datasetId, apiUrl, token, params):

Interface types: `number`, `text`, `select`, `checkbox`, `channel`, `channelCheckboxes`, `tags`, `layer`, `notes`

### Interface Parameter Data Types (What `params['workerInterface']` Returns)

Each interface type returns a specific data type in `params['workerInterface']['FieldName']`:

| Interface Type | Returns | Example Value |
|----------------|---------|---------------|
| `number` | `int` or `float` | `32`, `0.5` |
| `text` | `str` | `"1-3, 5-8"`, `""` |
| `select` | `str` | `"sam2.1_hiera_small.pt"` |
| `checkbox` | `bool` | `True`, `False` |
| `channel` | `int` | `0` |
| `channelCheckboxes` | `dict` of `str` → `bool` | `{"0": True, "1": False, "2": True}` |
| `tags` | **`list` of `str`** | `["DAPI blob"]`, `["cell", "nucleus"]` |
| `layer` | `str` | `"layer_id"` |

**Common pitfall with `tags`**: The `tags` type returns a **plain list of strings**, NOT a dict. Do not call `.get('tags')` on the result.

```python
# CORRECT - tags returns a list directly:
training_tags = params['workerInterface'].get('Training Tag', [])
# training_tags = ["DAPI blob"]

# WRONG - will crash with AttributeError: 'list' object has no attribute 'get':
training_tags = params['workerInterface'].get('Training Tag', {}).get('tags', [])
```

**Note**: `params['tags']` (the top-level output tags for the worker, NOT a workerInterface field) is also a plain list of strings (e.g., `["DAPI blob"]`). Meanwhile, `params['tags']` used in property workers via `workerClient.get_annotation_list_by_shape()` uses `{'tags': [...], 'exclusive': bool}` — these are two different things.

**Validating tags** (recommended pattern from cellpose_train, piscis):
```python
tags = workerInterface.get('My Tag Field', [])
if not tags or len(tags) == 0:
sendError("No tag selected", "Please select at least one tag.")
return
```

**Using tags to filter annotations**:
```python
# Pass the list directly to annotation_tools
filtered = annotation_tools.get_annotations_with_tags(
annotation_list, tags, exclusive=False)

# Or with Girder API (must JSON-serialize)
annotations = annotationClient.getAnnotationsByDatasetId(
datasetId, shape='polygon', tags=json.dumps(tags))
```

### Key APIs

**annotation_client** (installed from NimbusImage repo):
Expand Down
5 changes: 5 additions & 0 deletions build_machine_learning_workers.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,11 @@ docker build . -f ./workers/annotations/sam2_automatic_mask_generator/Dockerfile
# Command for M1:
# docker build . -f ./workers/annotations/sam2_automatic_mask_generator/Dockerfile_M1 -t annotations/sam2_automatic_mask_generator:latest $NO_CACHE

echo "Building SAM2 few-shot segmentation worker"
docker build . -f ./workers/annotations/sam2_fewshot_segmentation/Dockerfile -t annotations/sam2_fewshot_segmentation:latest $NO_CACHE
# Command for M1:
# docker build . -f ./workers/annotations/sam2_fewshot_segmentation/Dockerfile_M1 -t annotations/sam2_fewshot_segmentation:latest $NO_CACHE

echo "Building SAM2 propagate worker"
docker build . -f ./workers/annotations/sam2_propagate/$DOCKERFILE -t annotations/sam2_propagate_worker:latest $NO_CACHE

Expand Down
Loading