Cut a photo into depth-ordered transparent PNG layers with Meta's
Segment Anything 2. Every output
PNG is the source image's exact dimensions, so layers stack cleanly under
CSS object-fit: cover; object-position: center.
PSD is for Photoshop. layercake is for CSS.
Built for layered hero composites on the web — the kind where text weaves between a foreground object and the subject, or where parallax layers separate fore/mid/background. Click a few points per layer, optionally draw a bounding box, pick edge quality, save. Source dimensions guaranteed.
| Tool | Click-prompted | N layers | Source-dim PNG stack | Local & free |
|---|---|---|---|---|
| Photoshop "Select Subject" | ✕ | ✕ | N/A | Paid |
| remove.bg / Photoroom / Clipdrop | ✕ | Fg/bg only | ✕ | Cloud |
| iOS "Lift Subject" | ✕ | Subject only | ✕ | Local |
| jhj0517/sam2-playground | ✓ | PSD | ✕ | Local |
| 10b.ai RGBA Layers | ✕ | ✓ | ✓ | Cloud, paid |
| layercake | ✓ | ✓ | ✓ | ✓ |
Requires Python 3.10+. Recommended via uv:
git clone https://github.com/<you>/layercake.git
cd layercake
uv venv --python 3.13 .venv
source .venv/bin/activate
uv pip install -r requirements.txtFirst run downloads SAM 2 weights (~900 MB for sam2-hiera-large). Cached
under ~/.cache/huggingface/. Device auto-detects MPS (Apple silicon),
CUDA, or CPU.
The Gradio UI includes an opt-in "Segment by concept" section powered by
SAM 3 (Nov 2025). Type a short
noun phrase (rope, face, hand, leaves) and SAM 3 segments every
instance of the concept, creating one layer per instance.
This is genuinely additive to SAM 2 — SAM 2 needs you to know where to click; SAM 3 takes the concept name and finds it for you. Especially useful for scenes where the same thing repeats (multiple rope strands, a crowd, a shelf of objects).
One-time setup (weights are gated):
- Visit huggingface.co/facebook/sam3 and click Agree and access repository.
- Create a read-scope token at huggingface.co/settings/tokens.
export HF_TOKEN=hf_...in your shell, or runhuggingface-cli login.
Then relaunch layercake. The first concept segmentation downloads the model
(~3 GB) and caches it under ~/.cache/huggingface/.
Reading the scores: SAM 3 multiplies each instance's confidence by an image-level "is this concept present at all" confidence, so final scores run lower than you'd expect — a clearly-visible concept can land at 0.1–0.3 when the presence head is conservative (thin structures like rope especially). If a concept you can plainly see returns nothing, lower the score threshold before rewording the phrase; the default of 0.4 suits prominent subjects, not fine ones.
python layers_app.pyOpens at http://127.0.0.1:7860. Workflow:
- Upload a source image.
- Name a layer (order = depth; first layer = nearest) and hit Add layer.
- With the layer active, click the image. Modes:
- include — positive click: "this pixel belongs to the layer."
- exclude — negative click: "this pixel does NOT."
- move — 1st click picks up the nearest point, 2nd drops it.
- box — 2 clicks mark opposite corners of an axis-aligned bounding box.
- erase — click near a point (or inside the box) to remove it.
- Repeat for each layer. Edit points directly in the dataframe (move by changing x/y, delete by removing rows).
- Hit Save layers.
Output directory gets:
<name>.pngper layer, all source-dim RGBAbg.png— the inverse of the union of all your layerspoints.json— the full spec, for replay via the CLIsnippet.html— ready-to-paste HTML + CSS that stacks the layers
Same engine, no UI. Useful in build scripts.
python layers.py input.jpg \
--layers '[
{"name": "foreground", "points": [[1200, 300], [850, 250]]},
{"name": "subject", "points": [[420, 400]], "box": [300, 200, 600, 900]},
{"name": "midground", "points": [[1100, 800]], "labels": [1]}
]' \
--out out/ \
--edges matting \
--previewEach layer entry supports:
name(required)points— list of[x, y]in source-image pixel spacelabels— per-point include/exclude:1or0(default all 1)box— optional axis-aligned[x1, y1, x2, y2]concept— SAM 3 text prompt ("rope","face"); requires SAM 3 access (see above). A concept-only layer expands to one layer per found instance (capped at the 10 best after dropping near-duplicates — SAM 3 often returns several overlapping instances of the same object, so an instance mostly covered by better-scoring ones is skipped), named{name}-1,{name}-2, … best-score-first, and the SAM 3 masks are used as-is. A concept combined with points/box takes the best instance as a warm-start prior and refines it with your prompts through SAM 2 — scripted Level 2 chaining.concept_threshold— min score for concept instances (default0.4; scores are instance × presence confidence, see "Reading the scores" above)concept_instance— pin the i-th best instance (1-based) instead of expanding; the layer keeps its own name. UI exports use this so a concept-created layer replays as the same instance.
python layers.py input.jpg \
--layers '[
{"name": "strand", "concept": "rope", "concept_threshold": 0.15},
{"name": "subject", "concept": "person", "points": [[420, 60]], "labels": [0]}
]' \
--out out/The first entry becomes strand-1, strand-2, … per rope instance; the
second finds the best "person", then the negative click carves region
around (420, 60) out of it. Concept-only layers carry no SAM 2 soft logits,
so --edges sam-soft falls back to their hard mask edges (feather and
matting work normally).
--layers also accepts a path to a JSON file (e.g., points.json exported
from the UI). Concept-created UI layers export with their
concept/concept_instance provenance, so they replay through the CLI as
the same instances.
| Flag | Default | What |
|---|---|---|
--model |
sam2-hiera-large |
HF id facebook/sam2-hiera-{large,base-plus,small,tiny} |
--device |
auto |
auto/cpu/cuda/mps |
--edges |
feather |
feather (Gaussian blur), sam-soft (SAM's sigmoid logits), or matting (pymatting closed-form) |
--feather |
2 |
Blur radius (px) for --edges feather |
--matting-band |
8 |
Unknown-region band width (px) for --edges matting. Auto-narrows for masks too thin to survive erosion at this width; a mask with no interior at all keeps its hard edges |
--preview |
off | Write a tinted _preview.png |
--no-bg |
off | Skip bg.png |
--no-depth |
off | Skip the Depth Anything V2 depth map |
--no-css |
off | Skip snippet.html |
--infill |
none |
Fill the bg "hole" where layers sit: none, opencv (Navier-Stokes, instant), lama (LaMa model, ~200 MB, plausible on complex scenes) |
If you want depth-driven parallax speeds per layer, layercake can also emit
a grayscale depth map via Depth Anything
V2. Runs alongside the
SAM 2 pass; adds a few seconds. On by default; skip with --no-depth.
feather (default) — post-processes the hard mask with a Gaussian blur. Fast, uniform, content-blind. Good for most cases; leaves halos on high-contrast edges and doesn't catch thin filaments.
matting — pymatting alpha matting. Builds a trimap (erode = known-fg, dilate-inverse = known-bg, thin band = unknown), then solves for soft alpha in the unknown band using actual image structure. Edge-aware; dramatically better on hair, fur, and fine filaments.
--matting-band controls the unknown-zone width: narrow (2-4) for tight
boundaries, medium (8-12) for soft edges and hair, wide (16-24) for wispy
structures or motion blur.
--matting-algo picks the solver: cf (default, closed-form — fastest
and highest quality on typical problems), lbdm (approximation for very
large unknown regions), knn (alternative kernel, sometimes useful on
fur).
By default bg.png has a transparent "hole" where your foreground layers
sit — perfect for CSS stacking since the foreground layer sits on top and
fills the hole visually.
If you want the bg as a standalone complete image (e.g., for parallax where
the foreground slides and the exposed bg should look plausible), use
--infill:
--infill opencv— OpenCV Navier-Stokes inpainting. Near-instant, zero extra deps beyondopencv-python-headless. Works well on simple backdrops (walls, gradients, out-of-focus surfaces).--infill lama— LaMa model viasimple-lama-inpainting. Plausible on complex scenes; first use downloads ~200 MB; inference is a few seconds (CPU). Large photos are inpainted at the model's native ~512px scale and only the fill is upscaled — full-res LaMa on big holes degrades to flat mean-color mush.
Output bg.png becomes fully opaque (alpha=1 everywhere) and contains the
infilled backdrop.
snippet.html ships ready to paste:
<div class="hero">
<img src="images/bg.png" class="layer l-bg" alt="">
<h1 class="hero-wordmark">your wordmark</h1>
<img src="images/subject.png" class="layer l-subject" alt="">
<img src="images/foreground.png" class="layer l-foreground" alt="">
</div>.hero { position: relative; isolation: isolate; overflow: hidden; }
.hero .layer { position: absolute; inset: 0; width: 100%; height: 100%;
object-fit: cover; object-position: center;
pointer-events: none; }
.hero .l-bg { z-index: 1; }
.hero .hero-wordmark { z-index: 2; position: relative; /* your styles */ }
.hero .l-subject { z-index: 3; }
.hero .l-foreground { z-index: 4; }The wordmark sits between the background and the subject — text weaves through the composite naturally.
Many small layers beats one big layer. SAM 2 hits diminishing returns
after ~10-15 points per blob. Split contiguous regions into separate layers
(hand-left, hand-right, …) with 1-3 points each rather than 30 points
on one layer.
Boxes for the easy 80%, points for the fiddly 20%. A box around a whole object is often enough; add positive points for missed bits and negative points to pry neighbors apart.
Click the things you care about; let bg.png catch the rest. You don't
need to enumerate every region.
Large model is worth it. The image encoder runs once per upload; all
clicks after that are cheap. sam2-hiera-large materially improves
boundaries on thin structures vs. smaller variants.
Built on top of:
- Segment Anything 2 (Meta)
- pymatting (closed-form alpha matting)
- Depth Anything V2
- Gradio for the UI
MIT — see LICENSE.