layercake

Cut a photo into depth-ordered transparent PNG layers with Meta's Segment Anything 2. Every output PNG is the source image's exact dimensions, so layers stack cleanly under CSS object-fit: cover; object-position: center.

PSD is for Photoshop. layercake is for CSS.

Built for layered hero composites on the web — the kind where text weaves between a foreground object and the subject, or where parallax layers separate fore/mid/background. Click a few points per layer, optionally draw a bounding box, pick edge quality, save. Source dimensions guaranteed.

Why this and not something else

Tool	Click-prompted	N layers	Source-dim PNG stack	Local & free
Photoshop "Select Subject"	✕	✕	N/A	Paid
remove.bg / Photoroom / Clipdrop	✕	Fg/bg only	✕	Cloud
iOS "Lift Subject"	✕	Subject only	✕	Local
jhj0517/sam2-playground	✓	PSD	✕	Local
10b.ai RGBA Layers	✕	✓	✓	Cloud, paid
layercake	✓	✓	✓	✓

Install

Requires Python 3.10+. Recommended via uv:

git clone https://github.com/<you>/layercake.git
cd layercake
uv venv --python 3.13 .venv
source .venv/bin/activate
uv pip install -r requirements.txt

First run downloads SAM 2 weights (~900 MB for sam2-hiera-large). Cached under ~/.cache/huggingface/. Device auto-detects MPS (Apple silicon), CUDA, or CPU.

SAM 3 text prompts (optional)

The Gradio UI includes an opt-in "Segment by concept" section powered by SAM 3 (Nov 2025). Type a short noun phrase (rope, face, hand, leaves) and SAM 3 segments every instance of the concept, creating one layer per instance.

This is genuinely additive to SAM 2 — SAM 2 needs you to know where to click; SAM 3 takes the concept name and finds it for you. Especially useful for scenes where the same thing repeats (multiple rope strands, a crowd, a shelf of objects).

One-time setup (weights are gated):

Visit huggingface.co/facebook/sam3 and click Agree and access repository.
Create a read-scope token at huggingface.co/settings/tokens.
export HF_TOKEN=hf_... in your shell, or run huggingface-cli login.

Then relaunch layercake. The first concept segmentation downloads the model (~3 GB) and caches it under ~/.cache/huggingface/.

Reading the scores: SAM 3 multiplies each instance's confidence by an image-level "is this concept present at all" confidence, so final scores run lower than you'd expect — a clearly-visible concept can land at 0.1–0.3 when the presence head is conservative (thin structures like rope especially). If a concept you can plainly see returns nothing, lower the score threshold before rewording the phrase; the default of 0.4 suits prominent subjects, not fine ones.

Interactive UI (recommended)

python layers_app.py

Opens at http://127.0.0.1:7860. Workflow:

Upload a source image.
Name a layer (order = depth; first layer = nearest) and hit Add layer.
With the layer active, click the image. Modes:
- include — positive click: "this pixel belongs to the layer."
- exclude — negative click: "this pixel does NOT."
- move — 1st click picks up the nearest point, 2nd drops it.
- box — 2 clicks mark opposite corners of an axis-aligned bounding box.
- erase — click near a point (or inside the box) to remove it.
Repeat for each layer. Edit points directly in the dataframe (move by changing x/y, delete by removing rows).
Hit Save layers.

Output directory gets:

<name>.png per layer, all source-dim RGBA
bg.png — the inverse of the union of all your layers
points.json — the full spec, for replay via the CLI
snippet.html — ready-to-paste HTML + CSS that stacks the layers

Headless CLI

Same engine, no UI. Useful in build scripts.

python layers.py input.jpg \
  --layers '[
    {"name": "foreground",  "points": [[1200, 300], [850, 250]]},
    {"name": "subject",     "points": [[420, 400]], "box": [300, 200, 600, 900]},
    {"name": "midground",   "points": [[1100, 800]], "labels": [1]}
  ]' \
  --out out/ \
  --edges matting \
  --preview

Each layer entry supports:

name (required)
points — list of [x, y] in source-image pixel space
labels — per-point include/exclude: 1 or 0 (default all 1)
box — optional axis-aligned [x1, y1, x2, y2]
concept — SAM 3 text prompt ("rope", "face"); requires SAM 3 access (see above). A concept-only layer expands to one layer per found instance (capped at the 10 best after dropping near-duplicates — SAM 3 often returns several overlapping instances of the same object, so an instance mostly covered by better-scoring ones is skipped), named {name}-1, {name}-2, … best-score-first, and the SAM 3 masks are used as-is. A concept combined with points/box takes the best instance as a warm-start prior and refines it with your prompts through SAM 2 — scripted Level 2 chaining.
concept_threshold — min score for concept instances (default 0.4; scores are instance × presence confidence, see "Reading the scores" above)
concept_instance — pin the i-th best instance (1-based) instead of expanding; the layer keeps its own name. UI exports use this so a concept-created layer replays as the same instance.

python layers.py input.jpg \
  --layers '[
    {"name": "strand", "concept": "rope", "concept_threshold": 0.15},
    {"name": "subject", "concept": "person", "points": [[420, 60]], "labels": [0]}
  ]' \
  --out out/

The first entry becomes strand-1, strand-2, … per rope instance; the second finds the best "person", then the negative click carves region around (420, 60) out of it. Concept-only layers carry no SAM 2 soft logits, so --edges sam-soft falls back to their hard mask edges (feather and matting work normally).

--layers also accepts a path to a JSON file (e.g., points.json exported from the UI). Concept-created UI layers export with their concept/concept_instance provenance, so they replay through the CLI as the same instances.

Flags

Flag	Default	What
`--model`	`sam2-hiera-large`	HF id `facebook/sam2-hiera-{large,base-plus,small,tiny}`
`--device`	`auto`	`auto`/`cpu`/`cuda`/`mps`
`--edges`	`feather`	`feather` (Gaussian blur), `sam-soft` (SAM's sigmoid logits), or `matting` (pymatting closed-form)
`--feather`	`2`	Blur radius (px) for `--edges feather`
`--matting-band`	`8`	Unknown-region band width (px) for `--edges matting`. Auto-narrows for masks too thin to survive erosion at this width; a mask with no interior at all keeps its hard edges
`--preview`	off	Write a tinted `_preview.png`
`--no-bg`	off	Skip `bg.png`
`--no-depth`	off	Skip the Depth Anything V2 depth map
`--no-css`	off	Skip `snippet.html`
`--infill`	`none`	Fill the bg "hole" where layers sit: `none`, `opencv` (Navier-Stokes, instant), `lama` (LaMa model, ~200 MB, plausible on complex scenes)

Depth map (optional)

If you want depth-driven parallax speeds per layer, layercake can also emit a grayscale depth map via Depth Anything V2. Runs alongside the SAM 2 pass; adds a few seconds. On by default; skip with --no-depth.

Edge quality: feather vs. matting

feather (default) — post-processes the hard mask with a Gaussian blur. Fast, uniform, content-blind. Good for most cases; leaves halos on high-contrast edges and doesn't catch thin filaments.

matting — pymatting alpha matting. Builds a trimap (erode = known-fg, dilate-inverse = known-bg, thin band = unknown), then solves for soft alpha in the unknown band using actual image structure. Edge-aware; dramatically better on hair, fur, and fine filaments.

--matting-band controls the unknown-zone width: narrow (2-4) for tight boundaries, medium (8-12) for soft edges and hair, wide (16-24) for wispy structures or motion blur.

--matting-algo picks the solver: cf (default, closed-form — fastest and highest quality on typical problems), lbdm (approximation for very large unknown regions), knn (alternative kernel, sometimes useful on fur).

Background infill

By default bg.png has a transparent "hole" where your foreground layers sit — perfect for CSS stacking since the foreground layer sits on top and fills the hole visually.

If you want the bg as a standalone complete image (e.g., for parallax where the foreground slides and the exposed bg should look plausible), use --infill:

--infill opencv — OpenCV Navier-Stokes inpainting. Near-instant, zero extra deps beyond opencv-python-headless. Works well on simple backdrops (walls, gradients, out-of-focus surfaces).
--infill lama — LaMa model via simple-lama-inpainting. Plausible on complex scenes; first use downloads ~200 MB; inference is a few seconds (CPU). Large photos are inpainted at the model's native ~512px scale and only the fill is upscaled — full-res LaMa on big holes degrades to flat mean-color mush.

Output bg.png becomes fully opaque (alpha=1 everywhere) and contains the infilled backdrop.

CSS integration

snippet.html ships ready to paste:

<div class="hero">
  <img src="images/bg.png" class="layer l-bg" alt="">
  <h1 class="hero-wordmark">your wordmark</h1>
  <img src="images/subject.png" class="layer l-subject" alt="">
  <img src="images/foreground.png" class="layer l-foreground" alt="">
</div>

.hero { position: relative; isolation: isolate; overflow: hidden; }
.hero .layer { position: absolute; inset: 0; width: 100%; height: 100%;
               object-fit: cover; object-position: center;
               pointer-events: none; }
.hero .l-bg { z-index: 1; }
.hero .hero-wordmark { z-index: 2; position: relative; /* your styles */ }
.hero .l-subject { z-index: 3; }
.hero .l-foreground { z-index: 4; }

The wordmark sits between the background and the subject — text weaves through the composite naturally.

Workflow tips

Many small layers beats one big layer. SAM 2 hits diminishing returns after ~10-15 points per blob. Split contiguous regions into separate layers (hand-left, hand-right, …) with 1-3 points each rather than 30 points on one layer.

Boxes for the easy 80%, points for the fiddly 20%. A box around a whole object is often enough; add positive points for missed bits and negative points to pry neighbors apart.

Click the things you care about; let bg.png catch the rest. You don't need to enumerate every region.

Large model is worth it. The image encoder runs once per upload; all clicks after that are cheap. sam2-hiera-large materially improves boundaries on thin structures vs. smaller variants.

Credits

Built on top of:

Segment Anything 2 (Meta)
pymatting (closed-form alpha matting)
Depth Anything V2
Gradio for the UI

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

layercake

Why this and not something else

Install

SAM 3 text prompts (optional)

Interactive UI (recommended)

Headless CLI

Flags

Depth map (optional)

Edge quality: feather vs. matting

Background infill

CSS integration

Workflow tips

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
layers.py		layers.py
layers_app.py		layers_app.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

layercake

Why this and not something else

Install

SAM 3 text prompts (optional)

Interactive UI (recommended)

Headless CLI

Flags

Depth map (optional)

Edge quality: feather vs. matting

Background infill

CSS integration

Workflow tips

Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages