Text Transformation Prompt Stack

A modular system for constructing LLM prompts that transform raw audio into polished text. Designed for audio multimodal models (Gemini series) that perform single-pass dictation processing.

— Complete documentation of the constructed foundational prompt stack

The Problem

Traditional speech-to-text produces verbatim transcripts. You get:

"Um, so like, I was thinking we should—no wait, scratch that—we should probably, um, meet on Tuesday"

When what you meant was:

"We should meet on Tuesday."

This stack leverages audio multimodal LLMs to produce text that reflects what you meant to say, not merely what you said.

How It Works

The system concatenates instruction layers into comprehensive system prompts. Two stacks:

Foundational Stack (Always Applied)

Universal cleanup that's desirable for virtually all transcription:

Context: Establishes the transcription task
Exclusions: Background audio, filler words, repetitions
Corrections: Grammar, punctuation, spelling, paragraphs
Inference: Smart format detection
Personalization: User details for templates

Stylistic Stack (Context-Specific)

Customizes output format and tone:

Format: Email, documentation, to-do list, freeform
Tone: Formal, business-appropriate, casual, informal
Emotional: Heightened, neutral, reserved
Style: Concise, verbose, technical, conversational
Readability: Simple, intermediate, advanced

Quick Start

# List available stacks
python scripts/concatenate.py --list

# Generate a prompt
python scripts/concatenate.py business-email.yaml

# Save to file
python scripts/concatenate.py business-email.yaml -o prompt.txt

# Generate the foundational prompt
./scripts/generate-foundational.sh

Pre-Built Stacks

Stack	Use Case
`business-email.yaml`	Professional emails
`formal-email.yaml`	Official correspondence
`casual-note.yaml`	Personal messages
`technical-documentation.yaml`	Technical docs
`quick-todo.yaml`	To-do lists from voice notes

Creating Custom Stacks

Create a YAML file in stacks/:

name: My Custom Stack
description: What this stack does
layers:
  # Foundational (always include all)
  - layers/foundational/01-context/task-definition.md
  - layers/foundational/02-exclusions/background-audio.md
  - layers/foundational/02-exclusions/filler-words.md
  # ... rest of foundational

  # Stylistic (select as needed)
  - layers/stylistic/format-adherence/email.md
  - layers/stylistic/tone/business-appropriate.md
  - layers/stylistic/writing-style/concise.md

Then: python scripts/concatenate.py my-stack.yaml

Workflow

Record voice note or provide audio file
Select appropriate stack for your output format
Generate concatenated prompt: python scripts/concatenate.py stack.yaml
Submit to audio multimodal LLM with your audio
Receive formatted output

Programmatic Usage

import sys
sys.path.insert(0, 'scripts')
from concatenate import PromptStackConcatenator

concatenator = PromptStackConcatenator()
prompt = concatenator.concatenate_from_file("stacks/business-email.yaml")

# Use with your LLM
response = your_llm.complete(system=prompt, audio=audio_file)

Key Concept: Inferred Instructions

The model reasons about content that should be excluded without explicit markup:

Self-corrections: Keeps only the corrected version
Spelling instructions: "Zod, spelled Z-O-D" becomes just "Zod"
Meta-instructions: "scratch that" removes preceding content
Background noise: Side conversations excluded automatically

Author

Daniel Rosehill

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
archive		archive
examples		examples
exports		exports
generated/foundational		generated/foundational
images		images
layers		layers
scripts		scripts
stacks		stacks
test-audio		test-audio
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
layers.json		layers.json
ref.md		ref.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Transformation Prompt Stack

The Problem

How It Works

Foundational Stack (Always Applied)

Stylistic Stack (Context-Specific)

Quick Start

Pre-Built Stacks

Creating Custom Stacks

Workflow

Programmatic Usage

Key Concept: Inferred Instructions

Author

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text Transformation Prompt Stack

The Problem

How It Works

Foundational Stack (Always Applied)

Stylistic Stack (Context-Specific)

Quick Start

Pre-Built Stacks

Creating Custom Stacks

Workflow

Programmatic Usage

Key Concept: Inferred Instructions

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages