A meta-AI system — you give it a project idea (e.g., "a quizzing platform"), and it outputs a full prompt architecture: system prompts, agent roles, module prompts, inter-agent communication logic, and context management strategies.
The output isn't an app — it's the prompt infrastructure that would power an app.
The engine is a multi-agent pipeline with 5 distinct layers:
Takes the raw idea and decomposes it into functional modules.
Example — Quizzing Platform identifies:
- User Understanding Module
- Question Generation Module
- Adaptive Difficulty Module
- Feedback Module
- Progress Tracking Module
It also infers the AI role in each — which modules need LLM reasoning vs. deterministic logic.
For each module identified, it generates a specialized system prompt.
This agent knows prompt engineering patterns:
| Pattern | When Used |
|---|---|
| Chain-of-Thought | Reasoning-heavy modules (feedback, explanation) |
| Few-Shot | Output format consistency (question gen, scoring) |
| Role Prompting | Persona-driven modules (tutor, code reviewer) |
| Output Constraints | Structured data output (JSON, markdown) |
| Socratic Technique | Learning and quizzing flows |
It picks the right technique per module based on what that module needs to do.
This is where the multi-agentic loop logic gets designed. It defines:
- How agents hand off context to each other
- What shared memory looks like
- What triggers agent switches
- How to avoid context pollution between agents
Example: The User Understanding Module feeds a structured user profile into the Question Generation Module's prompt context — dynamically, at runtime.
A red-teaming agent that reviews generated prompts for:
- Ambiguity and vagueness
- Hallucination risks
- Missing constraints or guardrails
- Edge case coverage
- Token efficiency
It iterates on the output of Layers 2 and 3 in a feedback loop until the prompts pass quality thresholds.
Formats everything into a usable artifact:
- JSON prompt config (for programmatic use)
- Markdown spec document (for human review)
- Starter code scaffolding with prompts embedded
User Input (project idea)
│
▼
┌─────────────────────┐
│ Analyzer Agent │ → Module Map
└─────────────────────┘
│
▼
┌─────────────────────┐
│ Architect Agent │ ×N modules → Draft Prompts
└─────────────────────┘
│
▼
┌─────────────────────┐
│ Critic Agent │ → Feedback + Quality Scores
└─────────────────────┘
│
▼
┌─────────────────────┐
│ Refiner Agent │ → Revised Prompts
└─────────────────────┘
│
▼
┌─────────────────────┐
│ Packager Agent │ → Final Output Artifact
└─────────────────────┘
Each agent has its own carefully engineered system prompt — which is meta: your engine is itself an example of what it produces.
There are no hardcoded templates. The engine infers everything from the project idea itself.
When you give the engine a project idea, the Analyzer Agent reads the idea and:
- Classifies the domain — not from a fixed list, but by reasoning about what kind of product this is, who uses it, and what AI needs to do inside it
- Identifies the AI touchpoints — which parts of the product actually need LLM reasoning vs. deterministic logic
- Infers the interaction model — is this conversational? batch? real-time? adaptive?
- Derives the module list — entirely from the idea, not from preset categories
The output of this step feeds directly into the Architect Agent — no templates are consulted, no archetypes are matched.
Hardcoded templates create a ceiling. If your idea doesn't fit a pre-existing category, the engine either forces it into the wrong mold or fails. Instead:
- A quizzing platform gets modules derived from what quizzing actually requires — the engine figures that out
- A legal document reviewer gets a completely different module set — derived fresh
- A multiplayer game NPC system gets something else entirely
The engine's Analyzer Agent is itself powered by a prompt that knows how to think about projects — not a lookup table of project types.
Given a project idea described in natural language, reason through:
1. What is the core user goal this product serves?
2. Where does AI add value that deterministic logic cannot?
3. What data flows in and out of each AI touchpoint?
4. What are the failure modes if an AI module performs poorly?
5. What prompt engineering techniques are best suited to each touchpoint?
Output a structured module map with justifications for every decision.
Do not match against known archetypes. Derive from first principles.
This means every project idea — no matter how novel — gets a purpose-built prompt architecture, not a recycled one.
These are the things that separate this from just "a GPT wrapper that writes prompts":
Prompts aren't static. The engine generates prompts with clearly defined {{variable}} slots and documents:
- What runtime data fills each slot
- When that data is injected
- What happens if the data is missing (fallback logic)
It doesn't just write individual prompts — it designs the sequence and conditions for chaining them.
If the user scores below 60%, which prompt fires? What context does it receive? What does it output, and where does that output go?
Each generated prompt includes:
- Estimated token footprint (system + context + output)
- Flags if a module's prompt design is likely too expensive for real-time use
- Suggestions for compression or caching
For every prompt generated, the engine also outputs how you'd test it:
- What good output looks like (positive examples)
- What failure looks like (negative examples)
- Suggested automated evals or human review criteria
{
"project": "Quizzing Platform",
"modules": [
{
"name": "User Understanding Module",
"agent_role": "Learner Profiler",
"technique": "Chain-of-Thought + Structured Output",
"system_prompt": "...",
"context_slots": ["{{user_history}}", "{{recent_score}}", "{{topic}}"],
"token_estimate": 420,
"triggers": ["session_start", "post_quiz_completion"],
"outputs_to": ["Question Generation Module"],
"eval_criteria": {
"good_output": "...",
"bad_output": "...",
"test_cases": [...]
}
}
],
"inter_agent_map": {
"shared_memory_schema": "...",
"handoff_conditions": [...]
}
}| Layer | Recommended Tool |
|---|---|
| Agent Orchestration | LangGraph or custom async loop |
| LLM Backend | Claude API (Sonnet for speed, Opus for depth) |
| Prompt Storage | JSON / YAML config files |
| UI | Chat interface → structured output view |
| Evaluation | Custom evals or PromptFoo |
| Output Format | Markdown + JSON artifact download |
Why LangGraph over LangChain? You need true conditional agent loops with state — not just sequential chains. The Critic → Refiner loop especially requires graph-style flow control.
| Phase | What to Build |
|---|---|
| Phase 1 | Analyzer + Architect agents. Input: idea. Output: draft prompt set. |
| Phase 2 | Critic + Refiner loop. Iterative quality improvement. |
| Phase 3 | Use case template library. Pattern matching for archetypes. |
| Phase 4 | Output packager. Downloadable artifacts, code scaffolding. |
| Phase 5 | UI layer. Chat-based input, structured visual output. |
The most powerful demonstration of this system is that the system itself is built using the very prompt architecture it produces.
Every agent inside your engine — the Analyzer, the Architect, the Critic — is powered by a carefully engineered prompt. Your engine is its own best example.
Built with advanced prompt engineering · Multi-agent loop architecture · Modular by design