The Prompt That Writes Prompts: A Zero-Waste Framework for AI Agent Pipelines

I hit a breakthrough last week that I need to share because it changed how I think about prompts entirely. It's not about what you ask the model to do. It's about what you stop the model from doing.

Here's what happened. I had a batch of blog posts that needed formatting and data upgrades. Nothing crazy. Just typographical cleanup and updated statistics. I wrote a prompt, fed it in, and got back something that stopped me cold. Not because it was clever. Because it was absolutely focused. Every token had a job. No "I'd be happy to help." No "Here's your output." No conversational wrapper at all. Just the work, structured exactly how I needed it, machine-parseable from the first character to the last.

I sat there looking at it and realized I had accidentally discovered something. The prompt worked because I had stopped treating the LLM like a conversation partner and started treating it like a compiler in a data pipeline. That one shift changed everything about the output quality.

And then I realized the next thing. If I could describe what made that prompt work, I could build a prompt that generates prompts like it. A meta-prompt. A prompt architect.

KEY TAKEAWAYS

Treat the LLM like a compiler, not a chat partner. Chat interfaces train us to be conversational. Agent pipelines reward being surgical. The difference in output quality isn't subtle.
Position before submission. Lock down constraints so the model has zero room to execute anything other than the schema you need. Every degree of freedom you leave open becomes a place where tokens get wasted.
The CAGE-P framework. Four components that make any prompt production-grade: System Role, Core Directive, Strict Constraints, Execution Protocol with machine-readable output schema.
Delimiters are infrastructure. Using === and --- as output markers isn't cosmetic. It makes your output parseable by scripts, regex, and orchestration tools without a human reading it first.
Build a meta-prompt. A prompt that writes prompts pays for itself in one use. Feed it rough thoughts. Get back a production-grade CAGE-P prompt. Your vault now has a prompt architect on standby.

The Mental Model Shift: Compiler, Not Conversation Partner

Here's the thing. Most people interact with LLMs through chat interfaces. ChatGPT. Claude.ai. The chat window. And chat windows train a specific behavior. You say something. The model responds. You react to the response. The model adjusts. It's a conversation.

Conversations are great for exploration. They're terrible for production pipelines. Because in a conversation, the model is incentivized to be helpful, personable, and complete. It adds "Sure, I can help with that!" to the top of every response. It wraps output in explanatory paragraphs. It checks in to see if you're satisfied. Every one of those conversational signals is a token you're paying for that produces zero value.

When you're routing prompts through an API chain or an orchestration pipeline, the economics reverse. You're paying per token. Every "I'd be happy to assist" costs you money. Every "Here is the output you requested" costs you money. Every "Let me know if you need anything else" costs you money. And if you're running this at scale, processing hundreds of pieces of content through multiple pipeline nodes, those wasted tokens compound into real dollars.

The fix is a mental model shift. Stop thinking of the LLM as a conversational assistant and start thinking of it as a compiler. You give it source code (your content and instructions). It produces a compiled artifact (structured, machine-parseable output). There is no conversation. There is input and output. That's it.

A compiler doesn't say "I'd be happy to compile this for you." It compiles or it errors. That's the relationship you want with your pipeline agents.

The CAGE-P Framework: Four Components That Make Any Prompt Production-Grade

When I reverse-engineered why that prompt worked, it broke down into four components. I've been calling it CAGE-P because naming things makes them easier to remember and deploy.

Component	What It Does	Why It Matters
1. System Role	Defines the agent as a specific operational expert, not a general assistant	"You are a high-density Information Architect" produces fundamentally different output than "You are a helpful assistant." The role constrains the register, the depth, and the default behaviors.
2. Core Directive	A blunt, one to two sentence objective statement	No room for interpretation. The model knows exactly what success looks like. "Execute a typographical and data-currency upgrade on the provided posts" leaves no ambiguity about the task.
3. Strict Constraints	Three to five unbreakable rules that prevent the most common failure modes	Zero conversational fluff. No truncation. Preserve original voice. These constraints aren't suggestions. They're the guardrails that prevent the model from reverting to its chat-trained defaults.
4. Execution Protocol	A rigid output schema with machine-parseable delimiters	=== markers and --- separators mean your orchestration scripts can parse the output without a human. The output is infrastructure, not just text.

Let me break each component down with the exact prompt language that makes it work.

System Role: Name the Expert, Not the Assistant

The System Role isn't a vibe. It's a functional constraint. When you write "You are a helpful assistant," the model draws on every pattern in its training data associated with helpful assistants. That includes greeting the user. That includes asking clarifying questions. That includes wrapping output in conversational padding.

When you write "You are a high-density Information Architect and Ghost-Editor," the model draws on a completely different set of patterns. Architects produce blueprints. Ghost-editors preserve voice. The output register changes because the role changed.

// BAD SYSTEM ROLE You are a helpful AI assistant that formats blog posts. // GOOD SYSTEM ROLE You are a high-density Information Architect and Ghost-Editor. Your task is to perform a visual, typographical, and data-currency upgrade on the provided content.

The bad one is friendly and vague. The good one is specific and operational. Every word earns its place. No adjectives that don't constrain behavior.

Core Directive: One Shot, No Ambiguity

The Core Directive is the mission statement. It tells the model exactly what to produce. Not "help me with." Not "assist in." A specific deliverable.

Here's the test. After reading your core directive, could someone describe exactly what the model should output without any additional context? If yes, it's a good directive. If they'd need to ask clarifying questions, rewrite it.

// WEAK DIRECTIVE Help me improve my blog posts by making them look better and updating the information. // STRONG DIRECTIVE Return a strictly structured Markdown patch for each post with: voice profile, hindsight correction, data injection cue, and full refactored content with injected typographical upgrades.

The strong directive tells the model what to output, in what format, with what sections. No interpretation required.

Strict Constraints: The Rules That Prevent Reversion to Default

This is the section that separates production prompts from chat prompts. You need three to five unbreakable rules that explicitly prevent the model's most common failure modes.

There are two constraints I now put in every prompt I build for pipelines:

// UNIVERSAL CONSTRAINTS (include in every production prompt) 1. Zero conversational fluff. No intros, no filler text, no "Here is your output." 2. Do not truncate or use ellipses. Output the full requested content.

The first one kills the chat habit. The model has been trained on millions of conversations where politeness and framing were appropriate. You have to explicitly tell it that this isn't a conversation. The second one prevents the model from getting lazy on long outputs and summarizing sections with ellipses rather than producing the full content.

Beyond those two, add constraints specific to your task. For content work, I add:

// CONTENT-SPECIFIC CONSTRAINTS 3. Do NOT prune, truncate, or alter the core meaning or voice of the original text. Preserve the edge. 4. Temporal Anchor: Current date is [DATE]. Inject up-to-date data, trends, or stats.

Constraint three prevents the model from "improving" your voice into generic AI prose. Constraint four prevents outdated references. Each constraint addresses a specific, predictable failure mode.

Execution Protocol: Make Your Output Machine-Readable

This is the component most people skip, and it's the one that makes the biggest difference for pipelines. The Execution Protocol defines exactly how the output should be structured so that scripts, regex, and orchestration tools can parse it without a human reading it first.

The key is delimiter tokens that don't appear in natural language. I use `===` for section boundaries and `---` for subsection separators. These are trivial to parse with basic string operations.

// EXECUTION PROTOCOL (example for multi-item batch processing)
For each item, output the following template:

=== ITEM [Number]: [Title] ===
[Section A]: [Content]
---
[Section B]: [Content]
---
[Full Output]: [Complete processed content]
---

[Move immediately to next item until all are complete]

Three things make this work. The triple-equals delimiter is unambiguous. The template is explicit. And the final instruction "move immediately to next item" prevents the model from inserting conversational transitions between items.

The practical upshot: you can write a ten-line Python script that reads this output, splits on `===`, extracts each section by delimiter position, and writes the results to files. Zero human intervention between prompt execution and file output.

The Meta-Prompt: A Prompt That Writes Prompts

Once I had the CAGE-P framework defined, the next step was obvious. Build a prompt that generates CAGE-P prompts from rough ideas. A prompt architect. Feed it your stream-of-consciousness goals, get back a production-grade prompt.

Here's the meta-prompt. Save it. Use it whenever you need to turn a rough idea into a pipeline-ready prompt.

// SYSTEM ROLE You are an elite Prompt Architect and API Operations Engineer. Your objective is to take my raw, unstructured goals and compile them into a rigid, token-optimized System Prompt designed for zero-shot execution with machine-readable output. // CORE DIRECTIVE Analyze my input and generate a Master Prompt following the CAGE-P framework: System Role, Core Directive, Strict Constraints, and Execution Protocol with machine-parseable output schema. // STRICT CONSTRAINTS 1. Zero conversational fluff. No intros, no filler, no "Here is your prompt." 2. Include these two baseline constraints in every generated prompt: "Zero conversational fluff. No intros, no filler." and "Do not truncate or use ellipses." 3. Use === and --- delimiters for the output schema. 4. If the task involves multiple items, explicitly command the model to loop through the template for each item in a single pass. // EXECUTION PROTOCOL === GENERATED_PROMPT_START === ### 1. [SYSTEM ROLE] ### 2. [CORE DIRECTIVE] ### 3. [STRICT CONSTRAINTS] ### 4. [EXECUTION PROTOCOL / OUTPUT SCHEMA] === GENERATED_PROMPT_END === // USER INPUT [Insert your rough thoughts, goals, and desired output format here]

Here's why this works. The meta-prompt forces structure on unstructured thinking. You don't have to remember the CAGE-P framework in the moment. You don't have to craft the constraints yourself. You brain-dump what you need, and the architect produces a prompt that would have taken you twenty minutes to write and would have missed half the constraints.

The meta-prompt pays for itself on the first use. Actually, it pays for itself before the first use, because the prompt it generates will save you tokens on every subsequent execution by eliminating conversational waste from the output.

Why Delimiters Are Infrastructure, Not Cosmetic

Let me make this concrete because it's easy to gloss over delimiters as formatting details. They're not. They're the interface between your prompt and everything downstream.

Without delimiters, the output of your prompt is a blob of text. A human has to read it, understand where one section ends and another begins, and manually extract what they need. This is fine if you're generating one thing at a time. It's a bottleneck if you're running a pipeline.

With delimiters, the output is a data structure. Your orchestration script can parse it, route sections to the right downstream nodes, and write files without a human ever seeing the raw output. This is the difference between a tool you use and infrastructure you rely on.

Approach	Output Format	Integration Cost	Scale Limit
Chat-style prompt	Conversational text with embedded content	Human must read and extract	~5 items before human becomes the bottleneck
CAGE-P with delimiters	Machine-parseable sections with === markers	Script extracts in milliseconds	Limited by API rate, not human attention

Every piece of content you process through a pipeline is one more case where delimiters save you from manually hunting for section boundaries. Over a thousand pieces of content, that's hours of human attention recovered. The delimiters aren't cosmetic. They're leverage.

The Batch Processing Insight: One Pass, All Items

There's one more piece that made the original prompt sing, and it's worth calling out separately. The instruction to process all items in a single pass rather than one at a time.

If you prompt the model to "process this blog post" and then feed them one at a time, you get N separate inference calls. Each call has the full system prompt overhead. Each call has the conversational setup cost. Each call risks the model forgetting the constraints between sessions.

If you instead say "process all N posts in a single execution loop, moving immediately from one to the next," you get one inference call with one system prompt overhead. The constraints stay active for the entire run. The delimiters separate the items within the output. And you pay for the system prompt tokens once instead of N times.

Method	Inference Calls	System Prompt Overhead	Constraint Consistency
One at a time	N	Paid N times	May drift between calls
Single pass, looped	1	Paid once	Locked for entire run

This matters at any scale. Even at ten items, you're saving 90% of your system prompt token cost. At a hundred items, the difference is the margin between profitable automation and burning budget on redundant overhead.

The instruction to loop is simple: "Move immediately to next item until all are complete." That's it. The model understands sequential processing. You just have to tell it that this is a batch job, not a single-item interaction.

Frequently Asked Questions

Does this only work for content tasks, or can I use CAGE-P for anything?

Any task where you need structured, machine-parseable output. Code generation. Data extraction. Report formatting. API response parsing. The framework is domain-agnostic. The System Role and Core Directive change based on the task. The Strict Constraints and Execution Protocol are the reusable infrastructure.

What if my task doesn't fit neatly into a template?

The template is a starting point, not a cage. If your task requires a different output structure, change the Execution Protocol. The framework survives template changes. What doesn't survive is removing the constraints. Zero fluff and no truncation are universal. Keep those and adapt the rest.

Won't the model ignore the constraints and get chatty anyway?

Sometimes. Frontier models are heavily trained on conversational data and the chat pattern is deeply embedded. The constraints reduce the failure rate dramatically but don't eliminate it entirely. Two things help: put the zero-fluff constraint first (primacy effect), and include it in the System Role itself ("You are a compiler-like system that outputs structured data with no conversational framing"). The deeper the constraint is embedded, the harder it is for the model to bypass.

How do I know my delimiter won't appear in the content itself?

Use delimiters that are unlikely in natural language. `===` and `---` with surrounding newlines almost never appear in prose. If you're generating code, use a unique marker like `<<>>`. The key is that your parser can reliably split on the delimiter. Test it on a sample of your expected output format before running at scale.

Isn't this over-engineering? Most of my prompts work fine without this.

For one-off prompts in a chat interface, yes. Use whatever works. The CAGE-P framework is for prompts that feed into pipelines. Prompts that run unattended. Prompts whose output gets consumed by scripts, not humans. The moment your prompt output touches automation, the structure pays for itself. Until then, keep it simple.

Mikel Jorgensen

AI agent builder & founder of Chess Club Media. I write about what I learn — no fluff, no jargon, just working systems.