Prompts Are Product Specs: Why PMs Can't Delegate Prompt Iteration to Engineers
Somewhere between the first AI project kickoff and the first user complaint, a decision gets made quietly: the engineers will handle the prompts. It feels reasonable—prompts are text that talks to a model, and the engineers are the ones wiring up the model. But that decision is where a lot of AI products start going wrong.
Prompts are not implementation details. They are the closest thing an AI system has to a product specification—the document that tells the system what to do, how to behave, and what a good output looks like. When a PM delegates prompt ownership to an engineer, they are handing off the spec to someone whose job is to build the system, not define what the system should do. That inversion creates exactly the kind of misalignment that PMs exist to prevent.
What Prompts Actually Are (And Why That Changes Everything)
In a traditional software product, the product specification lives in a PRD, a design file, or a set of user stories. Engineers read those documents and write code that implements them. The spec and the implementation are separate artifacts maintained by different people for good reason: the person who understands user needs should define the desired behavior, and the person who understands the system should figure out how to produce it.
In an AI product, the prompt is the spec and the implementation simultaneously. When you write a system prompt for a customer support assistant, you are not describing desired behavior in abstract terms for someone else to implement—you are directly encoding that behavior into the system. Change the prompt, and you change what the product does. Add a sentence, remove a constraint, reorder instructions, and you get a meaningfully different user experience.
This is what some engineers and researchers mean when they say prompts are becoming the new application code. The logic of the product—how it handles edge cases, what tone it takes, what it refuses to do, how it prioritizes competing goals—lives in the prompt. If a PM isn't the one writing and iterating on that logic, then someone else is making product decisions by default.
That someone else is usually an engineer who is trying to make the system work, not trying to make it work for the user.
Engineers Are Not Positioned to Own This Work
This is not a criticism of engineers. It is a description of what their job is optimized for.
Engineers building AI systems are typically focused on:
- Model selection and integration
- Latency and cost optimization
- Infrastructure reliability
- Retrieval pipelines, context management, and tool use
These are hard, important problems. But none of them require a deep understanding of what users are trying to accomplish, what language resonates with them, or what a high-quality output actually looks like in context. Those are PM responsibilities—and they are exactly the inputs that drive good prompt iteration.
When an engineer writes a prompt, they tend to optimize for getting the model to produce output that looks correct to them. That is not the same as output that solves the user's problem. An engineer might write a prompt that reliably produces well-structured JSON and consider that a success. A PM who has talked to users knows that the actual failure mode is that the model's tone sounds robotic and users stop trusting its recommendations. Those are different problems that require different prompt strategies.
The gap between "technically functional" and "actually useful" is where AI products succeed or fail. PMs are the people trained to see that gap. Delegating prompts to engineers means delegating the responsibility of closing it.
Iteration Speed Is the Competitive Variable in Early AI Products
In early-stage AI products, the ability to run fast experiments matters more than almost anything else. The model is a black box that you probe through inputs. The faster you can form a hypothesis, test it, and read the results, the faster you can improve the product.
Prompt iteration is that experimentation loop. And it is uniquely accessible to PMs in a way that most product work is not.
Unlike a code change—which requires writing, reviewing, deploying, and testing—a prompt change can be tested in minutes. A PM can:
- Rewrite a system prompt and run it against a set of test cases
- Compare two prompt variants side by side on real user inputs
- Identify failure modes and adjust instructions directly
- Ship an improved behavior without touching a line of code
When PMs treat prompt iteration as engineering work, they insert themselves into a queue. Every hypothesis about output quality becomes a ticket, a discussion, a sprint item. The feedback loop that should take an afternoon takes two weeks. By the time the change ships, the context is stale and the user behavior that motivated the change may have shifted.
PMs who run their own prompt experiments compress that loop dramatically. They can observe a failure in a user session on Monday and have a tested fix in staging by Tuesday. That speed compounds. Over three months, a PM who owns prompt iteration has run dozens of experiments that a PM who delegates it has not.
"Collaboration" Is Not an Accountability Model
The standard guidance on AI product development says PMs and engineers should "collaborate" on prompts. This sounds reasonable and is almost entirely useless as operational advice.
Collaboration without clear ownership produces predictable outcomes:
- Prompts drift over time as engineers make small changes to fix immediate bugs without understanding the downstream product implications
- When outputs miss the mark, it is unclear whether the problem is the prompt logic, the model behavior, or a misunderstanding of user needs—and no one person is accountable for diagnosing it
- Evaluations (the process of systematically testing whether outputs meet quality standards) get deprioritized because neither the PM nor the engineer feels fully responsible for running them
This is the finger-pointing problem. If a user complains that the AI assistant gave bad advice, and the PM says "engineering owns the prompts" while the engineer says "PM approved the requirements," no one is in a position to fix it quickly or prevent it from happening again.
Clear ownership resolves this. When the PM owns the prompt and the evaluation criteria, there is one person who can look at a failure, trace it back to a specific instruction or missing constraint, and ship a fix. The engineer's job becomes building the infrastructure that makes the PM's iteration fast and reliable—not making product decisions by proxy.
What PM Ownership of Prompts Actually Looks Like
Owning prompts does not mean PMs write every word of every prompt alone. It means PMs are accountable for the product behavior that prompts produce, and they do the hands-on work of defining and refining that behavior.
In practice, that looks like:
- Writing the first draft of system prompts based on user research and product goals, not waiting for engineering to produce something to react to
- Maintaining a prompt changelog that documents what changed, why, and what effect it had—treating prompts with the same rigor as any other product artifact
- Defining evaluation criteria before testing begins: what does a good output look like? What are the failure modes that matter most? These are product questions, not engineering questions
- Running structured prompt experiments using a set of representative test cases drawn from real user inputs, comparing variants against defined criteria
- Reviewing prompt changes from engineering the same way a PM reviews a feature change—understanding what changed and why, not just approving a diff
Engineers remain essential partners. They handle the infrastructure that makes prompt iteration possible—the tooling to run evals at scale, the retrieval systems that feed context into prompts, the deployment pipeline that ships changes safely. But they are building the system that executes the PM's product decisions, not making those decisions themselves.
The Accountability Gap Is a Product Risk
There is a version of this that sounds like a workflow preference—PMs who like being hands-on versus PMs who prefer to stay strategic. It is not. Unclear prompt ownership is a product risk with measurable consequences.
AI outputs are not deterministic. The same prompt can produce different outputs in different contexts, and small prompt changes can produce large behavioral shifts. Without a PM who is deeply familiar with the current prompt and its known failure modes, the product can degrade silently—no error logs, no broken builds, just gradually worse outputs that users stop trusting.
The PM who owns the prompt is the person who notices when a change introduced three weeks ago is causing the model to hedge excessively on questions it used to answer confidently. The PM who delegated the prompt to engineering is the person who finds out about it in a user interview six weeks later.
Prompt ownership is not a nice-to-have for PMs who are technically curious. It is the mechanism by which PMs maintain visibility into what their AI product is actually doing—and the lever they use to change it when it is not doing the right thing.
The engineers on your team are good at building systems. They are not good at being you. Stop asking them to do both jobs.