Brent Haskins / Applied AI
AI Agents Fail When You Build Them Backwards — Here's the Product-First Fix
As of May 2026, the AI agent landscape is littered with projects that ship a fancy demo but fail in production. The root cause isn't model quality — it's building bottom-up from components rather than top-down from product requirements. This post argues that the fix is treating architecture as a requirement-driven process, encoded in a DESIGN.md that acts as the contract between product intent and agent behavior. Drawing from system design guides, production postmortems, and the emerging DESIGN.md pattern, it shows how product engineers can flip the approach and ship agents that actually work.
The short answer
Most AI agent projects fail in production not because the model isn't powerful enough, but because the architecture was built backwards. Teams start by choosing components — vector stores, function-calling frameworks, memory modules — then compose them into a system that mostly works for a demo but consistently falls apart on real user tasks. The reason is simple and avoidable: the architecture wasn't derived from what the product needed to prove.
Building top-down means starting with the product requirement — "this agent must prove a mortgage pre-approval with 98% accuracy under 10 seconds" — and letting that constraint define every architectural choice: which retrieval method, which tool granularity, where to add human-in-the-loop boundaries. The AI System Design Guide and the DESIGN.md pattern both point to the same shift: encode product logic as a system contract before writing a single agent loop. The teams that get this right are the ones that treat architecture as a requirement-driven process, not a technology selection exercise.
Key takeaways
- Bottom-up architecture (build components first, compose later) is the leading cause of production agent failure, not model quality.
- Top-down derivation starts with a specific, testable product requirement and lets it drive every component decision.
- DESIGN.md is emerging as the practical artifact to encode these requirements — brand rules, accessibility checks, latency budgets — and serve as the contract for both engineers and AI agents.
- The hardest part isn't designing the agent — it's saying no to components that don't serve the requirement. That's product discipline.
- Evaluation suites must be derived from the same requirements, not written after the fact to justify the architecture.
- Teams that invest in this upfront architecture definition ship agents that survive scaling, not just demos.
The Real Problem: Bottom-Up Architecture
The postmortem in "Most AI Agents Fail in Production Because They’re Built Backwards" nails it: "Components were designed first, evaluated on their own, and then composed in order to implement the desired functionality." This is the classic engineering trap — it feels productive because you're shipping code, but you're actually building a system whose emergent behavior you can't predict or control.
I've seen this pattern in every failed agent project I've audited. The team built a RAG pipeline with a state-of-the-art embedding model, connected to a tool-use framework, added a memory module from a popular library, and then discovered the agent couldn't handle ambiguous user input or correct itself after a wrong turn. The components were individually fine, but the architecture had no derivation from the product requirement — it was assembled bottom-up. The fix isn't better components; it's a top-down structure that forces each component to justify its existence against a product objective.
How Top-Down Works in Practice
A top-down approach starts with a single sentence: "The agent must accomplish [specific outcome] under [latency/accuracy/cost constraints]." From there, you derive the architecture. For example, if the requirement is "answer customer billing questions with 95% correctness and cite the exact policy," you immediately know you need a retrieval system that returns policy chunks with paragraph-level citations, a verification step that matches the answer to the cited text, and a fallback to human escalation when confidence is low. You don't debate vector databases until the retrieval structure is clear.
The System Architecture Design Definition from SEBoK reminds us that architecture "defines system behavior and structure characteristics in accordance with derived requirements." This isn't abstract — it's the core engineering practice that most agent projects skip. In practice, I write a one-page DESIGN.md that captures: the product requirement, the interaction flow, the success metrics, the fail states, and the constraints (latency, cost, security). This document becomes the architecture — not the code. The code is just an implementation.
DESIGN.md as the Contract
By mid-2026, the DESIGN.md pattern has moved from experimental to essential. Collections like VoltAgent's awesome-design-md analyze how brands like Airbnb and Stripe encode design rules into files that AI agents can read. But the real value isn't styling — it's encoding product logic. A DESIGN.md for an agent system should specify: what the agent is allowed to say "I don't know" about, where it can confidently act autonomously, and where it must hand off to a human. This is the product-engineering equivalent of writing acceptance criteria before coding.
The June 2026 Design Trends piece notes: "The winners are people who can encode taste, constraints, brand rules, accessibility checks, and product logic into reusable files and workflows." I'd add: the winners are also people who encode interaction contracts — the boundary between what the agent promises and what the backend can prove. That's the prompt/UI contract that defines every loading state, every undo action, every audit trail. If you don't define that in your DESIGN.md, your agent will make it up at runtime.
Tradeoffs and When the Rule Breaks
There are cases where bottom-up makes sense: prototyping, research projects, and internal tools where the requirements are unknown. But if you're shipping to users, top-down isn't optional. The tradeoff is up-front time — deriving architecture takes days, not hours. But the alternative is weeks of debugging a system that was never designed to work in the first place.
The pattern also breaks when the product requirement is too vague. "Help users write better emails" isn't specific enough to derive an architecture. The requirement must be falsifiable: "Complete a draft email based on the user's brief and sender history, with no more than one major edit per draft." That's a requirement you can architect for.
What to Evaluate Next
Before your next agent project, write the DESIGN.md first. Derive the architecture from a single, testable requirement. Then evaluate whether each component — retrieval, tool-use, memory, prompt structure — can trace its existence back to that requirement. If a component can't, cut it. The goal isn't a fancy agent; it's a system that ships and scales.
The teams that will dominate the next wave of AI products aren't the ones with the best models or the most components. They're the ones that treat architecture as a product discipline — and encode that discipline in a document their agents have to read.
FAQ
Questions people ask about this topic.
What is the single biggest mistake teams make when building AI agents?
Building bottom-up: designing components like retrieval, tool-use, or memory in isolation, then composing them. This ignores the product requirement that should drive architecture. The result is an agent that technically works but consistently fails on user-critical tasks — because nobody derived the architecture from what the product needed to prove.
How does DESIGN.md help avoid agent failures?
DESIGN.md encodes the product constraints, brand rules, accessibility checks, and interaction logic that define what the agent should and shouldn't do. It becomes the top-down contract that all components must satisfy. Instead of guessing prompt structure, you derive the system from these written constraints — the same way you'd write acceptance criteria before coding.
Can DESIGN.md work for existing agent projects that are already built?
Yes, but it's a refactor, not a retrofit. You start by documenting the product requirements the current agent fails at, then re-derive the architecture from those. Components that don't align get replaced. It's painful, but cheaper than continuing to ship a system that can't meet user expectations — and a fraction of the cost of starting from scratch.
Sources
Referenced sources
- https://github.com/ombharatiya/ai-system-design-guide
- https://blog.mean.ceo/design-md-news-june-2026/
- https://github.com/VoltAgent/awesome-design-md
- https://towardsdatascience.com/most-ai-agents-fail-in-production-because-theyre-built-backwards/
- https://github.com/VILA-Lab/Dive-into-Claude-Code
- https://blog.mean.ceo/design-trends-june-2026/