Brent Haskins / Applied AI

The Prompt Is the New Component API

June 5, 20266 min readBy Brent Haskins

Published June 5, 2026. This post argues that the most critical design artifact for AI-powered product surfaces is the prompt/UI contract — the agreement between what the interface suggests the model can do and the model's actual capabilities. Drawing from shipped experience with RAG systems, streaming interfaces, and agent handoffs, it covers citation placement, latency budgets, and the "I don't know" boundary as a product decision, not a model limitation.

AI Product Engineering
UI/UX Engineering
Product Thinking

The short answer

The hardest thing about shipping AI-powered product interfaces isn't model accuracy, prompt engineering, or token costs. It's the contract between what the UI says the model can do and what the model actually reliably delivers. Call it the prompt/UI contract. Violate it once — a citation that doesn't exist, a confident answer on a question outside the knowledge base, a streaming response that stalls without indication — and you've lost the user's trust in a way that's much harder to rebuild than fixing a broken button.

At its core, this is a product engineering problem, not an AI research problem. The UX patterns we choose — where to place citations, how to show confidence, when to require human confirmation — encode promises about the model's capabilities. When those promises are unrealistic, the interface becomes a liability. The teams that ship AI features that feel trustworthy are the ones that design the contract first and the model integration second.

Key takeaways

The prompt/UI contract must be explicit: every button, placeholder, and success state implies what the model can do. Audit your surface for over-promises.
Citations are a UX pattern, not a model output. Decide their position, density, and reveal behavior before wiring up retrieval — placement affects perceived trust more than accuracy does.
"I don't know" is a product quality signal. Design empty states and rephrase suggestions for low-confidence retrievals. Hiding uncertainty behind generic error text is worse than being honest.
Streaming vs batch UI is a contract decision. Long generations should stream with visible progress; deterministic transformations should batch and show results instantly. Mixing them confuses users about reliability.
Human-in-the-loop boundaries must be surfaced in the UI, not buried in a modal. If an action requires human approval, show the pending state, the expected delay, and an undo path before the agent acts.
Latency budgets are part of the design system. If your AI feature takes longer than 2 seconds to start producing output, the UI must show meaningful progress — not a spinner, but what's happening ("Searching your documents…", "Building the analysis…").

The real problem: The UI always makes promises the model can't keep

Most teams start by hooking up a model, picking a streaming or batch pattern, and then layering loading states on top. This is backwards. Every UI element — a text input labeled "Ask anything about your data," a button that says "Summarize this report," a progress bar that fills smoothly — implies a capability. When the model can't deliver, the user doesn't blame the model. They blame the product.

I've seen this play out in RAG systems where citations appear inline but link to irrelevant passages, or agent interfaces where a "Confirm" button appears before the action has actually been validated. The UI promised capability that the backend couldn't back up. The fix isn't better prompts alone; it's a stricter contract that surfaces model confidence, latency, and error boundaries as first-class UX concerns.

In practice, this means writing UX requirements that specify not just happy-path flows but also: what the UI shows when retrieval returns zero results, how the interface communicates that the model is "thinking" vs. "stuck," and what happens when the user asks a question outside the system's knowledge. These aren't edge cases — they're the majority of real interactions.

Tradeoffs and when the conventional wisdom breaks

The conventional wisdom says to stream everything for the best perceived performance. That's wrong for deterministic transformations like form fill or data extraction, where a batch response that arrives in 300ms feels more reliable than a stream that trickles in over two seconds. The user's expectation of speed depends on the action's perceived complexity, not the technical latency.

Similarly, inline citations are often treated as a default good practice. But if your retrieval pipeline has poor precision, showing citations at all can backfire — users will click through, find irrelevant sources, and lose trust. Sometimes a "Sources used" footer with the option to expand details is safer than inline hotlinks that imply high confidence.

Another broken assumption: that human-in-the-loop is always a safe default. If your agent handoff requires approval for every action, users will either click through blindly or abandon the flow. The better pattern is to batch approvals — show a summary of pending actions with clear justification, then let the user confirm or reject in bulk. This respects their time while maintaining control.

How this looks in a shipped product

In a real-time document analysis tool I helped build, the first version used a chat interface with inline citations and automatic summarization. Users complained that summaries were sometimes wrong, and citations linked to tangentially related paragraphs. We redesigned around the prompt/UI contract: the input now says "Ask a question about this document's financial sections" (not the whole document), citations appear as numbered footnotes at the bottom with a confidence indicator, and the "Summarize" button is replaced with a dropdown of specific section options. The model's capability is scoped honestly, and the interface communicates that scope without apology.

The result: fewer support tickets about incorrect outputs, higher feature retention, and user surveys showing increased trust in the AI features. The change wasn't a better model or prompt — it was a tighter contract between the UI's promises and the model's reliable delivery.

What to evaluate or watch for

When reviewing an AI-powered product surface, ask: Does this interface ever suggest a capability the model can't consistently deliver? Test with adversarial queries — questions outside the knowledge base, ambiguous inputs, edge-case document formats. Watch for UI states that hide uncertainty: generic spinners, vague error messages, or confirmations that appear before the backend has validated the action.

Also evaluate latency communication. If the feature takes more than one second to respond, does the UI show specific progress? A spinner that spins for five seconds without context erodes trust faster than a message saying "Searching 12 documents…" that takes six seconds. The user needs to know the system is working, not just waiting.

Closing: Define the contract first

Before you wire up an API call to an LLM, write down what the interface promises. List every button, placeholder, success message, and error state. Then ask: can the model actually deliver this, reliably, for every user? Where the answer is "no," change the UI — not the prompt. The prompt is a technical detail. The contract is the product.

Shipping AI features that users trust requires engineering discipline, not model enthusiasm. Design the surface around what the model can prove, not what you wish it could do.

FAQ

Questions people ask about this topic.

What is the prompt/UI contract and why does it matter for AI product engineering?

The prompt/UI contract is the explicit agreement between what the user interface promises and what the underlying model can reliably deliver. When a button says 'Analyze this document' but the model hallucinates citations, the UI has lied. Defining this contract upfront — including error states, latency expectations, and fallback behavior — is the core discipline of shipping trustworthy AI features.

How should product engineers handle 'I don't know' cases in RAG interfaces?

'I don't know' is a product quality signal, not a model bug. When the RAG system lacks evidence, the UI must surface that honestly — through a clear empty state, a suggestion to rephrase, or a fallback to manual search. Hiding uncertainty behind generic loading or vague responses erodes trust. The interface should make the confidence level visible, not pretend the model always knows.

What's the right tradeoff between streaming and batch UI for AI features?

Streaming wins for long generations where users want progressive disclosure — think chat responses or document summaries. Batch UI is better for deterministic, latency-sensitive actions like form validations or quick transformations where the output is known to be small. The decision should be driven by the user's expectation of time-to-value, not by technical preference for one pattern over another.

Sources