Brent Haskins / Applied AI

Conversational AI Is an Interface Problem, Not a Model Problem

July 2, 20265 min readBy Brent Haskins

In 2026, conversational AI platforms like Dialogflow CX and Retell AI have commoditized the model layer. The real engineering challenge is the interface contract: what the chat widget promises vs what the backend can deliver. This post argues that shipping judgment—latency budgets, citation placement, memory boundaries, and honest empty states—separates production-grade conversational AI from demo-ware. Drawing from production patterns at Posit and common failure modes in agent UX, it offers concrete evaluation criteria for product engineers.

AI Product Engineering
UI/UX Engineering

The short answer

In 2026, every conversational AI platform—Dialogflow CX, Retell, Rasa—packs the same core: a large language model, intent mapping, and some form of memory. The differentiator in shipped products isn't the model. It's the interface contract: what the chat widget promises the user and what the backend can actually prove.

Engineering teams that deploy fastest (source: Retell's own analysis) focus less on tuning prompts and more on designing latency budgets, citation placement, honest empty states, and human-in-the-loop boundaries. The Posit Assistant (described at AI in Production 2026) didn't win because its model was smarter—it won because it knew when to say "I don't know" and when to show its work.

Product engineers must stop treating conversational AI as a model integration problem and start treating it as a UX contract problem. The model is a commodity. The interface is your moat.

Key takeaways

Design the failure states first. Empty state, low-confidence fallback, timeout, and human handoff should be spec'd before the happy path.
Latency is a UX budget, not a performance metric. Every turn costs time. Signal progress honestly—blinking dots that appear before the backend commits are deceptive.
Citations are not optional. If your bot references a document, API, or knowledge base, the UI must surface the source inline. Users trust sources, not assertions.
Memory across sessions must be transparent. Let users review, clear, and correct what the system remembers. Source 4's interview question on session memory is exactly right: it's a product architecture decision, not a database schema.
The prompt is part of the UI contract. What you tell the model determines what it can promise. Version your prompts as you version your component props—with tests and changelogs.

The real problem: the illusion of agency

Most conversational AI tools ship with a blank text field and a smiling avatar. That UI implicitly promises: "You can ask me anything." But the backend is scoped to a knowledge base, a few APIs, and no fallback. The gap between what the UI implies and what the model can do is the #1 source of user frustration and support tickets.

The fix is not a bigger model. It's a narrower interface. Scope the input to known intents, show examples, and indicate topic boundaries. Retell's voice agents do this well by prompting users with what the agent can handle before the conversation starts. Dialogflow CX lets you design explicit fallback flows. The best products make the model's limitations visible.

Latency budgets and honest loading copy

Every conversational turn consumes time. The model inference is the biggest chunk, but there's also context window building, tool calling, and response rendering. If your UI shows a typing indicator for three seconds then delivers a short answer, you've burned a trust token.

Define a latency budget per turn: 0.5 seconds for caching or simple lookup, 2 seconds model inference, 1 second rendering. If the total exceeds 3 seconds, switch the UI to a processing state with an estimated wait. Don't show the dots until the backend has confirmed it will respond. That small discipline—inspired by real streaming patterns from Posit Assistant—turns latency from a defect into a design feature.

Memory and the transparency principle

Memory across sessions is a product lever, not a technical checkbox. Source 4 poses it as an interview question: "Design a conversational AI system with memory across sessions." The product answer is: decide what the user expects to be remembered and what should be ephemeral.

Never silently persist. Show a conversation summary at the start of each session with a clear "That's not right" button. Let users delete individual memories. And design for the case where memory is wrong—your model will misattribute preferences or escalate a stale order status. The ability to correct the system is more important than the ability to remember.

Saying "I don't know" as a product quality

Most conversational AI products treat uncertainty as a failure. It's not. A clear "I don't know, but here's what I can do" builds more trust than a confident hallucination. The Posit Assistant's design explicitly handles this: it surfaces the bounds of its capabilities and offers alternative actions.

As a product engineer, you should measure your bot's "I don't know" rate. If it's zero, you're either over-scoping the input or hallucinating too much. Shoot for a 5-10% uncertainty rate in early interaction turns, with a seamless handoff to a human or a fallback flow.

What to evaluate in your next conversational AI project

When you're evaluating platforms like Retell, Dialogflow CX, or building custom with Rasa, ask these questions:

Can I define latency budgets per turn and measure them in production?
Does the platform let me design custom empty, error, and fallback states—or is it just success-path templates?
Is memory transparent to the user? Can they see, edit, and delete what's stored?
How does the UI communicate uncertainty? Is there a native "I don't know" flow?
Is the prompt version-controlled and tied to the UI component that invokes it?

If the answer to more than one of these is "we'll figure that out later," you're building a demo, not a shipped product.

The shipping judgment

Conversational AI in 2026 is past the hype of model announcements. The AI Dev Craft conference and the AI Conference in San Francisco both emphasized applied implementations over architecture diagrams. The engineering challenge is now squarely in the product layer: how to design interfaces that set honest expectations, handle failure gracefully, and earn trust turn by turn.

Ship that, and the model layer almost doesn't matter.

FAQ

Questions people ask about this topic.

How do you handle latency in conversational AI without misleading users?

Set explicit expectations from the first interaction: streaming indicators, estimated wait times, and a fallback timeout. Never show the typing bubble until the backend has committed to a response. Log latency per turn and surface it in your monitoring—users will forgive a slow response if the UI is honest about the delay.

What's the most common failure mode in conversational AI products?

The illusion of understanding. The UI presents a blank text field and a friendly avatar, implying the bot can handle any query. But when the underlying model cannot answer, most products fall silent or hallucinate. The fix: design explicit 'I don't know' states, scope the input to known intents, and link to human handoff when confidence drops.

How should memory across sessions be designed from a product perspective?

Memory is a product decision, not a technical one. Decide what the user expects to be remembered—preferences, previous order, past questions—and what should remain ephemeral. Surface a clear 'forget' button and a memory summary so the user can verify and correct. Never store sensitive data without explicit consent.

Sources