The AI Coding Agent Is Not Your Senior Engineer

As of May 2026, AI coding agents like Claude Code ship with curated skills — React Best Practices, performance audits, and more. But treating these skills as a substitute for engineering judgment is a mistake. This post argues that the real value isn't the skill itself but how you evaluate its output: whether it eliminates request waterfalls before touching bundle size, whether it understands your product's latency budget, and whether it encodes real product states. Written for senior engineers and founders who need to separate signal from noise in AI-assisted development.

The short answer

AI coding agents — Claude Code, Cursor, and the rest — now ship curated skills that promise to enforce best practices. The React Best Practices skill, for example, claims to eliminate request waterfalls first, then bundle size, server-side performance, data fetching, re-renders, rendering, JavaScript performance, and advanced patterns. That ordering is revealing: it prioritizes network efficiency over bundle size, which is the correct instinct for most products. But the skill itself is just a prompt. The real question is whether the agent can reason about your specific product's constraints.

I've seen teams adopt these agents and treat the skill output as gospel — merging PRs that optimize bundle size while ignoring that their real bottleneck is a waterfall of sequential API calls. The agent didn't know better because the team didn't tell it what matters. A skill is a checklist. Product engineering is judgment. If you treat the agent as a senior engineer, you'll get junior-level output gated behind confident prose.

Key takeaways

  • Skills are not understanding. A React Best Practices skill can flag missing keys and suggest memoization, but it cannot know that your product's critical user journey is a search flow where every millisecond of latency costs conversions.
  • Ordering matters more than coverage. The skill that eliminates request waterfalls before touching bundle size is more valuable than one that optimizes in reverse. Evaluate agents on their prioritization, not their breadth.
  • Your product's state model is the test. An agent that generates components without encoding empty, loading, error, and partial states is producing toy code. Real products live in the edge cases.
  • AI-driven design-to-code workflows reduce ambiguity — but only if the design system already encodes component states, behavior notes, and QA checklists. Garbage in, garbage out, even with AI.
  • Measure shipped quality, not agent throughput. Track whether the agent reduces regressions, not how many lines it generates. More code is not better code.

The real problem: skills optimize for the average, not your product

The Claude Code skills list is a reasonable ordering for a generic web app. But your product isn't generic. If you're building a real-time dashboard, server-side performance and re-renders matter more than initial bundle size. If you're building a mobile-first SaaS onboarding flow, data fetching and request waterfalls dominate. The skill doesn't know your context.

This is where the product engineer's job shifts. Instead of writing every line of code, you now evaluate and direct. You tell the agent: "Focus on the search endpoint's latency first. Ignore bundle size until we ship the next feature." The agent can execute, but it cannot prioritize without your input. Treating the skill as a turnkey solution is how you end up with a perfectly optimized landing page that still feels slow because the agent never touched the critical data path.

Tradeoffs: when the conventional wisdom breaks

Conventional wisdom says: optimize bundle size, lazy load everything, use React.memo aggressively. But I've seen products where premature bundle optimization added complexity that slowed the team down without moving a single Core Web Vital. The agent will happily apply these patterns because they're in the skill. Your job is to say "no" when the cost of complexity exceeds the user-visible win.

Another tradeoff: the agent's performance skill will flag re-renders, but it won't distinguish between a re-render that costs 2ms and one that costs 200ms. The former is noise; the latter is a bug. A good product engineer teaches the agent to ignore the noise. A bad one lets the agent generate 50 memoization wrappers that make the codebase harder to read without improving perceived performance.

How this looks in a shipped product

I recently worked on a mortgage origination dashboard where the primary user action was searching for loan applications. The initial implementation had a waterfall: auth check, then user profile, then loan list, then loan details. The agent's React Best Practices skill flagged bundle size and suggested code splitting. But the real problem was the waterfall. I redirected the agent to prefetch the loan list in parallel with the auth check, then stream the loan details as the user scrolled. The result: perceived load time dropped from 3 seconds to under 1 second, and the bundle size optimization was irrelevant.

The skill didn't fail — it just didn't know what mattered. The product engineer's job was to provide that context.

What to evaluate in an AI coding agent

When you're evaluating an agent for your team, don't run a generic benchmark. Run a test that matters to your product:

  1. Give it a page with real network conditions — slow API, partial data, error states.
  2. Ask it to optimize for perceived performance, not Lighthouse score.
  3. Review whether it generates components that handle empty, loading, error, and partial states.
  4. Check whether it respects your existing component API conventions or generates ad-hoc patterns.
  5. Measure whether its changes reduce regressions in production, not just in the dev environment.

If the agent passes these tests, it's a tool worth investing in. If it only passes generic benchmarks, it's a liability.

The closing: your judgment is the product

AI coding agents are powerful. They can eliminate request waterfalls, enforce consistent component APIs, and generate state-aware UI. But they cannot replace the product engineer's judgment about what to optimize, when to stop, and what to leave alone. The skills are checklists. Your product's success depends on how you prioritize, evaluate, and direct.

Next time you run a Claude Code skill, ask yourself: is this making my product faster for the user, or just making my codebase more complex? If you can't answer that, the agent is driving — and you're along for the ride.

Questions people ask about this topic.

Should I let an AI agent run its performance skill on my entire codebase?

Not without defining what 'performance' means for your product first. The skill will find low-hanging fruit, but it won't know that your dashboard's critical path is the initial data fetch, not the animation library. Run it on a targeted module, review every suggestion, and reject anything that optimizes for Lighthouse over user-perceived latency.

How do I evaluate whether an AI coding agent is actually improving my team's output?

Measure what changes in shipped code quality, not agent throughput. Track whether the agent reduces request waterfalls, eliminates unnecessary re-renders, and enforces consistent component APIs. If the agent is generating more code but your bundle size grows or your accessibility regressions increase, it's a net negative regardless of skill count.

What's the most important thing to check before adopting an AI coding agent for production work?

Whether the agent understands your product's state model. A good agent encodes empty, loading, error, and partial states in every component it generates. A bad one produces happy-path code that breaks under real network conditions. Test it on your most stateful page — a multi-step form or a real-time dashboard — before letting it touch anything customer-facing.

Referenced sources