# What agent builders are learning from OpenClaw, Hermes, Claude Code, and Codex

The 2026 assistant stack is converging on the same pain points: context cost, handoff, permissions, local delivery, and reliable interrupts.

Published: 2026-05-14
Updated: 2026-05-14
Canonical: https://watch.qordinate.ai/blog/agent-runtime-pain-points-2026
Markdown: https://watch.qordinate.ai/blog/agent-runtime-pain-points-2026.md
Image: https://watch.qordinate.ai/images/blog/agent-runtime-pain-points-2026.jpg

Tags:
- OpenClaw
- Hermes Agent
- Claude Code
- Codex
- agent infrastructure

## Short answer

The hot agent products of 2026 are different on the surface, but they are exposing the same infrastructure gap. Claude Code and Codex make background coding work real. OpenClaw and Hermes make personal and local agents feel reachable. But users keep running into the same limits: context is expensive, long-running work stalls, permissions are awkward, local runtimes need safe delivery, and "proactive" often still means "poll everything on a timer."

The bigger opportunity sits around the model: the runtime substrate that keeps agent work cheap, safe, and continuous.

## Key takeaways

- Coding agents are becoming background workers, not autocomplete boxes.
- Personal assistants are becoming local control planes, not chatbots.
- Context cost is now an architectural constraint, not an accounting detail.
- Permission systems are moving from all-manual to classifier-mediated, but trust is still fragile.
- Proactivity needs event filtering and interrupt delivery because scheduled agent turns burn attention fast.

## What changed in 2026?

OpenAI describes Codex as a coding agent that can work in cloud sandboxes, run tasks in parallel, and create reviewable changes in repositories. Its docs position cloud tasks as delegated background work, rather than interactive chat.

Anthropic’s Claude Code ecosystem has moved in the same direction from another angle: hooks, subagents, MCP integrations, and advanced patterns for scaling to real codebases. The user experience is no longer "ask a question and get code." It is "coordinate a small agentic work system."

OpenClaw and Hermes bring the same pattern to personal operations. OpenClaw is closer to a practical assistant runtime: channels, plugins, local execution, and connected tools. Hermes is discussed more as an orchestrator or self-improving agent framework: memory, skills, and a longer-lived loop.

Different products, same pressure: agents are becoming systems that run across time.

## Pain point 1: context cost dominates agent economics

Users do complain about model quality, but the louder operational pain is repeated context cost.

Hacker News discussions around Claude Code spending repeatedly point to the same mechanism: long contexts get sent or cached across turns, and cost scales with context length. Reddit threads show users measuring hidden token overhead, token utilization, and the practical need to compact or clear sessions.

This matters beyond coding. Any proactive assistant that wakes often and reloads broad context will eventually become expensive. Ask "does this event deserve a model turn?" before asking which model should think.

The infrastructure implication: agent systems need context budgets, retrieval discipline, event prefiltering, and durable state summaries before they need more autonomy.

## Pain point 2: long-running work fails at handoff boundaries

The strongest agents can work for longer, but users still report that multi-step tasks stall, drift, or lose the reason they were doing something. This shows up in coding workflows as unfinished refactors, broken context handoffs, and confusion over when to delegate to a subagent.

It also shows up in personal agents. A local assistant may remember a preference, but does it know which event should resume which workflow? Does it know whether a delivery was acknowledged? Does it know which app connection expired while the user was away?

Long-running work needs checkpoints that live outside the model turn:

- What was the user’s durable intent?
- What state did the system observe?
- Which event caused the wakeup?
- Which agent or session accepted the work?
- What happened after delivery?

Without those records, the agent can sound confident while the system loses continuity.

## Pain point 3: permissions are still too binary

Claude Code’s hook and permission systems show where developer agents are headed: more automation, but still with guardrails. Recent coverage of auto-mode frames it as a middle path between prompting the user for every action and letting the agent do anything.

OpenClaw’s security incidents and broader criticism point at the same tension for personal assistants. A local agent with access to email, shell, browser, files, and payments is powerful enough to be useful and dangerous enough to need a real policy layer.

The next runtime primitive looks more like scoped permission: event, intent, channel, and action all affect what the agent may do.

Give the agent different authority when it responds to a direct prompt, wakes from a customer email, handles a maintenance window, or reacts to an unknown webhook. The interrupt source should shape the permission envelope.

## Pain point 4: local assistants need delivery, not inbound webhooks

Hosted agents can receive webhooks. Local agents usually need a safer path.

OpenClaw-style systems run on a user machine, a home server, or a small VPS. Hermes setups often stay personal and local too. These environments need outbound polling, durable pending deliveries, and session routing. Do not make every user expose a public endpoint.

"Proactive local assistant" quickly becomes a delivery problem. The agent needs a way to receive relevant interrupts without polling every source app directly.

The pattern is:

1. Source events arrive upstream.
2. A match layer filters them against durable user intent.
3. The local runtime pulls matched deliveries.
4. The runtime acknowledges what it accepted.

That beats asking the local agent to wake up every few minutes and inspect the world.

## Pain point 5: proactivity is still confused with schedules

Many assistant demos still define proactivity as "run this prompt every N minutes." That can start a demo. It cannot carry the category.

Reddit discussions around proactive assistants often make the sharper point: the human is still the sensor. The user notices the Sentry alert, the customer email, the blocked issue, the missed calendar prep, and then asks the assistant to help.

The deeper promise is event-level awareness: the system notices when something crosses a user-defined threshold and wakes the right agent with the right context.

That requires an interrupt layer. It also requires restraint. A proactive agent that alerts on everything becomes another inbox.

## FAQ

### Are OpenClaw and Hermes the same kind of agent?

No. OpenClaw is usually discussed as a practical personal assistant runtime with channels, plugins, and local execution. Hermes is often discussed as a more autonomous, self-improving orchestrator with memory and skills. The overlap is real, but the adoption pain is different.

### Are Claude Code and Codex replacing developer tools?

They are becoming developer work runtimes. The important shift is background delegation: tasks can run in a sandbox, branch, session, or subagent while the human reviews and steers.

### What is the biggest infrastructure gap?

The missing layer is the control plane around agent work: durable intent, event filtering, scoped permissions, delivery semantics, handoff state, and observability.

### Why does this matter for proactive agents?

Because proactivity multiplies cost and risk. A reactive agent runs when a user asks. A proactive agent may wake thousands of times. Without prefiltering and delivery discipline, the system spends too much and earns too little trust.

## Further reading

- https://platform.openai.com/docs/codex/overview
- https://openai.com/index/introducing-upgrades-to-codex/
- https://docs.anthropic.com/en/docs/claude-code/hooks
- https://www.anthropic.com/webinars/claude-code-advanced-patterns
- https://news.ycombinator.com/item?id=47976415
- https://www.reddit.com/r/ChatGPTCoding/comments/1sie75z/openai_codex_vs_claude_code_in_2026_spring/
- https://www.reddit.com/r/hermesagent/comments/1t9chdk/the_ai_agent_setup_that_finally_clicked_for_me/
