The framework trap: why LangChain, CrewAI, and the next graph abstraction collapse at scale
Every team I know that bet on an agent framework hit the same wall: the abstraction became the product. The rewrite always starts when debugging the framework costs more than writing the agent.
Every team I talk to starts in the same place.
Week one: LangChain, CrewAI, or the new graph abstraction feels like leverage. The demo works. The README feels reassuring. There is a box for memory, a box for tools, a box for planning, and a box for subagents. It looks like architecture.
Month three: the team is debugging the framework instead of the agent.
That is the trap. The framework is not helping you survive complexity. It is creating a second system you now have to understand before you can touch your own product.
The recent HN thread on dropping LangChain captured the mood perfectly: one engineer described having to go through “5 layers of abstraction just to change a minute detail,” another said most LLM apps need little more than string handling, API calls, loops, and maybe a vector store. That matches what I keep seeing in practice.
Failure mode 1: the framework assumed the happy path
LangGraph's own overview describes it as a low-level orchestration runtime for long-running, stateful agents. That sounds precise. The problem shows up when your real system stops looking like the tutorial graph.
A support agent starts as four nodes. Then you add retries. Then approval. Then a human review path. Then a tool that should only run after a previous tool returned a certain shape. The graph grows, but the core behavior does not get clearer. It gets harder to inspect.
CrewAI has the same issue from the opposite direction. Its docs recommend YAML-defined agents with roles, goals, backstories, delegation, and crews. That is a clean metaphor right up until the work does not map to a fake org chart anymore.
When the model needs to call lookup_customer, issue_refund, and draft_reply, it does not care that you called the wrapper a crew, graph, or chain. It needs a goal, a tool surface, and state it can see.
Failure mode 2: the framework's mental model is wrong
Framework authors keep trying to freeze the right abstraction one year before the model vendors change the primitive.
First it was chains. Then graphs. Then role-playing agents. Now it is subagent planners on top of agent harnesses on top of orchestration runtimes. Meanwhile the model APIs keep collapsing toward the same boring shape: messages in, tools out, structured output back.
That is why the simplest thing keeps winning after the rewrite.
name: support-agent
goal: resolve the ticket end-to-end
tools:
- lookup_customer
- search_docs
- issue_refund
- draft_reply
state:
- ticket
- customer
- tool_results
while (!done) {
const step = await model.respond({ prompt, state, tools })
state = await apply(step, tools, state)
}
That loop is not glamorous. It is just legible.
Failure mode 3: the framework hides the behavior you need to observe
The worst production bug is not a crash. It is a bad decision you cannot explain.
Once your prompt is assembled through templates, wrappers, middleware, memory injection, callback managers, role descriptions, and hidden retries, you lose the one thing agent systems demand: a clean view of what the model actually saw and why it chose the tool it chose.
If I cannot answer these three questions in under two minutes, I do not want the abstraction:
- What exact prompt hit the model?
- What exact tool schema did it see?
- What exact state changed after each step?
Most frameworks make those answers available eventually. That is not good enough. In production, eventual visibility is the same as blindfolded.
What I would do instead
Start with plain SDK calls, a typed tool spec, and explicit state. Add a runtime only when you need durable execution, checkpoints, or human approval gates. Add abstractions after the failure mode is real, not before.
Frameworks are attractive because they promise to remove thinking. But the whole job in agent engineering is thinking clearly about state, tools, and failure boundaries. If you outsource that too early, you do not remove complexity. You hide it under someone else's nouns.
YAML plus a tool spec plus a direct model call beats all of them more often than people want to admit.
Not because abstraction is bad.
Because in agent systems, the abstraction layer usually becomes the thing that breaks first.