
Remember when “prompt engineering” was the hottest skill in AI? When developers spent hours crafting the perfect phrasing, hoping to coax better responses from ChatGPT? Those days are rapidly fading. In 2025, a new discipline has emerged that’s fundamentally changing how we build AI-powered applications: context engineering.
As Andrej Karpathy puts it: “Context engineering is the delicate art and science of filling the context window with just the right information for the next step.”
This shift isn’t just semantics—it represents a fundamental evolution from clever wordsmithing to rigorous software architecture. For AI coding assistants like Claude Code and Cursor, mastering context engineering is the difference between an agent that understands your entire codebase and one that breaks everything it touches.
The Evolution: From Writing Prompts to Architecting Context
Prompt engineering was largely about clever wording. Maybe if you phrased a request differently, added “think step by step,” or included few-shot examples, the LLM would produce better output. It was an art form, often feeling more like persuasion than engineering.
Context engineering is different. It’s architecture for intelligence.
- Clever wording & phrasing
- Single-turn interactions
- Stateless conversations
- Art over science
- Dynamic information architecture
- Multi-turn agent workflows
- Persistent state management
- Engineering discipline
The key insight is this: Prompt engineering is about what you ask. Context engineering is about what the model already knows when you ask it.
Early generative AI was stateless—each interaction existed in isolation. A clever prompt was often sufficient. But autonomous AI agents are fundamentally different. They persist across multiple interactions, make sequential decisions, coordinate with other agents, and operate with varying levels of human oversight.
Why Prompt Engineering Breaks Down for Agents
Agentic AI systems suffer high failure rates under traditional prompt engineering. Here’s why:
Without shared context, agents misalign on priorities. One agent extracts data while another applies outdated rules because neither accesses the same operational state. They duplicate validation checks or ignore dependencies entirely.

Consider a typical multi-agent coding workflow:
With prompt engineering alone, each agent starts from scratch. Agent B doesn’t know what Agent A discovered. Agent C can’t access Agent B’s implementation rationale. The result? Inconsistent code, redundant work, and subtle bugs that slip through because no single agent had the full picture.
Agent A
Analyzer
Agent B
Generator
Agent C
Tester
Agent D
Reviewer
- No shared state
- Duplicated analysis
- Conflicting decisions
- Lost information
- Shared context layer
- Cumulative knowledge
- Coordinated actions
- Persistent memory
Karpathy’s Operating System Analogy
Andrej Karpathy offers a powerful mental model: LLMs are like a new kind of operating system. The LLM itself is the CPU, and its context window is the RAM—the model’s working memory.
Just like RAM, the context window has limited capacity. And just as an operating system carefully curates what fits into RAM to maximize performance, context engineering plays the same role for LLM applications.
This analogy illuminates why context engineering matters so much:
- RAM is finite → Context windows have token limits (200K for Claude, 128K-200K for Cursor)
- RAM management is critical → Poor context management leads to performance degradation
- OS abstracts complexity → Good context engineering hides complexity from the model
- Priority scheduling exists → Not all context is equally important

The Four Pillars of Context Engineering
According to research from Anthropic and LangChain, effective context engineering for agents can be grouped into four strategies: Write, Select, Compress, and Isolate.
1. Write: Persisting Context Beyond the Window
Writing context means saving information outside the context window for later retrieval. This includes:
- Scratchpads for intermediate reasoning
- Memory stores for long-term facts
- Tool outputs persisted to files
- Conversation summaries for continuity
Claude Code exemplifies this with its /compact command, which automatically summarizes earlier conversation parts while retaining critical information. This “writes” compressed context that can be referenced later without consuming the full token budget.
2. Select: Retrieving the Right Information
Selecting context means pulling relevant information into the window when needed. Modern coding assistants use multiple selection strategies:
- RAG (Retrieval-Augmented Generation) for semantic search
- String-based search (grep, ripgrep) for exact matches
- File system indexing for code navigation
- Git history for understanding changes
Cursor’s approach combines RAG with traditional search tools, creating a two-pronged system that can find both semantically similar code and exact string matches.
3. Compress: Maximizing Information Density
Compression means retaining only the tokens necessary for the current task. Techniques include:
- Summarization of long conversations
- Chunking large files intelligently
- Filtering irrelevant code sections
- Truncation with smart boundaries
The goal isn’t just fitting more in—it’s ensuring the model focuses on what matters. As one practitioner notes: “Don’t dump a dozen files into the prompt ‘just in case.’ This creates noise and can confuse the model.”
4. Isolate: Parallel Context for Parallel Work
Isolation means splitting context across different agents or processes to enable parallel execution. Each agent gets a focused slice of context optimized for its specific task.
This is why multi-agent architectures are becoming dominant. Instead of one agent trying to hold everything in context, specialized agents work on isolated portions and coordinate through shared state.
Context Engineering in Practice: Claude Code vs Cursor
The competition between Claude Code and Cursor perfectly illustrates different approaches to context engineering.

Claude Code’s Approach
Claude Code emerged as what Karpathy calls “the first convincing demonstration of what an LLM Agent looks like.” Its context engineering strategy includes:
- 200K token context window — Larger than competitors, enabling true understanding of entire projects
- Automatic compaction — Intelligently summarizes conversations while preserving critical details
- Local-first architecture — Runs on your computer with access to your environment, data, and configuration
- Extended thinking — Can reason about complex architectural decisions before acting
The key insight from Anthropic: the agent runs on your computer, not in the cloud. This means it has natural access to your local context—files, environment variables, git history—without needing to upload everything.
Cursor’s Approach
Cursor takes a different path to context management:
- RAG + Search hybrid — Combines semantic search with traditional string matching
- Auto-managed windows — Limits chat sessions to ~20K tokens by default for speed
- Max Mode option — Allows up to 200K tokens when needed
- Aggressive caching — Indexes codebases for rapid retrieval
Cursor’s philosophy prioritizes latency over context size. By default, it keeps context small and fast, using retrieval to pull in relevant code on demand.
| Aspect | Claude Code | Cursor |
|---|---|---|
| Default Context | 200K tokens | 20K tokens |
| Max Context | 200K (1M beta) | 200K (Max Mode) |
| Retrieval Strategy | Direct file access | RAG + Search |
| Compression | /compact command | Auto-truncation |
| Philosophy | Large context, full understanding | Small context, fast retrieval |
The ACE Framework: Proof That Context Beats Fine-Tuning
Perhaps the strongest validation for context engineering comes from the ACE (Augmented Context Engineering) framework, developed by researchers at Stanford University, SambaNova Systems, and UC Berkeley.
Their findings were striking:
- 10.6% improvement on agentic tasks through context engineering alone
- 8.6% gains on financial reasoning benchmarks
- 86.9% average latency reduction compared to fine-tuning approaches
The key insight: editing input context outperformed model fine-tuning. Instead of expensive retraining, teams can achieve better results by engineering what goes into the context window.

This has massive implications for AI coding assistants. Rather than fine-tuning models on specific codebases (expensive, slow, requires retraining), tools can achieve superior performance through intelligent context engineering (cheap, fast, dynamic).
Practical Context Engineering for Developers
If you’re building AI-powered development tools—or just trying to get better results from existing ones—here are actionable strategies:
For Tool Builders
For Users of AI Coding Assistants
CLAUDE.md files that describe project architecture. Create these to give agents persistent project knowledge.@file references in Cursor or let Claude Code’s search find relevant code automatically./compact.The Future: Context-Aware AI Development
The shift from prompt engineering to context engineering signals a broader maturation of AI development. We’re moving from:
- Art to engineering — Reproducible systems over clever tricks
- Single-turn to multi-turn — Persistent agents over isolated queries
- Stateless to stateful — Accumulated knowledge over fresh starts
- Individual to collaborative — Multi-agent coordination over solo performance
For AI coding assistants, this evolution means increasingly sophisticated context management. Future tools will likely feature:
- Automatic context optimization — AI systems that manage their own context windows
- Cross-session memory — Agents that remember previous projects and preferences
- Team-level context sharing — Shared understanding across developer workflows
- Adaptive compression — Dynamic summarization based on task requirements
Conclusion
Prompt engineering isn’t dead—it remains important for crafting effective instructions. But it’s no longer sufficient for building reliable AI agents. Context engineering represents the next frontier: a discipline that treats the model’s working memory with the same rigor we apply to other system resources.
As Gartner notes, context engineering is rapidly replacing prompt engineering as the critical skill for enterprise AI success. For developers working with AI coding assistants, understanding these principles isn’t optional—it’s essential for getting the most out of tools like Claude Code and Cursor.
The question isn’t whether to adopt context engineering practices. It’s how quickly you can master them before the rest of the industry catches up.
—
Key Takeaways:
—
Ready to level up your AI-assisted development? Start by auditing how you provide context to your tools. The difference between a helpful AI agent and a frustrating one often comes down to the quality of context it receives.

