Context Engineering: From Prompting to Production-Ready Coding Agents

The next frontier in AI software isn’t larger models or fancier prompts—it’s context engineering. By architecting how information, tools, and workflow states enter an LLM’s limited ‘working memory,’ teams can transform coding agents from brittle prototypes into reliable production systems.

Fernando

September 9, 2025

#Ai

The Evolution: Why Prompt Engineering Isn’t Enough #

Early LLM applications focused on prompt engineering—clever wording, few-shot examples, elaborate system messages. While this improved single-call outputs, it fell short when:

Building multi-step agents that must recall project state, tool schemas, and human feedback
Scaling to complex, legacy codebases, where ad hoc back-and-forth with an agent floods its context window with noise
Maintaining team alignment, since monolithic code reviews of generated PRs become unmanageable

As advanced practitioners note, “everything that makes agents good is context engineering”—the discipline of systematically curating what goes into the context window and when.

Defining Context Engineering #

Unlike static prompt templates, context engineering treats context as a dynamic, multi-layered system:

Instructions/System Prompt: Core agent behaviors and tool interfaces.

Short-Term Memory: Recent conversational exchanges, research outputs, and planning artifacts.

Long-Term Memory: Persistent knowledge—architecture docs, prior sprint notes, team conventions.

Retrieved Information (RAG): Domain- or repo-specific data dynamically fetched (e.g., API schemas, code snippets).

Tool Definitions: Available agent capabilities—file search, test execution, spec compaction.

The goal is to optimize for four metrics: correctness, completeness, size, and trajectory—while avoiding bad info, missing info, and noisy context.

The 12-Factor Agents Manifesto #

Inspired by Heroku’s Twelve-Factor App, the 12-Factor Agents framework provides a blueprint for reliable LLM applications:

Natural Language → Tool Calls
Own Your Prompts
Own Your Context Window
Treat Tools as Structured Outputs
Unify Execution & Business State
Pause/Resume APIs
Human-in-the-Loop as First-Class
Own Control Flow
Compact Errors into Context
Small, Focused Agents
Trigger Anywhere (multi-channel)
Stateless Reducer Design

Factor 3 (“Manage Context Windows Explicitly”) and Factor 12 (“Stateless Reducer”) are especially critical: each agent turn must act only on provided context and produce explicit outputs—no hidden state magic.

From Naïve Prompting to Intentional Compaction #

The Pitfall of “Shout-and-Cry” Prompting #

Teams often converse with agents until the context window overflows or they abandon the session—yielding sloppily generated code, copious rework, and misalignment.

Intentional Compaction #

Rather than using blunt context-clearing commands, advanced workflows:

Write a Structured Progress File: Summarizes research and planning in a compact format
Onboard Subagents: Small helper agents consume just the summary to execute focused tasks
Loop with Updated Plans: Keep context utilization under 40% by iteratively pruning and reinjecting only necessary details

This approach preserves correctness and team alignment, while avoiding noise and token bloat.

A Spec-First Development Workflow #

Complex Go and Rust codebases benefit from a spec-first development process:

Research Phase: Generate a concise prompt output listing files, functions, and line numbers relevant to the task.

Planning Phase: Enumerate each change—files affected, code snippets to insert—and define how to test and verify every step.

Implementation Phase: Execute code changes guided by the plan, constantly updating the remaining context window to stay below threshold.

Over weeks, this workflow enables large PRs, prototyping of complex features, and maintains mental alignment across engineers—without manually reading every line of generated code.

Subagents and Frequent Compaction #

Complex systems demand context control beyond single agents:

Subagents handle focused jobs (e.g., locating data flows across modules) and return structured results to the parent agent
Frequent Intentional Compaction stitches together research outputs, plans, and previous edits—discarding obsolete context and re-summarizing at each iteration

By orchestrating agent hierarchies and compaction, teams maintain high throughput and code quality—even in brownfield projects with legacy constraints.

Practical Outcomes #

Fast Onboarding: New engineers can ship multiple PRs within days by reading only specs, not raw code
Large-Scale Fixes: Live sessions have produced 300K-line PRs in under two hours, merged without manual edits
Engineering Efficiency: Tasks once requiring weeks now take hours, leveraging up to 170K token contexts while using under 40% for core work

Looking Ahead #

As LLM commoditization accelerates, workflow transformation—adopting context-centric communication, spec-first planning, and structured tool use—will be the ultimate competitive advantage. Digital gardeners and AI engineers should focus less on crafting the “perfect prompt” and more on engineering end-to-end context systems that empower LLMs to solve real-world software challenges reliably and at scale.

This article explores the emerging discipline of context engineering and its transformative impact on AI-assisted software development. The principles discussed here apply not just to coding agents, but to any AI application requiring sustained, multi-turn interactions with complex domain knowledge.