forge init
Your agent is probably good enough as is. It just needs an operating system.
We see them; organizations adopting one harness or the other to revolutionize their development process, mandating AI use, and soon thereafter, when productivity doesn’t skyrocket, they often do the same thing. They create their own system, scattered across various CLAUDE.md files, in institutional knowledge, reinvented per team, drifting per project, and of course flushing countless tokens down the drain. Which issue tracker? What gates? Who reviews? What do we let the agent do unsupervised and what not? Does it ring any bells?
You Already Have a Format
So here's the thing most AI wielders keep missing; chances are your company already has a system. You already have a Jira or equivalent, or a Github/Gitlab, or both, plus a CI and a review culture and a compliance regime. Whether you operate them well is a separate conversation, whether they are the right ones for the shape of your organization is another, but they exist; people know them, and they hold years of accumulated process knowledge and artifacts. Most agent frameworks however, respond to this by rebuilding the universe, a new tracker, a new memory store, a new workflow engine. All of it yours to maintain now. Congrats! You’ve adopted a pet, a rather complex and rapidly evolving one, to be exact.
If that’s not backwards, then what is? And if it is indeed backwards then why don’t we let the agents play to the strengths of the tools we already run and pay for?
A Generator, Not a Framework
forge is my attempt at that, and I open-sourced this week. Unlike most frameworks, forge is a generator; it interviews your organization, mapping the tools you already run, the process constraints you’re held to, the identities your agents act under, and emits a lightweight, standalone Claude Code plugin that your org owns. Owns. As in it lives in your repo, not mine. That plugin carries built-in adapters for your Jira, your GitHub, your CI; it enforces the boundaries between six distinct agent roles; and it keeps a wiki schema where code remains the source of truth. From then on every new project starts from it, as one command opens a project workspace with your org’s tools, roles, and rules already wired in. Custom by design: forge@Acme ≠ forge@Globex.
Code Is a Commodity. Judgment Isn’t
Every forge product keeps the engineer front and center; we don’t try to automate all the things. The agents type, the engineers solve the problems. That’s what the role boundaries are actually for — humans architect, review, and integrate; agents handle the bulk of implementation. Code was always the cheap part of software engineering, and it just got cheaper (yes cliche). What’s scarce is knowing what to build, what to refuse to build, and whether what shipped is right. Code is a commodity now. Judgment isn’t.
Which raises the obvious question; how do you know what shipped is right, when you didn’t type it? Let’s be honest with each other, nobody reviews tens of thousands of generated lines and does it well, and pretending otherwise is exactly how slop gets merged and shipped. The answer happens to be older than the agents themselves: Test-driven development (TDD). The engineer writes down what done means (an executable test, failing) and the agents iterate until it passes. The test is the spec, the contract, and the leash, all in one artifact. And this is where the engineer’s judgment actually lives now; not in reading every line, but in defining what correct looks like before a single line exists. TDD was always good advice that we mostly aspired to. With agents it stops being a discipline and becomes the steering wheel.
And the north star all of this serves? Continuous delivery. Once acceptance tests gate the work you can ship in slices small enough that any single one of them is boring; trunk-based, always releasable, integrated within hours instead of weeks. And you will need it, believe me, because agents produce a week’s worth of code in an afternoon, and the only safe way to absorb that kind of throughput is small batches through a pipeline you actually trust. Big-bang integration was already a bad idea back when humans were the bottleneck. With agents it’s malpractice.
And underneath both sits a single rule; undeniable proof. Agents are happy fools when it comes to their own progress, and “done” from an agent means precisely nothing until it cites an artifact. Think a green pipeline, a merged request, a manifest sitting in the repo where anyone can click it. forge products enforce this everywhere: no status claim without a receipt, and the wiki obeys the same law, pages cite code rather than restate it, because code is the only document that cannot drift from itself. If the proof doesn’t exist, the work didn’t happen. And if that sounds harsh, just wait until the first time an agent proudly reports shipping something that doesn’t compile, but you’ve most probably already been there by now.
It Pays to Be Lazy
The same economy applies to context. The fastest way to flush tokens down the drain is to dump your entire org brain into every agent's window and pray that relevance emerges. forge products work on progressive disclosure instead; a thin schema tells the agent what exists, the wiki layer gets pulled in only when the task actually needs it, raw sources sit behind that, and code remains the final word. Each role loads what its job requires and nothing more, and the reviewer sees the diff – never the implementer's reasoning. Which, if you think about it, is progressive disclosure doing double duty as a quality gate. Smaller contexts, sharper agents, fewer bucks into the token-grinder.
And forge stays deliberately thin, because the most underrated strategy in AI right now is laziness. Anthropic ships faster than you can maintain scaffolding. Every clever layer you build today is a layer you’ll rip out when the platform does it natively next quarter; remember subagents, hooks, plugins, skills, dynamic workflows? They were all somebody’s custom harness once. What really pays, is to build the minimum that encodes your org, and stay light enough to absorb the next Claude Code feature without throwing anything away. With forge you don’t fork and you don’t migrate: when the generator improves, you re-run it over your config and re-emit.
Everything Changed. Nothing Did
Everything changed, and yet everything remains stubbornly the same. TDD, continuous delivery, small batches, proofs over promises, code as the single source of truth, options; none of it is new. It’s the same boring canon we’ve all been nodding along to at conferences for at least twenty years, and practicing on our better days. The agents didn’t repeal software engineering, they just raised the price of skipping it. What was good advice yesterday is simply mandatory today, and the teams that already live the boring fundamentals are already moving embarrassingly fast.
It’s still very early, v0.3.0, working but incomplete. The full generator design lives in the repo (docs/GENERATOR.md), and the distance between it and the code is the roadmap. But the patterns aren’t theoretical: production platforms do ship like this, built by a handful of humans and teams of agents working exactly this way.
If your org has more than one team using Claude Code, you do have this problem. Whether you’ve noticed yet or not is up to you, but your bill and your outcomes certainly won’t lie.
github.com/ktoulgaridis/forge - MIT License. Issues welcome.
This is the first post of discoinferno.dev! learning through pain, fire, and production incidents. SRE, platform engineering, software craft, and AI from the trenches. Subscribe any of that’s your thing.





