"Give Me Six Hours to Chop Down a Tree and I Will Spend the First Four Sharpening the Axe."

By Suraj Kumar · Mar 15, 2026 · 10 min read · 0 views · 0 likes

I took that literally. Six hours designing a CLI overhaul. Zero lines of code. Two published specs. Here's what sharpening the axe actually looks like.

An axe resting on a tree stump with wood shavings

Two weeks ago I rewrote my entire workflow — removed the desk from coding, replaced six apps with one Telegram chat, built a system where my AI agent handles everything from research to publishing. That was the infrastructure. This is the first real test of it.

Agent Orchestrator's CLI was a mess — six-step onboarding, duplicate commands, hardcoded everything. The old me would've cloned the repo and started hacking. Instead, I opened Telegram, talked to my agent, and spent six hours doing something I'd never done before: designing before building. No laptop. No IDE. No code. Just questions, decisions, and two published specs.

By the end I had killed five features, added zero flags, and produced a 17-section design doc + 18-step implementation plan that any developer or AI agent can pick up cold and build. Here's how.

6Hours
0Lines of code
17Design sections
18Implementation steps
· · ·

The Problem

I contribute to Agent Orchestrator, an open-source system for managing parallel AI coding agents. AO is powerful. Its onboarding is not.

Before — 6 steps
  • Clone the repo
  • pnpm install && pnpm build
  • ao init (13-prompt wizard)
  • ao start
  • Hope the port isn't taken
  • Hope node-pty compiled
After — 2 steps
  • npm install -g @composio/ao
  • ao start ~/my-project

Six steps down to two. Sounds simple. Designing it was not.

The end-to-end journey we designed:

The New Onboarding — End to End
npm install -g @composio/ao
package published, 67MB → 1MB, node-pty optional
ao start ~/my-project
auto-creates config, detects project type & agent runtime, finds free port
Dashboard running. Agent ready to spawn.
adding a second project later:
ao start ~/other-repo
auto-adds to config, sidebar appears in same dashboard
Both projects managed. One dashboard. One port.

One command handles everything. No flags needed. The details of how — that's in the hard decisions below.

· · ·

How I Actually Designed This

I didn't open a Google Doc. I didn't sketch wireframes. I didn't schedule a meeting.

I opened Telegram and started arguing with my AI agent.

Not "write me a design doc." More like thinking out loud — questioning assumptions, poking holes, killing bad ideas before they became code.

The Design Loop
QuestionI ask "but what if…"
Analysisagent reads codebase
DecisionI say yes or kill it
Documentagent writes the spec
↻ repeat for six hours

The conversation looked like this:

Me: "Why does ao init exist separately from ao start?"

The agent goes through the codebase. Finds that init.ts is 481 lines, start.ts is 557 lines, and 6 of 14 capabilities are duplicated between them. Shows me the overlap matrix.

Me: "Absorb them. Kill init as a standalone."

Me: "What happens if someone runs ao start when it's already running?"

Agent proposes a detection system with running.json. Interactive menu for humans. Structured output for agents.

Me: "But I want users to be able to run multiple orchestrators on the same project."

Agent initially proposes a --id flag and separate ports. I kill it. We iterate until the answer is simple: same dashboard, same port, just another config entry with a unique session prefix. No architecture change.

Telegram conversation: catching a bad design decision in real time

The actual conversation. I caught a bad assumption — the agent pivoted in real time.

This went on for six hours.

· · ·

What I Killed

The biggest value wasn't what we added. It was what we killed before anyone built it.

WhatWhy it seemed rightWhy I killed it
--id flag Obvious solution for parallel instances Users don't read flag docs. Agents don't guess flags. One instance, one port, sidebar handles it.
Hardcoded runtime detection Just which claude, which codex Growing project = growing list = maintenance debt. Plugin registry — each plugin detects itself.
The cd step Standard pattern ao start ~/my-project takes a path. Two steps, not three.
ao init Been there since day one 70% overlap with ao start. Users ran init, forgot start. Absorbed.
ao add-project Clean separation of concerns ao start ~/other-repo does the same thing. One command, not three.

Five things killed. Zero new flags added. The CLI got simpler, not more complex.

· · ·

The Hard Decisions

Two problems that took the longest to get right. Both went through multiple iterations before landing somewhere simple.

1. What happens when ao start hits a running orchestrator?

Current behavior: starts a second dashboard on another port. Two instances. User doesn't know which one to use.

But sometimes you want a second orchestrator on the same project. One for bugs, one for features. The question isn't "should we allow it?" — it's "how do we make it obvious?"

Three iterations:

  1. Block it. "Already running, exit." — Too restrictive. Killed it.
  2. --id staging flag. — Back to the flag problem. Users don't read docs. Killed it.
  3. Ask. — Interactive menu. Four choices. Zero flags.
$ ao start ~/my-app

ℹ An orchestrator is already running on this project.
  Dashboard: http://localhost:3000
  Sessions:  3 active | Up: 2h

? What would you like to do?
  ❯ Continue with the current one (open dashboard)
    Start a new orchestrator on the same project
    Override current config and restart
    Quit

"New orchestrator" generates a unique name (my-app-x7k2), duplicates the config entry, spawns it. Same dashboard. Same port. No architecture change — it's just another config entry pointing to the same repo.

AI agents don't get the menu. They get structured text — port, PID, uptime — and exit cleanly. Agents know what to do with data. They don't know what to do with menus.

The full decision tree for what ao start now handles:

ao start Decision Tree
ao start [path or URL]
What's the argument?
URL
Clone repo
detect & create config
Path or empty
Config exists?
YES
Load it
NO
Auto-create
absorbed init logic
Already running?
YES (human)
Interactive menu
Continue / New / Restart / Quit
NO
Register in running.json
Dashboard + orchestrator running

One command. Every scenario. No flags.

2. Hardcoded runtime detection

The first draft had this:

const checks = [
  { name: "claude-code", bin: "claude" },
  { name: "codex",       bin: "codex" },
  { name: "opencode",    bin: "opencode" },
];

Three lines. Works today. Ships fast.

But I've seen what happens to hardcoded lists in growing projects. New plugin, list not updated, user files issue, someone patches, another plugin comes. Repeat forever.

I killed it: "I never want things hardcoded. This project is continuously growing."

The replacement: each agent plugin exports its own detect() method and priority value. The CLI asks the plugin registry "what's available?" — plugins answer for themselves. New runtime? Publish an npm package. Zero code changes. Ever.

Telegram conversation: I never want things to be hardcoded

"I never want things hardcoded. This is a continuously growing project." — the exact moment.

The test for any design decision: does this break when the project grows? If yes, kill it now. Not later. Now.
· · ·

Designing for Two Audiences

Here's something most CLI tools get wrong: they're designed for either humans or machines. Not both.

AO gets used by developers typing in terminals AND by AI agents running it programmatically. Same command, very different needs.

ScenarioHuman (TTY)Agent (non-TTY)
ao start while running Interactive menu: Continue / New orchestrator / Override & restart / Quit Print info + exit 0
Multiple runtimes detected "Which runtime?" prompt Auto-pick highest priority
ao spawn without project Auto-detects from AO_PROJECT_ID Same — env var inherited
Error Colored, friendly message Structured text, parseable

One check: process.stdout.isTTY. That's it. If true, you're talking to a human. If false, you're talking to a script or agent. Every command adapts.

· · ·

Why Design-First Changed How I Think

I'm being honest: I've never done this before. I'm the developer who reads a problem, opens VS Code, and starts typing. Hit an edge case at hour 3. Refactor at hour 5. Rewrite at hour 8. Ship something that works but nobody understands why.

This session flipped that.

Hour 1 — Analyze
Read the existing CLI: 16 commands. Found 6 overlapping capabilities, 3 redundant commands. Built the overlap matrix.
Hour 2 — Absorb
Designed ao start absorption: init, add-project, and config creation merged into one command. 481 + 148 lines → thin wrappers.
Hour 3 — Detect
Already-running detection: running.json, PID liveness checks, interactive menu for humans, structured exit for agents.
Hour 4 — Kill
Multiple orchestrators on same project. Killed --id flag. Answer: duplicate config entry with unique session prefix. Same dashboard, same port.
Hour 5 — Generalize
Plugin registry for runtime detection. Killed hardcoded lists. Each plugin exports detect() + priority. Zero future maintenance.
Hour 6 — Audit
ao spawn simplification. Audited both docs end-to-end. Caught 7 bugs in the spec itself. Fixed and published.

Six hours. No code. But:

· · ·

The Impact

Six hours of conversation. Here's what it produced:

Onboarding steps
6 2
install + start. Nothing else.
Commands to learn
3 1
init, add-project, start → just start
Total code
1,186 ~910
23% less. Simpler, not bigger.
Features killed
5
flags, commands, hardcoded lists
New flags added
0
every scenario handled through conversation or auto-detection
Hardcoded runtime entries
3 0
plugin registry. grows forever without code changes.
Edge cases documented
0 18
every scenario tested in the spec
0 lines of code written
all design. all conversation. all from my phone.

Everything got simpler. Fewer commands, fewer lines, fewer things to learn. And every decision has a URL you can point to when someone asks "why?"

· · ·

The Meta

The agent read the AO codebase, analyzed 133 open PRs, checked the plugin registry in PR #474, and produced two publication-quality specs. I asked questions and made decisions. That's it.

The workflow isn't just productive. It's productive at producing workflows.

This is the loop: build a workflow → use the workflow to design something → hand the design to an agent → the agent builds it. Every cycle gets faster.

What's Next

The design doc goes to the owner for review. After that:

  1. Hand the implementation plan to a Claude agent on PR #463
  2. The agent builds it — 18 steps, exact code from the spec
  3. I review the output, not the approach (because the approach is already reviewed)

That's the shift. I'm not reviewing "how did you solve this?" I'm reviewing "did you follow the spec?"

design-first agent-orchestrator cli-design open-source workflow living-documents