"Give Me Six Hours to Chop Down a Tree and I Will Spend the First Four Sharpening the Axe."

By Suraj Kumar · Mar 15, 2026 · 10 min read · 0 views · 0 likes

I took that literally. Six hours designing a CLI overhaul. Zero lines of code. Two published specs. Here's what sharpening the axe actually looks like.

An axe resting on a tree stump with wood shavings

Two weeks ago I rewrote my entire workflow — removed the desk from coding, replaced six apps with one Telegram chat, built a system where my AI agent handles everything from research to publishing. That was the infrastructure. This is the first real test of it.

Agent Orchestrator's CLI was a mess — six-step onboarding, duplicate commands, hardcoded everything. The old me would've cloned the repo and started hacking. Instead, I opened Telegram, talked to my agent, and spent six hours doing something I'd never done before: designing before building. No laptop. No IDE. No code. Just questions, decisions, and two published specs.

By the end I had killed five features, added zero flags, and produced a 17-section design doc + 18-step implementation plan that any developer or AI agent can pick up cold and build. Here's how.

6Hours

0Lines of code

17Design sections

18Implementation steps

· · ·

The Problem

I contribute to Agent Orchestrator, an open-source system for managing parallel AI coding agents. AO is powerful. Its onboarding is not.

Before — 6 steps

Clone the repo
pnpm install && pnpm build
ao init (13-prompt wizard)
ao start
Hope the port isn't taken
Hope node-pty compiled

After — 2 steps

npm install -g @composio/ao
ao start ~/my-project

Six steps down to two. Sounds simple. Designing it was not.

The end-to-end journey we designed:

The New Onboarding — End to End

npm install -g @composio/ao

↓

package published, 67MB → 1MB, node-pty optional

ao start ~/my-project

↓

auto-creates config, detects project type & agent runtime, finds free port

Dashboard running. Agent ready to spawn.

↓

adding a second project later:

ao start ~/other-repo

↓

auto-adds to config, sidebar appears in same dashboard

Both projects managed. One dashboard. One port.

One command handles everything. No flags needed. The details of how — that's in the hard decisions below.

· · ·

How I Actually Designed This

I didn't open a Google Doc. I didn't sketch wireframes. I didn't schedule a meeting.

I opened Telegram and started arguing with my AI agent.

Not "write me a design doc." More like thinking out loud — questioning assumptions, poking holes, killing bad ideas before they became code.

The Design Loop

QuestionI ask "but what if…"

→

Analysisagent reads codebase

→

DecisionI say yes or kill it

→

Documentagent writes the spec

↻ repeat for six hours

The conversation looked like this:

Me: "Why does ao init exist separately from ao start?"

The agent goes through the codebase. Finds that init.ts is 481 lines, start.ts is 557 lines, and 6 of 14 capabilities are duplicated between them. Shows me the overlap matrix.

Me: "Absorb them. Kill init as a standalone."

Me: "What happens if someone runs ao start when it's already running?"

Agent proposes a detection system with running.json. Interactive menu for humans. Structured output for agents.

Me: "But I want users to be able to run multiple orchestrators on the same project."

Agent initially proposes a --id flag and separate ports. I kill it. We iterate until the answer is simple: same dashboard, same port, just another config entry with a unique session prefix. No architecture change.

Telegram conversation: catching a bad design decision in real time

The actual conversation. I caught a bad assumption — the agent pivoted in real time.

This went on for six hours.

· · ·

What I Killed

The biggest value wasn't what we added. It was what we killed before anyone built it.

What	Why it seemed right	Why I killed it
`--id` flag	Obvious solution for parallel instances	Users don't read flag docs. Agents don't guess flags. One instance, one port, sidebar handles it.
Hardcoded runtime detection	Just `which claude`, `which codex`…	Growing project = growing list = maintenance debt. Plugin registry — each plugin detects itself.
The `cd` step	Standard pattern	`ao start ~/my-project` takes a path. Two steps, not three.
`ao init`	Been there since day one	70% overlap with `ao start`. Users ran init, forgot start. Absorbed.
`ao add-project`	Clean separation of concerns	`ao start ~/other-repo` does the same thing. One command, not three.

Five things killed. Zero new flags added. The CLI got simpler, not more complex.

· · ·

The Hard Decisions

Two problems that took the longest to get right. Both went through multiple iterations before landing somewhere simple.

1. What happens when `ao start` hits a running orchestrator?

Current behavior: starts a second dashboard on another port. Two instances. User doesn't know which one to use.

But sometimes you want a second orchestrator on the same project. One for bugs, one for features. The question isn't "should we allow it?" — it's "how do we make it obvious?"

Three iterations:

Block it. "Already running, exit." — Too restrictive. Killed it.
--id staging flag. — Back to the flag problem. Users don't read docs. Killed it.
Ask. — Interactive menu. Four choices. Zero flags.

$ ao start ~/my-app

ℹ An orchestrator is already running on this project.
  Dashboard: http://localhost:3000
  Sessions:  3 active | Up: 2h

? What would you like to do?
  ❯ Continue with the current one (open dashboard)
    Start a new orchestrator on the same project
    Override current config and restart
    Quit

"New orchestrator" generates a unique name (my-app-x7k2), duplicates the config entry, spawns it. Same dashboard. Same port. No architecture change — it's just another config entry pointing to the same repo.

AI agents don't get the menu. They get structured text — port, PID, uptime — and exit cleanly. Agents know what to do with data. They don't know what to do with menus.

The full decision tree for what ao start now handles:

ao start Decision Tree

ao start [path or URL]

↓

What's the argument?

↓

URL

Clone repo
detect & create config

Path or empty

Config exists?

↓

YES

Load it

Auto-create
absorbed init logic

↓

Already running?

↓

YES (human)

Interactive menu
Continue / New / Restart / Quit

↓

Dashboard + orchestrator running

One command. Every scenario. No flags.

2. Hardcoded runtime detection

The first draft had this:

const checks = [
  { name: "claude-code", bin: "claude" },
  { name: "codex",       bin: "codex" },
  { name: "opencode",    bin: "opencode" },
];

Three lines. Works today. Ships fast.

But I've seen what happens to hardcoded lists in growing projects. New plugin, list not updated, user files issue, someone patches, another plugin comes. Repeat forever.

I killed it: "I never want things hardcoded. This project is continuously growing."

The replacement: each agent plugin exports its own detect() method and priority value. The CLI asks the plugin registry "what's available?" — plugins answer for themselves. New runtime? Publish an npm package. Zero code changes. Ever.

Telegram conversation: I never want things to be hardcoded

"I never want things hardcoded. This is a continuously growing project." — the exact moment.

The test for any design decision: does this break when the project grows? If yes, kill it now. Not later. Now.

· · ·

Designing for Two Audiences

Here's something most CLI tools get wrong: they're designed for either humans or machines. Not both.

AO gets used by developers typing in terminals AND by AI agents running it programmatically. Same command, very different needs.

Scenario	Human (TTY)	Agent (non-TTY)
`ao start` while running	Interactive menu: Continue / New orchestrator / Override & restart / Quit	Print info + exit 0
Multiple runtimes detected	"Which runtime?" prompt	Auto-pick highest priority
`ao spawn` without project	Auto-detects from `AO_PROJECT_ID`	Same — env var inherited
Error	Colored, friendly message	Structured text, parseable

One check: process.stdout.isTTY. That's it. If true, you're talking to a human. If false, you're talking to a script or agent. Every command adapts.

· · ·

Why Design-First Changed How I Think

I'm being honest: I've never done this before. I'm the developer who reads a problem, opens VS Code, and starts typing. Hit an edge case at hour 3. Refactor at hour 5. Rewrite at hour 8. Ship something that works but nobody understands why.

This session flipped that.

Hour 1 — Analyze

Read the existing CLI: 16 commands. Found 6 overlapping capabilities, 3 redundant commands. Built the overlap matrix.

Hour 2 — Absorb

Designed ao start absorption: init, add-project, and config creation merged into one command. 481 + 148 lines → thin wrappers.

Hour 3 — Detect

Already-running detection: running.json, PID liveness checks, interactive menu for humans, structured exit for agents.

Hour 4 — Kill

Multiple orchestrators on same project. Killed --id flag. Answer: duplicate config entry with unique session prefix. Same dashboard, same port.

Hour 5 — Generalize

Plugin registry for runtime detection. Killed hardcoded lists. Each plugin exports detect() + priority. Zero future maintenance.

Hour 6 — Audit

ao spawn simplification. Audited both docs end-to-end. Caught 7 bugs in the spec itself. Fixed and published.

Six hours. No code. But:

Every edge case is handled before it's hit. PID reuse? Stale running.json? Race conditions? All in the spec.
Decisions are documented, not buried in code. "Why doesn't ao spawn require a project name?" — section 16.
The architecture doesn't change. Multiple orchestrators? Just another config entry. Proved during design, not during a painful refactor.
Anyone can implement it. Human or AI agent. Exact code, exact file paths, exact test cases.

· · ·

The Impact

Six hours of conversation. Here's what it produced:

Onboarding steps

6 → 2

install + start. Nothing else.

Commands to learn

3 → 1

init, add-project, start → just start

Total code

1,186 → ~910

23% less. Simpler, not bigger.

Features killed

flags, commands, hardcoded lists

New flags added

every scenario handled through conversation or auto-detection

Hardcoded runtime entries

3 → 0

plugin registry. grows forever without code changes.

Edge cases documented

0 → 18

every scenario tested in the spec

0 lines of code written

all design. all conversation. all from my phone.

Everything got simpler. Fewer commands, fewer lines, fewer things to learn. And every decision has a URL you can point to when someone asks "why?"

· · ·

The Meta

The agent read the AO codebase, analyzed 133 open PRs, checked the plugin registry in PR #474, and produced two publication-quality specs. I asked questions and made decisions. That's it.

The workflow isn't just productive. It's productive at producing workflows.

This is the loop: build a workflow → use the workflow to design something → hand the design to an agent → the agent builds it. Every cycle gets faster.

What's Next

The design doc goes to the owner for review. After that:

Hand the implementation plan to a Claude agent on PR #463
The agent builds it — 18 steps, exact code from the spec
I review the output, not the approach (because the approach is already reviewed)

That's the shift. I'm not reviewing "how did you solve this?" I'm reviewing "did you follow the spec?"

Read the specs: Design Doc (17 sections) → | Implementation Plan (18 steps) →

design-first agent-orchestrator cli-design open-source workflow living-documents

The Problem

How I Actually Designed This

What I Killed

The Hard Decisions

1. What happens when ao start hits a running orchestrator?

2. Hardcoded runtime detection

Designing for Two Audiences

Why Design-First Changed How I Think

The Impact

The Meta

What's Next

1. What happens when `ao start` hits a running orchestrator?