notes, projects, and tools from the lab. everything runs on free tiers.
start learning freemulti-agent orchestration built entirely on free LLM providers. two-ceo model: human decides what and why, LLM decides how. sub-agents never spawn sub-agents.
view on githubmay 2026
the layer you're in determines the tool. most people skip straight to writing code. but every problem sits in one of four layers: configuration (wire up what already exists), instruction (write behavior, not code), integration (build on open source primitives), or custom code (only here when nothing else works). start at layer one. only move down when you have to.
behavioral guardrails can outperform technical ones. claude code's permission system has two layers — an allowlist that hard-blocks tool calls, and CLAUDE.md instructions that change how the ai reasons before it acts. a CLAUDE.md instruction fires before a plan executes. a permission prompt fires mid-stream, one tool call at a time. the instruction layer is stronger for complex operations and has zero maintenance cost.
never write infrastructure. always own your logic. ntfy and tailscale are infrastructure — battle-tested, maintained by others, not your problem. the hook that connects them to your workflow is logic — 30 lines, yours to control. the open source question is always: is this infrastructure or is this logic?
optimistic execution beats blocking approval for autonomous ai. freezing a session until a human approves is the wrong default for ai agents. the better pattern: proceed unless vetoed. send a notification, give a 15-second deny window, auto-approve if no response. sessions never freeze. you stay in control.
ship the pragmatic fix, backlog the right solution. the right answer was tailscale + self-hosted ntfy. the pragmatic fix was an expanded allowlist and a CLAUDE.md guardrail. both are correct — for different stages. knowing when to ship the fast version and when to build the proper one is the skill.
may 2026
don't pay for what you can get free. the whole project runs on free llm providers — groq, gemini, github models, cloudflare, and local ollama. paid apis are a last resort. the cost ladder works.
two-ceo model. the human decides what and why. the llm decides how. the llm orchestrates sub-agents, but sub-agents never spawn sub-agents — that is a hard structural rule.
context is everything. sessions cap at ~80k tokens, then hand off. three strategies: compaction (summarize, discard redundant output), external memory (files as persistent state), and sub-agent delegation (child burns its own context, parent only sees a summary).
match rigor to scope. a one-session project doesn't need the full 9-agent pipeline. sometimes it is just researcher -> developer -> documenter.
ask when unsure. one clarifying question saves five fix messages. guessing burns context and time.
fresh context per task. commit after each task. if something fails, retry only the failed subtask, not everything.
sub-agents cost ~2-3k tokens. only worth it when you need isolation, a different model, or restart tolerance.