Loom-Mode MCP for Advanced, Fast AI-Assisted Dev (Go-Native, Proxy+Daemon)

February 9, 20267 min read

labloomloom-coremcpgodeveloper-toolsagentshudproxy

The beginner-friendly story for MCP is “add tools.”

The advanced story is “add a control plane.”

TL;DR

A fast AI-assisted dev setup is not a pile of MCP servers. It is one proxy, one Go daemon, and a small set of Go-native servers behind them.
Optimize for four things: token efficiency, latency, drift resistance, and process isolation. Everything else is a consequence.
The proxy gives you a single entry point per assistant; the daemon gives you a single process lifecycle; the registry gives you a single source of truth that syncs to six assistants.
Fewer tools is a feature. A tight, well-documented surface beats a sprawling one the model has to guess its way through.

If you actually want AI-assisted dev to feel fast, reliable, and repeatable, you need to optimize for:

token efficiency (don’t turn every keystroke into a tool call)
latency (local stdio calls, bounded outputs, pagination)
drift resistance (one source of truth, one sync loop)
process isolation (avoid one tool’s runtime turning into your whole machine’s runtime)

This post is the companion to my Loom registry post. It’s the “experienced dev” version: Loom-mode, Go-native MCPs, and how to keep multi-agent setups from turning into memory/CPU soup.

It also reflects the current shape of the project better than when I first started writing about it: Loom is not just a CLI anymore. It now has a HUD, a VS Code sync/dashboard surface, product pages and docs on this site, and a growing operator story alongside MentatLab.

If you haven’t read the registry overview yet, start here:

Loom: One Registry, Many AI Coding Assistants

This post assumes you already believe “config drift is real” and you want the path that stays fast at scale.

The Architecture: One Proxy, One Daemon, Many Servers

Loom’s core idea in Loom-mode:

MCP clients talk to one entrypoint: loom proxy (stdio)
The proxy forwards to loomd (local daemon)
loomd spawns and routes to many MCP server binaries (cmd/mcp-*), also stdio

Why this matters:

You don’t have N client configs that must all agree.
You don’t have N “server processes” embedded in each editor.
You get one place to enforce routing, lifecycle, and policy.

In Loom’s docs, the dataflow looks like:

Client -> loom proxy (stdio) -> loomd -> mcp-<server> -> external API

That’s boring.

Boring is good.

The Data Reality: TOON vs JSON (Token Efficiency Without Losing Structure)

When you’re moving structured data through LLM prompts (or through tool calls that end up in prompts), JSON becomes a syntax tax.

This is where TOON (Token Optimized Object Notation) matters: it preserves the JSON data model, but it’s designed to be more token-efficient and easier for models to follow (explicit structure headers, compact tabular arrays).

This is separate from how MCP clients store their MCP configuration (those vary by platform). TOON is about how you represent data inside the agent loop.

Example (same data, different token profiles):

{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
}

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

In practice, TOON is a good fit for:

large uniform arrays (tables)
“agent asks tool for 200 rows” scenarios
audit logs you want to keep cheap but structured

Loom’s job is to make the configuration differences boring (registry -> generate -> sync), so you can focus on the data and the loop.

That’s not glamourous, but it’s how you keep “works on one assistant” from becoming “works on none.”

TOON In The Wild: Loom’s HUD Bridge Accepts JSON or TOON

This is not theoretical. Loom-core’s HUD bridge explicitly supports both JSON and TOON payloads when it calls mcp-agent-context tools. It tries JSON first, then falls back to decoding TOON:

// supports both JSON and TOON (Token-Optimized Object Notation) text payloads.
if err := json.Unmarshal([]byte(c.Text), target); err != nil {
  jsonBytes, toonErr := mcp.DecodeTOONToJSON(c.Text)
  if toonErr != nil {
    return fmt.Errorf("unmarshal %s text (json: %v, toon: %v)", toolName, err, toonErr)
  }
  if err := json.Unmarshal(jsonBytes, target); err != nil {
    return fmt.Errorf("unmarshal %s decoded toon: %w", toolName, err)
  }
}

That’s the point: keep the agent loop cheap without giving up typed-ish structure, and make the UI layer resilient to both representations.

Why Go-Native MCP Servers Are a Feature

Nothing against Python or Node. I ship plenty of both.

But for the core “every day, always on” MCPs, Go has practical advantages:

fast startup (important when a daemon restarts a failed server)
predictable memory footprint
easy distribution as a single static-ish binary
straightforward concurrency without turning into event-loop archaeology

In loom-core, the common MCP servers (mcp-git, mcp-gitlab, mcp-github, mcp-k8s-ops, mcp-prometheus, etc.) are implemented as Go binaries. When you run Loom-mode, the daemon owns their lifecycle.

This keeps your editor from becoming the long-running process manager.

The Performance Rule: Fewer, Higher-Leverage Tools

The fastest MCP call is the one you didn’t make.

If you want token-efficient and fast workflows:

Use MCP for coarse-grained actions (query a system, fetch logs, list PRs, render a report).
Don’t use MCP as a per-line code editor.
Prefer “batch” tools that do one meaningful unit of work.

In practice, I can do an enormous amount with a small set:

git (status/diff/log/show)
gitlab or github (MR/PR lifecycle)
k8s-ops (logs/rollout/describe)
loki / prometheus (debugging)
agent-context (sessions/tasks/decisions) when I’m doing multi-step work

That list is intentionally short.

Loom-Mode Quickstart (The Fast Path)

From services/loom-core:

make build
./bin/loom sync all --regen --loom-mode
./bin/loomd

Then for drift checks:

./bin/loom sync status

The point of Loom-mode is that every client sees the same proxy entry, and the daemon decides what exists behind it.

Multi-Agent Without Bloat: Where the State Lives

Multi-agent setups get messy when:

every agent spawns its own tool zoo
memory lives in chat logs
tool state is duplicated per client

In Loom-mode:

tools run as daemon-managed processes
policy lives in registry.yaml
multi-agent orchestration is a layer above that

This is where mcp-agent-context fits.

Agent Context: Memory and Tasks (When You Actually Need It)

I treat mcp-agent-context like a “workbench” for long-running tasks:

sessions provide continuity
tasks are an explicit queue
decisions/findings are recorded with titles and tags

A minimal loop is:

agent_session_start(namespace="project/feature-x")
agent_context_recall_enhanced(query="feature-x")
agent_task_add(tasks=[{title:"...", priority:"high"}])
agent_context_add(entries=[{entry_type:"decision", title:"...", content:"..."}])
agent_session_end(summarize=true)

Important constraint: adding context entries requires an embeddings key.

That’s the trade.

If you don’t need recall and search, skip it. Don’t pay for machinery you aren’t using.

Secrets: Make GUI Apps Not Lie To You

One of the most annoying failure modes in “real” MCP setups is when tools work in your terminal but fail in a GUI-launched context (launchd, VS Code, desktop clients) because your shell exports are missing.

In loom-core, there are two practical countermeasures:

loom check can warn when likely-required secrets referenced by the registry are missing for the default profile.
for secret-looking ${env:...} keys (suffixes like _TOKEN, _API_KEY, etc.), Loom can fall back to the secrets manager when the env var is unset.

That’s not glamorous, but it’s what makes your agent loop predictable.

Token Efficiency: Design Your Tool Calls Like You Design APIs

Token efficiency is mostly about interface design.

Rules that have held up for me:

A tool call should return exactly what you need to decide the next step.
Every list/search tool should have pagination.
Every response should have size caps.
Prefer structured summaries over raw dumps.

Pagination: Make Lists Cheap

If a tool can return 10,000 rows, it will. Eventually.

The fix is simple and boring:

list/search tools accept per_page and page
responses include pagination metadata (next page pointers or counts)

This is not an “AI feature.” It’s an operations feature. It prevents timeouts and keeps tool calls predictable.

In loom-core, the pattern is intentionally boring and standardized. There’s a shared validator that clamps and normalizes page / per_page:

v := validate.NewArgs(args)
p := v.GetPagination() // default per_page=30, max per_page=100, default page=1
page := p.Page
perPage := p.PerPage

And then servers return pagination metadata alongside results (example: mcp-gitlab list merge requests):

path := fmt.Sprintf(
  "/projects/%s/merge_requests?state=%s&per_page=%d&page=%d",
  encodeProject(project), state, perPage, page,
)
result, meta, err := g.requestListWithMeta(ctx, path)
return mcp.JSONResult(map[string]any{
  "merge_requests": result,
  "count":          len(result),
  "pagination":     meta,
})

Response Caps: Don’t Turn MCP Into a Dump Truck

Large payloads are the silent killer of “fast agent workflows.”

Some loom-core MCPs enforce maximum response sizes for this reason. When a response would be too large, the tool should fail with a useful error message and tell you how to narrow the query (time range, selectors, page size).

Loom-core MCPs follow these patterns (pagination metadata, response caps, etc.) because otherwise your “agent” becomes a log shipper.

The UX Layer: HUD + Native Overlay

Once you have a daemon/proxy architecture, you get a new UX opportunity: always-visible state.

Loom’s HUD is a local dashboard for:

server health
active sessions/tasks
workflows and approvals

On macOS, the native overlay is the “always-on” version: it keeps agent state visible without context switching.

That’s the UI version of the same design principle:

make state visible
make drift visible
make transitions explicit

Coordinator (Optional): FlexInfer-Backed “LLM Ops” for Agent Context

Loom also has an optional coordinator layer (started by the HUD) that can use FlexInfer (an OpenAI-compatible proxy) to do things like:

session summarization on session-end
on-demand summarization and compression
workflow planning from a natural language goal

It’s explicitly optional. If you don’t configure a FlexInfer URL, the HUD works exactly as before.

Example:

loom hud --port 3333 \
  --flexinfer-url http://127.0.0.1:8080 \
  --flexinfer-key "$FLEXINFER_API_KEY" \
  --coordinator-model qwen3-8b

The Anti-Pattern This Avoids

The failure mode I’m trying to avoid is:

each client has its own config
each agent has its own tools
tool runtimes pile up
latency and memory use scale with the number of assistants

Loom-mode is a direct countermeasure:

one proxy entrypoint
one daemon that owns process lifecycle
Go-native servers where it matters

The point isn’t to dunk on other ecosystems.

The point is to keep your machine (and your attention) from becoming the integration platform.

6 min read

loomloom-core

Loom Mills: From Agent Swarms to Software Production Lines

The next Loom Core orchestration layer turns roadmap intent into reviewed, gated, observable work. The internal codename was Hive; the product metaphor is moving toward Mills.

6 min read

mcploom

Loom: One Registry, Many AI Coding Assistants

How Loom keeps MCP servers and skills in sync across Codex, Claude, Gemini, VS Code, Antigravity, Kilocode, OpenCode, and Zed.

5 min read

agentsloom-core

Repo Design Patterns for AI-Assisted Dev: Control Loops, Hooks, and Memory

Treat your repo like a control system: instruction hierarchy, workflows, hooks, and shared memory that make AI-assisted dev fast, reproducible, and hard to derail.

Comments

Join the discussion. Be respectful.