Skip to main content
Back to Blog

Loom-Mode MCP for Advanced, Fast AI-Assisted Dev (Go-Native, Proxy+Daemon)

February 9, 2026·6 min read

labloomloom-coremcpgodeveloper-toolsagentshudproxy

The beginner-friendly story for MCP is “add tools.”

The advanced story is “add a control plane.”

If you actually want AI-assisted dev to feel fast, reliable, and repeatable, you need to optimize for:

  • token efficiency (don’t turn every keystroke into a tool call)
  • latency (local stdio calls, bounded outputs, pagination)
  • drift resistance (one source of truth, one sync loop)
  • process isolation (avoid one tool’s runtime turning into your whole machine’s runtime)

This post is the companion to my Loom registry post. It’s the “experienced dev” version: Loom-mode, Go-native MCPs, and how to keep multi-agent setups from turning into memory/CPU soup.

If you haven’t read the registry overview yet, start here:

This post assumes you already believe “config drift is real” and you want the path that stays fast at scale.

The Architecture: One Proxy, One Daemon, Many Servers

Loom’s core idea in Loom-mode:

  • MCP clients talk to one entrypoint: loom proxy (stdio)
  • The proxy forwards to loomd (local daemon)
  • loomd spawns and routes to many MCP server binaries (cmd/mcp-*), also stdio

Why this matters:

  • You don’t have N client configs that must all agree.
  • You don’t have N “server processes” embedded in each editor.
  • You get one place to enforce routing, lifecycle, and policy.

In Loom’s docs, the dataflow looks like:

Client -> loom proxy (stdio) -> loomd -> mcp-<server> -> external API

That’s boring.

Boring is good.

The Data Reality: TOON vs JSON (Token Efficiency Without Losing Structure)

When you’re moving structured data through LLM prompts (or through tool calls that end up in prompts), JSON becomes a syntax tax.

This is where TOON (Token Optimized Object Notation) matters: it preserves the JSON data model, but it’s designed to be more token-efficient and easier for models to follow (explicit structure headers, compact tabular arrays).

This is separate from how MCP clients store their MCP configuration (those vary by platform). TOON is about how you represent data inside the agent loop.

Example (same data, different token profiles):

{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
}
users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

In practice, TOON is a good fit for:

  • large uniform arrays (tables)
  • “agent asks tool for 200 rows” scenarios
  • audit logs you want to keep cheap but structured

Loom’s job is to make the configuration differences boring (registry -> generate -> sync), so you can focus on the data and the loop.

That’s not glamourous, but it’s how you keep “works on one assistant” from becoming “works on none.”

TOON In The Wild: Loom’s HUD Bridge Accepts JSON or TOON

This is not theoretical. Loom-core’s HUD bridge explicitly supports both JSON and TOON payloads when it calls mcp-agent-context tools. It tries JSON first, then falls back to decoding TOON:

// supports both JSON and TOON (Token-Optimized Object Notation) text payloads.
if err := json.Unmarshal([]byte(c.Text), target); err != nil {
  jsonBytes, toonErr := mcp.DecodeTOONToJSON(c.Text)
  if toonErr != nil {
    return fmt.Errorf("unmarshal %s text (json: %v, toon: %v)", toolName, err, toonErr)
  }
  if err := json.Unmarshal(jsonBytes, target); err != nil {
    return fmt.Errorf("unmarshal %s decoded toon: %w", toolName, err)
  }
}

That’s the point: keep the agent loop cheap without giving up typed-ish structure, and make the UI layer resilient to both representations.

Why Go-Native MCP Servers Are a Feature

Nothing against Python or Node. I ship plenty of both.

But for the core “every day, always on” MCPs, Go has practical advantages:

  • fast startup (important when a daemon restarts a failed server)
  • predictable memory footprint
  • easy distribution as a single static-ish binary
  • straightforward concurrency without turning into event-loop archaeology

In loom-core, the common MCP servers (mcp-git, mcp-gitlab, mcp-github, mcp-k8s-ops, mcp-prometheus, etc.) are implemented as Go binaries. When you run Loom-mode, the daemon owns their lifecycle.

This keeps your editor from becoming the long-running process manager.

The Performance Rule: Fewer, Higher-Leverage Tools

The fastest MCP call is the one you didn’t make.

If you want token-efficient and fast workflows:

  • Use MCP for coarse-grained actions (query a system, fetch logs, list PRs, render a report).
  • Don’t use MCP as a per-line code editor.
  • Prefer “batch” tools that do one meaningful unit of work.

In practice, I can do an enormous amount with a small set:

  • git (status/diff/log/show)
  • gitlab or github (MR/PR lifecycle)
  • k8s-ops (logs/rollout/describe)
  • loki / prometheus (debugging)
  • agent-context (sessions/tasks/decisions) when I’m doing multi-step work

That list is intentionally short.

Loom-Mode Quickstart (The Fast Path)

From services/loom-core:

make build
./bin/loom sync all --regen --loom-mode
./bin/loomd

Then for drift checks:

./bin/loom sync status

The point of Loom-mode is that every client sees the same proxy entry, and the daemon decides what exists behind it.

Multi-Agent Without Bloat: Where the State Lives

Multi-agent setups get messy when:

  • every agent spawns its own tool zoo
  • memory lives in chat logs
  • tool state is duplicated per client

In Loom-mode:

  • tools run as daemon-managed processes
  • policy lives in registry.yaml
  • multi-agent orchestration is a layer above that

This is where mcp-agent-context fits.

Agent Context: Memory and Tasks (When You Actually Need It)

I treat mcp-agent-context like a “workbench” for long-running tasks:

  • sessions provide continuity
  • tasks are an explicit queue
  • decisions/findings are recorded with titles and tags

A minimal loop is:

agent_session_start(namespace="project/feature-x")
agent_context_recall_enhanced(query="feature-x")
agent_task_add(tasks=[{title:"...", priority:"high"}])
agent_context_add(entries=[{entry_type:"decision", title:"...", content:"..."}])
agent_session_end(summarize=true)

Important constraint: adding context entries requires an embeddings key.

That’s the trade.

If you don’t need recall and search, skip it. Don’t pay for machinery you aren’t using.

Secrets: Make GUI Apps Not Lie To You

One of the most annoying failure modes in “real” MCP setups is when tools work in your terminal but fail in a GUI-launched context (launchd, VS Code, desktop clients) because your shell exports are missing.

In loom-core, there are two practical countermeasures:

  • loom check can warn when likely-required secrets referenced by the registry are missing for the default profile.
  • for secret-looking ${env:...} keys (suffixes like _TOKEN, _API_KEY, etc.), Loom can fall back to the secrets manager when the env var is unset.

That’s not glamorous, but it’s what makes your agent loop predictable.

Token Efficiency: Design Your Tool Calls Like You Design APIs

Token efficiency is mostly about interface design.

Rules that have held up for me:

  • A tool call should return exactly what you need to decide the next step.
  • Every list/search tool should have pagination.
  • Every response should have size caps.
  • Prefer structured summaries over raw dumps.

Pagination: Make Lists Cheap

If a tool can return 10,000 rows, it will. Eventually.

The fix is simple and boring:

  • list/search tools accept per_page and page
  • responses include pagination metadata (next page pointers or counts)

This is not an “AI feature.” It’s an operations feature. It prevents timeouts and keeps tool calls predictable.

In loom-core, the pattern is intentionally boring and standardized. There’s a shared validator that clamps and normalizes page / per_page:

v := validate.NewArgs(args)
p := v.GetPagination() // default per_page=30, max per_page=100, default page=1
page := p.Page
perPage := p.PerPage

And then servers return pagination metadata alongside results (example: mcp-gitlab list merge requests):

path := fmt.Sprintf(
  "/projects/%s/merge_requests?state=%s&per_page=%d&page=%d",
  encodeProject(project), state, perPage, page,
)
result, meta, err := g.requestListWithMeta(ctx, path)
return mcp.JSONResult(map[string]any{
  "merge_requests": result,
  "count":          len(result),
  "pagination":     meta,
})

Response Caps: Don’t Turn MCP Into a Dump Truck

Large payloads are the silent killer of “fast agent workflows.”

Some loom-core MCPs enforce maximum response sizes for this reason. When a response would be too large, the tool should fail with a useful error message and tell you how to narrow the query (time range, selectors, page size).

Loom-core MCPs follow these patterns (pagination metadata, response caps, etc.) because otherwise your “agent” becomes a log shipper.

The UX Layer: HUD + Native Overlay

Once you have a daemon/proxy architecture, you get a new UX opportunity: always-visible state.

Loom’s HUD is a local dashboard for:

  • server health
  • active sessions/tasks
  • workflows and approvals

On macOS, the native overlay is the “always-on” version: it keeps agent state visible without context switching.

That’s the UI version of the same design principle:

  • make state visible
  • make drift visible
  • make transitions explicit

Coordinator (Optional): FlexInfer-Backed “LLM Ops” for Agent Context

Loom also has an optional coordinator layer (started by the HUD) that can use FlexInfer (an OpenAI-compatible proxy) to do things like:

  • session summarization on session-end
  • on-demand summarization and compression
  • workflow planning from a natural language goal

It’s explicitly optional. If you don’t configure a FlexInfer URL, the HUD works exactly as before.

Example:

loom hud --port 3333 \
  --flexinfer-url http://127.0.0.1:8080 \
  --flexinfer-key "$FLEXINFER_API_KEY" \
  --coordinator-model qwen3-8b

The Anti-Pattern This Avoids

The failure mode I’m trying to avoid is:

  • each client has its own config
  • each agent has its own tools
  • tool runtimes pile up
  • latency and memory use scale with the number of assistants

Loom-mode is a direct countermeasure:

  • one proxy entrypoint
  • one daemon that owns process lifecycle
  • Go-native servers where it matters

The point isn’t to dunk on other ecosystems.

The point is to keep your machine (and your attention) from becoming the integration platform.

Related Articles

Comments

Join the discussion. Be respectful.