Self-hosted AI operations

FlexInferPrivate AI, operated like real infrastructure

Run and customize open models on your own GPUs, behind an OpenAI-compatible gateway. Loom governs how agents reach tools and context, MentatLab turns model calls into observable DAG workflows, and fi-fhir proves the stack on real healthcare data.

OpenAI-compatible APIGPU-aware, scale-to-zeroMCP-governed agent access

Explore FlexInfer Open playground

Product Platform Docs Playground

Route preview

Private runtime boundary

Ready

Boundary

Private

Ops

GitOps

Routes

4 lanes

Runtime

GPU-aware

Gateway

OpenAI-compatible

Context

Bounded MCP

Workflow

DAG-visible

flexinfer.route.yaml

model: gemma4-26b-a4b

route: /v1/chat/completions

placement: gpu-pool/ready

policy: tools.allowed[3]

01FlexInferPrivate inference runtime 02LoomMCP + agent governance 03MentatLabDAG agent orchestration 04fi-fhirHealthcare ETL proof

Measured on the clustersnapshot

Verified FlexInfer benchmark results

Serving throughput across language, embedding, and image workloads. The live feed takes over when available; otherwise this shows the latest verified cluster snapshot.

Snapshot verified Jul 1, 2026

106.0tok/s recorded peak

FlexInfer docs

mlc-llmAMD · gfx1100

Qwen3-8B-abliterated-q4f32_1-MLC

106.0tok/s

8 GiB VRAM

mlc-llmAMD · gfx1100

Qwen3-14B-q4f16_1-MLC

82.4tok/s

16 GiB VRAM

mlc-llmAMD · gfx1100

Qwen3-14B-abliterated-q4f16_1-MLC

75.0tok/s

16 GiB VRAM

teiCPU

BAAI/bge-large-en-v1.5

203.3emb/s

2 GiB memory

diffusersAMD · gfx1100

stabilityai/sdxl-turbo

61.6img/min

10 GiB VRAM

Snapshot scope. These legacy benchmark-result records preserve backend, hardware class, memory, and throughput, but not prompt, batch, or sample counts. Treat them as operational baselines, not a cross-vendor leaderboard.

Inspect the source records

Platform posture

Four products, one operating boundary

The stack is intentionally layered: run and customize models inside your boundary, govern how agents reach tools and context, orchestrate repeatable DAG workflows over direct API models, and use fi-fhir when the workload is sensitive healthcare ETL instead of generic demo data.

Explore Platform

Private inference and model customization

FlexInfer

Kubernetes-native model lifecycle, OpenAI-compatible routing, GPU-aware runtime controls, quantization, adapters, and artifact delivery for private or hybrid inference.

Deployment: Model runtime placement, scheduling, caching, activation, and adapter workflows stay inside your cluster boundary.

Integration: Applications hit standard inference APIs while runtime operations stay inside your network and observability stack.

Docs Playground

MCP, context, and agent fleet coordination

Loom

Loom plus Loom Core coordinate MCP servers, editor config, agent sessions, tiered context, Weaver routing, sandbox execution, Mills autonomous delivery, RBAC, audit, HUD, and mobile fleet visibility.

Deployment: Centralizes MCP server lifecycle, daemon routing, context memory, and policy boundaries for internal tools and agent access.

Integration: Defines how agents reach internal systems with bounded context, auditable routing, and least-privilege intent.

Docs Playground

DAG agent orchestration

MentatLab

Mission Control for DAG-based agent workflows over direct API model calls, internal services, and private inference endpoints such as FlexInfer-hosted models.

Deployment: UI, gateway, and orchestrator services run inside your Kubernetes footprint alongside private model and integration services.

Integration: Workflows call direct model/API nodes and MCP-governed tools without depending on frontier coding harness semantics.

Docs Playground

Healthcare ETL and integration proof path

fi-fhir

Healthcare-focused ingestion and transformation workflows across HL7v2, FHIR, CSV, EDI X12, and CDA/CCDA with profile-driven, testable data handling.

Deployment: Data transformation pipeline runs in your controlled environment and deployment topology.

Integration: Source Profiles, semantic events, validation, routing, and terminology mapping isolate source variability while preserving operational traceability.

Docs Playground

Integration points

How platform surfaces connect in production

Use these contracts to map deployment and integration boundaries before implementation.

View canonical platform map

Runtime and context boundary

FlexInfer + Loom

FlexInfer runs private model workloads in-cluster while Loom governs MCP routing, context memory, policy boundaries, and agent/tool access.

Deployment: Model runtime placement, rollout safety, and capacity controls stay inside your cluster boundary.

Integration: Agent access is routed through auditable context policies before reaching internal runtime services and private model endpoints.

FlexInfer docs Loom Core docs

Control plane and operator UX

Loom + MentatLab

Loom provides policy-governed context and tool routing while MentatLab delivers mission-control UX for DAG design, execution planning, and run visibility.

Deployment: UI, gateway, and orchestrator services run in your private Kubernetes footprint.

Integration: Operator workflows connect to direct model/API nodes, internal MCP-governed tools, and runtime services without moving control to shared SaaS planes.

MentatLab product Loom Core product

Inference and sensitive-data integration

FlexInfer + fi-fhir

fi-fhir handles profile-driven healthcare transformation while FlexInfer provides private runtime execution for downstream model workflows and AI-assisted integration development.

Deployment: Data transformation and runtime execution remain inside your controlled environment.

Integration: Profile-driven mapping isolates source variability while preserving operational traceability.

fi-fhir docs fi-fhir playground

Core Stack

Loom suite, FlexInfer, fi-fhir, and MentatLab

FlexInfer runs the models, Loom governs context and policy, MentatLab gives operators the DAG view, and fi-fhir carries the healthcare workload. Product pages explain where each piece fits; docs and playground show it running.

Operational surface

MentatLab mission control

Loom Core governs context routing and policy boundaries. MentatLab provides the DAG design and run-visibility layer over direct API model calls, internal services, and private FlexInfer endpoints.

Product page Docs Code

Mobile

Loom Companion

SwiftUI app for fleet monitoring, session management, real-time alerts, and lightweight operator control from iPhone and iPad.

API docs

Core

Loom Core

Single-entry MCP and context control plane: registry sync, tiered memory, Weaver orchestration, proxy + daemon routing, and controlled execution.

Docs Playground Product page

Core

Loom

VS Code extension for multi-platform MCP config sync, skills registry, and agent context workflows.

Docs Playground

Core

Loom Zed

Zed integration for Loom workflows: consistent servers, shared registry, and repeatable profiles.

Docs

Core

FlexInfer

Inference control plane for private and hybrid workloads: model lifecycle, serverless activation, routing, and GPU-aware placement.

Docs Playground FlexInfer product

Core

fi-fhir

Healthcare integration tooling: Source Profiles, parsing pipeline, and HL7v2 to FHIR workflows.

Docs Playground

Enterprise capabilities

Operator controls already shipping across the platform stack

MCP gateway routing, RBAC, sandbox execution, and OTel-traced operations are available now. HUD cost and audit visibility are landing next.

FlexInfer product Capability map

In progress

HUD Cost Dashboard

In progress

Cost monitoring integration: loom/cost-stats RPC, CostMonitor polling, SSE events, and OverviewPanel KPI tile.

HUD RBAC + Audit Visibility

In progress

RBAC config RPC, denied-calls ring buffer, ServersPanel RBAC sub-tab, and OverviewPanel badge.

Available now

MCP Gateway

Available

Centralized MCP routing via loom proxy with streamable HTTP transport, bearer/OIDC/mTLS auth, and hub failover.

Single context ingress with controlled routing, auditing, and automatic local fallback.

Role-based Access Control (RBAC)

Available

Role-aware permissions for MCP tool access with audit trail, cost tracking, and OAuth 2.1.

Enforces least-privilege context access with auditable decision logs across teams and environments.

Sandbox Executor (Docker + K8s)

Available

mcp-devbox sandbox runtime with Docker and Kubernetes backends for isolated agent execution.

Runs builds, tests, and automation in controlled containers with consistent isolation and audit trails.

Operational Foundations

Available

OTel tracing across all 59 MCP servers, JSON log correlation, observability stack, and deployment controls.

Full production observability with distributed tracing, structured logging, and repeatable deployment workflows.

Start Here

Pick a product, then prove it in the playground

Docs, playground, and product pages stay aligned on the same config shapes and schemas, so what you validate here is what you deploy.

Browse Products Open Playground Browse Docs

Map the boundaries

Start with the platform map to see which product owns which deployment and integration boundary.

Review platform

Validate the config

Use playground workflows to check config, routing, and integration behavior before you deploy.

Open playground

Bring in help

Scoped consulting engagements for hardening private AI infrastructure.

See services

Labs

Live dashboards and read-only demos

CI, cluster topology, model status, and infra diagrams.

Project Library

Map of maintained repos

Browse OSS and production systems by category and readiness.

Consulting

Bring it into production

If you want this stack inside your environment, the fastest path is a scoped audit or build engagement.

Readiness Audit

Fixed-scope diagnostic for sensitive workloads: architecture review, risk register, deployment constraints, and a 90-day plan.

Learn more about Readiness Audit

Cluster Build

Build private or hybrid inference infrastructure with observability, GitOps, and safer rollout controls.

Learn more about Cluster Build

Fractional Lead

Hands-on senior guidance to align teams around an executable platform roadmap.

Learn more about Fractional Lead

Writing

Case studies and deep dives

Implementation notes, architecture decisions, and reusable patterns.

Open Writing hub

Recent case studies

All

FlexInferPrivate AI, operated like real infrastructure

Verified FlexInfer benchmark results

Qwen3-8B-abliterated-q4f32_1-MLC

Qwen3-14B-q4f16_1-MLC

Qwen3-14B-abliterated-q4f16_1-MLC

BAAI/bge-large-en-v1.5

stabilityai/sdxl-turbo

Four products, one operating boundary

FlexInfer

Loom

MentatLab

fi-fhir

How platform surfaces connect in production

FlexInfer + Loom

Loom + MentatLab

FlexInfer + fi-fhir

Loom suite, FlexInfer, fi-fhir, and MentatLab

MentatLab mission control

Loom Companion

Loom Core

Loom

Loom Zed

FlexInfer

fi-fhir

Operator controls already shipping across the platform stack

In progress

Available now

Pick a product, then prove it in the playground

Map the boundaries

Validate the config

Bring in help

Bring it into production

Readiness Audit

Cluster Build

Fractional Lead

Case studies and deep dives

Recent case studies

Recent posts