About

FlexInfer Platform Lab

I help teams ship private AI infrastructure for sensitive workloads: on-prem/hybrid inference operations, context governance, and healthcare-aligned integration reliability.

Platform posture Book an intro call View services See the lab

Recent outcomes

Lab demos wired to live K8s data, not screenshots: cluster overview.
CI visibility you can reason about end-to-end: pipeline visualization.
Operational UX for inference platforms: model gallery.

Why this exists

Most teams don't fail at model quality. They fail at operations: runaway GPU spend, deploys that feel risky, and dashboards that don't answer "is it working?" quickly.

I publish practical patterns for operating private AI platforms in environments where reliability, traceability, and data boundaries matter.

How I work

Baseline

Measure cost and latency under a representative workload. Identify bottlenecks that actually move spend.

Outputs

Workload profile (traffic shape + prompt mix)
Cost curve (throughput vs spend)
Bottleneck map (what is actually limiting)

Signals I track

p50/p95 latencytokens/secGPU utilerror rate

Stabilize

Put guardrails in place: routing, budgets, autoscaling, and rollout/rollback that doesn’t require heroics.

Outputs

Capacity model + safe scaling strategy
Rollout/rollback plan with guardrails
Cost controls (budgets, caps, routing rules)

Signals I track

queue depthtimeoutsretriessaturationbudget burn

Operate

Alerts that point to ownership, runbooks that match reality, and dashboards that answer “is it working?” fast.

Outputs

SLO-style targets (latency, error, cost)
Dashboards that match on-call questions
Runbooks tied to alerts (not a wiki graveyard)

Signals I track

SLO burnMTTRdeploy healthcustomer impact

Start with a readiness audit Browse offers

Deliverables

I try to leave artifacts, not vibes. If we work together, you should end up with a baseline you can reproduce, a rollout plan you can execute, and operational docs that your team will actually use.

Typical artifacts

Baseline report (workload, costs, bottlenecks)
Architecture notes + deployment model
Dashboards + alert thresholds (first pass)
Runbooks (alerts -> actions)
90-day plan with owners and milestones

Clarity

You know what drives cost and what to change first.

Control

Deploys have guardrails and rollbacks are rehearsed.

Operations

Dashboards + runbooks answer on-call questions fast.

The lab

Live, read-only dashboards. No screenshots. If it breaks, you see it.

View all demos

Cluster Overview

Read-only homelab dashboards wired to live K8s data.

Open demo

CI Pipeline

GitLab CI/CD visualization with stage tracking.

Open demo

Model Gallery

Live model status with metrics and UI instrumentation.

Open demo

Founder

Cody Blevins

I've spent a decade building and operating integration platforms in healthcare, then took that operational mindset into GPU clusters, inference tooling, and GitOps workflows for private environments. FlexInfer is where I publish those patterns and offer hands-on consulting.

Focus

Production AI infra

Style

Measurable, ops-first

Mode

Remote-friendly

View profile Browse projects

Want help shipping this?

If inference is unstable, costs are spiking, or deploys feel risky, tell me what you're building and where it's getting stuck.

Book an intro call Email See services