Skip to main content
Resume

Profile & Capabilities

Production AI infra consulting: GPU clusters, Kubernetes + GitOps workflows, cost baselines, and operational guardrails so deploys stop feeling risky.

Recent outcomes

  • Shippable artifacts (dashboards, runbooks, baselines) as default deliverables: readiness audit.
  • Proof you can click through (not marketing diagrams): live demos.
  • Notes on constraints, verification, and what changed: case studies.

Capabilities

What I do

The work is usually a mix of platform engineering and operational design. I focus on the places where teams lose time: unclear bottlenecks, risky deploys, and dashboards that don't match the on-call questions.

  • GPU platform build: scheduling, routing, scaling, day-2 ops
  • Cost baseline: throughput vs spend, cache/batch/routing levers
  • GitOps reliability: safer rollouts, environments, rollbacks
  • Observability: signals, alerts, and runbooks tied to ownership

What you get

I try to ship artifacts you can reuse after I'm gone: baselines, docs, dashboards, and an execution plan that fits your constraints.

  • Baseline report: workload profile + bottlenecks + cost curve
  • Deployment model: topology, capacity, and scaling strategy
  • Dashboards + alert thresholds (first pass)
  • Runbooks: alerts -> actions
  • 90-day plan: owners, milestones, and risk register

Typical engagement

Week 1Baseline
1/4

Agree on the workload, instrument the right signals, and produce a cost/latency baseline you can reproduce.

Weeks 2-3Stabilize
2/4

Put guardrails in place: routing, budgets, autoscaling, and rollout/rollback with clear abort conditions.

Week 4Operate
3/4

Dashboards + alerts that map to on-call questions. Runbooks tied to ownership, plus a practical roadmap.

OngoingIterate
4/4

Tight loops: measure, change one thing, validate. Reduce surprises and keep spend predictable.

Background

Prior roles that shaped the ops-first approach: integrations, reliability, and shipping under SLA constraints.

Read the about page

Let's talk

Tell me what you're building, your constraints, and what you've tried. I'll respond with a concrete next step (even if that means “don’t hire me for this”).