FlexInfer
FlexInfer Tools
AI inference deployment toolkit. Build deployment configurations with live validation, generate CLI commands, and explore the full configuration schema.
Config Editor
Interactive YAML editor for FlexInfer deployment configurations. Build, validate with live schema checking, and export production-ready configs.
CLI Builder
Visual command builder for FlexInfer CLI. Configure models, backends, and resource allocation with parameter hints and copy-ready output.
About FlexInfer
FlexInfer is an AI inference orchestration platform for Kubernetes. It simplifies deploying and scaling large language models with support for multiple backends (vLLM, TGI, Ollama), automatic GPU allocation, and production-ready configurations.
Documentation
Repo →Quickstart
Get started with FlexInfer in minutes. Install, configure, and deploy your first model.
Configuration schema
Complete reference for flexinfer-config.yaml with all options and examples.
Model backends
Supported inference backends: vLLM, TGI, Ollama, and custom containers.
Operations guide
Production deployment, scaling, monitoring, and troubleshooting.
These playground tools mirror the FlexInfer CLI and configuration concepts. Changes made here can be exported and used directly in your deployments.