Infrastructure layer that routes AI inference between local models and cloud APIs based on task complexity, latency requirements, and cost. LLM-as-judge determines output quality in real time.
Problem
Cloud models are powerful but expensive; local models are cheap but uneven across task types.
Experiment
Route work between local and cloud models using task classification, quality evaluation, and promotion gates.
Shipped artifact
A control plane with health checks, promoted model slots, rebalancing, and LLM-as-judge evaluation.
Result
Certain task classes can be routed to local models while higher-risk reasoning stays on stronger cloud models.
What we learned
Model choice should be empirical and task-specific, not brand-specific.
Proof assets