RESEARCH● Active

Ralph Lab

Hypothesis-driven model routing research. We ran structured waves of experiments comparing local Ollama models against Claude Sonnet across task types — code generation, bug detection, repair, and structured output. Produced a validated routing table, a published paper draft, and two public blog posts. Active: Wave 7 in progress.

Case study

Problem

Local model routing needed evidence instead of vibes: which models can replace cloud models, and for which tasks?

Experiment

Run thousands of shadow tests across local models with independent judge evaluation and task-specific hypotheses.

Shipped artifact

A validated routing table, published analysis, and an active research program feeding the lab operating model.

Result

Ralph Lab turned model selection into a measurable system rather than a default-provider choice.

What we learned

Local models can win specific jobs when evaluation is task-specific, statistically grounded, and continuously updated.

Proof assets

4,193 shadow tests8 local models200 evaluated runs