LLM Routing

Per-stage provider routing. Each call goes through LLMClient.complete(stage="…").

Failover policy
Each stage has a primary + fallback. On provider error / timeout / rate limit, the runtime automatically falls back. Failovers appear in TurnTrace and surface as "provider degradation" anomalies in Insights.
A

Per-stage routing

StagePrimaryFallbackRationale
drafting
Final grounded answer to visitor
Multilingual strength for 10 langs; 2M ctx; ~½ cost of GPT-4.1
kq_classifier
Knowledge-question binary classifier
Trivial; sub-200ms; cheap
intent_rerank
Intent reranker post-embedding
Same as above
triage_cluster
Failed query clustering naming
Batched; cost-sensitive
triage_fix_suggest
Quick-fix suggestion in triage queue
Quality matters here
judge
Sampled LLM-as-judge audit
Independent model — avoid Gemini judging Gemini
agent_suggested_reply
Agent suggested replies
Quality + voice matters
handback_summary
Bot context summary after agent hand-back
Short summary; cheap
realtime_voice
Realtime voice generation
OpenAI Realtime is best realtime today
B

Configured providers

Google Gemini

connected
GEMINI_API_KEY · last used 12s ago

OpenAI

connected
OPENAI_API_KEY · last used 2 min ago
Preview widget