ClawSwitch Documentation
Route every AI request to the right model at the right price. Get started in minutes.
§Overview
ClawSwitch is an intelligent LLM proxy gateway. It sits between your AI agents and LLM providers, analyzing each request, picking the best model/provider combination, handling failover, and tracking every dollar — so you ship faster and spend less.
Smart Routing
Every request scored and sent to the optimal model
Multi-Provider
Anthropic, OpenAI, Gemini, Moonshot — one endpoint
API Key Pool
Multiple keys per provider with auto-rotation on rate limits
Fallback Chains
If primary fails, cascades through backup models
Cost Tracking
Every request logs actual cost vs. baseline savings
Response Caching
Identical requests hit cache instead of burning tokens
Budget Controls
Daily/monthly limits per agent with alerts
OpenAI Compatible
Drop-in replacement — change one URL
§Quick Start
Get running in 3 steps — no installation needed.
- 1Pick a plan — Go to Pricing and select Starter, Pro, or Enterprise.
- 2Create your account — You'll be redirected to the Dashboard. Sign up with email and password, then complete checkout.
- 3Create an agent & get your API key — In the Dashboard, go to Agents → Create Agent. Name it (e.g., "chatbot"), copy the generated API key — it's shown only once.
Your first request
curl https://api.clawswitch.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Hello!"}]
}'Set model: "auto" and ClawSwitch picks the best model. Or specify a model name to force it.
§How It Works
Every request to POST /v1/chat/completions flows through this pipeline:
- 1Auth — Verify your agent API key
- 2Cache — Exact or semantic match? → return instantly at $0
- 3Score — Analyze request complexity
- 4Route — Pick cheapest model for that complexity tier
- 5Guard — Check budget limits, token limits, blocked models
- 6Key Pool — Select best available API key for the provider
- 7Execute — Call the provider with automatic fallback chain
- 8Track — Log tokens, cost, savings
- 9Respond — Return OpenAI-format response + cost headers
Every response includes cost metadata headers:
X-clawswitch-Cost: $0.000142 ← what you paid
X-clawswitch-Saved: $7.533000 ← what you saved vs default model
X-clawswitch-Model: claude-haiku-4-5-20251001
X-clawswitch-Cache: miss
X-clawswitch-Tier: simple§Smart Routing
When you set model: "auto", ClawSwitch's proprietary routing engine analyzes each request and selects the optimal model for quality and cost. Requests are classified into tiers:
| Tier | Routes to | Examples |
|---|---|---|
| simple | Haiku / GPT-4o-mini | Greetings, lookups, formatting |
| standard | Haiku / GPT-4o-mini / Kimi | Code, summaries, Q&A |
| complex | Sonnet / GPT-4o | Architecture, analysis, writing |
| premium | Opus / GPT-4-turbo / Gemini Pro | Critical reasoning, long review |
The router also detects task domains (code, reasoning, chat, writing) and applies domain-specific model preferences. Each tier has a fallback chain — if the primary model fails, the request cascades to the next model automatically.
Guardrails
Configure safety limits on routing from the Dashboard Switch page:
- •Max cost per request — Reject requests that would exceed a cost threshold
- •Max input tokens — Reject inputs that are too long
- •Allowed tiers — Restrict which routing tiers can be used
- •Blocked models — Prevent specific models from being selected
§API Key Pool
Add multiple API keys per provider. ClawSwitch automatically rotates between them when one gets rate-limited — zero downtime for your agents.
- •Keys are tried in priority order (lower number = preferred)
- •Same-priority keys rotate round-robin for even distribution
- •Rate-limited keys are cooled down with exponential backoff
- •When all keys for a provider are exhausted, email alert is sent
Manage keys in the Dashboard under Settings → API Key Pool:
| Stat | Description |
|---|---|
| Status indicator | Green = active, Red = rate-limited, Gray = inactive |
| Request count | Total requests served by this key |
| Error count | Total errors (429s, 500s, etc.) |
| Last used | Timestamp of last successful request |
| Rate limited until | When the cooldown expires |
§Caching
Identical or semantically similar requests are served from cache — $0 cost, instant response.
| Layer | Matches | TTL | Cost |
|---|---|---|---|
| Exact | Byte-identical request | 1 hour | $0 |
| Semantic | Same meaning, different wording | 24 hours | $0 |
X-clawswitch-Cache header: hit, miss, or bypass.§Budget Controls
Set daily and monthly spend limits per agent to prevent runaway costs. Manage budgets in the Dashboard Budget page.
| Budget used | Behavior |
|---|---|
| < 50% | All routing tiers available |
| 50–80% | Prefer cheaper tiers, alert sent |
| 80–100% | Cheapest models only, urgent alert |
| > 100% (hard stop) | Requests rejected with 403 |
| > 100% (soft) | Warning logged, continues |
Each agent shows a progress bar with current spend vs. limit, plus savings amount. Budget alerts notify you before limits are hit.
§Dashboard Guide
The Dashboard is your control center. Access it at app.clawswitch.com.
| Page | What It Does |
|---|---|
| Home | Today's spend, savings, request volume, cache hit rate, 7-day trend chart |
| Agents | Create/manage API keys for your AI agents with per-agent budget limits |
| Models | View all available models across providers with cost per 1K tokens |
| Wallet | Manage prepaid credits, set low-balance threshold, enable auto-topup |
| Budget | Per-agent daily/monthly spend vs. limits with progress bars |
| Logs | Request history: agent, model, tier, provider, cost, savings, latency, cache |
| Settings | Provider credentials and API Key Pool management |
| Switch | Configure routing: guardrails, endpoints, routing matrix, test routing |
| Playground | Test the full routing pipeline live |
| Billing | View/change subscription plan, sync status |
Home (Analytics)
- •Today's Spend — Total cost of all requests today
- •Today's Saved — Amount saved vs. Claude Sonnet baseline
- •Request Count — Total requests processed today
- •Cache Hit Rate — Percentage served from cache
Agents
- 1. Click Create Agent
- 2. Enter a name (e.g., "code-reviewer", "chatbot", "research-agent")
- 3. Optionally set a daily budget limit
- 4. Copy the generated API key — shown only once
Logs
| Column | Description |
|---|---|
| Time | When the request was made |
| Agent | Which agent made the request |
| Original Model | What model was requested |
| Routed Model | What model actually served it |
| Tier | Routing tier (simple/standard/complex/premium) |
| Cost / Savings | Actual cost and amount saved |
| Latency | Response time in milliseconds |
| Cache | Whether it was a cache hit |
§Playground
Test the complete proxy pipeline from the Dashboard. Send real requests through the same routing, key pool, and fallback chain your agents use — inspect every detail.
Analyze (Routing Only)
Click Analyze to see how a message would be routed without executing:
- •Complexity score and breakdown by factor
- •Tier classification and domain detection
- •Selected model, provider, and full fallback chain
- •Guardrail violation warnings
Run (Full Execution)
Click Run to send the message through the complete pipeline. Results across four tabs:
| Tab | Shows |
|---|---|
| Response | The actual LLM response with token counts |
| Routing | Primary vs. actual model/provider, fallback used |
| Complexity | Score ring (0–100%), tier + domain badges, factor breakdown |
| Cost | Actual cost, baseline cost, savings, latency |
§SDK & Framework Integration
ClawSwitch is 100% OpenAI-compatible. Any SDK or tool that talks to OpenAI can point at ClawSwitch. Just change the base URL and API key.
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://api.clawswitch.com/v1",
api_key="YOUR_API_KEY", # from Dashboard → Agents
)
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "What is the capital of France?"}],
temperature=0.7,
)
print(response.choices[0].message.content)Streaming:
stream = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Write a short poem"}],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)Node.js / TypeScript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.clawswitch.com/v1",
apiKey: "YOUR_API_KEY",
});
const response = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);cURL
curl https://api.clawswitch.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "What is 2+2?"}]
}'LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="https://api.clawswitch.com/v1",
api_key="YOUR_API_KEY",
model="auto",
)
response = llm.invoke("Explain microservices in simple terms")LlamaIndex / AutoGen
# LlamaIndex
from llama_index.llms.openai_like import OpenAILike
llm = OpenAILike(api_base="https://api.clawswitch.com/v1", api_key="YOUR_API_KEY", model="auto")
# AutoGen
config_list = [{"model": "auto", "base_url": "https://api.clawswitch.com/v1", "api_key": "YOUR_API_KEY"}]Cursor / Continue / AI Code Editors
| Setting | Value |
|---|---|
| API Base URL | https://api.clawswitch.com/v1 |
| API Key | Your API key from Dashboard → Agents |
| Model | auto |
Request Format
| Parameter | Type | Description |
|---|---|---|
model | string | "auto" for smart routing, or a specific model name |
messages | array | Chat messages array (required) |
temperature | float | Sampling temperature 0.0–1.0 (default: 0.7) |
max_tokens | int | Maximum output tokens |
stream | bool | Enable SSE streaming (default: false) |
tools | array | Tool/function definitions |
Response Headers
| Header | Description |
|---|---|
X-clawswitch-Cost | Actual cost paid (e.g., $0.001234) |
X-clawswitch-Saved | Savings vs. Sonnet baseline |
X-clawswitch-Model | Model that served the request |
X-clawswitch-Cache | Cache status: hit, miss, or bypass |
X-clawswitch-Tier | Routing tier used |
X-Clawswitch-Balance | Remaining wallet balance |
§API Reference
Proxy Endpoint
POST https://api.clawswitch.com/v1/chat/completions
Authorization: Bearer <your-api-key>OpenAI-compatible chat completions with streaming support.
Management API
All management endpoints require JWT authentication (handled by the Dashboard automatically).
| Endpoint | Method | Description |
|---|---|---|
/api/agents | GET/POST | List/create agents |
/api/agents/{id} | GET/PATCH | Get/update agent |
/api/stats/overview | GET | Today/month stats |
/api/stats/daily | GET | Daily cost/request breakdown |
/api/stats/requests | GET | Recent request log |
/api/models | GET | Available models with pricing |
/api/budget | GET | Per-agent budget status |
/api/key-pool | GET/POST | List/add API pool keys |
/api/key-pool/{id} | PATCH/DELETE | Update/remove pool key |
/api/key-pool/{id}/reset | POST | Reset rate limit on a key |
/api/key-pool/status | GET | Pool status per provider |
/api/switch | GET | Full switch configuration |
/api/switch/test | POST | Test routing for a message |
/api/playground/analyze | POST | Analyze routing without execution |
/api/playground/run | POST | Run request through full pipeline |
/api/wallet | GET | Wallet balance |
/api/billing/subscription | GET | Current subscription status |
Error Codes
| Code | Meaning |
|---|---|
| 200 | Success |
| 401 | Invalid or missing API key |
| 402 | Insufficient wallet balance |
| 403 | Account inactive or budget exceeded |
| 429 | Rate limit exceeded |
| 500 | All providers failed |
§Available Models
| Model | Provider | Best For |
|---|---|---|
"auto" | Auto-selected | Let ClawSwitch pick the optimal model |
"claude-sonnet-4-20250514" | Anthropic | Code, reasoning, analysis |
"claude-opus-4-20250514" | Anthropic | Top-tier complex tasks |
"claude-haiku-4-5-20251001" | Anthropic | Fast, cheap tasks |
"gpt-4o" | OpenAI | General purpose |
"gpt-4o-mini" | OpenAI | Fast, cheap |
"gemini-2.5-pro" | Long context, reasoning | |
"gemini-2.5-flash" | Fast, cheap | |
"moonshot/kimi-k2.5" | Moonshot | 256K context, very cheap |
View all models with live pricing on the Dashboard Models page. To force a model, set it as the model value instead of "auto".
§Cost Savings
Example: Single agent (8h/day active)
| Request type | Without ClawSwitch | With ClawSwitch | Savings |
|---|---|---|---|
| Heartbeats 480/day | $180.00 | $0.15 (cache) | 99.9% |
| Code tasks 50/day | $15.00 | $0.90 (Haiku) | 94% |
| Analysis 20/day | $6.00 | $3.60 (Haiku) | 40% |
| Complex tasks 5/day | $7.50 | $7.50 (Opus) | 0% |
| Daily total | $208.50 | $12.15 | 94.2% |
| Monthly | ~$6,255 | ~$365 | $5,890 saved |
Actual savings depend on your request mix. Check Home in the Dashboard for real-time savings tracking.
§Plans & Pricing
| Plan | Price | Includes |
|---|---|---|
| Starter | $29/mo | Cloud dashboard, email alerts, 5 agents, 30-day log history |
| Pro | $79/mo | Semantic cache, unlimited agents, Slack/Discord alerts, heartbeat optimizer |
| Enterprise | $299/mo | SSO, audit logs, custom routing rules, dedicated support |
All plans include smart routing, API key pool, fallback chains, cost tracking, and the full dashboard. You only pay for tokens used with your own provider API keys — ClawSwitch doesn't mark up provider costs.