Docs/Guide

ClawSwitch Documentation

Route every AI request to the right model at the right price. Get started in minutes.

§Overview

ClawSwitch is an intelligent LLM proxy gateway. It sits between your AI agents and LLM providers, analyzing each request, picking the best model/provider combination, handling failover, and tracking every dollar — so you ship faster and spend less.

Your AI Agents / SDKs / Tools

↓ OpenAI-compatible API

┌── ClawSwitch Cloud ──────────────────────────┐

│ Auth → Cache → Score → Route → Guard → Exec │

│ Key Pool · Fallback Chains · Cost Tracking │

└──────┬──────────┬──────────┬─────────────────┘

↓ ↓ ↓

Anthropic OpenAI Google

Claude GPT-4o Gemini

GPT-4o-mini Moonshot/Kimi

Smart Routing

Every request scored and sent to the optimal model

Multi-Provider

Anthropic, OpenAI, Gemini, Moonshot — one endpoint

API Key Pool

Multiple keys per provider with auto-rotation on rate limits

Fallback Chains

If primary fails, cascades through backup models

Cost Tracking

Every request logs actual cost vs. baseline savings

Response Caching

Identical requests hit cache instead of burning tokens

Budget Controls

Daily/monthly limits per agent with alerts

OpenAI Compatible

Drop-in replacement — change one URL

§Quick Start

Get running in 3 steps — no installation needed.

1Pick a plan — Go to Pricing and select Starter, Pro, or Enterprise.
2Create your account — You'll be redirected to the Dashboard. Sign up with email and password, then complete checkout.
3Create an agent & get your API key — In the Dashboard, go to Agents → Create Agent. Name it (e.g., "chatbot"), copy the generated API key — it's shown only once.

Your first request

curl https://api.clawswitch.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Set model: "auto" and ClawSwitch picks the best model. Or specify a model name to force it.

§How It Works

Every request to POST /v1/chat/completions flows through this pipeline:

1Auth — Verify your agent API key
2Cache — Exact or semantic match? → return instantly at $0
3Score — Analyze request complexity
4Route — Pick cheapest model for that complexity tier
5Guard — Check budget limits, token limits, blocked models
6Key Pool — Select best available API key for the provider
7Execute — Call the provider with automatic fallback chain
8Track — Log tokens, cost, savings
9Respond — Return OpenAI-format response + cost headers

Every response includes cost metadata headers:

X-clawswitch-Cost:    $0.000142   ← what you paid
X-clawswitch-Saved:   $7.533000   ← what you saved vs default model
X-clawswitch-Model:   claude-haiku-4-5-20251001
X-clawswitch-Cache:   miss
X-clawswitch-Tier:    simple

§Smart Routing

When you set model: "auto", ClawSwitch's proprietary routing engine analyzes each request and selects the optimal model for quality and cost. Requests are classified into tiers:

Tier	Routes to	Examples
simple	Haiku / GPT-4o-mini	Greetings, lookups, formatting
standard	Haiku / GPT-4o-mini / Kimi	Code, summaries, Q&A
complex	Sonnet / GPT-4o	Architecture, analysis, writing
premium	Opus / GPT-4-turbo / Gemini Pro	Critical reasoning, long review

The router also detects task domains (code, reasoning, chat, writing) and applies domain-specific model preferences. Each tier has a fallback chain — if the primary model fails, the request cascades to the next model automatically.

Guardrails

Configure safety limits on routing from the Dashboard Switch page:

•Max cost per request — Reject requests that would exceed a cost threshold
•Max input tokens — Reject inputs that are too long
•Allowed tiers — Restrict which routing tiers can be used
•Blocked models — Prevent specific models from being selected

§API Key Pool

Add multiple API keys per provider. ClawSwitch automatically rotates between them when one gets rate-limited — zero downtime for your agents.

Request arrives for Anthropic provider

↓

Try Key A (priority 1) → HTTP 429 Rate Limited

↓ auto-rotate

Try Key B (priority 2) → Success!

↓

Key A cooled down with exponential backoff

•Keys are tried in priority order (lower number = preferred)
•Same-priority keys rotate round-robin for even distribution
•Rate-limited keys are cooled down with exponential backoff
•When all keys for a provider are exhausted, email alert is sent

Manage keys in the Dashboard under Settings → API Key Pool:

AnthropicOpenAIGeminiMoonshot

Stat	Description
Status indicator	Green = active, Red = rate-limited, Gray = inactive
Request count	Total requests served by this key
Error count	Total errors (429s, 500s, etc.)
Last used	Timestamp of last successful request
Rate limited until	When the cooldown expires

§Caching

Identical or semantically similar requests are served from cache — $0 cost, instant response.

Layer	Matches	TTL	Cost
Exact	Byte-identical request	1 hour	$0
Semantic	Same meaning, different wording	24 hours	$0

Cache status is shown in every response via the X-clawswitch-Cache header: hit, miss, or bypass.

§Budget Controls

Set daily and monthly spend limits per agent to prevent runaway costs. Manage budgets in the Dashboard Budget page.

Budget used	Behavior
< 50%	All routing tiers available
50–80%	Prefer cheaper tiers, alert sent
80–100%	Cheapest models only, urgent alert
> 100% (hard stop)	Requests rejected with 403
> 100% (soft)	Warning logged, continues

Each agent shows a progress bar with current spend vs. limit, plus savings amount. Budget alerts notify you before limits are hit.

§Dashboard Guide

The Dashboard is your control center. Access it at app.clawswitch.com.

Page	What It Does
Home	Today's spend, savings, request volume, cache hit rate, 7-day trend chart
Agents	Create/manage API keys for your AI agents with per-agent budget limits
Models	View all available models across providers with cost per 1K tokens
Wallet	Manage prepaid credits, set low-balance threshold, enable auto-topup
Budget	Per-agent daily/monthly spend vs. limits with progress bars
Logs	Request history: agent, model, tier, provider, cost, savings, latency, cache
Settings	Provider credentials and API Key Pool management
Switch	Configure routing: guardrails, endpoints, routing matrix, test routing
Playground	Test the full routing pipeline live
Billing	View/change subscription plan, sync status

Home (Analytics)

•Today's Spend — Total cost of all requests today
•Today's Saved — Amount saved vs. Claude Sonnet baseline
•Request Count — Total requests processed today
•Cache Hit Rate — Percentage served from cache

Agents

1. Click Create Agent
2. Enter a name (e.g., "code-reviewer", "chatbot", "research-agent")
3. Optionally set a daily budget limit
4. Copy the generated API key — shown only once

Each agent gets its own API key, budget limits, and usage statistics.

Logs

Column	Description
Time	When the request was made
Agent	Which agent made the request
Original Model	What model was requested
Routed Model	What model actually served it
Tier	Routing tier (simple/standard/complex/premium)
Cost / Savings	Actual cost and amount saved
Latency	Response time in milliseconds
Cache	Whether it was a cache hit

§Playground

Test the complete proxy pipeline from the Dashboard. Send real requests through the same routing, key pool, and fallback chain your agents use — inspect every detail.

Analyze (Routing Only)

Click Analyze to see how a message would be routed without executing:

•Complexity score and breakdown by factor
•Tier classification and domain detection
•Selected model, provider, and full fallback chain
•Guardrail violation warnings

Run (Full Execution)

Click Run to send the message through the complete pipeline. Results across four tabs:

Tab	Shows
Response	The actual LLM response with token counts
Routing	Primary vs. actual model/provider, fallback used
Complexity	Score ring (0–100%), tier + domain badges, factor breakdown
Cost	Actual cost, baseline cost, savings, latency

Use the Playground before deploying changes — test how your configuration affects real requests.

§SDK & Framework Integration

ClawSwitch is 100% OpenAI-compatible. Any SDK or tool that talks to OpenAI can point at ClawSwitch. Just change the base URL and API key.

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.clawswitch.com/v1",
    api_key="YOUR_API_KEY",  # from Dashboard → Agents
)

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    temperature=0.7,
)
print(response.choices[0].message.content)

Streaming:

stream = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Write a short poem"}],
    stream=True,
)
for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Node.js / TypeScript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.clawswitch.com/v1",
  apiKey: "YOUR_API_KEY",
});

const response = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

cURL

curl https://api.clawswitch.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "What is 2+2?"}]
  }'

LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://api.clawswitch.com/v1",
    api_key="YOUR_API_KEY",
    model="auto",
)
response = llm.invoke("Explain microservices in simple terms")

LlamaIndex / AutoGen

# LlamaIndex
from llama_index.llms.openai_like import OpenAILike
llm = OpenAILike(api_base="https://api.clawswitch.com/v1", api_key="YOUR_API_KEY", model="auto")

# AutoGen
config_list = [{"model": "auto", "base_url": "https://api.clawswitch.com/v1", "api_key": "YOUR_API_KEY"}]

Cursor / Continue / AI Code Editors

Setting	Value
API Base URL	`https://api.clawswitch.com/v1`
API Key	Your API key from Dashboard → Agents
Model	`auto`

Request Format

Parameter	Type	Description
`model`	string	`"auto"` for smart routing, or a specific model name
`messages`	array	Chat messages array (required)
`temperature`	float	Sampling temperature 0.0–1.0 (default: 0.7)
`max_tokens`	int	Maximum output tokens
`stream`	bool	Enable SSE streaming (default: false)
`tools`	array	Tool/function definitions

Response Headers

Header	Description
`X-clawswitch-Cost`	Actual cost paid (e.g., $0.001234)
`X-clawswitch-Saved`	Savings vs. Sonnet baseline
`X-clawswitch-Model`	Model that served the request
`X-clawswitch-Cache`	Cache status: hit, miss, or bypass
`X-clawswitch-Tier`	Routing tier used
`X-Clawswitch-Balance`	Remaining wallet balance

§API Reference

Proxy Endpoint

POST https://api.clawswitch.com/v1/chat/completions
Authorization: Bearer <your-api-key>

OpenAI-compatible chat completions with streaming support.

Management API

All management endpoints require JWT authentication (handled by the Dashboard automatically).

Endpoint	Method	Description
`/api/agents`	GET/POST	List/create agents
`/api/agents/{id}`	GET/PATCH	Get/update agent
`/api/stats/overview`	GET	Today/month stats
`/api/stats/daily`	GET	Daily cost/request breakdown
`/api/stats/requests`	GET	Recent request log
`/api/models`	GET	Available models with pricing
`/api/budget`	GET	Per-agent budget status
`/api/key-pool`	GET/POST	List/add API pool keys
`/api/key-pool/{id}`	PATCH/DELETE	Update/remove pool key
`/api/key-pool/{id}/reset`	POST	Reset rate limit on a key
`/api/key-pool/status`	GET	Pool status per provider
`/api/switch`	GET	Full switch configuration
`/api/switch/test`	POST	Test routing for a message
`/api/playground/analyze`	POST	Analyze routing without execution
`/api/playground/run`	POST	Run request through full pipeline
`/api/wallet`	GET	Wallet balance
`/api/billing/subscription`	GET	Current subscription status

Error Codes

Code	Meaning
200	Success
401	Invalid or missing API key
402	Insufficient wallet balance
403	Account inactive or budget exceeded
429	Rate limit exceeded
500	All providers failed

ClawSwitch handles retries internally through its fallback chain. Only retry on your end if you get a 500 with "All providers failed".

§Available Models

Model	Provider	Best For
`"auto"`	Auto-selected	Let ClawSwitch pick the optimal model
`"claude-sonnet-4-20250514"`	Anthropic	Code, reasoning, analysis
`"claude-opus-4-20250514"`	Anthropic	Top-tier complex tasks
`"claude-haiku-4-5-20251001"`	Anthropic	Fast, cheap tasks
`"gpt-4o"`	OpenAI	General purpose
`"gpt-4o-mini"`	OpenAI	Fast, cheap
`"gemini-2.5-pro"`	Google	Long context, reasoning
`"gemini-2.5-flash"`	Google	Fast, cheap
`"moonshot/kimi-k2.5"`	Moonshot	256K context, very cheap

View all models with live pricing on the Dashboard Models page. To force a model, set it as the model value instead of "auto".

§Cost Savings

Example: Single agent (8h/day active)

Request type	Without ClawSwitch	With ClawSwitch	Savings
Heartbeats 480/day	$180.00	$0.15 (cache)	99.9%
Code tasks 50/day	$15.00	$0.90 (Haiku)	94%
Analysis 20/day	$6.00	$3.60 (Haiku)	40%
Complex tasks 5/day	$7.50	$7.50 (Opus)	0%
Daily total	$208.50	$12.15	94.2%
Monthly	~$6,255	~$365	$5,890 saved

Actual savings depend on your request mix. Check Home in the Dashboard for real-time savings tracking.

§Plans & Pricing

Plan	Price	Includes
Starter	$29/mo	Cloud dashboard, email alerts, 5 agents, 30-day log history
Pro	$79/mo	Semantic cache, unlimited agents, Slack/Discord alerts, heartbeat optimizer
Enterprise	$299/mo	SSO, audit logs, custom routing rules, dedicated support

All plans include smart routing, API key pool, fallback chains, cost tracking, and the full dashboard. You only pay for tokens used with your own provider API keys — ClawSwitch doesn't mark up provider costs.

View pricing →

Last updated March 2026 · Back to top ↑