Docs/Guide

ClawSwitch Documentation

Route every AI request to the right model at the right price. Get started in minutes.

§Overview

ClawSwitch is an intelligent LLM proxy gateway. It sits between your AI agents and LLM providers, analyzing each request, picking the best model/provider combination, handling failover, and tracking every dollar — so you ship faster and spend less.

Your AI Agents / SDKs / Tools
↓ OpenAI-compatible API
┌── ClawSwitch Cloud ──────────────────────────┐
│ Auth → Cache → Score → Route → Guard → Exec │
│ Key Pool · Fallback Chains · Cost Tracking │
└──────┬──────────┬──────────┬─────────────────┘
↓ ↓ ↓
Anthropic OpenAI Google
Claude GPT-4o Gemini
GPT-4o-mini Moonshot/Kimi

Smart Routing

Every request scored and sent to the optimal model

Multi-Provider

Anthropic, OpenAI, Gemini, Moonshot — one endpoint

API Key Pool

Multiple keys per provider with auto-rotation on rate limits

Fallback Chains

If primary fails, cascades through backup models

Cost Tracking

Every request logs actual cost vs. baseline savings

Response Caching

Identical requests hit cache instead of burning tokens

Budget Controls

Daily/monthly limits per agent with alerts

OpenAI Compatible

Drop-in replacement — change one URL

§Quick Start

Get running in 3 steps — no installation needed.

  1. 1Pick a planGo to Pricing and select Starter, Pro, or Enterprise.
  2. 2Create your accountYou'll be redirected to the Dashboard. Sign up with email and password, then complete checkout.
  3. 3Create an agent & get your API keyIn the Dashboard, go to Agents → Create Agent. Name it (e.g., "chatbot"), copy the generated API key — it's shown only once.

Your first request

curl https://api.clawswitch.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Set model: "auto" and ClawSwitch picks the best model. Or specify a model name to force it.

§How It Works

Every request to POST /v1/chat/completions flows through this pipeline:

  1. 1AuthVerify your agent API key
  2. 2CacheExact or semantic match? → return instantly at $0
  3. 3ScoreAnalyze request complexity
  4. 4RoutePick cheapest model for that complexity tier
  5. 5GuardCheck budget limits, token limits, blocked models
  6. 6Key PoolSelect best available API key for the provider
  7. 7ExecuteCall the provider with automatic fallback chain
  8. 8TrackLog tokens, cost, savings
  9. 9RespondReturn OpenAI-format response + cost headers

Every response includes cost metadata headers:

X-clawswitch-Cost:    $0.000142   ← what you paid
X-clawswitch-Saved:   $7.533000   ← what you saved vs default model
X-clawswitch-Model:   claude-haiku-4-5-20251001
X-clawswitch-Cache:   miss
X-clawswitch-Tier:    simple

§Smart Routing

When you set model: "auto", ClawSwitch's proprietary routing engine analyzes each request and selects the optimal model for quality and cost. Requests are classified into tiers:

TierRoutes toExamples
simpleHaiku / GPT-4o-miniGreetings, lookups, formatting
standardHaiku / GPT-4o-mini / KimiCode, summaries, Q&A
complexSonnet / GPT-4oArchitecture, analysis, writing
premiumOpus / GPT-4-turbo / Gemini ProCritical reasoning, long review

The router also detects task domains (code, reasoning, chat, writing) and applies domain-specific model preferences. Each tier has a fallback chain — if the primary model fails, the request cascades to the next model automatically.

Guardrails

Configure safety limits on routing from the Dashboard Switch page:

  • Max cost per request — Reject requests that would exceed a cost threshold
  • Max input tokens — Reject inputs that are too long
  • Allowed tiers — Restrict which routing tiers can be used
  • Blocked models — Prevent specific models from being selected

§API Key Pool

Add multiple API keys per provider. ClawSwitch automatically rotates between them when one gets rate-limited — zero downtime for your agents.

Request arrives for Anthropic provider
Try Key A (priority 1) → HTTP 429 Rate Limited
↓ auto-rotate
Try Key B (priority 2) → Success!
Key A cooled down with exponential backoff
  • Keys are tried in priority order (lower number = preferred)
  • Same-priority keys rotate round-robin for even distribution
  • Rate-limited keys are cooled down with exponential backoff
  • When all keys for a provider are exhausted, email alert is sent

Manage keys in the Dashboard under Settings → API Key Pool:

AnthropicOpenAIGeminiMoonshot
StatDescription
Status indicatorGreen = active, Red = rate-limited, Gray = inactive
Request countTotal requests served by this key
Error countTotal errors (429s, 500s, etc.)
Last usedTimestamp of last successful request
Rate limited untilWhen the cooldown expires

§Caching

Identical or semantically similar requests are served from cache — $0 cost, instant response.

LayerMatchesTTLCost
ExactByte-identical request1 hour$0
SemanticSame meaning, different wording24 hours$0
Cache status is shown in every response via the X-clawswitch-Cache header: hit, miss, or bypass.

§Budget Controls

Set daily and monthly spend limits per agent to prevent runaway costs. Manage budgets in the Dashboard Budget page.

Budget usedBehavior
< 50%All routing tiers available
50–80%Prefer cheaper tiers, alert sent
80–100%Cheapest models only, urgent alert
> 100% (hard stop)Requests rejected with 403
> 100% (soft)Warning logged, continues

Each agent shows a progress bar with current spend vs. limit, plus savings amount. Budget alerts notify you before limits are hit.

§Dashboard Guide

The Dashboard is your control center. Access it at app.clawswitch.com.

PageWhat It Does
HomeToday's spend, savings, request volume, cache hit rate, 7-day trend chart
AgentsCreate/manage API keys for your AI agents with per-agent budget limits
ModelsView all available models across providers with cost per 1K tokens
WalletManage prepaid credits, set low-balance threshold, enable auto-topup
BudgetPer-agent daily/monthly spend vs. limits with progress bars
LogsRequest history: agent, model, tier, provider, cost, savings, latency, cache
SettingsProvider credentials and API Key Pool management
SwitchConfigure routing: guardrails, endpoints, routing matrix, test routing
PlaygroundTest the full routing pipeline live
BillingView/change subscription plan, sync status

Home (Analytics)

  • Today's Spend — Total cost of all requests today
  • Today's Saved — Amount saved vs. Claude Sonnet baseline
  • Request Count — Total requests processed today
  • Cache Hit Rate — Percentage served from cache

Agents

  1. 1. Click Create Agent
  2. 2. Enter a name (e.g., "code-reviewer", "chatbot", "research-agent")
  3. 3. Optionally set a daily budget limit
  4. 4. Copy the generated API key — shown only once
Each agent gets its own API key, budget limits, and usage statistics.

Logs

ColumnDescription
TimeWhen the request was made
AgentWhich agent made the request
Original ModelWhat model was requested
Routed ModelWhat model actually served it
TierRouting tier (simple/standard/complex/premium)
Cost / SavingsActual cost and amount saved
LatencyResponse time in milliseconds
CacheWhether it was a cache hit

§Playground

Test the complete proxy pipeline from the Dashboard. Send real requests through the same routing, key pool, and fallback chain your agents use — inspect every detail.

Analyze (Routing Only)

Click Analyze to see how a message would be routed without executing:

  • Complexity score and breakdown by factor
  • Tier classification and domain detection
  • Selected model, provider, and full fallback chain
  • Guardrail violation warnings

Run (Full Execution)

Click Run to send the message through the complete pipeline. Results across four tabs:

TabShows
ResponseThe actual LLM response with token counts
RoutingPrimary vs. actual model/provider, fallback used
ComplexityScore ring (0–100%), tier + domain badges, factor breakdown
CostActual cost, baseline cost, savings, latency
Use the Playground before deploying changes — test how your configuration affects real requests.

§SDK & Framework Integration

ClawSwitch is 100% OpenAI-compatible. Any SDK or tool that talks to OpenAI can point at ClawSwitch. Just change the base URL and API key.

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.clawswitch.com/v1",
    api_key="YOUR_API_KEY",  # from Dashboard → Agents
)

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    temperature=0.7,
)
print(response.choices[0].message.content)

Streaming:

stream = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Write a short poem"}],
    stream=True,
)
for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Node.js / TypeScript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.clawswitch.com/v1",
  apiKey: "YOUR_API_KEY",
});

const response = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

cURL

curl https://api.clawswitch.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "What is 2+2?"}]
  }'

LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://api.clawswitch.com/v1",
    api_key="YOUR_API_KEY",
    model="auto",
)
response = llm.invoke("Explain microservices in simple terms")

LlamaIndex / AutoGen

# LlamaIndex
from llama_index.llms.openai_like import OpenAILike
llm = OpenAILike(api_base="https://api.clawswitch.com/v1", api_key="YOUR_API_KEY", model="auto")

# AutoGen
config_list = [{"model": "auto", "base_url": "https://api.clawswitch.com/v1", "api_key": "YOUR_API_KEY"}]

Cursor / Continue / AI Code Editors

SettingValue
API Base URLhttps://api.clawswitch.com/v1
API KeyYour API key from Dashboard → Agents
Modelauto

Request Format

ParameterTypeDescription
modelstring"auto" for smart routing, or a specific model name
messagesarrayChat messages array (required)
temperaturefloatSampling temperature 0.0–1.0 (default: 0.7)
max_tokensintMaximum output tokens
streamboolEnable SSE streaming (default: false)
toolsarrayTool/function definitions

Response Headers

HeaderDescription
X-clawswitch-CostActual cost paid (e.g., $0.001234)
X-clawswitch-SavedSavings vs. Sonnet baseline
X-clawswitch-ModelModel that served the request
X-clawswitch-CacheCache status: hit, miss, or bypass
X-clawswitch-TierRouting tier used
X-Clawswitch-BalanceRemaining wallet balance

§API Reference

Proxy Endpoint

POST https://api.clawswitch.com/v1/chat/completions
Authorization: Bearer <your-api-key>

OpenAI-compatible chat completions with streaming support.

Management API

All management endpoints require JWT authentication (handled by the Dashboard automatically).

EndpointMethodDescription
/api/agentsGET/POSTList/create agents
/api/agents/{id}GET/PATCHGet/update agent
/api/stats/overviewGETToday/month stats
/api/stats/dailyGETDaily cost/request breakdown
/api/stats/requestsGETRecent request log
/api/modelsGETAvailable models with pricing
/api/budgetGETPer-agent budget status
/api/key-poolGET/POSTList/add API pool keys
/api/key-pool/{id}PATCH/DELETEUpdate/remove pool key
/api/key-pool/{id}/resetPOSTReset rate limit on a key
/api/key-pool/statusGETPool status per provider
/api/switchGETFull switch configuration
/api/switch/testPOSTTest routing for a message
/api/playground/analyzePOSTAnalyze routing without execution
/api/playground/runPOSTRun request through full pipeline
/api/walletGETWallet balance
/api/billing/subscriptionGETCurrent subscription status

Error Codes

CodeMeaning
200Success
401Invalid or missing API key
402Insufficient wallet balance
403Account inactive or budget exceeded
429Rate limit exceeded
500All providers failed
ClawSwitch handles retries internally through its fallback chain. Only retry on your end if you get a 500 with "All providers failed".

§Available Models

ModelProviderBest For
"auto"Auto-selectedLet ClawSwitch pick the optimal model
"claude-sonnet-4-20250514"AnthropicCode, reasoning, analysis
"claude-opus-4-20250514"AnthropicTop-tier complex tasks
"claude-haiku-4-5-20251001"AnthropicFast, cheap tasks
"gpt-4o"OpenAIGeneral purpose
"gpt-4o-mini"OpenAIFast, cheap
"gemini-2.5-pro"GoogleLong context, reasoning
"gemini-2.5-flash"GoogleFast, cheap
"moonshot/kimi-k2.5"Moonshot256K context, very cheap

View all models with live pricing on the Dashboard Models page. To force a model, set it as the model value instead of "auto".

§Cost Savings

Example: Single agent (8h/day active)

Request typeWithout ClawSwitchWith ClawSwitchSavings
Heartbeats 480/day$180.00$0.15 (cache)99.9%
Code tasks 50/day$15.00$0.90 (Haiku)94%
Analysis 20/day$6.00$3.60 (Haiku)40%
Complex tasks 5/day$7.50$7.50 (Opus)0%
Daily total$208.50$12.1594.2%
Monthly~$6,255~$365$5,890 saved

Actual savings depend on your request mix. Check Home in the Dashboard for real-time savings tracking.

§Plans & Pricing

PlanPriceIncludes
Starter$29/moCloud dashboard, email alerts, 5 agents, 30-day log history
Pro$79/moSemantic cache, unlimited agents, Slack/Discord alerts, heartbeat optimizer
Enterprise$299/moSSO, audit logs, custom routing rules, dedicated support

All plans include smart routing, API key pool, fallback chains, cost tracking, and the full dashboard. You only pay for tokens used with your own provider API keys — ClawSwitch doesn't mark up provider costs.

Last updated March 2026 · Back to top ↑