Cloudflare AI stack. Use skills in this subdomain when building:
Three services that compose into the 2nth.ai AI pipeline:
| Service | Role | When to use |
|---|---|---|
tech/cloudflare/ai/workers-ai | Edge inference | Classification, routing, embeddings — cheap + zero latency |
tech/cloudflare/ai/ai-gateway | Proxy + observability | All Claude calls — token metering, caching, fallback |
tech/cloudflare/ai/vectorize | Vector database | RAG, semantic search, skill discovery |
Request
→ Workers AI (Llama 3.1 8B) — classify intent at edge (5–20 tokens, ~1ms)
→ If complex: AI Gateway → Claude — domain expert response
→ Vectorize — retrieve relevant skills/context for RAG
→ Response streams back to client
→ AI Gateway logs token usage (→ Penny's token economy)
This pattern minimises Claude API costs by filtering and enriching at the edge before the expensive call.