llm-providers

GitHub: Stackbilt-dev/llm-providers · MIT

Part of the Stackbilt ecosystem. Provides the multi-provider inference layer used across the Stackbilder platform and available as a standalone npm package. For image generation, use img-forge — ImageProvider in this package is deprecated.

A multi-provider LLM abstraction layer with automatic failover, graduated circuit breakers, cost tracking, and intelligent retry. Extracted from a production orchestration platform handling 80K+ LOC. Built for Cloudflare Workers but runs anywhere with a standard fetch API.

Install

npm install @stackbilt/llm-providers

Zero runtime dependencies.

Quick Start

import { LLMProviders, MODELS } from '@stackbilt/llm-providers';

const llm = new LLMProviders({
  anthropic: { apiKey: process.env.ANTHROPIC_API_KEY },
  cloudflare: { ai: env.AI },
  groq: { apiKey: process.env.GROQ_API_KEY },
  defaultProvider: 'auto',
  costOptimization: true,
  enableCircuitBreaker: true,
});

const response = await llm.generateResponse({
  messages: [{ role: 'user', content: 'Summarize the circuit breaker pattern.' }],
  maxTokens: 1000,
});

console.log(response.message);
console.log(`Provider: ${response.provider}, Cost: $${response.usage.cost}`);

Auto-discover from environment:

// Scans env for ANTHROPIC_API_KEY, OPENAI_API_KEY, GROQ_API_KEY, CEREBRAS_API_KEY,
// NVIDIA_API_KEY, and the AI binding — configures only what's present
const llm = LLMProviders.fromEnv(env, { costOptimization: true, enableCircuitBreaker: true });

Providers

Provider	Streaming	Tools	Default model
Anthropic	Yes	Yes	`claude-haiku-4-5-20251001`
OpenAI	Yes	Yes	`gpt-4o-mini`
Cloudflare Workers AI	Yes	Selected	Catalog-driven
Cerebras	Yes	GLM, Qwen, GPT-OSS	~2,200 tok/s
Groq	Yes	LLaMA 3.3 70B, GPT-OSS 120B	Ultra-fast inference
NVIDIA NIM	Yes	Llama, Nemotron, Mistral Large 2	Dev-tier credits

Circuit Breaker

Each provider has a graduated 4-state circuit breaker that routes traffic away from failing providers with probabilistic degradation instead of a hard on/off switch.

State	Behavior
Closed	100% traffic to primary
Degraded	Probabilistic split: 90% → 70% → 40% → 10% as failures accumulate
Recovering	Success steps traffic back up one level at a time
Open	0% traffic. After `resetTimeout` ms, failures decay and traffic resumes

Default: 5-step curve [1.0, 0.9, 0.7, 0.4, 0.1], 60s reset, 5-minute monitoring window.

In Cloudflare Workers, circuit state is per-isolate. Persist to KV/D1/Durable Objects if you need state to survive across requests:

import { defaultCircuitBreakerManager, CreditLedger } from '@stackbilt/llm-providers';

// On request start: restore
const json = await env.LLM_STATE.get('circuit-breakers');
if (json) defaultCircuitBreakerManager.restore(json);

// On request end: persist
await env.LLM_STATE.put('circuit-breakers', defaultCircuitBreakerManager.serialize());

Cost Tracking

import { CreditLedger, LLMProviders } from '@stackbilt/llm-providers';

const ledger = new CreditLedger({
  budgets: [
    { provider: 'anthropic', monthlyBudget: 100, rateLimits: { rpm: 60 } },
    { provider: 'openai', monthlyBudget: 50 },
  ],
});

ledger.on((event) => {
  if (event.type === 'threshold_crossed') {
    console.warn(`${event.provider}: ${event.tier} — ${event.utilizationPct.toFixed(0)}% of budget`);
  }
});

const llm = new LLMProviders({ anthropic: { apiKey: '...' }, ledger });

Threshold alerts fire at 80%, 90%, 95% utilization.

Tool-Use Loop

generateResponseWithTools owns the full generate → parse → execute → repeat cycle with iteration caps and cost limits:

const result = await llm.generateResponseWithTools(
  {
    messages: [{ role: 'user', content: 'What is 2 + 2 * 3?' }],
    tools: [{ type: 'function', function: { name: 'calculate', description: '...', parameters: { type: 'object', properties: { expr: { type: 'string' } }, required: ['expr'] } } }],
  },
  {
    execute: async (name, args) => eval((args as { expr: string }).expr),
  },
  { maxIterations: 5, maxCostUSD: 0.10 }
);

Canonical Provider Contract

The canonical boundary for downstream gateways:

client protocol → gateway adapter → CanonicalLLMRequest → llm-providers → vendor API

import { canonicalToLLMRequest, normalizeLLMRequest, normalizeLLMResponse } from '@stackbilt/llm-providers';

normalizeLLMRequest() maps legacy aliases into the canonical shape. normalizeLLMResponse() provides a stable response shape with normalized routing metadata.

Routing Introspection

Pre-flight routing inspection without dispatching:

import { getRoutingInfo } from '@stackbilt/llm-providers';

const info = getRoutingInfo(
  { messages: [{ role: 'user', content: 'Call the weather tool' }], tools: [...] },
  ['cloudflare', 'cerebras', 'groq']
);

// info.useCase    → 'TOOL_CALLING'
// info.provider   → 'cerebras'
// info.model      → 'zai-glm-4.7'
// info.modelLifecycle → 'active'

Built-in Tools (Server-Side)

Some models run tools server-side — no execute callback required:

const res = await llm.generateResponse({
  model: MODELS.GROQ_COMPOUND,
  messages: [{ role: 'user', content: 'Find sources on topic X.' }],
  builtInTools: [{ type: 'web_search' }],
});
// res.metadata.builtInToolResults → [{type, name, results:[{title, url, content, score}]}]

Supported: web_search, visit_website, browser_automation, code_interpreter, wolfram_alpha (model-gated). Built-in tool surcharges are billed by the provider, not tracked in TokenUsage.

Cloudflare AI Gateway

Route requests through Cloudflare AI Gateway for network-layer caching and rate-limit absorption:

const provider = new AnthropicProvider({
  apiKey: 'sk-ant-...',
  cfGateway: { accountId: 'YOUR_ACCOUNT_ID', gatewayId: 'my-gateway' },
});

Composes with llm-providers’ semantic routing — the factory picks provider/model, the gateway sits underneath each provider’s HTTP and handles caching transparently.

Streaming

const stream = await llm.generateResponseStream({
  messages: [{ role: 'user', content: 'Tell me a story.' }],
});
const reader = stream.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  process.stdout.write(value);
}

Pre-stream HTTP errors (401, 429, 503, circuit open) fall over to the next provider before emitting the first chunk.

Deprecation Notice

ImageProvider in this package is deprecated. Use img-forge or its MCP tools for image generation. llm-providers handles text inference and vision understanding only. ImageProvider, ImageProviderConfig, and IMAGE_MODELS will be removed in the next major version.