OpenRouter - Unified AI API Gateway

Overview

OpenRouter provides a single API to access 200+ language models from OpenAI, Anthropic, Google, Meta, Mistral, and more. It offers intelligent routing, streaming, cost optimization, and standardized OpenAI-compatible interface.

Key Features:

Access 200+ models through one API
OpenAI-compatible interface (drop-in replacement)
Intelligent model routing and fallbacks
Real-time streaming responses
Cost tracking and optimization
Model performance analytics
Function calling support
Vision model support

Pricing Model:

Pay-per-token (no subscriptions)
Volume discounts available
Free tier with credits
Per-model pricing varies

Installation:

npm install openai  # Use OpenAI SDK
# or
pip install openai  # Python

Quick Start

1. Get API Key

# Sign up at https://openrouter.ai/keys
export OPENROUTER_API_KEY="sk-or-v1-..."

2. Basic Chat Completion

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: process.env.OPENROUTER_API_KEY,
  defaultHeaders: {
    'HTTP-Referer': 'https://your-app.com',  // Optional
    'X-Title': 'Your App Name',              // Optional
  }
});

async function chat() {
  const completion = await client.chat.completions.create({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
      { role: 'user', content: 'Explain quantum computing in simple terms' }
    ],
  });

  console.log(completion.choices[0].message.content);
}

3. Streaming Response

async function streamChat() {
  const stream = await client.chat.completions.create({
    model: 'openai/gpt-4-turbo',
    messages: [
      { role: 'user', content: 'Write a short story about AI' }
    ],
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || '';
    process.stdout.write(content);
  }
}

Model Selection Strategy

Available Model Categories

Flagship Models (Highest Quality):

const flagshipModels = {
  claude: 'anthropic/claude-3.5-sonnet',      // Best reasoning
  gpt4: 'openai/gpt-4-turbo',                 // Best general purpose
  gemini: 'google/gemini-pro-1.5',            // Best long context
  opus: 'anthropic/claude-3-opus',            // Best complex tasks
};

Fast Models (Low Latency):

const fastModels = {
  claude: 'anthropic/claude-3-haiku',         // Fastest Claude
  gpt35: 'openai/gpt-3.5-turbo',             // Fast GPT
  gemini: 'google/gemini-flash-1.5',         // Fast Gemini
  llama: 'meta-llama/llama-3.1-8b-instruct', // Fast open source
};

Cost-Optimized Models:

const budgetModels = {
  haiku: 'anthropic/claude-3-haiku',          // $0.25/$1.25 per 1M tokens
  gemini: 'google/gemini-flash-1.5',         // $0.075/$0.30 per 1M tokens
  llama: 'meta-llama/llama-3.1-8b-instruct', // $0.06/$0.06 per 1M tokens
  mixtral: 'mistralai/mixtral-8x7b-instruct', // $0.24/$0.24 per 1M tokens
};

Specialized Models:

const specializedModels = {
  vision: 'openai/gpt-4-vision-preview',     // Image understanding
  code: 'anthropic/claude-3.5-sonnet',       // Code generation
  longContext: 'google/gemini-pro-1.5',      // 2M token context
  function: 'openai/gpt-4-turbo',            // Function calling
};

Model Selection Logic

interface ModelSelector {
  task: 'chat' | 'code' | 'vision' | 'function' | 'summary';
  priority: 'quality' | 'speed' | 'cost';
  maxCost?: number;  // Max cost per 1M tokens
  contextSize?: number;
}

function selectModel(criteria: ModelSelector): string {
  if (criteria.task === 'vision') {
    return 'openai/gpt-4-vision-preview';
  }

  if (criteria.task === 'code') {
    return criteria.priority === 'quality'
      ? 'anthropic/claude-3.5-sonnet'
      : 'meta-llama/llama-3.1-70b-instruct';
  }

  if (criteria.contextSize && criteria.contextSize > 100000) {
    return 'google/gemini-pro-1.5';  // 2M context
  }

  // Default selection by priority
  switch (criteria.priority) {
    case 'quality':
      return 'anthropic/claude-3.5-sonnet';
    case 'speed':
      return 'anthropic/claude-3-haiku';
    case 'cost':
      return criteria.maxCost && criteria.maxCost < 0.5
        ? 'google/gemini-flash-1.5'
        : 'anthropic/claude-3-haiku';
    default:
      return 'openai/gpt-4-turbo';
  }
}

// Usage
const model = selectModel({
  task: 'code',
  priority: 'quality',
});

Streaming Implementation

TypeScript Streaming with Error Handling

async function robustStreamingChat(
  prompt: string,
  model: string = 'anthropic/claude-3.5-sonnet'
) {
  try {
    const stream = await client.chat.completions.create({
      model,
      messages: [{ role: 'user', content: prompt }],
      stream: true,
      max_tokens: 4000,
    });

    let fullResponse = '';

    for await (const chunk of stream) {
      const delta = chunk.choices[0]?.delta;

      if (delta?.content) {
        fullResponse += delta.content;
        process.stdout.write(delta.content);
      }

      // Handle function calls
      if (delta?.function_call) {
        console.log('\nFunction call:', delta.function_call);
      }

      // Check for finish reason
      if (chunk.choices[0]?.finish_reason) {
        console.log(`\n[Finished: ${chunk.choices[0].finish_reason}]`);
      }
    }

    return fullResponse;
  } catch (error) {
    if (error instanceof Error) {
      console.error('Streaming error:', error.message);
    }
    throw error;
  }
}

Python Streaming

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ.get("OPENROUTER_API_KEY"),
)

def stream_chat(prompt: str, model: str = "anthropic/claude-3.5-sonnet"):
    stream = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        stream=True,
    )

    full_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            full_response += content
            print(content, end="", flush=True)

    print()  # New line
    return full_response

React Streaming Component

import { useState } from 'react';

function StreamingChat() {
  const [response, setResponse] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);

  async function handleSubmit(prompt: string) {
    setIsStreaming(true);
    setResponse('');

    try {
      const res = await fetch('https://openrouter.ai/api/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`,
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'anthropic/claude-3.5-sonnet',
          messages: [{ role: 'user', content: prompt }],
          stream: true,
        }),
      });

      const reader = res.body?.getReader();
      const decoder = new TextDecoder();

      while (true) {
        const { done, value } = await reader!.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n').filter(line => line.trim());

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const data = line.slice(6);
            if (data === '[DONE]') continue;

            try {
              const parsed = JSON.parse(data);
              const content = parsed.choices[0]?.delta?.content || '';
              setResponse(prev => prev + content);
            } catch (e) {
              // Skip invalid JSON
            }
          }
        }
      }
    } catch (error) {
      console.error('Streaming error:', error);
    } finally {
      setIsStreaming(false);
    }
  }

  return (
    <div>
      <textarea
        value={response}
        readOnly
        rows={20}
        cols={80}
        placeholder="Response will appear here..."
      />
      <button onClick={() => handleSubmit('Explain AI')}>
        {isStreaming ? 'Streaming...' : 'Send'}
      </button>
    </div>
  );
}

Function Calling

Basic Function Calling

const tools = [
  {
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get current weather for a location',
      parameters: {
        type: 'object',
        properties: {
          location: {
            type: 'string',
            description: 'City name, e.g. San Francisco',
          },
          unit: {
            type: 'string',
            enum: ['celsius', 'fahrenheit'],
          },
        },
        required: ['location'],
      },
    },
  },
];

async function chatWithFunctions() {
  const completion = await client.chat.completions.create({
    model: 'openai/gpt-4-turbo',
    messages: [
      { role: 'user', content: 'What is the weather in Tokyo?' }
    ],
    tools,
    tool_choice: 'auto',
  });

  const message = completion.choices[0].message;

  if (message.tool_calls) {
    for (const toolCall of message.tool_calls) {
      console.log('Function:', toolCall.function.name);
      console.log('Arguments:', toolCall.function.arguments);

      // Execute function
      const args = JSON.parse(toolCall.function.arguments);
      const result = await getWeather(args.location, args.unit);

      // Send result back
      const followUp = await client.chat.completions.create({
        model: 'openai/gpt-4-turbo',
        messages: [
          { role: 'user', content: 'What is the weather in Tokyo?' },
          message,
          {
            role: 'tool',
            tool_call_id: toolCall.id,
            content: JSON.stringify(result),
          },
        ],
        tools,
      });

      console.log(followUp.choices[0].message.content);
    }
  }
}

Multi-Step Function Calling

async function multiStepFunctionCall(userQuery: string) {
  const messages = [{ role: 'user', content: userQuery }];
  let iterationCount = 0;
  const maxIterations = 5;

  while (iterationCount < maxIterations) {
    const completion = await client.chat.completions.create({
      model: 'openai/gpt-4-turbo',
      messages,
      tools,
      tool_choice: 'auto',
    });

    const message = completion.choices[0].message;
    messages.push(message);

    if (!message.tool_calls) {
      // No more function calls, return final response
      return message.content;
    }

    // Execute all function calls
    for (const toolCall of message.tool_calls) {
      const functionName = toolCall.function.name;
      const args = JSON.parse(toolCall.function.arguments);

      // Execute function (implement your function registry)
      const result = await executeFunctionCall(functionName, args);

      messages.push({
        role: 'tool',
        tool_call_id: toolCall.id,
        content: JSON.stringify(result),
      });
    }

    iterationCount++;
  }

  throw new Error('Max iterations reached');
}

Cost Optimization

Token Counting and Cost Estimation

import { encoding_for_model } from 'tiktoken';

interface CostEstimate {
  promptTokens: number;
  completionTokens: number;
  promptCost: number;
  completionCost: number;
  totalCost: number;
}

const modelPricing = {
  'anthropic/claude-3.5-sonnet': { input: 3.00, output: 15.00 },  // per 1M tokens
  'anthropic/claude-3-haiku': { input: 0.25, output: 1.25 },
  'openai/gpt-4-turbo': { input: 10.00, output: 30.00 },
  'openai/gpt-3.5-turbo': { input: 0.50, output: 1.50 },
  'google/gemini-flash-1.5': { input: 0.075, output: 0.30 },
};

function estimateCost(
  prompt: string,
  expectedCompletion: number,
  model: string
): CostEstimate {
  const encoder = encoding_for_model('gpt-4');  // Approximation
  const promptTokens = encoder.encode(prompt).length;
  const completionTokens = expectedCompletion;

  const pricing = modelPricing[model] || { input: 0, output: 0 };

  const promptCost = (promptTokens / 1_000_000) * pricing.input;
  const completionCost = (completionTokens / 1_000_000) * pricing.output;

  return {
    promptTokens,
    completionTokens,
    promptCost,
    completionCost,
    totalCost: promptCost + completionCost,
  };
}

// Usage
const estimate = estimateCost(
  'Explain quantum computing',
  500,  // Expected response tokens
  'anthropic/claude-3.5-sonnet'
);

console.log(`Estimated cost: $${estimate.totalCost.toFixed(4)}`);

Dynamic Model Selection by Budget

async function budgetOptimizedChat(
  prompt: string,
  maxCostPerRequest: number = 0.01  // $0.01 max
) {
  // Estimate with expensive model
  const expensiveEstimate = estimateCost(
    prompt,
    1000,
    'anthropic/claude-3.5-sonnet'
  );

  let selectedModel = 'anthropic/claude-3.5-sonnet';

  if (expensiveEstimate.totalCost > maxCostPerRequest) {
    // Try cheaper models
    const cheapEstimate = estimateCost(
      prompt,
      1000,
      'anthropic/claude-3-haiku'
    );

    if (cheapEstimate.totalCost > maxCostPerRequest) {
      selectedModel = 'google/gemini-flash-1.5';
    } else {
      selectedModel = 'anthropic/claude-3-haiku';
    }
  }

  console.log(`Selected model: ${selectedModel}`);

  const completion = await client.chat.completions.create({
    model: selectedModel,
    messages: [{ role: 'user', content: prompt }],
  });

  return completion.choices[0].message.content;
}

Batching for Cost Reduction

async function batchProcess(prompts: string[], model: string) {
  // Process multiple prompts in parallel with rate limiting
  const concurrency = 5;
  const results = [];

  for (let i = 0; i < prompts.length; i += concurrency) {
    const batch = prompts.slice(i, i + concurrency);

    const batchResults = await Promise.all(
      batch.map(prompt =>
        client.chat.completions.create({
          model,
          messages: [{ role: 'user', content: prompt }],
          max_tokens: 500,  // Limit tokens to control cost
        })
      )
    );

    results.push(...batchResults);

    // Rate limiting delay
    if (i + concurrency < prompts.length) {
      await new Promise(resolve => setTimeout(resolve, 1000));
    }
  }

  return results;
}

Model Fallback and Retry Strategy

Automatic Fallback

const modelFallbackChain = [
  'anthropic/claude-3.5-sonnet',
  'openai/gpt-4-turbo',
  'anthropic/claude-3-haiku',
  'google/gemini-flash-1.5',
];

async function chatWithFallback(
  prompt: string,
  maxRetries: number = 3
): Promise<string> {
  for (const model of modelFallbackChain) {
    try {
      console.log(`Trying model: ${model}`);

      const completion = await client.chat.completions.create({
        model,
        messages: [{ role: 'user', content: prompt }],
        max_tokens: 2000,
      });

      return completion.choices[0].message.content || '';
    } catch (error) {
      console.warn(`Model ${model} failed:`, error);

      // Continue to next model
      if (model === modelFallbackChain[modelFallbackChain.length - 1]) {
        throw new Error('All models failed');
      }
    }
  }

  throw new Error('No models available');
}

Exponential Backoff for Rate Limits

async function retryWithBackoff<T>(
  fn: () => Promise<T>,
  maxRetries: number = 5
): Promise<T> {
  let lastError: Error;

  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error as Error;

      // Check if rate limit error
      if (error.status === 429) {
        const delay = Math.pow(2, i) * 1000;  // Exponential backoff
        console.log(`Rate limited. Retrying in ${delay}ms...`);
        await new Promise(resolve => setTimeout(resolve, delay));
      } else {
        throw error;  // Non-retryable error
      }
    }
  }

  throw lastError!;
}

// Usage
const result = await retryWithBackoff(() =>
  client.chat.completions.create({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [{ role: 'user', content: 'Hello' }],
  })
);

Prompt Engineering Best Practices

System Prompts for Consistency

const systemPrompts = {
  concise: 'You are a helpful assistant. Be concise and direct.',
  detailed: 'You are a knowledgeable expert. Provide comprehensive answers with examples.',
  code: 'You are an expert programmer. Provide clean, well-commented code with explanations.',
  creative: 'You are a creative writing assistant. Be imaginative and engaging.',
};

async function chatWithPersonality(
  prompt: string,
  personality: keyof typeof systemPrompts
) {
  const completion = await client.chat.completions.create({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
      { role: 'system', content: systemPrompts[personality] },
      { role: 'user', content: prompt },
    ],
  });

  return completion.choices[0].message.content;
}

Few-Shot Prompting

async function fewShotClassification(text: string) {
  const completion = await client.chat.completions.create({
    model: 'openai/gpt-4-turbo',
    messages: [
      {
        role: 'system',
        content: 'Classify text sentiment as positive, negative, or neutral.',
      },
      { role: 'user', content: 'I love this product!' },
      { role: 'assistant', content: 'positive' },
      { role: 'user', content: 'This is terrible.' },
      { role: 'assistant', content: 'negative' },
      { role: 'user', content: 'It works fine.' },
      { role: 'assistant', content: 'neutral' },
      { role: 'user', content: text },
    ],
  });

  return completion.choices[0].message.content;
}

Chain of Thought Prompting

async function reasoningTask(problem: string) {
  const completion = await client.chat.completions.create({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
      {
        role: 'user',
        content: `${problem}\n\nLet's solve this step by step:\n1.`,
      },
    ],
    max_tokens: 3000,
  });

  return completion.choices[0].message.content;
}

Rate Limits and Throttling

Rate Limit Handler

class RateLimitedClient {
  private requestQueue: Array<() => Promise<any>> = [];
  private processing = false;
  private requestsPerMinute = 60;
  private requestInterval = 60000 / this.requestsPerMinute;

  async enqueue<T>(request: () => Promise<T>): Promise<T> {
    return new Promise((resolve, reject) => {
      this.requestQueue.push(async () => {
        try {
          const result = await request();
          resolve(result);
        } catch (error) {
          reject(error);
        }
      });

      this.processQueue();
    });
  }

  private async processQueue() {
    if (this.processing || this.requestQueue.length === 0) return;

    this.processing = true;

    while (this.requestQueue.length > 0) {
      const request = this.requestQueue.shift()!;
      await request();
      await new Promise(resolve => setTimeout(resolve, this.requestInterval));
    }

    this.processing = false;
  }
}

// Usage
const rateLimitedClient = new RateLimitedClient();

const result = await rateLimitedClient.enqueue(() =>
  client.chat.completions.create({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [{ role: 'user', content: 'Hello' }],
  })
);

Vision Models

Image Understanding

async function analyzeImage(imageUrl: string, question: string) {
  const completion = await client.chat.completions.create({
    model: 'openai/gpt-4-vision-preview',
    messages: [
      {
        role: 'user',
        content: [
          { type: 'text', text: question },
          { type: 'image_url', image_url: { url: imageUrl } },
        ],
      },
    ],
    max_tokens: 1000,
  });

  return completion.choices[0].message.content;
}

// Usage
const result = await analyzeImage(
  'https://example.com/image.jpg',
  'What objects are in this image?'
);

Multi-Image Analysis

async function compareImages(imageUrls: string[]) {
  const completion = await client.chat.completions.create({
    model: 'openai/gpt-4-vision-preview',
    messages: [
      {
        role: 'user',
        content: [
          { type: 'text', text: 'Compare these images and describe the differences:' },
          ...imageUrls.map(url => ({
            type: 'image_url' as const,
            image_url: { url },
          })),
        ],
      },
    ],
  });

  return completion.choices[0].message.content;
}

Error Handling and Monitoring

Comprehensive Error Handler

interface ErrorResponse {
  error: {
    message: string;
    type: string;
    code: string;
  };
}

async function robustCompletion(prompt: string) {
  try {
    const completion = await client.chat.completions.create({
      model: 'anthropic/claude-3.5-sonnet',
      messages: [{ role: 'user', content: prompt }],
    });

    return completion.choices[0].message.content;
  } catch (error: any) {
    // Rate limit errors
    if (error.status === 429) {
      console.error('Rate limit exceeded. Please wait.');
      throw new Error('RATE_LIMIT_EXCEEDED');
    }

    // Invalid API key
    if (error.status === 401) {
      console.error('Invalid API key');
      throw new Error('INVALID_API_KEY');
    }

    // Model not found
    if (error.status === 404) {
      console.error('Model not found');
      throw new Error('MODEL_NOT_FOUND');
    }

    // Server errors
    if (error.status >= 500) {
      console.error('OpenRouter server error');
      throw new Error('SERVER_ERROR');
    }

    // Unknown error
    console.error('Unknown error:', error);
    throw error;
  }
}

Request/Response Logging

class LoggingClient {
  async chat(prompt: string, model: string) {
    const startTime = Date.now();

    console.log('[Request]', {
      timestamp: new Date().toISOString(),
      model,
      promptLength: prompt.length,
    });

    try {
      const completion = await client.chat.completions.create({
        model,
        messages: [{ role: 'user', content: prompt }],
      });

      const duration = Date.now() - startTime;

      console.log('[Response]', {
        timestamp: new Date().toISOString(),
        duration,
        usage: completion.usage,
        finishReason: completion.choices[0].finish_reason,
      });

      return completion;
    } catch (error) {
      console.error('[Error]', {
        timestamp: new Date().toISOString(),
        duration: Date.now() - startTime,
        error,
      });
      throw error;
    }
  }
}

Best Practices

Model Selection:
- Use fast models (Haiku, Flash) for simple tasks
- Use flagship models (Sonnet, GPT-4) for complex reasoning
- Consider context size requirements
- Test multiple models for your use case
Cost Optimization:
- Estimate costs before requests
- Use cheaper models when possible
- Implement token limits
- Cache common responses
- Batch similar requests
Streaming:
- Always use streaming for user-facing apps
- Handle connection interruptions
- Show progress indicators
- Buffer partial responses
Error Handling:
- Implement retry logic with exponential backoff
- Use model fallbacks for reliability
- Log all errors for debugging
- Handle rate limits gracefully
Prompt Engineering:
- Use system prompts for consistency
- Implement few-shot learning for specific tasks
- Use chain-of-thought for complex reasoning
- Keep prompts concise to reduce costs
Rate Limiting:
- Respect API rate limits
- Implement request queuing
- Use exponential backoff
- Monitor usage metrics
Security:
- Never expose API keys in client code
- Use environment variables
- Implement server-side proxies
- Validate user inputs
Monitoring:
- Track token usage
- Monitor response times
- Log errors and failures
- Analyze model performance

Common Pitfalls

❌ Exposing API keys in frontend:

// WRONG - API key exposed
const client = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: 'sk-or-v1-...',  // Exposed!
});

✅ Correct - Server-side proxy:

// Backend proxy
app.post('/api/chat', async (req, res) => {
  const { prompt } = req.body;

  const completion = await client.chat.completions.create({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [{ role: 'user', content: prompt }],
  });

  res.json(completion);
});

❌ Not handling streaming errors:

// WRONG - no error handling
for await (const chunk of stream) {
  console.log(chunk.choices[0].delta.content);
}

✅ Correct - with error handling:

try {
  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || '';
    process.stdout.write(content);
  }
} catch (error) {
  console.error('Stream error:', error);
  // Implement retry or fallback
}

❌ Ignoring rate limits:

// WRONG - no rate limiting
const promises = prompts.map(prompt => chat(prompt));
await Promise.all(promises);  // May hit rate limits

✅ Correct - with rate limiting:

const results = [];
for (let i = 0; i < prompts.length; i += 5) {
  const batch = prompts.slice(i, i + 5);
  const batchResults = await Promise.all(batch.map(chat));
  results.push(...batchResults);
  await new Promise(r => setTimeout(r, 1000));  // Delay between batches
}

Performance Optimization

Caching Responses

const responseCache = new Map<string, string>();

async function cachedChat(prompt: string, model: string) {
  const cacheKey = `${model}:${prompt}`;

  if (responseCache.has(cacheKey)) {
    console.log('Cache hit');
    return responseCache.get(cacheKey)!;
  }

  const completion = await client.chat.completions.create({
    model,
    messages: [{ role: 'user', content: prompt }],
  });

  const response = completion.choices[0].message.content || '';
  responseCache.set(cacheKey, response);

  return response;
}

Parallel Processing

async function parallelChat(prompts: string[], model: string) {
  const results = await Promise.all(
    prompts.map(prompt =>
      client.chat.completions.create({
        model,
        messages: [{ role: 'user', content: prompt }],
      })
    )
  );

  return results.map(r => r.choices[0].message.content);
}

Resources

Documentation: https://openrouter.ai/docs
API Reference: https://openrouter.ai/docs/api-reference
Model List: https://openrouter.ai/models
Pricing: https://openrouter.ai/docs/pricing
Status Page: https://status.openrouter.ai

Related Skills

MCP Servers: Integration with Model Context Protocol (when built)
TypeScript API Integration: Type-safe OpenRouter clients
Python API Integration: Python SDK usage patterns

Summary

OpenRouter provides unified access to 200+ LLMs
OpenAI-compatible API for easy migration
Cost optimization through model selection and token management
Streaming for responsive user experiences
Function calling for tool integration
Vision models for image understanding
Fallback strategies for reliability
Rate limiting and error handling essential
Perfect for multi-model apps, cost-sensitive deployments, avoiding vendor lock-in

OpenRouter - Unified AI API Gateway

Overview

Key Features:

Access 200+ models through one API
OpenAI-compatible interface (drop-in replacement)
Intelligent model routing and fallbacks
Real-time streaming responses
Cost tracking and optimization
Model performance analytics
Function calling support
Vision model support

Pricing Model:

Pay-per-token (no subscriptions)
Volume discounts available
Free tier with credits
Per-model pricing varies

Installation:

npm install openai  # Use OpenAI SDK
# or
pip install openai  # Python

Quick Start

1. Get API Key

# Sign up at https://openrouter.ai/keys
export OPENROUTER_API_KEY="sk-or-v1-..."

2. Basic Chat Completion

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: process.env.OPENROUTER_API_KEY,
  defaultHeaders: {
    'HTTP-Referer': 'https://your-app.com',  // Optional
    'X-Title': 'Your App Name',              // Optional
  }
});

async function chat() {
  const completion = await client.chat.completions.create({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
      { role: 'user', content: 'Explain quantum computing in simple terms' }
    ],
  });

  console.log(completion.choices[0].message.content);
}

3. Streaming Response

async function streamChat() {
  const stream = await client.chat.completions.create({
    model: 'openai/gpt-4-turbo',
    messages: [
      { role: 'user', content: 'Write a short story about AI' }
    ],
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || '';
    process.stdout.write(content);
  }
}

Model Selection Strategy

Available Model Categories

Flagship Models (Highest Quality):

const flagshipModels = {
  claude: 'anthropic/claude-3.5-sonnet',      // Best reasoning
  gpt4: 'openai/gpt-4-turbo',                 // Best general purpose
  gemini: 'google/gemini-pro-1.5',            // Best long context
  opus: 'anthropic/claude-3-opus',            // Best complex tasks
};

Fast Models (Low Latency):

const fastModels = {
  claude: 'anthropic/claude-3-haiku',         // Fastest Claude
  gpt35: 'openai/gpt-3.5-turbo',             // Fast GPT
  gemini: 'google/gemini-flash-1.5',         // Fast Gemini
  llama: 'meta-llama/llama-3.1-8b-instruct', // Fast open source
};

Cost-Optimized Models:

const budgetModels = {
  haiku: 'anthropic/claude-3-haiku',          // $0.25/$1.25 per 1M tokens
  gemini: 'google/gemini-flash-1.5',         // $0.075/$0.30 per 1M tokens
  llama: 'meta-llama/llama-3.1-8b-instruct', // $0.06/$0.06 per 1M tokens
  mixtral: 'mistralai/mixtral-8x7b-instruct', // $0.24/$0.24 per 1M tokens
};

Specialized Models:

const specializedModels = {
  vision: 'openai/gpt-4-vision-preview',     // Image understanding
  code: 'anthropic/claude-3.5-sonnet',       // Code generation
  longContext: 'google/gemini-pro-1.5',      // 2M token context
  function: 'openai/gpt-4-turbo',            // Function calling
};

Model Selection Logic

interface ModelSelector {
  task: 'chat' | 'code' | 'vision' | 'function' | 'summary';
  priority: 'quality' | 'speed' | 'cost';
  maxCost?: number;  // Max cost per 1M tokens
  contextSize?: number;
}

function selectModel(criteria: ModelSelector): string {
  if (criteria.task === 'vision') {
    return 'openai/gpt-4-vision-preview';
  }

  if (criteria.task === 'code') {
    return criteria.priority === 'quality'
      ? 'anthropic/claude-3.5-sonnet'
      : 'meta-llama/llama-3.1-70b-instruct';
  }

  if (criteria.contextSize && criteria.contextSize > 100000) {
    return 'google/gemini-pro-1.5';  // 2M context
  }

  // Default selection by priority
  switch (criteria.priority) {
    case 'quality':
      return 'anthropic/claude-3.5-sonnet';
    case 'speed':
      return 'anthropic/claude-3-haiku';
    case 'cost':
      return criteria.maxCost && criteria.maxCost < 0.5
        ? 'google/gemini-flash-1.5'
        : 'anthropic/claude-3-haiku';
    default:
      return 'openai/gpt-4-turbo';
  }
}

// Usage
const model = selectModel({
  task: 'code',
  priority: 'quality',
});

Streaming Implementation

TypeScript Streaming with Error Handling

async function robustStreamingChat(
  prompt: string,
  model: string = 'anthropic/claude-3.5-sonnet'
) {
  try {
    const stream = await client.chat.completions.create({
      model,
      messages: [{ role: 'user', content: prompt }],
      stream: true,
      max_tokens: 4000,
    });

    let fullResponse = '';

    for await (const chunk of stream) {
      const delta = chunk.choices[0]?.delta;

      if (delta?.content) {
        fullResponse += delta.content;
        process.stdout.write(delta.content);
      }

      // Handle function calls
      if (delta?.function_call) {
        console.log('\nFunction call:', delta.function_call);
      }

      // Check for finish reason
      if (chunk.choices[0]?.finish_reason) {
        console.log(`\n[Finished: ${chunk.choices[0].finish_reason}]`);
      }
    }

    return fullResponse;
  } catch (error) {
    if (error instanceof Error) {
      console.error('Streaming error:', error.message);
    }
    throw error;
  }
}

Python Streaming

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ.get("OPENROUTER_API_KEY"),
)

def stream_chat(prompt: str, model: str = "anthropic/claude-3.5-sonnet"):
    stream = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        stream=True,
    )

    full_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            full_response += content
            print(content, end="", flush=True)

    print()  # New line
    return full_response

React Streaming Component

import { useState } from 'react';

function StreamingChat() {
  const [response, setResponse] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);

  async function handleSubmit(prompt: string) {
    setIsStreaming(true);
    setResponse('');

    try {
      const res = await fetch('https://openrouter.ai/api/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`,
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: 'anthropic/claude-3.5-sonnet',
          messages: [{ role: 'user', content: prompt }],
          stream: true,
        }),
      });

      const reader = res.body?.getReader();
      const decoder = new TextDecoder();

      while (true) {
        const { done, value } = await reader!.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n').filter(line => line.trim());

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const data = line.slice(6);
            if (data === '[DONE]') continue;

            try {
              const parsed = JSON.parse(data);
              const content = parsed.choices[0]?.delta?.content || '';
              setResponse(prev => prev + content);
            } catch (e) {
              // Skip invalid JSON
            }
          }
        }
      }
    } catch (error) {
      console.error('Streaming error:', error);
    } finally {
      setIsStreaming(false);
    }
  }

  return (
    <div>
      <textarea
        value={response}
        readOnly
        rows={20}
        cols={80}
        placeholder="Response will appear here..."
      />
      <button onClick={() => handleSubmit('Explain AI')}>
        {isStreaming ? 'Streaming...' : 'Send'}
      </button>
    </div>
  );
}

Function Calling

Basic Function Calling

const tools = [
  {
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get current weather for a location',
      parameters: {
        type: 'object',
        properties: {
          location: {
            type: 'string',
            description: 'City name, e.g. San Francisco',
          },
          unit: {
            type: 'string',
            enum: ['celsius', 'fahrenheit'],
          },
        },
        required: ['location'],
      },
    },
  },
];

async function chatWithFunctions() {
  const completion = await client.chat.completions.create({
    model: 'openai/gpt-4-turbo',
    messages: [
      { role: 'user', content: 'What is the weather in Tokyo?' }
    ],
    tools,
    tool_choice: 'auto',
  });

  const message = completion.choices[0].message;

  if (message.tool_calls) {
    for (const toolCall of message.tool_calls) {
      console.log('Function:', toolCall.function.name);
      console.log('Arguments:', toolCall.function.arguments);

      // Execute function
      const args = JSON.parse(toolCall.function.arguments);
      const result = await getWeather(args.location, args.unit);

      // Send result back
      const followUp = await client.chat.completions.create({
        model: 'openai/gpt-4-turbo',
        messages: [
          { role: 'user', content: 'What is the weather in Tokyo?' },
          message,
          {
            role: 'tool',
            tool_call_id: toolCall.id,
            content: JSON.stringify(result),
          },
        ],
        tools,
      });

      console.log(followUp.choices[0].message.content);
    }
  }
}

Multi-Step Function Calling

async function multiStepFunctionCall(userQuery: string) {
  const messages = [{ role: 'user', content: userQuery }];
  let iterationCount = 0;
  const maxIterations = 5;

  while (iterationCount < maxIterations) {
    const completion = await client.chat.completions.create({
      model: 'openai/gpt-4-turbo',
      messages,
      tools,
      tool_choice: 'auto',
    });

    const message = completion.choices[0].message;
    messages.push(message);

    if (!message.tool_calls) {
      // No more function calls, return final response
      return message.content;
    }

    // Execute all function calls
    for (const toolCall of message.tool_calls) {
      const functionName = toolCall.function.name;
      const args = JSON.parse(toolCall.function.arguments);

      // Execute function (implement your function registry)
      const result = await executeFunctionCall(functionName, args);

      messages.push({
        role: 'tool',
        tool_call_id: toolCall.id,
        content: JSON.stringify(result),
      });
    }

    iterationCount++;
  }

  throw new Error('Max iterations reached');
}

Cost Optimization

Token Counting and Cost Estimation

import { encoding_for_model } from 'tiktoken';

interface CostEstimate {
  promptTokens: number;
  completionTokens: number;
  promptCost: number;
  completionCost: number;
  totalCost: number;
}

const modelPricing = {
  'anthropic/claude-3.5-sonnet': { input: 3.00, output: 15.00 },  // per 1M tokens
  'anthropic/claude-3-haiku': { input: 0.25, output: 1.25 },
  'openai/gpt-4-turbo': { input: 10.00, output: 30.00 },
  'openai/gpt-3.5-turbo': { input: 0.50, output: 1.50 },
  'google/gemini-flash-1.5': { input: 0.075, output: 0.30 },
};

function estimateCost(
  prompt: string,
  expectedCompletion: number,
  model: string
): CostEstimate {
  const encoder = encoding_for_model('gpt-4');  // Approximation
  const promptTokens = encoder.encode(prompt).length;
  const completionTokens = expectedCompletion;

  const pricing = modelPricing[model] || { input: 0, output: 0 };

  const promptCost = (promptTokens / 1_000_000) * pricing.input;
  const completionCost = (completionTokens / 1_000_000) * pricing.output;

  return {
    promptTokens,
    completionTokens,
    promptCost,
    completionCost,
    totalCost: promptCost + completionCost,
  };
}

// Usage
const estimate = estimateCost(
  'Explain quantum computing',
  500,  // Expected response tokens
  'anthropic/claude-3.5-sonnet'
);

console.log(`Estimated cost: $${estimate.totalCost.toFixed(4)}`);

Dynamic Model Selection by Budget

async function budgetOptimizedChat(
  prompt: string,
  maxCostPerRequest: number = 0.01  // $0.01 max
) {
  // Estimate with expensive model
  const expensiveEstimate = estimateCost(
    prompt,
    1000,
    'anthropic/claude-3.5-sonnet'
  );

  let selectedModel = 'anthropic/claude-3.5-sonnet';

  if (expensiveEstimate.totalCost > maxCostPerRequest) {
    // Try cheaper models
    const cheapEstimate = estimateCost(
      prompt,
      1000,
      'anthropic/claude-3-haiku'
    );

    if (cheapEstimate.totalCost > maxCostPerRequest) {
      selectedModel = 'google/gemini-flash-1.5';
    } else {
      selectedModel = 'anthropic/claude-3-haiku';
    }
  }

  console.log(`Selected model: ${selectedModel}`);

  const completion = await client.chat.completions.create({
    model: selectedModel,
    messages: [{ role: 'user', content: prompt }],
  });

  return completion.choices[0].message.content;
}

Batching for Cost Reduction

async function batchProcess(prompts: string[], model: string) {
  // Process multiple prompts in parallel with rate limiting
  const concurrency = 5;
  const results = [];

  for (let i = 0; i < prompts.length; i += concurrency) {
    const batch = prompts.slice(i, i + concurrency);

    const batchResults = await Promise.all(
      batch.map(prompt =>
        client.chat.completions.create({
          model,
          messages: [{ role: 'user', content: prompt }],
          max_tokens: 500,  // Limit tokens to control cost
        })
      )
    );

    results.push(...batchResults);

    // Rate limiting delay
    if (i + concurrency < prompts.length) {
      await new Promise(resolve => setTimeout(resolve, 1000));
    }
  }

  return results;
}

Model Fallback and Retry Strategy

Automatic Fallback

const modelFallbackChain = [
  'anthropic/claude-3.5-sonnet',
  'openai/gpt-4-turbo',
  'anthropic/claude-3-haiku',
  'google/gemini-flash-1.5',
];

async function chatWithFallback(
  prompt: string,
  maxRetries: number = 3
): Promise<string> {
  for (const model of modelFallbackChain) {
    try {
      console.log(`Trying model: ${model}`);

      const completion = await client.chat.completions.create({
        model,
        messages: [{ role: 'user', content: prompt }],
        max_tokens: 2000,
      });

      return completion.choices[0].message.content || '';
    } catch (error) {
      console.warn(`Model ${model} failed:`, error);

      // Continue to next model
      if (model === modelFallbackChain[modelFallbackChain.length - 1]) {
        throw new Error('All models failed');
      }
    }
  }

  throw new Error('No models available');
}

Exponential Backoff for Rate Limits

async function retryWithBackoff<T>(
  fn: () => Promise<T>,
  maxRetries: number = 5
): Promise<T> {
  let lastError: Error;

  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error as Error;

      // Check if rate limit error
      if (error.status === 429) {
        const delay = Math.pow(2, i) * 1000;  // Exponential backoff
        console.log(`Rate limited. Retrying in ${delay}ms...`);
        await new Promise(resolve => setTimeout(resolve, delay));
      } else {
        throw error;  // Non-retryable error
      }
    }
  }

  throw lastError!;
}

// Usage
const result = await retryWithBackoff(() =>
  client.chat.completions.create({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [{ role: 'user', content: 'Hello' }],
  })
);

Prompt Engineering Best Practices

System Prompts for Consistency

const systemPrompts = {
  concise: 'You are a helpful assistant. Be concise and direct.',
  detailed: 'You are a knowledgeable expert. Provide comprehensive answers with examples.',
  code: 'You are an expert programmer. Provide clean, well-commented code with explanations.',
  creative: 'You are a creative writing assistant. Be imaginative and engaging.',
};

async function chatWithPersonality(
  prompt: string,
  personality: keyof typeof systemPrompts
) {
  const completion = await client.chat.completions.create({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
      { role: 'system', content: systemPrompts[personality] },
      { role: 'user', content: prompt },
    ],
  });

  return completion.choices[0].message.content;
}

Few-Shot Prompting

async function fewShotClassification(text: string) {
  const completion = await client.chat.completions.create({
    model: 'openai/gpt-4-turbo',
    messages: [
      {
        role: 'system',
        content: 'Classify text sentiment as positive, negative, or neutral.',
      },
      { role: 'user', content: 'I love this product!' },
      { role: 'assistant', content: 'positive' },
      { role: 'user', content: 'This is terrible.' },
      { role: 'assistant', content: 'negative' },
      { role: 'user', content: 'It works fine.' },
      { role: 'assistant', content: 'neutral' },
      { role: 'user', content: text },
    ],
  });

  return completion.choices[0].message.content;
}

Chain of Thought Prompting

async function reasoningTask(problem: string) {
  const completion = await client.chat.completions.create({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
      {
        role: 'user',
        content: `${problem}\n\nLet's solve this step by step:\n1.`,
      },
    ],
    max_tokens: 3000,
  });

  return completion.choices[0].message.content;
}

Rate Limits and Throttling

Rate Limit Handler

class RateLimitedClient {
  private requestQueue: Array<() => Promise<any>> = [];
  private processing = false;
  private requestsPerMinute = 60;
  private requestInterval = 60000 / this.requestsPerMinute;

  async enqueue<T>(request: () => Promise<T>): Promise<T> {
    return new Promise((resolve, reject) => {
      this.requestQueue.push(async () => {
        try {
          const result = await request();
          resolve(result);
        } catch (error) {
          reject(error);
        }
      });

      this.processQueue();
    });
  }

  private async processQueue() {
    if (this.processing || this.requestQueue.length === 0) return;

    this.processing = true;

    while (this.requestQueue.length > 0) {
      const request = this.requestQueue.shift()!;
      await request();
      await new Promise(resolve => setTimeout(resolve, this.requestInterval));
    }

    this.processing = false;
  }
}

// Usage
const rateLimitedClient = new RateLimitedClient();

const result = await rateLimitedClient.enqueue(() =>
  client.chat.completions.create({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [{ role: 'user', content: 'Hello' }],
  })
);

Vision Models

Image Understanding

async function analyzeImage(imageUrl: string, question: string) {
  const completion = await client.chat.completions.create({
    model: 'openai/gpt-4-vision-preview',
    messages: [
      {
        role: 'user',
        content: [
          { type: 'text', text: question },
          { type: 'image_url', image_url: { url: imageUrl } },
        ],
      },
    ],
    max_tokens: 1000,
  });

  return completion.choices[0].message.content;
}

// Usage
const result = await analyzeImage(
  'https://example.com/image.jpg',
  'What objects are in this image?'
);

Multi-Image Analysis

async function compareImages(imageUrls: string[]) {
  const completion = await client.chat.completions.create({
    model: 'openai/gpt-4-vision-preview',
    messages: [
      {
        role: 'user',
        content: [
          { type: 'text', text: 'Compare these images and describe the differences:' },
          ...imageUrls.map(url => ({
            type: 'image_url' as const,
            image_url: { url },
          })),
        ],
      },
    ],
  });

  return completion.choices[0].message.content;
}

Error Handling and Monitoring

Comprehensive Error Handler

interface ErrorResponse {
  error: {
    message: string;
    type: string;
    code: string;
  };
}

async function robustCompletion(prompt: string) {
  try {
    const completion = await client.chat.completions.create({
      model: 'anthropic/claude-3.5-sonnet',
      messages: [{ role: 'user', content: prompt }],
    });

    return completion.choices[0].message.content;
  } catch (error: any) {
    // Rate limit errors
    if (error.status === 429) {
      console.error('Rate limit exceeded. Please wait.');
      throw new Error('RATE_LIMIT_EXCEEDED');
    }

    // Invalid API key
    if (error.status === 401) {
      console.error('Invalid API key');
      throw new Error('INVALID_API_KEY');
    }

    // Model not found
    if (error.status === 404) {
      console.error('Model not found');
      throw new Error('MODEL_NOT_FOUND');
    }

    // Server errors
    if (error.status >= 500) {
      console.error('OpenRouter server error');
      throw new Error('SERVER_ERROR');
    }

    // Unknown error
    console.error('Unknown error:', error);
    throw error;
  }
}

Request/Response Logging

class LoggingClient {
  async chat(prompt: string, model: string) {
    const startTime = Date.now();

    console.log('[Request]', {
      timestamp: new Date().toISOString(),
      model,
      promptLength: prompt.length,
    });

    try {
      const completion = await client.chat.completions.create({
        model,
        messages: [{ role: 'user', content: prompt }],
      });

      const duration = Date.now() - startTime;

      console.log('[Response]', {
        timestamp: new Date().toISOString(),
        duration,
        usage: completion.usage,
        finishReason: completion.choices[0].finish_reason,
      });

      return completion;
    } catch (error) {
      console.error('[Error]', {
        timestamp: new Date().toISOString(),
        duration: Date.now() - startTime,
        error,
      });
      throw error;
    }
  }
}

Best Practices

Model Selection:
- Use fast models (Haiku, Flash) for simple tasks
- Use flagship models (Sonnet, GPT-4) for complex reasoning
- Consider context size requirements
- Test multiple models for your use case
Cost Optimization:
- Estimate costs before requests
- Use cheaper models when possible
- Implement token limits
- Cache common responses
- Batch similar requests
Streaming:
- Always use streaming for user-facing apps
- Handle connection interruptions
- Show progress indicators
- Buffer partial responses
Error Handling:
- Implement retry logic with exponential backoff
- Use model fallbacks for reliability
- Log all errors for debugging
- Handle rate limits gracefully
Prompt Engineering:
- Use system prompts for consistency
- Implement few-shot learning for specific tasks
- Use chain-of-thought for complex reasoning
- Keep prompts concise to reduce costs
Rate Limiting:
- Respect API rate limits
- Implement request queuing
- Use exponential backoff
- Monitor usage metrics
Security:
- Never expose API keys in client code
- Use environment variables
- Implement server-side proxies
- Validate user inputs
Monitoring:
- Track token usage
- Monitor response times
- Log errors and failures
- Analyze model performance

Common Pitfalls

❌ Exposing API keys in frontend:

// WRONG - API key exposed
const client = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: 'sk-or-v1-...',  // Exposed!
});

✅ Correct - Server-side proxy:

// Backend proxy
app.post('/api/chat', async (req, res) => {
  const { prompt } = req.body;

  const completion = await client.chat.completions.create({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [{ role: 'user', content: prompt }],
  });

  res.json(completion);
});

❌ Not handling streaming errors:

// WRONG - no error handling
for await (const chunk of stream) {
  console.log(chunk.choices[0].delta.content);
}

✅ Correct - with error handling:

try {
  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || '';
    process.stdout.write(content);
  }
} catch (error) {
  console.error('Stream error:', error);
  // Implement retry or fallback
}

❌ Ignoring rate limits:

// WRONG - no rate limiting
const promises = prompts.map(prompt => chat(prompt));
await Promise.all(promises);  // May hit rate limits

✅ Correct - with rate limiting:

const results = [];
for (let i = 0; i < prompts.length; i += 5) {
  const batch = prompts.slice(i, i + 5);
  const batchResults = await Promise.all(batch.map(chat));
  results.push(...batchResults);
  await new Promise(r => setTimeout(r, 1000));  // Delay between batches
}

Performance Optimization

Caching Responses

const responseCache = new Map<string, string>();

async function cachedChat(prompt: string, model: string) {
  const cacheKey = `${model}:${prompt}`;

  if (responseCache.has(cacheKey)) {
    console.log('Cache hit');
    return responseCache.get(cacheKey)!;
  }

  const completion = await client.chat.completions.create({
    model,
    messages: [{ role: 'user', content: prompt }],
  });

  const response = completion.choices[0].message.content || '';
  responseCache.set(cacheKey, response);

  return response;
}

Parallel Processing

async function parallelChat(prompts: string[], model: string) {
  const results = await Promise.all(
    prompts.map(prompt =>
      client.chat.completions.create({
        model,
        messages: [{ role: 'user', content: prompt }],
      })
    )
  );

  return results.map(r => r.choices[0].message.content);
}

Resources

Documentation: https://openrouter.ai/docs
API Reference: https://openrouter.ai/docs/api-reference
Model List: https://openrouter.ai/models
Pricing: https://openrouter.ai/docs/pricing
Status Page: https://status.openrouter.ai

Related Skills

MCP Servers: Integration with Model Context Protocol (when built)
TypeScript API Integration: Type-safe OpenRouter clients
Python API Integration: Python SDK usage patterns

Summary

OpenRouter provides unified access to 200+ LLMs
OpenAI-compatible API for easy migration
Cost optimization through model selection and token management
Streaming for responsive user experiences
Function calling for tool integration
Vision models for image understanding
Fallback strategies for reliability
Rate limiting and error handling essential
Perfect for multi-model apps, cost-sensitive deployments, avoiding vendor lock-in

Adoption

bobmatnyc/openrouter

$ install --global

Security Scan Results

SKILL.md

OpenRouter - Unified AI API Gateway

Overview

Quick Start

1. Get API Key

2. Basic Chat Completion

3. Streaming Response

Model Selection Strategy

Available Model Categories

Model Selection Logic

Streaming Implementation

TypeScript Streaming with Error Handling

Python Streaming

React Streaming Component

Function Calling

Basic Function Calling

Multi-Step Function Calling

Cost Optimization

Token Counting and Cost Estimation

Dynamic Model Selection by Budget

Batching for Cost Reduction

Model Fallback and Retry Strategy

Automatic Fallback

Exponential Backoff for Rate Limits

Prompt Engineering Best Practices

System Prompts for Consistency

Few-Shot Prompting

Chain of Thought Prompting

Rate Limits and Throttling

Rate Limit Handler

Vision Models

Image Understanding

Multi-Image Analysis

Error Handling and Monitoring

Comprehensive Error Handler

Request/Response Logging

Best Practices

Common Pitfalls

Performance Optimization

Caching Responses

Parallel Processing

Resources

Related Skills

Summary

Related Skills

bobmatnyc/xquik

bobmatnyc/linkedin

bobmatnyc/model-context-builder

bobmatnyc/model-context

bobmatnyc/openrouter

$ install --global

Security Scan Results

SKILL.md

OpenRouter - Unified AI API Gateway

Overview

Quick Start

1. Get API Key

2. Basic Chat Completion

3. Streaming Response

Model Selection Strategy

Available Model Categories

Model Selection Logic

Streaming Implementation

TypeScript Streaming with Error Handling

Python Streaming

React Streaming Component

Function Calling

Basic Function Calling

Multi-Step Function Calling

Cost Optimization

Token Counting and Cost Estimation

Dynamic Model Selection by Budget

Batching for Cost Reduction

Model Fallback and Retry Strategy

Automatic Fallback

Exponential Backoff for Rate Limits