Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

jamie-bitflight/litellm

Name: litellm
Author: jamie-bitflight

plugins/litellm/skills/litellm/SKILL.md

npx skillsauth add jamie-bitflight/claude_skills litellm

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

LiteLLM

Unified Python interface for calling 100+ LLM APIs using consistent OpenAI format. Provides standardized exception handling, retry/fallback logic, and cost tracking across multiple providers.

When to Use This Skill

Use this skill when:

Integrating with multiple LLM providers through a single interface
Routing requests to local llamafile servers using OpenAI-compatible endpoints
Implementing retry and fallback logic for LLM calls
Building applications requiring consistent error handling across providers
Tracking LLM usage costs across different providers
Converting between provider-specific APIs and OpenAI format
Deploying LLM proxy servers with unified configuration
Testing applications against both cloud and local LLM endpoints

Core Capabilities

Provider Support

LiteLLM supports 100+ providers through consistent OpenAI-style API:

Cloud Providers: OpenAI, Anthropic, Google, Azure, AWS Bedrock
Local Servers: llamafile, Ollama, LocalAI, vLLM
Unified Format: All requests use OpenAI message format
Exception Mapping: All provider errors map to OpenAI exception types

Key Features

Unified API: Single completion() function for all providers
Exception Handling: All exceptions inherit from OpenAI types
Retry Logic: Built-in retry with configurable attempts
Streaming Support: Sync and async streaming for all providers
Cost Tracking: Automatic usage and cost calculation
Proxy Mode: Deploy centralized LLM gateway

Installation

# Using pip
pip install litellm

# Using uv
uv add litellm

Llamafile Integration

Provider Configuration

All llamafile models MUST use the llamafile/ prefix for routing:

model = "llamafile/mistralai/mistral-7b-instruct-v0.2"
model = "llamafile/gemma-3-3b"

API Base URL

The api_base MUST point to llamafile's OpenAI-compatible endpoint:

api_base = "http://localhost:8080/v1"

Critical Requirements:

Include /v1 suffix
Do NOT add endpoint paths like /chat/completions (LiteLLM adds these automatically)
Default llamafile port is 8080

Environment Variable Configuration

import os

os.environ["LLAMAFILE_API_BASE"] = "http://localhost:8080/v1"

Basic Usage Patterns

Synchronous Completion

import litellm

response = litellm.completion(
    model="llamafile/mistralai/mistral-7b-instruct-v0.2",
    messages=[{"role": "user", "content": "Summarize this diff"}],
    api_base="http://localhost:8080/v1",
    temperature=0.2,
    max_tokens=80,
)

print(response.choices[0].message.content)

Asynchronous Completion

from litellm import acompletion
import asyncio

async def generate_message():
    response = await acompletion(
        model="llamafile/gemma-3-3b",
        messages=[{"role": "user", "content": "Write a commit message"}],
        api_base="http://localhost:8080/v1",
        temperature=0.3,
        max_tokens=200,
    )
    return response.choices[0].message.content

result = asyncio.run(generate_message())
print(result)

Async Streaming

from litellm import acompletion
import asyncio

async def stream_response():
    response = await acompletion(
        model="llamafile/gemma-3-3b",
        messages=[{"role": "user", "content": "Hello, how are you?"}],
        api_base="http://localhost:8080/v1",
        stream=True,
    )

    async for chunk in response:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()

asyncio.run(stream_response())

Embeddings

from litellm import embedding
import os

os.environ["LLAMAFILE_API_BASE"] = "http://localhost:8080/v1"

response = embedding(
    model="llamafile/sentence-transformers/all-MiniLM-L6-v2",
    input=["Hello world"],
)

print(response)

Exception Handling

Import Pattern

All exceptions can be imported directly from litellm:

from litellm import (
    BadRequestError,           # 400 errors
    AuthenticationError,       # 401 errors
    NotFoundError,             # 404 errors
    Timeout,                   # 408 errors (alias: openai.APITimeoutError)
    RateLimitError,            # 429 errors
    APIConnectionError,        # 500 errors / connection issues (default)
    ServiceUnavailableError,   # 503 errors
)

Exception Types Reference

| Status Code | Exception Type | Inherits from | Description | | ----------- | ----------------------------- | ---------------------------- | --------------------------- | | 400 | BadRequestError | openai.BadRequestError | Invalid request | | 400 | ContextWindowExceededError | litellm.BadRequestError | Token limit exceeded | | 400 | ContentPolicyViolationError | litellm.BadRequestError | Content policy violation | | 401 | AuthenticationError | openai.AuthenticationError | Auth failure | | 403 | PermissionDeniedError | openai.PermissionDeniedError | Permission denied | | 404 | NotFoundError | openai.NotFoundError | Invalid model/endpoint | | 408 | Timeout | openai.APITimeoutError | Request timeout | | 429 | RateLimitError | openai.RateLimitError | Rate limited | | 500 | APIConnectionError | openai.APIConnectionError | Default for unmapped errors | | 500 | APIError | openai.APIError | Generic 500 error | | 503 | ServiceUnavailableError | openai.APIStatusError | Service unavailable | | >=500 | InternalServerError | openai.InternalServerError | Unmapped 500+ errors |

Exception Attributes

All LiteLLM exceptions include:

status_code: HTTP status code
message: Error message
llm_provider: Provider that raised the exception

Exception Handling Example

import litellm
import openai

try:
    response = litellm.completion(
        model="llamafile/gemma-3-3b",
        messages=[{"role": "user", "content": "Hello"}],
        api_base="http://localhost:8080/v1",
        timeout=30.0,
    )
except openai.APITimeoutError as e:
    # LiteLLM exceptions inherit from OpenAI types
    print(f"Timeout: {e}")
except litellm.APIConnectionError as e:
    print(f"Connection failed: {e.message}")
    print(f"Provider: {e.llm_provider}")

Alternative Import from litellm.exceptions

from litellm.exceptions import BadRequestError, AuthenticationError, APIError

try:
    response = litellm.completion(
        model="llamafile/gemma-3-3b",
        messages=[{"role": "user", "content": "Hello"}],
        api_base="http://localhost:8080/v1",
    )
except AuthenticationError as e:
    print(f"Authentication failed: {e}")
except BadRequestError as e:
    print(f"Bad request: {e}")
except APIError as e:
    print(f"API error: {e}")

Checking If Exception Should Retry

import litellm

try:
    response = litellm.completion(
        model="llamafile/gemma-3-3b",
        messages=[{"role": "user", "content": "Hello"}],
        api_base="http://localhost:8080/v1",
    )
except Exception as e:
    if hasattr(e, 'status_code'):
        should_retry = litellm._should_retry(e.status_code)
        print(f"Should retry: {should_retry}")

Retry and Fallback Configuration

from litellm import completion

response = completion(
    model="llamafile/gemma-3-3b",
    messages=[{"role": "user", "content": "Hello"}],
    api_base="http://localhost:8080/v1",
    num_retries=3,      # Retry 3 times on failure
    timeout=30.0,       # 30 second timeout
)

Proxy Server Configuration

For proxy deployments, use config.yaml:

model_list:
  - model_name: commit-polish-model
    litellm_params:
      model: llamafile/gemma-3-3b          # add llamafile/ prefix
      api_base: http://localhost:8080/v1   # add api base for OpenAI compatible provider

Application Integration Patterns

Connection Verification Pattern

import litellm
from litellm import APIConnectionError

def verify_llamafile_connection(api_base: str = "http://localhost:8080/v1") -> bool:
    """Check if llamafile server is running."""
    try:
        litellm.completion(
            model="llamafile/test",
            messages=[{"role": "user", "content": "test"}],
            api_base=api_base,
            max_tokens=1,
        )
        return True
    except APIConnectionError:
        return False

Async Service Pattern

import litellm
from litellm import acompletion, APIConnectionError
import asyncio

class AIService:
    """LiteLLM wrapper with llamafile routing."""

    def __init__(self, model: str, api_base: str, temperature: float = 0.3, max_tokens: int = 200):
        self.model = model
        self.api_base = api_base
        self.temperature = temperature
        self.max_tokens = max_tokens

    async def generate_commit_message(self, diff: str, system_prompt: str) -> str:
        """Generate a commit message using the LLM."""
        try:
            response = await acompletion(
                model=self.model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": f"Generate a commit message for this diff:\n\n{diff}"},
                ],
                api_base=self.api_base,
                temperature=self.temperature,
                max_tokens=self.max_tokens,
            )
            return response.choices[0].message.content.strip()
        except APIConnectionError as e:
            raise RuntimeError(f"Failed to connect to llamafile server at {self.api_base}: {e.message}")

Common Pitfalls to Avoid

Missing llamafile/ prefix: Without prefix, LiteLLM won't route to OpenAI-compatible endpoint
Wrong port: Llamafile uses 8080 by default, not 8000
Missing /v1 suffix: API base must end with /v1
Adding extra path segments: Do NOT use http://localhost:8080/v1/chat/completions - LiteLLM adds the endpoint path automatically
API key requirement: No API key needed for local llamafile (use empty string or any value if required by validation)

Configuration Examples

TOML Configuration

# ~/.config/commit-polish/config.toml
[ai]
model = "llamafile/gemma-3-3b"  # MUST have llamafile/ prefix
temperature = 0.3
max_tokens = 200

Environment Variables

export LLAMAFILE_API_BASE="http://localhost:8080/v1"
export LITELLM_LOG="INFO"  # Enable LiteLLM debug logging

Related Skills

For comprehensive documentation on related tools:

llamafile: Activate the llamafile skill using Skill(command: "llamafile:llamafile") for llamafile server setup, model management, and local LLM deployment patterns
uv: Activate the uv skill using Skill(command: "python3-development:uv") for Python project management, dependency handling, and virtual environment workflows

References

Official Documentation

LiteLLM Documentation - Main documentation portal
Llamafile Provider Docs - Llamafile-specific configuration
Exception Mapping - Complete exception reference
GitHub Repository - Source code and examples

Provider-Specific Documentation

Llamafile API Endpoints - Llamafile OpenAI-compatible API reference
Completion Streaming - Streaming implementation guide

Version Information

Documentation verified against: LiteLLM GitHub repository (main branch, accessed 2025-01-15)
Python: 3.11+
Llamafile: 0.9.3+

jamie-bitflight/litellm

plugins/litellm/skills/litellm/SKILL.md

When calling LLM APIs from Python code. When connecting to llamafile or local LLM servers. When switching between OpenAI/Anthropic/local providers. When implementing retry/fallback logic for LLM calls. When code imports litellm or uses completion() patterns.

39 stars

development

Updated Apr 20, 2026

$ install --global

skillsauth

npx skillsauth add jamie-bitflight/claude_skills litellm

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 7:03 AM16.4s1 file scanned

SKILL.md

name:: litellm
description:: When calling LLM APIs from Python code. When connecting to llamafile or local LLM servers. When switching between OpenAI/Anthropic/local providers. When implementing retry/fallback logic for LLM calls. When code imports litellm or uses completion() patterns.

LiteLLM

Unified Python interface for calling 100+ LLM APIs using consistent OpenAI format. Provides standardized exception handling, retry/fallback logic, and cost tracking across multiple providers.

When to Use This Skill

Use this skill when:

Integrating with multiple LLM providers through a single interface
Routing requests to local llamafile servers using OpenAI-compatible endpoints
Implementing retry and fallback logic for LLM calls
Building applications requiring consistent error handling across providers
Tracking LLM usage costs across different providers
Converting between provider-specific APIs and OpenAI format
Deploying LLM proxy servers with unified configuration
Testing applications against both cloud and local LLM endpoints

Core Capabilities

Provider Support

LiteLLM supports 100+ providers through consistent OpenAI-style API:

Cloud Providers: OpenAI, Anthropic, Google, Azure, AWS Bedrock
Local Servers: llamafile, Ollama, LocalAI, vLLM
Unified Format: All requests use OpenAI message format
Exception Mapping: All provider errors map to OpenAI exception types

Key Features

Unified API: Single completion() function for all providers
Exception Handling: All exceptions inherit from OpenAI types
Retry Logic: Built-in retry with configurable attempts
Streaming Support: Sync and async streaming for all providers
Cost Tracking: Automatic usage and cost calculation
Proxy Mode: Deploy centralized LLM gateway

Installation

# Using pip
pip install litellm

# Using uv
uv add litellm

Llamafile Integration

Provider Configuration

All llamafile models MUST use the llamafile/ prefix for routing:

model = "llamafile/mistralai/mistral-7b-instruct-v0.2"
model = "llamafile/gemma-3-3b"

API Base URL

The api_base MUST point to llamafile's OpenAI-compatible endpoint:

api_base = "http://localhost:8080/v1"

Critical Requirements:

Include /v1 suffix
Do NOT add endpoint paths like /chat/completions (LiteLLM adds these automatically)
Default llamafile port is 8080

Environment Variable Configuration

import os

os.environ["LLAMAFILE_API_BASE"] = "http://localhost:8080/v1"

Basic Usage Patterns

Synchronous Completion

import litellm

response = litellm.completion(
    model="llamafile/mistralai/mistral-7b-instruct-v0.2",
    messages=[{"role": "user", "content": "Summarize this diff"}],
    api_base="http://localhost:8080/v1",
    temperature=0.2,
    max_tokens=80,
)

print(response.choices[0].message.content)

Asynchronous Completion

from litellm import acompletion
import asyncio

async def generate_message():
    response = await acompletion(
        model="llamafile/gemma-3-3b",
        messages=[{"role": "user", "content": "Write a commit message"}],
        api_base="http://localhost:8080/v1",
        temperature=0.3,
        max_tokens=200,
    )
    return response.choices[0].message.content

result = asyncio.run(generate_message())
print(result)

Async Streaming

from litellm import acompletion
import asyncio

async def stream_response():
    response = await acompletion(
        model="llamafile/gemma-3-3b",
        messages=[{"role": "user", "content": "Hello, how are you?"}],
        api_base="http://localhost:8080/v1",
        stream=True,
    )

    async for chunk in response:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()

asyncio.run(stream_response())

Embeddings

from litellm import embedding
import os

os.environ["LLAMAFILE_API_BASE"] = "http://localhost:8080/v1"

response = embedding(
    model="llamafile/sentence-transformers/all-MiniLM-L6-v2",
    input=["Hello world"],
)

print(response)

Exception Handling

Import Pattern

All exceptions can be imported directly from litellm:

from litellm import (
    BadRequestError,           # 400 errors
    AuthenticationError,       # 401 errors
    NotFoundError,             # 404 errors
    Timeout,                   # 408 errors (alias: openai.APITimeoutError)
    RateLimitError,            # 429 errors
    APIConnectionError,        # 500 errors / connection issues (default)
    ServiceUnavailableError,   # 503 errors
)

Exception Types Reference

Exception Attributes

All LiteLLM exceptions include:

status_code: HTTP status code
message: Error message
llm_provider: Provider that raised the exception

Exception Handling Example

import litellm
import openai

try:
    response = litellm.completion(
        model="llamafile/gemma-3-3b",
        messages=[{"role": "user", "content": "Hello"}],
        api_base="http://localhost:8080/v1",
        timeout=30.0,
    )
except openai.APITimeoutError as e:
    # LiteLLM exceptions inherit from OpenAI types
    print(f"Timeout: {e}")
except litellm.APIConnectionError as e:
    print(f"Connection failed: {e.message}")
    print(f"Provider: {e.llm_provider}")

Alternative Import from litellm.exceptions

from litellm.exceptions import BadRequestError, AuthenticationError, APIError

try:
    response = litellm.completion(
        model="llamafile/gemma-3-3b",
        messages=[{"role": "user", "content": "Hello"}],
        api_base="http://localhost:8080/v1",
    )
except AuthenticationError as e:
    print(f"Authentication failed: {e}")
except BadRequestError as e:
    print(f"Bad request: {e}")
except APIError as e:
    print(f"API error: {e}")

Checking If Exception Should Retry

import litellm

try:
    response = litellm.completion(
        model="llamafile/gemma-3-3b",
        messages=[{"role": "user", "content": "Hello"}],
        api_base="http://localhost:8080/v1",
    )
except Exception as e:
    if hasattr(e, 'status_code'):
        should_retry = litellm._should_retry(e.status_code)
        print(f"Should retry: {should_retry}")

Retry and Fallback Configuration

from litellm import completion

response = completion(
    model="llamafile/gemma-3-3b",
    messages=[{"role": "user", "content": "Hello"}],
    api_base="http://localhost:8080/v1",
    num_retries=3,      # Retry 3 times on failure
    timeout=30.0,       # 30 second timeout
)

Proxy Server Configuration

For proxy deployments, use config.yaml:

model_list:
  - model_name: commit-polish-model
    litellm_params:
      model: llamafile/gemma-3-3b          # add llamafile/ prefix
      api_base: http://localhost:8080/v1   # add api base for OpenAI compatible provider

Application Integration Patterns

Connection Verification Pattern

import litellm
from litellm import APIConnectionError

def verify_llamafile_connection(api_base: str = "http://localhost:8080/v1") -> bool:
    """Check if llamafile server is running."""
    try:
        litellm.completion(
            model="llamafile/test",
            messages=[{"role": "user", "content": "test"}],
            api_base=api_base,
            max_tokens=1,
        )
        return True
    except APIConnectionError:
        return False

Async Service Pattern

import litellm
from litellm import acompletion, APIConnectionError
import asyncio

class AIService:
    """LiteLLM wrapper with llamafile routing."""

    def __init__(self, model: str, api_base: str, temperature: float = 0.3, max_tokens: int = 200):
        self.model = model
        self.api_base = api_base
        self.temperature = temperature
        self.max_tokens = max_tokens

    async def generate_commit_message(self, diff: str, system_prompt: str) -> str:
        """Generate a commit message using the LLM."""
        try:
            response = await acompletion(
                model=self.model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": f"Generate a commit message for this diff:\n\n{diff}"},
                ],
                api_base=self.api_base,
                temperature=self.temperature,
                max_tokens=self.max_tokens,
            )
            return response.choices[0].message.content.strip()
        except APIConnectionError as e:
            raise RuntimeError(f"Failed to connect to llamafile server at {self.api_base}: {e.message}")

Common Pitfalls to Avoid

Missing llamafile/ prefix: Without prefix, LiteLLM won't route to OpenAI-compatible endpoint
Wrong port: Llamafile uses 8080 by default, not 8000
Missing /v1 suffix: API base must end with /v1
Adding extra path segments: Do NOT use http://localhost:8080/v1/chat/completions - LiteLLM adds the endpoint path automatically
API key requirement: No API key needed for local llamafile (use empty string or any value if required by validation)

Configuration Examples

TOML Configuration

# ~/.config/commit-polish/config.toml
[ai]
model = "llamafile/gemma-3-3b"  # MUST have llamafile/ prefix
temperature = 0.3
max_tokens = 200

Environment Variables

export LLAMAFILE_API_BASE="http://localhost:8080/v1"
export LITELLM_LOG="INFO"  # Enable LiteLLM debug logging

Related Skills

For comprehensive documentation on related tools:

llamafile: Activate the llamafile skill using Skill(command: "llamafile:llamafile") for llamafile server setup, model management, and local LLM deployment patterns
uv: Activate the uv skill using Skill(command: "python3-development:uv") for Python project management, dependency handling, and virtual environment workflows

References

Official Documentation

LiteLLM Documentation - Main documentation portal
Llamafile Provider Docs - Llamafile-specific configuration
Exception Mapping - Complete exception reference
GitHub Repository - Source code and examples

Provider-Specific Documentation

Llamafile API Endpoints - Llamafile OpenAI-compatible API reference
Completion Streaming - Streaming implementation guide

Version Information

Documentation verified against: LiteLLM GitHub repository (main branch, accessed 2025-01-15)
Python: 3.11+
Llamafile: 0.9.3+

Related Skills

jamie-bitflight/xdg-base-directory

development

VerifiedTrustedCommunity

When an application needs to store config, data, cache, or state files. When designing where user-specific files should live. When code writes to ~/.appname or hardcoded home paths. When implementing cross-platform file storage with platformdirs.

39SKILL.mdUpdated Apr 30, 2026

jamie-bitflight/xdg-base-directory

jamie-bitflight/verification-gate

testing

VerifiedTrustedCommunity

Enforce mandatory pre-action verification checkpoints to prevent pattern-matching from overriding explicit reasoning. Use this skill when about to execute implementation actions (Bash, Write, Edit) to verify hypothesis-action alignment. Blocks execution when hypothesis unverified or action targets different system than hypothesis identified. Critical for preventing cognitive dissonance where correct diagnosis leads to wrong implementation.

39SKILL.mdUpdated Apr 30, 2026

jamie-bitflight/verification-gate

jamie-bitflight/twelve-factor-app

tools

VerifiedTrustedCommunity

Reference guide for the Twelve-Factor App methodology — 15 principles (12 original + 3 modern extensions) for building portable, resilient, cloud-native applications. Use when evaluating application architecture, designing cloud-native services, reviewing codebases for methodology compliance, advising on configuration, scaling, observability, security, and deployment patterns. Incorporates the 2025 open-source community evolution and cloud-native reinterpretations of each factor.

39SKILL.mdUpdated Apr 30, 2026

jamie-bitflight/twelve-factor-app

jamie-bitflight/user-docs-to-ai-skill

tools

VerifiedTrustedCommunity

Converts user-facing documentation (how-to guides, tutorials, API references, examples) in any format — Markdown, PDF, DOCX, PPTX, XLSX, AsciiDoc, RST, HTML, Jupyter notebooks, man pages, TOML/YAML/JSON configs, and plain text — into Claude Code skill directories with SKILL.md plus thematically grouped references/*.md files. Use when given a docs directory or mixed-format documentation to transform into an AI skill. Uses MCP file-reader server for binary formats.

39SKILL.mdUpdated Apr 30, 2026

jamie-bitflight/user-docs-to-ai-skill

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/jamie-bitflight/claude_skills.git

# Copy into Claude Code skills folder (global)
cp -r claude_skills/plugins/litellm/skills/litellm ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

jamie-bitflight/claude_skills

39 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT