Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

anhvth/llm-integration

Name: llm-integration
Author: anhvth

.github/skills/llm-integration/SKILL.md

npx skillsauth add anhvth/speedy_utils llm-integration

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

LLM Integration Guide

This skill provides comprehensive guidance for using the LLM utilities in speedy_utils.

When to Use This Skill

Use this skill when you need to:

Make OpenAI API calls with automatic caching (memoization) to save costs and time.
Transform chat messages between different formats (ChatML, ShareGPT, Text).
Prepare prompts for local LLM inference.

Prerequisites

speedy_utils installed.
openai package installed for API clients.

Core Capabilities

Memoized OpenAI Clients (`MOpenAI`, `MAsyncOpenAI`)

Drop-in replacements for OpenAI and AsyncOpenAI.
Automatically caches post (chat completion) requests.
Uses speedy_utils caching backend (disk/memory).
Configurable per-instance caching.

Chat Format Transformation (`transform_messages`)

Converts between:
- chatml: List of {"role": "...", "content": "..."} dicts.
- sharegpt: Dict with {"conversations": [{"from": "...", "value": "..."}]}.
- text: String with <|im_start|> tokens.
- simulated_chat: Human/AI transcript format.
Supports applying tokenizer templates.

Usage Examples

Example 1: Memoized OpenAI Call

Make repeated calls without hitting the API twice.

from llm_utils.lm.openai_memoize import MOpenAI

# Initialize just like OpenAI client
client = MOpenAI(api_key="sk-...")

# First call hits the API
response1 = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello"}]
)

# Second call returns cached result instantly
response2 = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello"}]
)

Example 2: Async Memoized Call

Same as above but for async workflows.

from llm_utils.lm.openai_memoize import MAsyncOpenAI
import asyncio

async def main():
    client = MAsyncOpenAI(api_key="sk-...")
    response = await client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hi"}]
    )

Example 3: Transforming Chat Formats

Convert ShareGPT format to ChatML.

from llm_utils.chat_format.transform import transform_messages

sharegpt_data = {
    "conversations": [
        {"from": "human", "value": "Hi"},
        {"from": "gpt", "value": "Hello there"}
    ]
}

# Convert to ChatML list
chatml_data = transform_messages(sharegpt_data, frm="sharegpt", to="chatml")
# Result: [{'role': 'user', 'content': 'Hi'}, {'role': 'assistant', 'content': 'Hello there'}]

# Convert to Text string
text_data = transform_messages(chatml_data, frm="chatml", to="text")
# Result: "<|im_start|>user\nHi<|im_end|>\n<|im_start|>assistant\nHello there<|im_end|>\n<|im_start|>assistant\n"

Guidelines

Caching Behavior:
- The cache key is generated from the arguments passed to create.
- If you change any parameter (e.g., temperature, model), it counts as a new request.
- Cache is persistent if configured (default behavior of memoize).
Format Detection:
- transform_messages tries to auto-detect input format, but it's safer to specify frm explicitly.
Tokenizer Support:
- You can pass a HuggingFace tokenizer to transform_messages to use its specific chat template.

Limitations

Streaming: Memoization does NOT work with streaming responses (stream=True).
Side Effects: If your LLM calls rely on randomness (high temperature) and you want different results each time, disable caching or change the seed/input.

anhvth/llm-integration

.github/skills/llm-integration/SKILL.md

Guide for using LLM utilities in speedy_utils, including memoized OpenAI clients and chat format transformations.

7 stars

tools

Updated Apr 15, 2026

$ install --global

skillsauth

npx skillsauth add anhvth/speedy_utils llm-integration

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 8:47 PM1.7s1 file scanned

SKILL.md

name:: llm-integration
description:: Guide for using LLM utilities in speedy_utils, including memoized OpenAI clients and chat format transformations.

LLM Integration Guide

This skill provides comprehensive guidance for using the LLM utilities in speedy_utils.

When to Use This Skill

Use this skill when you need to:

Make OpenAI API calls with automatic caching (memoization) to save costs and time.
Transform chat messages between different formats (ChatML, ShareGPT, Text).
Prepare prompts for local LLM inference.

Prerequisites

speedy_utils installed.
openai package installed for API clients.

Core Capabilities

Memoized OpenAI Clients (`MOpenAI`, `MAsyncOpenAI`)

Drop-in replacements for OpenAI and AsyncOpenAI.
Automatically caches post (chat completion) requests.
Uses speedy_utils caching backend (disk/memory).
Configurable per-instance caching.

Chat Format Transformation (`transform_messages`)

Converts between:
- chatml: List of {"role": "...", "content": "..."} dicts.
- sharegpt: Dict with {"conversations": [{"from": "...", "value": "..."}]}.
- text: String with <|im_start|> tokens.
- simulated_chat: Human/AI transcript format.
Supports applying tokenizer templates.

Usage Examples

Example 1: Memoized OpenAI Call

Make repeated calls without hitting the API twice.

from llm_utils.lm.openai_memoize import MOpenAI

# Initialize just like OpenAI client
client = MOpenAI(api_key="sk-...")

# First call hits the API
response1 = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello"}]
)

# Second call returns cached result instantly
response2 = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello"}]
)

Example 2: Async Memoized Call

Same as above but for async workflows.

from llm_utils.lm.openai_memoize import MAsyncOpenAI
import asyncio

async def main():
    client = MAsyncOpenAI(api_key="sk-...")
    response = await client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hi"}]
    )

Example 3: Transforming Chat Formats

Convert ShareGPT format to ChatML.

from llm_utils.chat_format.transform import transform_messages

sharegpt_data = {
    "conversations": [
        {"from": "human", "value": "Hi"},
        {"from": "gpt", "value": "Hello there"}
    ]
}

# Convert to ChatML list
chatml_data = transform_messages(sharegpt_data, frm="sharegpt", to="chatml")
# Result: [{'role': 'user', 'content': 'Hi'}, {'role': 'assistant', 'content': 'Hello there'}]

# Convert to Text string
text_data = transform_messages(chatml_data, frm="chatml", to="text")
# Result: "<|im_start|>user\nHi<|im_end|>\n<|im_start|>assistant\nHello there<|im_end|>\n<|im_start|>assistant\n"

Guidelines

Caching Behavior:
- The cache key is generated from the arguments passed to create.
- If you change any parameter (e.g., temperature, model), it counts as a new request.
- Cache is persistent if configured (default behavior of memoize).
Format Detection:
- transform_messages tries to auto-detect input format, but it's safer to specify frm explicitly.
Tokenizer Support:
- You can pass a HuggingFace tokenizer to transform_messages to use its specific chat template.

Limitations

Streaming: Memoization does NOT work with streaming responses (stream=True).
Side Effects: If your LLM calls rely on randomness (high temperature) and you want different results each time, disable caching or change the seed/input.

Related Skills

anhvth/vision-utilities

documentation

VerifiedTrustedCommunity

Guide for using vision utilities in speedy_utils, including fast GPU image loading, memory-mapped datasets, and notebook visualization.

7SKILL.mdUpdated Apr 15, 2026

anhvth/vision-utilities

anhvth/skill-creation

development

VerifiedTrustedCommunity

Guide for creating new Agent Skills with proper structure, frontmatter, bundled assets, and validation. Includes templates, best practices, and examples for building reusable skill resources.

7SKILL.mdUpdated Apr 15, 2026

anhvth/skill-creation

anhvth/.github/skills/ray-distributed-computing

documentation

VerifiedTrustedCommunity

Comprehensive guide to using Ray for scalable distributed computing, including Ray Core, Data, Train, Tune, Serve, and RLlib with practical examples

7SKILL.mdUpdated Apr 15, 2026

anhvth/.github/skills/ray-distributed-computing

anhvth/multi-threading-processing

development

VerifiedTrustedCommunity

Comprehensive guide for using multi-threading and multi-processing in Python, including when to choose each approach, best practices, and practical examples using the speedy_utils library.

7SKILL.mdUpdated Apr 15, 2026

anhvth/multi-threading-processing

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/anhvth/speedy_utils.git

# Copy into Claude Code skills folder (global)
cp -r speedy_utils/.github/skills/llm-integration ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

anhvth/speedy_utils

7 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

anhvth/llm-integration

$ install --global

Security Scan Results

SKILL.md

LLM Integration Guide

When to Use This Skill

Prerequisites

Core Capabilities

Memoized OpenAI Clients (MOpenAI, MAsyncOpenAI)

Chat Format Transformation (transform_messages)

Usage Examples

Example 1: Memoized OpenAI Call

Example 2: Async Memoized Call

Example 3: Transforming Chat Formats

Guidelines

Limitations

Related Skills

anhvth/vision-utilities

anhvth/skill-creation

anhvth/.github/skills/ray-distributed-computing

anhvth/multi-threading-processing

anhvth/llm-integration

$ install --global

Security Scan Results

SKILL.md

LLM Integration Guide

When to Use This Skill

Prerequisites

Core Capabilities

Memoized OpenAI Clients (MOpenAI, MAsyncOpenAI)

Chat Format Transformation (transform_messages)

Usage Examples

Example 1: Memoized OpenAI Call

Example 2: Async Memoized Call

Example 3: Transforming Chat Formats

Guidelines

Limitations

Related Skills

anhvth/vision-utilities

anhvth/skill-creation

anhvth/.github/skills/ray-distributed-computing

anhvth/multi-threading-processing

Memoized OpenAI Clients (`MOpenAI`, `MAsyncOpenAI`)

Chat Format Transformation (`transform_messages`)

Memoized OpenAI Clients (`MOpenAI`, `MAsyncOpenAI`)

Chat Format Transformation (`transform_messages`)