.github/skills/llm-integration/SKILL.md
Guide for using LLM utilities in speedy_utils, including memoized OpenAI clients and chat format transformations.
npx skillsauth add anhvth/speedy_utils llm-integrationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill provides comprehensive guidance for using the LLM utilities in speedy_utils.
Use this skill when you need to:
speedy_utils installed.openai package installed for API clients.MOpenAI, MAsyncOpenAI)OpenAI and AsyncOpenAI.post (chat completion) requests.speedy_utils caching backend (disk/memory).transform_messages)chatml: List of {"role": "...", "content": "..."} dicts.sharegpt: Dict with {"conversations": [{"from": "...", "value": "..."}]}.text: String with <|im_start|> tokens.simulated_chat: Human/AI transcript format.Make repeated calls without hitting the API twice.
from llm_utils.lm.openai_memoize import MOpenAI
# Initialize just like OpenAI client
client = MOpenAI(api_key="sk-...")
# First call hits the API
response1 = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello"}]
)
# Second call returns cached result instantly
response2 = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello"}]
)
Same as above but for async workflows.
from llm_utils.lm.openai_memoize import MAsyncOpenAI
import asyncio
async def main():
client = MAsyncOpenAI(api_key="sk-...")
response = await client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hi"}]
)
Convert ShareGPT format to ChatML.
from llm_utils.chat_format.transform import transform_messages
sharegpt_data = {
"conversations": [
{"from": "human", "value": "Hi"},
{"from": "gpt", "value": "Hello there"}
]
}
# Convert to ChatML list
chatml_data = transform_messages(sharegpt_data, frm="sharegpt", to="chatml")
# Result: [{'role': 'user', 'content': 'Hi'}, {'role': 'assistant', 'content': 'Hello there'}]
# Convert to Text string
text_data = transform_messages(chatml_data, frm="chatml", to="text")
# Result: "<|im_start|>user\nHi<|im_end|>\n<|im_start|>assistant\nHello there<|im_end|>\n<|im_start|>assistant\n"
Caching Behavior:
create.temperature, model), it counts as a new request.memoize).Format Detection:
transform_messages tries to auto-detect input format, but it's safer to specify frm explicitly.Tokenizer Support:
tokenizer to transform_messages to use its specific chat template.stream=True).documentation
Guide for using vision utilities in speedy_utils, including fast GPU image loading, memory-mapped datasets, and notebook visualization.
development
Guide for creating new Agent Skills with proper structure, frontmatter, bundled assets, and validation. Includes templates, best practices, and examples for building reusable skill resources.
documentation
Comprehensive guide to using Ray for scalable distributed computing, including Ray Core, Data, Train, Tune, Serve, and RLlib with practical examples
development
Comprehensive guide for using multi-threading and multi-processing in Python, including when to choose each approach, best practices, and practical examples using the speedy_utils library.