.github/skills/io-utilities/SKILL.md
Guide for using IO utilities in speedy_utils, including fast JSONL reading, multi-format loading, and file serialization.
npx skillsauth add anhvth/speedy_utils io-utilitiesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill provides comprehensive guidance for using the IO utilities in speedy_utils.
Use this skill when you need to:
speedy_utils installed.orjson: For faster JSON parsing.zstandard: For .zst file support.pandas: For CSV/TSV loading.pyarrow: For faster CSV reading with pandas.fast_load_jsonl)orjson if available for speed.tqdm.load_by_ext)data/*.json) and lists of files.do_memoize=True.dump_json_or_pickle, load_json_or_pickle)Read a large compressed JSONL file line by line.
from speedy_utils import fast_load_jsonl
# Iterates lazily, low memory usage
for item in fast_load_jsonl('large_data.jsonl.gz', progress=True):
process(item)
Load a file without worrying about the format.
from speedy_utils import load_by_ext
data = load_by_ext('config.json')
df = load_by_ext('data.csv')
items = load_by_ext('dataset.pkl')
Load multiple files in parallel.
from speedy_utils import load_by_ext
# Returns a list of results, one for each file
all_data = load_by_ext('logs/*.jsonl')
Save data to disk, creating directories as needed.
from speedy_utils import dump_json_or_pickle
data = {"key": "value"}
dump_json_or_pickle(data, 'output/processed/result.json')
Prefer JSONL for Large Datasets:
fast_load_jsonl for datasets that don't fit in memory..jsonl.gz or .jsonl.zst) to save space.Use load_by_ext for Scripts:
load_by_ext to be flexible.Error Handling:
fast_load_jsonl has an on_error parameter (raise, warn, skip) to handle malformed lines gracefully.Performance:
orjson for significantly faster JSON operations.load_by_ext uses pyarrow engine for CSVs if available, which is much faster.load_by_ext loads the entire file into memory. Use fast_load_jsonl for streaming.load_by_ext with glob patterns loads all matching files into memory at once (in a list). Be careful with massive datasets.documentation
Guide for using vision utilities in speedy_utils, including fast GPU image loading, memory-mapped datasets, and notebook visualization.
development
Guide for creating new Agent Skills with proper structure, frontmatter, bundled assets, and validation. Includes templates, best practices, and examples for building reusable skill resources.
documentation
Comprehensive guide to using Ray for scalable distributed computing, including Ray Core, Data, Train, Tune, Serve, and RLlib with practical examples
development
Comprehensive guide for using multi-threading and multi-processing in Python, including when to choose each approach, best practices, and practical examples using the speedy_utils library.