skills/ollama/SKILL.md
Run LLMs locally with Ollama — model management, REST API, Python SDK, OpenAI-compatible endpoints, tool calling, structured output, vision, embeddings, and production deployment. MANDATORY TRIGGERS: ollama, Ollama, ollama run, ollama pull, ollama serve, Modelfile, ollama python, ollama api, ollama docker. Also trigger when user wants to run LLMs locally, serve models on localhost, create custom model configurations, build local RAG pipelines with embeddings, use OpenAI-compatible local endpoints, or deploy self-hosted LLM inference. When in doubt about whether to use this skill for local LLM tasks, use it.
npx skillsauth add abhisheksharma-17/skills-graph ollamaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Run large language models locally with a simple CLI and API.
Source: ollama.com/docs | Package: ollama v0.22.x | License: MIT
| Reference | File | Read When |
|-----------|------|-----------|
| Overview & Setup | references/00-overview.md | Getting started, installation, quickstart, what Ollama is |
| CLI Reference | references/01-cli-reference.md | run, pull, push, create, list, show, cp, rm, serve, ps commands |
| REST API | references/02-api-reference.md | /api/generate, /api/chat, /api/tags, /api/embeddings endpoints |
| Modelfile | references/03-modelfile.md | FROM, PARAMETER, SYSTEM, TEMPLATE, ADAPTER, custom models |
| Python Library | references/04-python-library.md | Client, AsyncClient, chat, generate, embeddings, streaming |
| OpenAI Compatibility | references/05-openai-compatibility.md | /v1/chat/completions, /v1/embeddings, drop-in replacement |
| Tool Calling | references/06-tool-calling.md | Function calling, tools field, streaming tool calls |
| Structured Output | references/07-structured-output.md | JSON mode, schema enforcement, format parameter |
| Vision & Multimodal | references/08-vision-multimodal.md | Image input, LLaVA, Gemma 4, Qwen2.5-VL, base64 images |
| Embeddings & RAG | references/09-embeddings-rag.md | nomic-embed-text, all-minilm, vector DB integration, RAG |
| GPU & Performance | references/10-gpu-performance.md | GPU layers, VRAM, multi-GPU, num_gpu, context sizing |
| Configuration | references/11-configuration.md | Environment variables, OLLAMA_HOST, networking, model storage |
| Docker & Deployment | references/12-docker-deployment.md | Docker setup, production deployment, scaling, monitoring |
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Python library
pip install ollama
# Docker
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
development
High-throughput LLM inference and serving engine with PagedAttention, continuous batching, and OpenAI-compatible API. MANDATORY TRIGGERS: vLLM, vllm, LLM serving, LLM inference engine, PagedAttention. Also trigger when the user wants to serve LLMs in production, deploy models with tensor parallelism, use speculative decoding, quantize models for inference, build OpenAI-compatible API servers, or optimize LLM throughput and latency. When in doubt about whether to use this skill for LLM serving tasks, use it.
tools
Type-safe Python agent framework for building production-grade GenAI applications with Pydantic validation, structured outputs, and dependency injection. MANDATORY TRIGGERS: pydantic-ai, pydantic_ai, PydanticAI, pydantic ai agent. Also trigger when the user wants to build type-safe AI agents in Python, create structured LLM outputs with Pydantic models, implement dependency injection for agents, use tools/capabilities with LLMs, or build multi-agent systems with Python type safety. When in doubt about whether to use this skill for Python AI agent tasks, use it.
development
Durable execution platform for building fault-tolerant workflows, long-running processes, and resilient distributed applications. MANDATORY TRIGGERS: temporal, temporal.io, temporalio, durable execution, workflow orchestration engine. Also trigger when the user wants to build fault-tolerant workflows, implement saga patterns, create long-running distributed processes, orchestrate microservices with retries and timeouts, or build durable AI agent pipelines. When in doubt about whether to use this skill for workflow orchestration or durable execution tasks, use it.
tools
AI framework for building RAG pipelines, agents, workflows, and data-augmented LLM applications with 300+ integrations. MANDATORY TRIGGERS: llamaindex, llama-index, llama_index, LlamaIndex, VectorStoreIndex, SimpleDirectoryReader, LlamaHub, LlamaParse. Also trigger when the user wants to build RAG applications with LlamaIndex, create document indexing pipelines, build agentic workflows with tool calling, implement structured data extraction from documents, or connect LLMs to custom data sources. When in doubt about whether to use this skill for RAG, document indexing, or LLM data augmentation tasks, use it.