skills/ollama/SKILL.md
Local AI integration patterns using Ollama. Use when building features that need local LLM inference for the store AI server or other on-prem solutions.
npx skillsauth add ComputerConnection/zach-pack ollamaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Patterns for local AI with Ollama.
# Is Ollama running?
curl http://localhost:11434/api/tags
# List models
ollama list
# Pull a model
ollama pull llama3.2
ollama pull mistral
ollama pull codellama
# Chat
ollama run llama3.2 "Hello, how are you?"
# From API
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Hello",
"stream": false
}'
import requests
def ask_ollama(prompt: str, model: str = "llama3.2") -> str:
response = requests.post(
"http://localhost:11434/api/generate",
json={
"model": model,
"prompt": prompt,
"stream": False
}
)
return response.json()["response"]
# Usage
answer = ask_ollama("What is the capital of France?")
With streaming:
import requests
import json
def stream_ollama(prompt: str, model: str = "llama3.2"):
response = requests.post(
"http://localhost:11434/api/generate",
json={"model": model, "prompt": prompt, "stream": True},
stream=True
)
for line in response.iter_lines():
if line:
data = json.loads(line)
yield data.get("response", "")
if data.get("done"):
break
# Usage
for chunk in stream_ollama("Tell me a story"):
print(chunk, end="", flush=True)
async function askOllama(prompt: string, model = "llama3.2"): Promise<string> {
const response = await fetch("http://localhost:11434/api/generate", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ model, prompt, stream: false })
});
const data = await response.json();
return data.response;
}
With streaming:
async function* streamOllama(prompt: string, model = "llama3.2") {
const response = await fetch("http://localhost:11434/api/generate", {
method: "POST",
body: JSON.stringify({ model, prompt, stream: true })
});
const reader = response.body?.getReader();
const decoder = new TextDecoder();
while (reader) {
const { done, value } = await reader.read();
if (done) break;
const lines = decoder.decode(value).split("\n");
for (const line of lines) {
if (line.trim()) {
const data = JSON.parse(line);
yield data.response || "";
}
}
}
}
import { useState, useCallback } from "react";
function useOllama(model = "llama3.2") {
const [loading, setLoading] = useState(false);
const [response, setResponse] = useState("");
const ask = useCallback(async (prompt: string) => {
setLoading(true);
setResponse("");
try {
const res = await fetch("http://localhost:11434/api/generate", {
method: "POST",
body: JSON.stringify({ model, prompt, stream: true })
});
const reader = res.body?.getReader();
const decoder = new TextDecoder();
while (reader) {
const { done, value } = await reader.read();
if (done) break;
const lines = decoder.decode(value).split("\n");
for (const line of lines) {
if (line.trim()) {
const data = JSON.parse(line);
setResponse(prev => prev + (data.response || ""));
}
}
}
} finally {
setLoading(false);
}
}, [model]);
return { ask, response, loading };
}
For the store AI that answers questions about SOPs/documents:
from sentence_transformers import SentenceTransformer
import numpy as np
import requests
# 1. Embed your documents
embedder = SentenceTransformer('all-MiniLM-L6-v2')
documents = [
"Return policy: Items can be returned within 30 days...",
"Repair intake: First, get customer name and phone...",
# ... more SOPs
]
embeddings = embedder.encode(documents)
# 2. On query, find relevant docs
def find_relevant(query: str, top_k: int = 3):
query_embedding = embedder.encode([query])[0]
scores = np.dot(embeddings, query_embedding)
top_indices = np.argsort(scores)[-top_k:][::-1]
return [documents[i] for i in top_indices]
# 3. Ask Ollama with context
def ask_with_context(question: str):
relevant_docs = find_relevant(question)
context = "\n\n".join(relevant_docs)
prompt = f"""Use the following context to answer the question.
Context:
{context}
Question: {question}
Answer:"""
return ask_ollama(prompt)
import chromadb
from chromadb.config import Settings
# Setup
client = chromadb.Client(Settings(
chroma_db_impl="duckdb+parquet",
persist_directory="./store_knowledge"
))
collection = client.get_or_create_collection("sops")
# Add documents
collection.add(
documents=["Return policy...", "Repair intake..."],
ids=["sop-1", "sop-2"],
metadatas=[{"category": "policy"}, {"category": "repair"}]
)
# Query
results = collection.query(
query_texts=["how do I process a return"],
n_results=3
)
| Model | Best For | Size |
|-------|----------|------|
| llama3.2 | General purpose, good balance | 3B |
| llama3.2:70b | Best quality (needs big GPU) | 70B |
| mistral | Fast, good for chat | 7B |
| codellama | Code generation | 7B-34B |
| phi3 | Small, fast, surprisingly good | 3B |
| mixtral | High quality, MoE | 8x7B |
For store AI POC: Start with llama3.2 or mistral. Upgrade if needed.
def ask_with_system(prompt: str, system: str, model: str = "llama3.2"):
response = requests.post(
"http://localhost:11434/api/generate",
json={
"model": model,
"prompt": prompt,
"system": system,
"stream": False
}
)
return response.json()["response"]
# Store AI system prompt
STORE_SYSTEM = """You are the Computer Connection store assistant.
You help staff and customers with questions about store policies,
repair procedures, and product information.
Be concise and helpful. If you don't know, say so.
Never make up information about store policies."""
def chat(messages: list, model: str = "llama3.2"):
"""
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there!"},
{"role": "user", "content": "What's your return policy?"}
]
"""
response = requests.post(
"http://localhost:11434/api/chat",
json={
"model": model,
"messages": messages,
"stream": False
}
)
return response.json()["message"]["content"]
Keep model loaded: First request is slow (loading). Keep a warm model.
# Keep model in memory
curl http://localhost:11434/api/generate -d '{"model":"llama3.2","keep_alive":"24h"}'
Use appropriate model size: Don't use 70B if 7B works.
Batch similar requests: If processing multiple docs, batch them.
Set context length: Lower = faster
json={"model": model, "prompt": prompt, "options": {"num_ctx": 2048}}
GPU vs CPU: Ensure Ollama is using GPU
# Check
ollama ps
# Should show GPU memory usage
# Restart
systemctl restart ollama
# or
ollama serve
llama3.2:q4_0)Based on $ARGUMENTS:
data-ai
Inject Zach's full identity, business context, and working preferences. Use at session start to eliminate cold starts. Lightweight context load — not a full agent like Vision, just who Zach is and how to work with him.
tools
--- name: vision description: "Zach's personal AI — his Jarvis. NOT a store agent. This is the owner's private command center that sits above everything else. Handles anything Zach needs — business, personal, technical, strategic, creative. High-systems AI: precise, anticipatory, authoritative. Invoke for ANY task." context: fork allowed-tools: Read, Grep, Glob, Bash, Edit, Write, Task, TodoWrite argument-hint: [what-do-you-need] — freeform. Vision figures out the rest. --- # VISION — Zach's Ja
development
Tauri-specific development patterns for NEXUS. Use when building desktop app features, handling IPC, or working with Rust backend.
development
Document Computer Connection store processes in AI-queryable format. Use to capture SOPs for the store AI server POC.