hermes-skills/ai-providers/ollama-on-vps/SKILL.md
Ollama LLM running on VPS as arifOS/A-FORGE fallback — models, endpoints, embedding setup
npx skillsauth add ariffazil/openclaw-workspace ollama-on-vpsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Container: ollama-engine-prod
Image: ollama/ollama:latest
Network: docker bridge / compose network
qwen2.5:7b — chat model (used by arifOS call_llm Tier 2 fallback)
bge-m3:latest — embedding model (used by A-FORGE LongTermMemory)
POST http://127.0.0.1:11434/api/generate
{
"model": "qwen2.5:7b",
"prompt": "...",
"stream": false,
"temperature": 0.3,
"options": {"num_predict": 1200}
}
Response: {"response": "...", "done": true}
POST http://127.0.0.1:11434/api/embeddings
{
"model": "bge-m3:latest",
"prompt": "..."
}
Response: {"embedding": [...], "done": true}
GET http://127.0.0.1:11434/api/tags
Response: {"models": [{"name": "qwen2.5:7b"}, {"name": "bge-m3:latest"}]}
File: /root/arifOS/arifosmcp/runtime/llm_client.py
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL") or os.getenv("OLLAMA_URL", "http://ollama:11434")
OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "qwen2.5:7b")
async def _call_ollama(system, user, response_schema, temperature, max_tokens=1200):
prompt = f"{system}\n\n{user}"
payload = {
"model": OLLAMA_MODEL,
"prompt": prompt,
"stream": False,
"temperature": temperature,
"options": {"num_predict": max_tokens},
}
if response_schema:
payload["format"] = "json"
# ... httpx POST to /api/generate
arifOS calls Ollama when SEA-LION Tier 1 fails. All 3 LLM tools (mind_reason, heart_critique, reply_compose) fall back here.
File: /root/A-FORGE/src/memory/LongTermMemory.ts
const OLLAMA_URL = process.env.OLLAMA_URL ?? "http://localhost:11434";
const response = await fetch(`${OLLAMA_URL}/api/embeddings`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ model: "bge-m3:latest", prompt }),
});
From inside other containers (arifOS MCP, A-FORGE):
http://ollama:11434 (Docker compose service name)
From VPS host:
http://127.0.0.1:11434
Ollama not responding:
curl -s http://127.0.0.1:11434/api/tags
# Empty = container down
docker ps | grep ollama
Model not loaded:
"model 'qwen2.5:7b' not found"
→ docker exec ollama-engine-prod ollama pull qwen2.5:7b
arifOS still trying SEA-LION: Check logs — if you see "SEA-LION HTTP 401" repeatedly, Tier 1 is failing and falling through to Ollama. This is EXPECTED behavior when SEA-LION key is invalid.
docker exec -it ollama-engine-prod ollama pull <model>
# e.g.:
docker exec -it ollama-engine-prod ollama pull llama3:8b
docker exec -it ollama-engine-prod ollama pull nomic-embed-text
Update arifOS to use new model:
# In .env
OLLAMA_MODEL=llama3:8b
Update A-FORGE embedding model:
# In A-FORGE .env
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
development
Governed intelligence skill for AAA as the abstraction, attestation, and abduction control plane across arifOS, APEX, A-FORGE, GEOX, WEALTH, WELL, and the ariffazil profile repository. Use when the user asks to explain or design AAA, route agentic work, reduce chaos/entropy in an arifOS federation task, create AREP/task declarations, classify risk, plan multi-repo changes, review governance boundaries, or translate human intent into evidence-backed, authority-safe, recursively agentic workflows. Provides deterministic F1-F13 floor checking, bounded abduction, and FederationReceipt composition.
development
Check every skill’s “use when” and “do not use when” clauses for collisions, missing negatives, and vague verbs like “help,” “assist,” or “improve.” Load when linting, reviewing, or validating trigger boundaries.
development
Bootstrap, design, and package new skills. Load when capturing user intent for a new skill or drafting its initial instruction framework.
content-media
Diagnose which federation services are up, down, or drifting. Produce a prioritized remediation plan.