Skill: Genie NPU Node (ir20) – OpenAI-Compatible-ish API

This skill teaches GoatCitadel how to use a specific Windows ARM64 laptop node (Tailscale name ir20) running GenieAPIService (FastAPI/Uvicorn) that exposes an OpenAI-style HTTP API.

Use this node when you want:

Local / private inference (stays on your devices + mesh)
NPU-backed inference (when available)
A mesh compute target that other GoatCitadel nodes can call

Node identity

Node name: ir20
Service: GenieAPIService
Port: 8910
Base path: /v1
Observed models (from GET /v1/models):
- IBM-Granite
- IBM-Granite-v3.1-8B

Connection addresses (priority order)

Pick the first that works.

Tailscale IP (preferred for mesh):
http://100.64.0.4:8910/v1
Tailscale MagicDNS hostname (if enabled):
http://ir20:8910/v1
LAN IP (same Wi‑Fi/LAN only):
http://192.168.0.108:8910/v1

Notes

The service binds to 0.0.0.0:8910 on the laptop (so LAN + Tailscale can reach it).
Recommended Windows Firewall rule: allow inbound only from Tailscale range 100.64.0.0/10.

What the endpoints are (and what they’re for)

Text endpoints (the important ones)

GET /v1/models
Returns available model IDs.
POST /v1/chat/completions
Main “Chat Completions” endpoint (supports streaming via SSE).
POST /v1/completions
Sometimes exists as an alias/compat endpoint. Prefer /v1/chat/completions.

Image endpoints (likely “best effort”)

Swagger/OpenAPI may list:

POST /v1/images/generations
POST /images/generations

These are meant for image generation (like “DALL·E style” APIs). On this node, they may be present in OpenAPI but not actually wired to a working image model. Treat them as probe-only: try it, and if it errors, route image generation to another provider/node.

Health checks

1) Basic “alive” check (fast)

GET {base}/models

Expected:

HTTP 200
JSON body containing model list
Should respond quickly

2) Functional “can it generate text” check (strict)

POST {base}/chat/completions with a tiny request (max_tokens small).

Request format

Chat Completions request schema (practical)

Most requests look like this:

{
  "model": "IBM-Granite",
  "stream": false,
  "messages": [
    { "role": "user", "content": "Reply with exactly: pong" }
  ],
  "temperature": 0,
  "max_tokens": 3
}

Parameters that typically work

model (string): one of the IDs from /v1/models
messages (array): {role, content} chat messages
- roles: system, user, assistant
stream (bool): true for SSE streaming, false for one-shot
temperature (number): randomness
max_tokens (int): keep small for control tests
top_p (number): nucleus sampling (if supported)
top_k (int): top‑k sampling (if supported)

Schema mismatch note: `temp` vs `temperature`

Some Genie/OpenAPI screens show temp while many OpenAI clients use temperature. You already verified temperature works via Invoke-RestMethod. If a client requires temp, use it as a fallback:

{ "temp": 0.2 }

If both are present, prefer temperature unless Genie errors.

Streaming format (SSE)

When stream: true, responses arrive like:

data: { ... "object":"chat.completion.chunk", ... }
data: { ... }
data: [DONE]

Each chunk contains partial text under something like:

choices[0].delta.content

Implementation tip:

Concatenate delta.content fields until you see [DONE].

PowerShell: common gotchas and working patterns

1) The colon `:` interpolation trap

In PowerShell, this can break:

$base = "http://$ip:8910/v1"   # ❌ sometimes parses weird

Use either:

$base = "http://$($ip):8910/v1"
# or
$base = "http://{0}:8910/v1" -f $ip

2) The “file didn’t get created” trap

A here-string by itself only prints. You must pipe it to a file:

@'
hello
'@ | Set-Content -Path .\skill.md -Encoding utf8

3) UTF-8 console output (fix weird characters like `âï¸`)

chcp 65001
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8

Working PowerShell examples

Models list

$base = "http://100.64.0.4:8910/v1"
curl.exe -s "$base/models"

One-shot chat completion (Invoke-RestMethod)

$base = "http://100.64.0.4:8910/v1"

$body = @{
  model="IBM-Granite"
  stream=$false
  messages=@(@{ role="user"; content="Reply with exactly: pong" })
  temperature=0
  max_tokens=3
} | ConvertTo-Json -Depth 10

$resp = Invoke-RestMethod -Method Post -Uri ($base + "/chat/completions") `
  -ContentType "application/json" -Body $body

$resp.choices[0].message.content

Streaming (curl SSE)

$base = "http://100.64.0.4:8910/v1"

$payloadPath = "$env:TEMP\genie_stream.json"
@{
  model="IBM-Granite"
  stream=$true
  messages=@(@{ role="user"; content="Count 1 to 10, each on a new line." })
  temperature=0.2
} | ConvertTo-Json -Depth 10 | Set-Content -Encoding utf8 -NoNewline $payloadPath

curl.exe -N -s -X POST ($base + "/chat/completions") `
  -H "Content-Type: application/json" `
  --data-binary "@$payloadPath"

Python examples (for GoatCitadel integrations / adapters)

A) Plain HTTP (no OpenAI SDK)

import requests

BASE = "http://100.64.0.4:8910/v1"

payload = {
  "model": "IBM-Granite",
  "stream": False,
  "messages": [{"role": "user", "content": "Reply with exactly: pong"}],
  "temperature": 0,
  "max_tokens": 3,
}

r = requests.post(f"{BASE}/chat/completions", json=payload, timeout=30)
r.raise_for_status()
print(r.json()["choices"][0]["message"]["content"])

B) OpenAI Python SDK pointing at Genie (works if the endpoint is compatible)

If GoatCitadel uses the OpenAI SDK pattern, set:

base_url to Genie
api_key to any dummy string (if Genie doesn’t enforce auth)

from openai import OpenAI

client = OpenAI(
  base_url="http://100.64.0.4:8910/v1",
  api_key="local-no-auth",
)

resp = client.chat.completions.create(
  model="IBM-Granite",
  messages=[{"role": "user", "content": "Reply with exactly: pong"}],
  temperature=0,
  max_tokens=3,
)
print(resp.choices[0].message.content)

If the SDK complains about unsupported fields, fall back to the plain HTTP method.

GoatCitadel usage guidance (how to route work here)

When GoatCitadel needs text generation, it should:

Choose base URL in this order:
- http://100.64.0.4:8910/v1
- http://ir20:8910/v1
- http://192.168.0.108:8910/v1
Call GET /models (cache for a short time, e.g., 60 seconds)
Prefer IBM-Granite unless explicitly asked for IBM-Granite-v3.1-8B
Use POST /chat/completions with:
- stream: false for short tool calls / structured answers
- stream: true for long-form interactive output

Recommended defaults for stable agent behavior

temperature: 0.0 to 0.3
max_tokens: small for “tool-ish” calls, larger for writing tasks
Consider enforcing “less chatty” behavior with a system message:
- “Follow instructions exactly. Output only what’s requested.”

Security posture (recommended)

Prefer allowing inbound 8910 only from Tailscale:
- RemoteAddress: 100.64.0.0/10
Avoid exposing 8910 to the public internet.
If you later want authenticated access:
- Put a reverse proxy in front (Tailscale Serve / Caddy / Traefik) with auth.
- Or implement an API key check in GenieAPIService.

Troubleshooting quick hits

Port bind error: WinError 10048

Meaning: something else already owns 8910.

Find it:

Get-NetTCPConnection -LocalPort 8910 -State Listen | Select-Object -First 1 | Format-List

Kill the owning PID (careful):

$pid = (Get-NetTCPConnection -LocalPort 8910 -State Listen | Select-Object -First 1).OwningProcess
Stop-Process -Id $pid -Force

`Invoke-RestMethod` returns weird truncated objects

PowerShell sometimes formats nested objects oddly. For inspection:

$resp | ConvertTo-Json -Depth 50

Capability summary

✅ Confirmed working:

/v1/models
/v1/chat/completions (non-stream)
/v1/chat/completions (stream via SSE)

⚠️ Present but not guaranteed:

image generation endpoints

Future extension (planned)

This skill currently hardcodes ir20 details. Later you can generalize by:

Discovering nodes via mesh registry
Probing /v1/models to detect capability
Selecting nodes by latency/cost/availability

Skill: Genie NPU Node (ir20) – OpenAI-Compatible-ish API

This skill teaches GoatCitadel how to use a specific Windows ARM64 laptop node (Tailscale name ir20) running GenieAPIService (FastAPI/Uvicorn) that exposes an OpenAI-style HTTP API.

Use this node when you want:

Local / private inference (stays on your devices + mesh)
NPU-backed inference (when available)
A mesh compute target that other GoatCitadel nodes can call

Node identity

Node name: ir20
Service: GenieAPIService
Port: 8910
Base path: /v1
Observed models (from GET /v1/models):
- IBM-Granite
- IBM-Granite-v3.1-8B

Connection addresses (priority order)

Pick the first that works.

Tailscale IP (preferred for mesh):
http://100.64.0.4:8910/v1
Tailscale MagicDNS hostname (if enabled):
http://ir20:8910/v1
LAN IP (same Wi‑Fi/LAN only):
http://192.168.0.108:8910/v1

Notes

The service binds to 0.0.0.0:8910 on the laptop (so LAN + Tailscale can reach it).
Recommended Windows Firewall rule: allow inbound only from Tailscale range 100.64.0.0/10.

What the endpoints are (and what they’re for)

Text endpoints (the important ones)

GET /v1/models
Returns available model IDs.
POST /v1/chat/completions
Main “Chat Completions” endpoint (supports streaming via SSE).
POST /v1/completions
Sometimes exists as an alias/compat endpoint. Prefer /v1/chat/completions.

Image endpoints (likely “best effort”)

Swagger/OpenAPI may list:

POST /v1/images/generations
POST /images/generations

Health checks

1) Basic “alive” check (fast)

GET {base}/models

Expected:

HTTP 200
JSON body containing model list
Should respond quickly

2) Functional “can it generate text” check (strict)

POST {base}/chat/completions with a tiny request (max_tokens small).

Request format

Chat Completions request schema (practical)

Most requests look like this:

{
  "model": "IBM-Granite",
  "stream": false,
  "messages": [
    { "role": "user", "content": "Reply with exactly: pong" }
  ],
  "temperature": 0,
  "max_tokens": 3
}

Parameters that typically work

model (string): one of the IDs from /v1/models
messages (array): {role, content} chat messages
- roles: system, user, assistant
stream (bool): true for SSE streaming, false for one-shot
temperature (number): randomness
max_tokens (int): keep small for control tests
top_p (number): nucleus sampling (if supported)
top_k (int): top‑k sampling (if supported)

Schema mismatch note: `temp` vs `temperature`

{ "temp": 0.2 }

If both are present, prefer temperature unless Genie errors.

Streaming format (SSE)

When stream: true, responses arrive like:

data: { ... "object":"chat.completion.chunk", ... }
data: { ... }
data: [DONE]

Each chunk contains partial text under something like:

choices[0].delta.content

Implementation tip:

Concatenate delta.content fields until you see [DONE].

PowerShell: common gotchas and working patterns

1) The colon `:` interpolation trap

In PowerShell, this can break:

$base = "http://$ip:8910/v1"   # ❌ sometimes parses weird

Use either:

$base = "http://$($ip):8910/v1"
# or
$base = "http://{0}:8910/v1" -f $ip

2) The “file didn’t get created” trap

A here-string by itself only prints. You must pipe it to a file:

@'
hello
'@ | Set-Content -Path .\skill.md -Encoding utf8

3) UTF-8 console output (fix weird characters like `âï¸`)

chcp 65001
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8

Working PowerShell examples

Models list

$base = "http://100.64.0.4:8910/v1"
curl.exe -s "$base/models"

One-shot chat completion (Invoke-RestMethod)

$base = "http://100.64.0.4:8910/v1"

$body = @{
  model="IBM-Granite"
  stream=$false
  messages=@(@{ role="user"; content="Reply with exactly: pong" })
  temperature=0
  max_tokens=3
} | ConvertTo-Json -Depth 10

$resp = Invoke-RestMethod -Method Post -Uri ($base + "/chat/completions") `
  -ContentType "application/json" -Body $body

$resp.choices[0].message.content

Streaming (curl SSE)

$base = "http://100.64.0.4:8910/v1"

$payloadPath = "$env:TEMP\genie_stream.json"
@{
  model="IBM-Granite"
  stream=$true
  messages=@(@{ role="user"; content="Count 1 to 10, each on a new line." })
  temperature=0.2
} | ConvertTo-Json -Depth 10 | Set-Content -Encoding utf8 -NoNewline $payloadPath

curl.exe -N -s -X POST ($base + "/chat/completions") `
  -H "Content-Type: application/json" `
  --data-binary "@$payloadPath"

Python examples (for GoatCitadel integrations / adapters)

A) Plain HTTP (no OpenAI SDK)

import requests

BASE = "http://100.64.0.4:8910/v1"

payload = {
  "model": "IBM-Granite",
  "stream": False,
  "messages": [{"role": "user", "content": "Reply with exactly: pong"}],
  "temperature": 0,
  "max_tokens": 3,
}

r = requests.post(f"{BASE}/chat/completions", json=payload, timeout=30)
r.raise_for_status()
print(r.json()["choices"][0]["message"]["content"])

B) OpenAI Python SDK pointing at Genie (works if the endpoint is compatible)

If GoatCitadel uses the OpenAI SDK pattern, set:

base_url to Genie
api_key to any dummy string (if Genie doesn’t enforce auth)

from openai import OpenAI

client = OpenAI(
  base_url="http://100.64.0.4:8910/v1",
  api_key="local-no-auth",
)

resp = client.chat.completions.create(
  model="IBM-Granite",
  messages=[{"role": "user", "content": "Reply with exactly: pong"}],
  temperature=0,
  max_tokens=3,
)
print(resp.choices[0].message.content)

If the SDK complains about unsupported fields, fall back to the plain HTTP method.

GoatCitadel usage guidance (how to route work here)

When GoatCitadel needs text generation, it should:

Choose base URL in this order:
- http://100.64.0.4:8910/v1
- http://ir20:8910/v1
- http://192.168.0.108:8910/v1
Call GET /models (cache for a short time, e.g., 60 seconds)
Prefer IBM-Granite unless explicitly asked for IBM-Granite-v3.1-8B
Use POST /chat/completions with:
- stream: false for short tool calls / structured answers
- stream: true for long-form interactive output

Recommended defaults for stable agent behavior

temperature: 0.0 to 0.3
max_tokens: small for “tool-ish” calls, larger for writing tasks
Consider enforcing “less chatty” behavior with a system message:
- “Follow instructions exactly. Output only what’s requested.”

Security posture (recommended)

Prefer allowing inbound 8910 only from Tailscale:
- RemoteAddress: 100.64.0.0/10
Avoid exposing 8910 to the public internet.
If you later want authenticated access:
- Put a reverse proxy in front (Tailscale Serve / Caddy / Traefik) with auth.
- Or implement an API key check in GenieAPIService.

Troubleshooting quick hits

Port bind error: WinError 10048

Meaning: something else already owns 8910.

Find it:

Get-NetTCPConnection -LocalPort 8910 -State Listen | Select-Object -First 1 | Format-List

Kill the owning PID (careful):

$pid = (Get-NetTCPConnection -LocalPort 8910 -State Listen | Select-Object -First 1).OwningProcess
Stop-Process -Id $pid -Force

`Invoke-RestMethod` returns weird truncated objects

PowerShell sometimes formats nested objects oddly. For inspection:

$resp | ConvertTo-Json -Depth 50

Capability summary

✅ Confirmed working:

/v1/models
/v1/chat/completions (non-stream)
/v1/chat/completions (stream via SSE)

⚠️ Present but not guaranteed:

image generation endpoints

Future extension (planned)

This skill currently hardcodes ir20 details. Later you can generalize by:

Discovering nodes via mesh registry
Probing /v1/models to detect capability
Selecting nodes by latency/cost/availability

Adoption

spurnout/skills/genie-npu-ir20

$ install --global

Security Scan Results

SKILL.md

Skill: Genie NPU Node (ir20) – OpenAI-Compatible-ish API

Node identity

Connection addresses (priority order)

Notes

What the endpoints are (and what they’re for)

Text endpoints (the important ones)

Image endpoints (likely “best effort”)

Health checks

1) Basic “alive” check (fast)

2) Functional “can it generate text” check (strict)

Request format

Chat Completions request schema (practical)

Parameters that typically work

Schema mismatch note: temp vs temperature

Streaming format (SSE)

PowerShell: common gotchas and working patterns

1) The colon : interpolation trap

2) The “file didn’t get created” trap

3) UTF-8 console output (fix weird characters like âï¸)

Working PowerShell examples

Models list

One-shot chat completion (Invoke-RestMethod)

Streaming (curl SSE)

Python examples (for GoatCitadel integrations / adapters)

A) Plain HTTP (no OpenAI SDK)

B) OpenAI Python SDK pointing at Genie (works if the endpoint is compatible)

GoatCitadel usage guidance (how to route work here)

Recommended defaults for stable agent behavior

Security posture (recommended)

Troubleshooting quick hits

Port bind error: WinError 10048

Invoke-RestMethod returns weird truncated objects

Capability summary

Future extension (planned)

Related Skills

spurnout/templates/skills/bankr-optional

spurnout/sendclaw

spurnout/research

spurnout/qa

spurnout/skills/genie-npu-ir20

$ install --global

Security Scan Results

SKILL.md

Skill: Genie NPU Node (ir20) – OpenAI-Compatible-ish API

Node identity

Connection addresses (priority order)

Notes

What the endpoints are (and what they’re for)

Text endpoints (the important ones)

Image endpoints (likely “best effort”)

Health checks

1) Basic “alive” check (fast)

2) Functional “can it generate text” check (strict)

Request format

Chat Completions request schema (practical)

Parameters that typically work

Schema mismatch note: temp vs temperature

Streaming format (SSE)

PowerShell: common gotchas and working patterns

1) The colon : interpolation trap

2) The “file didn’t get created” trap

3) UTF-8 console output (fix weird characters like âï¸)

Working PowerShell examples

Models list

One-shot chat completion (Invoke-RestMethod)

Streaming (curl SSE)

Python examples (for GoatCitadel integrations / adapters)

A) Plain HTTP (no OpenAI SDK)

B) OpenAI Python SDK pointing at Genie (works if the endpoint is compatible)

GoatCitadel usage guidance (how to route work here)

Recommended defaults for stable agent behavior

Security posture (recommended)

Troubleshooting quick hits

Port bind error: WinError 10048

Invoke-RestMethod returns weird truncated objects

Schema mismatch note: `temp` vs `temperature`

1) The colon `:` interpolation trap

3) UTF-8 console output (fix weird characters like `âï¸`)

`Invoke-RestMethod` returns weird truncated objects

Schema mismatch note: `temp` vs `temperature`

1) The colon `:` interpolation trap

3) UTF-8 console output (fix weird characters like `âï¸`)

`Invoke-RestMethod` returns weird truncated objects