skills/genai-services/SKILL.md
Use when implementing OCI GenAI inference APIs, debugging rate limit (429) or token limit (400) errors, selecting between command-r vs command-r-plus, handling PHI/PII in prompts, or optimizing GenAI costs. Covers model cost trade-offs, token management, rate limit backoff, PHI redaction patterns, and response validation for healthcare.
npx skillsauth add acedergren/agentic-tools genai-servicesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
4 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
❌ NEVER send PHI/PII identifiers to GenAI APIs
# WRONG - patient identifiers in external service logs
prompt = f"Transcribe note for patient {patient_name}, MRN {mrn}, SSN {ssn}: {note}"
# RIGHT - redact first, keep mapping in secure DB
prompt = f"Transcribe this medical note: {redacted_note}"
# phi_mapping stored locally: temp_id → real_id
GenAI service logs may retain data. Sending PHI violates HIPAA/GDPR regardless of Oracle BAA status.
❌ NEVER trust GenAI output without validation in critical systems
❌ NEVER exceed token limits silently
command-r-plus: 128k context (input + output combined)command-r: 4k context❌ NEVER call GenAI without rate limit handling — 429s are common and predictable; see backoff pattern below
❌ NEVER use GenAI for deterministic tasks
| Model | Context | Input Cost/1M | Output Cost/1M | Use For | |-------|---------|---------------|----------------|---------| | command-r-plus | 128k | ~$15 | ~$75 | Complex reasoning, long docs, RAG | | command-r | 4k | ~$1.50 | ~$7.50 | Chat, short prompts, high volume | | embed-english-v3 | 512 | ~$0.10 | N/A | Semantic search (1000x cheaper than generation) | | llama-2-70b | 4k | ~$2 | ~$10 | Cost-effective, open weights |
Decision rule: Start with command-r for everything. Upgrade to command-r-plus only when reasoning quality is demonstrably insufficient.
Cost optimization: Use embeddings for retrieval/search before invoking generation — same semantic result at 1000x lower cost.
| Model | Requests/Min | Requests/Day | |-------|-------------|--------------| | command-r-plus | 20 | 1,000 | | command-r | 60 | 3,000 | | Embeddings | 100 | 10,000 |
import time, random
from oci.exceptions import ServiceError
def generate_with_backoff(genai_client, request, max_retries=5):
for attempt in range(max_retries):
try:
response = genai_client.chat(request)
return response.data.chat_response.text
except ServiceError as e:
if e.status == 429 and attempt < max_retries - 1:
wait = (2 ** attempt) + random.uniform(0, 1) # 1s, 2s, 4s, 8s, 16s
time.sleep(wait)
elif e.status == 400:
if "token" in e.message.lower():
raise ValueError("Token limit exceeded — truncate input")
raise
else:
raise
def truncate_for_model(text: str, model: str = "command-r-plus", max_output: int = 2000) -> str:
limits = {"command-r-plus": 128000, "command-r": 4000}
max_input_tokens = limits.get(model, 2000) - max_output
max_chars = max_input_tokens * 4 # ~4 chars per token
if len(text) <= max_chars:
return text
return "...[earlier content truncated]...\n" + text[-max_chars:]
Prompt token savings: Verbose system prompts waste tokens at scale. "Summarize: diagnoses, meds, allergies, treatment plan." vs a 50-word instruction saves 50 tokens × 1000 req/day = $68/month at command-r-plus rates.
import re
def redact_phi(text: str) -> tuple[str, dict]:
"""Remove PHI, return (redacted_text, mapping_to_restore)"""
mapping = {}
redacted = text
# MRNs
mrn_pattern = r'\b(MRN|Medical Record):?\s*([A-Z0-9]{6,10})\b'
redacted = re.sub(mrn_pattern, r'\1: [REDACTED]', redacted)
# SSNs
redacted = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[SSN_REDACTED]', redacted)
# Names: use NER library (spacy or similar) for accuracy
# names = extract_names(text)
# for i, name in enumerate(names):
# placeholder = f"[PATIENT_{i}]"
# mapping[placeholder] = name
# redacted = redacted.replace(name, placeholder)
return redacted, mapping
def validate_medical_response(response: str) -> tuple[bool, list[str]]:
issues = []
if not response or len(response.strip()) < 10:
issues.append("Response too short or empty")
# Hallucination markers
for marker in ["I don't have access", "I cannot", "As an AI", "[INSERT", "TODO"]:
if marker.lower() in response.lower():
issues.append(f"Hallucination marker: {marker}")
# Expected structure (customize per use case)
for section in ["Chief Complaint", "Assessment", "Plan"]:
if section.lower() not in response.lower():
issues.append(f"Missing section: {section}")
# PII leak detection (if input was redacted)
for pattern in [r'\b\d{3}-\d{2}-\d{4}\b', r'\b[A-Z]{2}\d{6,8}\b']:
if re.search(pattern, response):
issues.append(f"Potential PII in response: {pattern}")
return len(issues) == 0, issues
Before going live with PHI-adjacent GenAI:
Load references/oci-genai-reference.md when you need:
development
--- name: api-audit description: "Use when auditing API routes for schema drift, missing auth, or validation gaps. Scans routes against shared TypeScript types to find mismatches, missing middleware, and undocumented endpoints. Read-only — produces a severity-grouped report. Keywords: audit routes, schema drift, auth gaps, missing validation, type mismatch, orphaned schemas. Triggers on "audit API routes" or "find schema drift"." --- # API Route & Type Audit Skill ## When to Use Load this skil
development
Use when drafting, translating, polishing, or reviewing Swedish text so it sounds natural, fluent, contemporary, and appropriate for its audience. Triggers include "write better Swedish", "make this sound natural in Swedish", "translate into Swedish", "polish this Swedish", "tech company Swedish", "contemporary Swedish words", "Swedish developer docs", and "avoid Anglicisms".
development
Use when working with shadcn-svelte components, TanStack Table in Svelte 5, or Tailwind v4.1. Covers non-obvious reactivity bugs, library selection trade-offs, and migration pitfalls not in the official docs. Keywords: shadcn-svelte, TanStack Table, Tailwind v4.1, Svelte 5 runes, bits-ui, superforms, data table, svelte-check.
data-ai
Use when mapping IDCS claims to org membership after OAuth login succeeds. Covers mapProfileToUser, session.create.before, session.create.after hooks, MERGE INTO upserts, tenant-org mapping, and first-admin bootstrap. Keywords: IDCS groups, org_members, provisioning, session hooks, tenant map, MERGE INTO.