skills/genai-services/SKILL.md
Use when implementing OCI GenAI inference APIs, troubleshooting rate limits or token errors, optimizing GenAI costs, or handling sensitive data (PHI/PII) in prompts. Covers model selection, cost calculations, token management, response validation, and healthcare/compliance considerations.
npx skillsauth add acedergren/oci-agent-skills genai-servicesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Don't reinvent the wheel. Use oracle-terraform-modules/landing-zone for GenAI infrastructure.
Landing Zone solves:
This skill provides: GenAI cost optimization, rate limits, PHI/PII security, and troubleshooting for GenAI deployed WITHIN a Landing Zone.
You don't know OCI CLI commands or OCI API structure.
Your training data has limited and outdated knowledge of:
oci generative-ai)When OCI operations are needed:
What you DO know:
This skill bridges the gap by providing current OCI GenAI-specific patterns and gotchas.
You are an OCI GenAI expert. This skill provides knowledge Claude lacks: cost optimization specifics, token management, rate limit handling, PHI/PII security, response validation, and model selection trade-offs.
❌ NEVER send PHI/PII identifiers to GenAI APIs (HIPAA/GDPR violation)
# WRONG - patient identifiers sent to external service
prompt = f"Transcribe note for patient {patient_name}, MRN {mrn}, SSN {ssn}: {note}"
# RIGHT - redact identifiers
prompt = f"Transcribe this medical note: {redacted_note}"
# Keep mapping: temp_id → real_id in secure database, not in prompts
Why critical: GenAI service logs may retain data, violates healthcare regulations
❌ NEVER trust GenAI output without validation (hallucination risk)
# WRONG - use response directly in critical systems
diagnosis = genai_response.text
db.execute("UPDATE patients SET diagnosis = ?", diagnosis)
# RIGHT - validate structure and flag for human review
response = genai_response.text
if validate_medical_format(response):
db.execute("UPDATE patients SET ai_suggested_diagnosis = ?, status = 'PENDING_REVIEW'", response)
Hallucination rate: 5-15% for factual queries, higher for medical/legal domains
❌ NEVER ignore token limits
❌ NEVER call GenAI without rate limit handling
# WRONG - no retry logic, fails on rate limit
response = genai_client.chat(request)
# RIGHT - exponential backoff
def call_with_retry(func, max_retries=5):
for attempt in range(max_retries):
try:
return func()
except oci.exceptions.ServiceError as e:
if e.status == 429 and attempt < max_retries - 1:
wait = (2 ** attempt) + random.uniform(0, 1)
logger.warning(f"Rate limited, retry in {wait:.2f}s")
time.sleep(wait)
else:
raise
❌ NEVER cache responses without consent (data privacy)
❌ NEVER use GenAI for deterministic tasks
| Model | Context | Cost (per 1M tokens) | Best For | Avoid For | |-------|---------|---------------------|----------|-----------| | command-r-plus | 128k | ~$15 input, $75 output | Complex reasoning, long documents | Simple tasks (expensive) | | command-r | 4k | ~$1.50 input, $7.50 output | Chat, short prompts, high volume | Long documents, RAG | | embed-english-v3 | N/A | ~$0.10 per 1M | Semantic search, clustering | Text generation | | llama-2-70b | 4k | ~$2 input, $10 output | Open weights, cost-effective | Production (limited support) |
Cost optimization strategy:
Scenario: Medical transcription service
Without optimization:
Model: command-r-plus
Input: 12M × ($15/1M) = $180/month
Output: 12M × ($75/1M) = $900/month
Total: $1,080/month
With optimization (30% cache hit, use command-r for simple notes):
70% unique notes = 16.8M tokens
60% simple (command-r): 10M × $1.50 input + $7.50 output = $90
40% complex (command-r-plus): 6.72M × $15 input + $75 output = $605
Total: $695/month (36% savings)
| Model | Max Context | Max Output | Notes | |-------|-------------|------------|-------| | command-r-plus | 128k (input+output) | Varies | ~4 chars per token (rough) | | command-r | 4k | 2k | Good for chat | | embed-english-v3 | 512 | N/A | Embeddings only |
def truncate_for_model(text: str, model: str = "command-r-plus", max_output: int = 2000):
"""Truncate input to fit token budget"""
# Rough estimate: 1 token ≈ 4 characters
if model == "command-r-plus":
max_input_tokens = 128000 - max_output
elif model == "command-r":
max_input_tokens = 4000 - max_output
else:
max_input_tokens = 2000
max_chars = max_input_tokens * 4
if len(text) <= max_chars:
return text
# Keep most recent content (chronological data like logs, notes)
logger.warning(f"Input exceeds {max_input_tokens} tokens, truncating")
return "...[earlier content truncated]...\n" + text[-max_chars:]
Inefficient (wastes tokens):
Please carefully analyze the following medical record and provide a comprehensive
summary including all diagnoses, medications, allergies, and treatment plans. Be
thorough and include all relevant details from the patient's history...
[5000 word medical record]
Optimized (40% token reduction):
Summarize: diagnoses, meds, allergies, treatment plan.
[5000 word medical record]
Token savings: ~50 tokens on prompt × 1000 requests/day = 50k tokens/day saved = $2.25/day ($68/month)
| Model | Requests/Minute | Requests/Day | Tokens/Request | |-------|-----------------|--------------|----------------| | command-r-plus | 20 | 1000 | 128k | | command-r | 60 | 3000 | 4k | | Embeddings | 100 | 10000 | 512 |
import time
import random
from oci.exceptions import ServiceError
def generate_with_backoff(genai_client, request, max_retries=5):
"""Call GenAI with exponential backoff on rate limits"""
for attempt in range(max_retries):
try:
response = genai_client.chat(request)
return response.data.chat_response.text
except ServiceError as e:
if e.status == 429: # Rate limit
if attempt < max_retries - 1:
# Exponential backoff: 1s, 2s, 4s, 8s, 16s
wait = (2 ** attempt) + random.uniform(0, 1)
logger.warning(f"Rate limited (429), retry {attempt+1}/{max_retries} in {wait:.1f}s")
time.sleep(wait)
else:
logger.error(f"Rate limit exceeded after {max_retries} retries")
raise
elif e.status == 400: # Bad request (often token limit)
logger.error(f"Bad request (400): {e.message}")
if "token" in e.message.lower():
logger.error("Token limit exceeded - truncate input")
raise
else:
logger.error(f"GenAI error ({e.status}): {e.message}")
raise
def validate_medical_response(response: str) -> tuple[bool, list[str]]:
"""Validate GenAI medical response for safety"""
issues = []
# Check 1: Response not empty
if not response or len(response.strip()) < 10:
issues.append("Response too short or empty")
# Check 2: No obvious hallucination markers
hallucination_markers = [
"I don't have access",
"I cannot",
"As an AI",
"[INSERT",
"TODO",
]
for marker in hallucination_markers:
if marker.lower() in response.lower():
issues.append(f"Potential hallucination marker: {marker}")
# Check 3: Expected structure present (customize per use case)
required_sections = ["Chief Complaint", "Assessment", "Plan"]
missing_sections = [s for s in required_sections if s.lower() not in response.lower()]
if missing_sections:
issues.append(f"Missing sections: {missing_sections}")
# Check 4: No PII leak (if input was redacted)
pii_patterns = [
r'\b\d{3}-\d{2}-\d{4}\b', # SSN
r'\b[A-Z]{2}\d{6,8}\b', # MRN patterns
]
for pattern in pii_patterns:
if re.search(pattern, response):
issues.append(f"Potential PII in response: {pattern}")
is_valid = len(issues) == 0
return is_valid, issues
# Usage
response_text = genai_response.data.chat_response.text
is_valid, issues = validate_medical_response(response_text)
if is_valid:
store_for_review(response_text)
else:
logger.warning(f"Invalid response: {issues}")
flag_for_manual_review(response_text, issues)
Minimum requirements:
Never assume GenAI is HIPAA-compliant by default - verify BAA coverage with Oracle
def redact_phi(text: str) -> tuple[str, dict]:
"""Remove PHI from text, return redacted text + mapping"""
mapping = {}
redacted = text
# Patient names (use NER or pattern matching)
names = extract_names(text) # Your NER function
for i, name in enumerate(names):
placeholder = f"[PATIENT_{i}]"
mapping[placeholder] = name
redacted = redacted.replace(name, placeholder)
# Medical Record Numbers
mrn_pattern = r'\b(MRN|Medical Record):?\s*([A-Z0-9]{6,10})\b'
redacted = re.sub(mrn_pattern, r'\1: [REDACTED]', redacted)
# SSN
ssn_pattern = r'\b\d{3}-\d{2}-\d{4}\b'
redacted = re.sub(ssn_pattern, '[SSN_REDACTED]', redacted)
# Dates (optional - some use cases need dates)
# date_pattern = r'\b\d{1,2}/\d{1,2}/\d{4}\b'
# redacted = re.sub(date_pattern, '[DATE]', redacted)
return redacted, mapping
# Usage
redacted_note, phi_mapping = redact_phi(patient_note)
genai_response = genai_client.chat(prompt=f"Summarize: {redacted_note}")
# Store phi_mapping securely, use to re-identify if needed
WHEN TO LOAD oci-genai-reference.md:
Do NOT load for:
development
Use when storing credentials in OCI Vault, troubleshooting secret retrieval failures, implementing secret rotation, or setting up application authentication to Vault. Covers vault hierarchy confusion, IAM permission gotchas, cost optimization, temp file security, and audit logging.
development
Use when managing Oracle Autonomous Database on OCI, troubleshooting performance issues, optimizing costs, or implementing HA/DR. Covers ADB-specific gotchas, cost traps, SQL_ID debugging workflows, auto-scaling behavior, and version differences (19c/21c/23ai/26ai).
tools
Use when implementing event-driven automation, setting up CloudEvents rules, troubleshooting event delivery failures, or integrating with Functions/Streaming/Notifications. Covers event rule patterns, filter syntax, action types, dead letter queue configuration, and event-driven architecture anti-patterns.
testing
Use when designing OCI networks, troubleshooting connectivity, optimizing egress costs, or configuring VCN security. Covers Service Gateway cost savings, VCN CIDR immutability, Security List vs NSG tradeoffs, VCN peering limitations, and Load Balancer subnet requirements.