cli-tool/components/skills/ai-research/prompt-engineering-instructor/SKILL.md
Extract structured data from LLM responses with Pydantic validation, retry failed extractions automatically, parse complex JSON with type safety, and stream partial results with Instructor - battle-tested structured output library
npx skillsauth add davila7/claude-code-templates instructorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use Instructor when you need to:
GitHub Stars: 15,000+ | Battle-tested: 100,000+ developers
# Base installation
pip install instructor
# With specific providers
pip install "instructor[anthropic]" # Anthropic Claude
pip install "instructor[openai]" # OpenAI
pip install "instructor[all]" # All providers
import instructor
from pydantic import BaseModel
from anthropic import Anthropic
# Define output structure
class User(BaseModel):
name: str
age: int
email: str
# Create instructor client
client = instructor.from_anthropic(Anthropic())
# Extract structured data
user = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "John Doe is 30 years old. His email is [email protected]"
}],
response_model=User
)
print(user.name) # "John Doe"
print(user.age) # 30
print(user.email) # "[email protected]"
from openai import OpenAI
client = instructor.from_openai(OpenAI())
user = client.chat.completions.create(
model="gpt-4o-mini",
response_model=User,
messages=[{"role": "user", "content": "Extract: Alice, 25, [email protected]"}]
)
Response models define the structure and validation rules for LLM outputs.
from pydantic import BaseModel, Field
class Article(BaseModel):
title: str = Field(description="Article title")
author: str = Field(description="Author name")
word_count: int = Field(description="Number of words", gt=0)
tags: list[str] = Field(description="List of relevant tags")
article = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Analyze this article: [article text]"
}],
response_model=Article
)
Benefits:
class Address(BaseModel):
street: str
city: str
country: str
class Person(BaseModel):
name: str
age: int
address: Address # Nested model
person = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "John lives at 123 Main St, Boston, USA"
}],
response_model=Person
)
print(person.address.city) # "Boston"
from typing import Optional
class Product(BaseModel):
name: str
price: float
discount: Optional[float] = None # Optional
description: str = Field(default="No description") # Default value
# LLM doesn't need to provide discount or description
from enum import Enum
class Sentiment(str, Enum):
POSITIVE = "positive"
NEGATIVE = "negative"
NEUTRAL = "neutral"
class Review(BaseModel):
text: str
sentiment: Sentiment # Only these 3 values allowed
review = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "This product is amazing!"
}],
response_model=Review
)
print(review.sentiment) # Sentiment.POSITIVE
Pydantic validates LLM outputs automatically. If validation fails, Instructor retries.
from pydantic import Field, EmailStr, HttpUrl
class Contact(BaseModel):
name: str = Field(min_length=2, max_length=100)
age: int = Field(ge=0, le=120) # 0 <= age <= 120
email: EmailStr # Validates email format
website: HttpUrl # Validates URL format
# If LLM provides invalid data, Instructor retries automatically
from pydantic import field_validator
class Event(BaseModel):
name: str
date: str
attendees: int
@field_validator('date')
def validate_date(cls, v):
"""Ensure date is in YYYY-MM-DD format."""
import re
if not re.match(r'\d{4}-\d{2}-\d{2}', v):
raise ValueError('Date must be YYYY-MM-DD format')
return v
@field_validator('attendees')
def validate_attendees(cls, v):
"""Ensure positive attendees."""
if v < 1:
raise ValueError('Must have at least 1 attendee')
return v
from pydantic import model_validator
class DateRange(BaseModel):
start_date: str
end_date: str
@model_validator(mode='after')
def check_dates(self):
"""Ensure end_date is after start_date."""
from datetime import datetime
start = datetime.strptime(self.start_date, '%Y-%m-%d')
end = datetime.strptime(self.end_date, '%Y-%m-%d')
if end < start:
raise ValueError('end_date must be after start_date')
return self
Instructor retries automatically when validation fails, providing error feedback to the LLM.
# Retries up to 3 times if validation fails
user = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Extract user from: John, age unknown"
}],
response_model=User,
max_retries=3 # Default is 3
)
# If age can't be extracted, Instructor tells the LLM:
# "Validation error: age - field required"
# LLM tries again with better extraction
How it works:
Stream partial results for real-time processing.
from instructor import Partial
class Story(BaseModel):
title: str
content: str
tags: list[str]
# Stream partial updates as LLM generates
for partial_story in client.messages.create_partial(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Write a short sci-fi story"
}],
response_model=Story
):
print(f"Title: {partial_story.title}")
print(f"Content so far: {partial_story.content[:100]}...")
# Update UI in real-time
class Task(BaseModel):
title: str
priority: str
# Stream list items as they're generated
tasks = client.messages.create_iterable(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Generate 10 project tasks"
}],
response_model=Task
)
for task in tasks:
print(f"- {task.title} ({task.priority})")
# Process each task as it arrives
import instructor
from anthropic import Anthropic
client = instructor.from_anthropic(
Anthropic(api_key="your-api-key")
)
# Use with Claude models
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[...],
response_model=YourModel
)
from openai import OpenAI
client = instructor.from_openai(
OpenAI(api_key="your-api-key")
)
response = client.chat.completions.create(
model="gpt-4o-mini",
response_model=YourModel,
messages=[...]
)
from openai import OpenAI
# Point to local Ollama server
client = instructor.from_openai(
OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # Required but ignored
),
mode=instructor.Mode.JSON
)
response = client.chat.completions.create(
model="llama3.1",
response_model=YourModel,
messages=[...]
)
class CompanyInfo(BaseModel):
name: str
founded_year: int
industry: str
employees: int
headquarters: str
text = """
Tesla, Inc. was founded in 2003. It operates in the automotive and energy
industry with approximately 140,000 employees. The company is headquartered
in Austin, Texas.
"""
company = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Extract company information from: {text}"
}],
response_model=CompanyInfo
)
class Category(str, Enum):
TECHNOLOGY = "technology"
FINANCE = "finance"
HEALTHCARE = "healthcare"
EDUCATION = "education"
OTHER = "other"
class ArticleClassification(BaseModel):
category: Category
confidence: float = Field(ge=0.0, le=1.0)
keywords: list[str]
classification = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Classify this article: [article text]"
}],
response_model=ArticleClassification
)
class Person(BaseModel):
name: str
role: str
class Organization(BaseModel):
name: str
industry: str
class Entities(BaseModel):
people: list[Person]
organizations: list[Organization]
locations: list[str]
text = "Tim Cook, CEO of Apple, announced at the event in Cupertino..."
entities = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Extract all entities from: {text}"
}],
response_model=Entities
)
for person in entities.people:
print(f"{person.name} - {person.role}")
class SentimentAnalysis(BaseModel):
overall_sentiment: Sentiment
positive_aspects: list[str]
negative_aspects: list[str]
suggestions: list[str]
score: float = Field(ge=-1.0, le=1.0)
review = "The product works well but setup was confusing..."
analysis = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Analyze this review: {review}"
}],
response_model=SentimentAnalysis
)
def extract_person(text: str) -> Person:
return client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Extract person from: {text}"
}],
response_model=Person
)
texts = [
"John Doe is a 30-year-old engineer",
"Jane Smith, 25, works in marketing",
"Bob Johnson, age 40, software developer"
]
people = [extract_person(text) for text in texts]
from typing import Union
class TextContent(BaseModel):
type: str = "text"
content: str
class ImageContent(BaseModel):
type: str = "image"
url: HttpUrl
caption: str
class Post(BaseModel):
title: str
content: Union[TextContent, ImageContent] # Either type
# LLM chooses appropriate type based on content
from pydantic import create_model
# Create model at runtime
DynamicUser = create_model(
'User',
name=(str, ...),
age=(int, Field(ge=0)),
email=(EmailStr, ...)
)
user = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[...],
response_model=DynamicUser
)
# For providers without native structured outputs
client = instructor.from_anthropic(
Anthropic(),
mode=instructor.Mode.JSON # JSON mode
)
# Available modes:
# - Mode.ANTHROPIC_TOOLS (recommended for Claude)
# - Mode.JSON (fallback)
# - Mode.TOOLS (OpenAI tools)
# Single-use client
with instructor.from_anthropic(Anthropic()) as client:
result = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[...],
response_model=YourModel
)
# Client closed automatically
from pydantic import ValidationError
try:
user = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[...],
response_model=User,
max_retries=3
)
except ValidationError as e:
print(f"Failed after retries: {e}")
# Handle gracefully
except Exception as e:
print(f"API error: {e}")
class ValidatedUser(BaseModel):
name: str = Field(description="Full name, 2-100 characters")
age: int = Field(description="Age between 0 and 120", ge=0, le=120)
email: EmailStr = Field(description="Valid email address")
class Config:
# Custom error messages
json_schema_extra = {
"examples": [
{
"name": "John Doe",
"age": 30,
"email": "[email protected]"
}
]
}
# ❌ Bad: Vague
class Product(BaseModel):
name: str
price: float
# ✅ Good: Descriptive
class Product(BaseModel):
name: str = Field(description="Product name from the text")
price: float = Field(description="Price in USD, without currency symbol")
# ✅ Good: Constrain values
class Rating(BaseModel):
score: int = Field(ge=1, le=5, description="Rating from 1 to 5 stars")
review: str = Field(min_length=10, description="Review text, at least 10 chars")
messages = [{
"role": "user",
"content": """Extract person info from: "John, 30, engineer"
Example format:
{
"name": "John Doe",
"age": 30,
"occupation": "engineer"
}"""
}]
# ✅ Good: Enum ensures valid values
class Status(str, Enum):
PENDING = "pending"
APPROVED = "approved"
REJECTED = "rejected"
class Application(BaseModel):
status: Status # LLM must choose from enum
class PartialData(BaseModel):
required_field: str
optional_field: Optional[str] = None
default_field: str = "default_value"
# LLM only needs to provide required_field
| Feature | Instructor | Manual JSON | LangChain | DSPy | |---------|------------|-------------|-----------|------| | Type Safety | ✅ Yes | ❌ No | ⚠️ Partial | ✅ Yes | | Auto Validation | ✅ Yes | ❌ No | ❌ No | ⚠️ Limited | | Auto Retry | ✅ Yes | ❌ No | ❌ No | ✅ Yes | | Streaming | ✅ Yes | ❌ No | ✅ Yes | ❌ No | | Multi-Provider | ✅ Yes | ⚠️ Manual | ✅ Yes | ✅ Yes | | Learning Curve | Low | Low | Medium | High |
When to choose Instructor:
When to choose alternatives:
references/validation.md - Advanced validation patternsreferences/providers.md - Provider-specific configurationreferences/examples.md - Real-world use casestools
No-code automation democratizes workflow building. Zapier and Make (formerly Integromat) let non-developers automate business processes without writing code. But no-code doesn't mean no-complexity - these platforms have their own patterns, pitfalls, and breaking points. This skill covers when to use which platform, how to build reliable automations, and when to graduate to code-based solutions. Key insight: Zapier optimizes for simplicity and integrations (7000+ apps), Make optimizes for power
tools
Use only when the user explicitly asks to stage, commit, push, and open a GitHub pull request in one flow using the GitHub CLI (`gh`).
tools
Workflow automation is the infrastructure that makes AI agents reliable. Without durable execution, a network hiccup during a 10-step payment flow means lost money and angry customers. With it, workflows resume exactly where they left off. This skill covers the platforms (n8n, Temporal, Inngest) and patterns (sequential, parallel, orchestrator-worker) that turn brittle scripts into production-grade automation. Key insight: The platforms make different tradeoffs. n8n optimizes for accessibility
development
Trigger.dev expert for background jobs, AI workflows, and reliable async execution with excellent developer experience and TypeScript-first design. Use when: trigger.dev, trigger dev, background task, ai background job, long running task.