Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

davila7/llm-ops

Name: llm-ops
Author: davila7

cli-tool/components/skills/ai-research/llm-ops/SKILL.md

npx skillsauth add davila7/claude-code-templates llm-ops

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

LLM-OPS -- IA de Producao

Overview

LLM Operations -- RAG, embeddings, vector databases, fine-tuning, prompt engineering avancado, custos de LLM, evals de qualidade e arquiteturas de IA para producao. Ativar para: implementar RAG, criar pipeline de embeddings, Pinecone/Chroma/pgvector, fine-tuning, prompt engineering, reducao de custos de LLM, evals, cache semantico, streaming, agents.

When to Use This Skill

When you need specialized assistance with this domain

Do Not Use This Skill When

The task is unrelated to llm ops
A simpler, more specific tool can handle the request
The user needs general-purpose assistance without domain expertise

How It Works

A diferenca entre um prototipo de IA e um produto de IA e operabilidade. LLM-Ops e a engenharia que torna IA confiavel, escalavel e economica.

Arquitetura Rag Completa

[Documentos] -> [Chunking] -> [Embeddings] -> [Vector DB] | [Query] -> [Embed query] -> [Semantic Search] -> [Top K chunks] | [LLM + Context] -> [Resposta]

Pipeline De Indexacao

from anthropic import Anthropic import chromadb

client = Anthropic()
chroma = chromadb.PersistentClient(path="./chroma_db")

def chunk_text(text, chunk_size=500, overlap=50):
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size - overlap):
        chunk = " ".join(words[i:i + chunk_size])
        if chunk: chunks.append(chunk)
    return chunks

def index_document(doc_id, content_text, metadata=None):
    chunks = chunk_text(content_text)
    ids = [f"{doc_id}_chunk_{i}" for i in range(len(chunks))]
    collection.upsert(ids=ids, documents=chunks)
    return len(chunks)

Pipeline De Query Com Rag

def rag_query(query, top_k=5, system=None): results = collection.query( query_texts=[query], n_results=top_k, include=["documents", "metadatas", "distances"]) context_parts = [] for doc, meta, dist in zip(results["documents"][0], results["metadatas"][0], results["distances"][0]): if dist < 1.5: src = meta.get("source", "doc") context_parts.append(f"[Fonte: {src}] {doc}") context = "

".join(context_parts) response = client.messages.create( model="claude-opus-4-20250805", max_tokens=1024, system=system or "Responda baseado no contexto.", messages=[{"role": "user", "content": f"Contexto: {context}

{query}"}]) return response.content[0].text

Escolha Do Vector Db

| DB | Melhor Para | Hosting | Custo | |----|------------|---------|-------| | Chroma | Desenvolvimento, local | Self-hosted | Gratis | | pgvector | Ja usa PostgreSQL | Self/Cloud | Gratis | | Pinecone | Producao gerenciada | Cloud | USD 70+/mes | | Weaviate | Multi-modal | Self/Cloud | Gratis+ | | Qdrant | Alta performance | Self/Cloud | Gratis+ |

Pgvector

CREATE EXTENSION IF NOT EXISTS vector; CREATE TABLE knowledge_embeddings ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), content TEXT NOT NULL, embedding vector(1536), metadata JSONB, created_at TIMESTAMPTZ DEFAULT NOW() ); CREATE INDEX ON knowledge_embeddings USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); SELECT content, 1 - (embedding <=> QUERY_VECTOR) AS similarity FROM knowledge_embeddings ORDER BY similarity DESC LIMIT 5;

Estrutura De Prompt De Elite

Componentes do system prompt Auri:

Identidade: Nome (Auri), Tom (Natural, caloroso, direto), Plataforma (Amazon Alexa)
Regras: Maximo 3 paragrafos curtos, sem markdown, linguagem conversacional
Capacidades: analise de negocios, conselho baseado em dados, criatividade
Limitacoes: sem internet tempo real, sem transacoes financeiras
Personalizacao: {user_name}, {user_preferences}, {relevant_history}

Chain-Of-Thought

def cot_analysis(problem: str) -> str: steps = [ "1. O que exatamente esta sendo pedido?", "2. Que informacoes sao criticas para resolver?", "3. Quais abordagens possiveis existem?", "4. Qual abordagem e melhor e por que?", "5. Quais riscos ou limitacoes existem?", ] prompt = f"Analise passo a passo:

PROBLEMA: {problem}

" prompt += " ".join(steps) + "

Resposta final (concisa, para voz):" return call_claude(prompt)

Cache Semantico

class SemanticCache: def init(self, similarity_threshold=0.95): self.threshold = similarity_threshold self.cache = {}

    def get_cached(self, query, embedding):
        for cached_emb, (response, _) in self.cache.items():
            if cosine_similarity(embedding, cached_emb) >= self.threshold:
                return response
        return None

    def set_cache(self, query, embedding, response):
        self.cache[tuple(embedding)] = (response, query)

Estimativa De Custos Claude

PRICING = { "claude-opus-4-20250805": {"input": 15.00, "output": 75.00}, "claude-sonnet-4-5": {"input": 3.00, "output": 15.00}, "claude-haiku-3-5": {"input": 0.80, "output": 4.00}, }

def estimate_monthly_cost(model, avg_input, avg_output, req_per_day):
    p = PRICING[model]
    daily = (avg_input + avg_output) * req_per_day / 1e6
    monthly = daily * p["input"] * 30
    return {"model": model, "monthly_cost": "USD %.2f" % monthly}

Framework De Avaliacao

from anthropic import Anthropic client = Anthropic()

def evaluate_response(question, expected, actual, criteria):
    criteria_text = "

".join(f"- {c}" for c in criteria) eval_prompt = ( f"Avalie a resposta do assistente de IA.

" f"PERGUNTA: {question} RESPOSTA ESPERADA: {expected} " f"RESPOSTA ATUAL: {actual}

Criterios: {criteria_text}

" "Nota 0-10 e justificativa para cada criterio. Formato JSON." ) response = client.messages.create( model="claude-haiku-3-5", max_tokens=1024, messages=[{"role": "user", "content": eval_prompt}] ) import json return json.loads(response.content[0].text)

AURI_EVALS = [
    {
        "question": "Quais sao os principais riscos de abrir startup agora?",
        "criteria": ["precisao_factual", "relevancia", "clareza_para_voz"]
    },
]

6. Comandos

| Comando | Acao | |---------|------| | /rag-setup | Configura pipeline RAG completo | | /embed-docs | Indexa documentos no vector DB | | /prompt-optimize | Otimiza prompt para qualidade e custo | | /cost-estimate | Estima custo mensal do LLM | | /eval-run | Roda suite de evals de qualidade | | /cache-setup | Configura cache semantico | | /model-select | Escolhe modelo ideal para o caso de uso |

Best Practices

Provide clear, specific context about your project and requirements
Review all suggestions before applying them to production code
Combine with other complementary skills for comprehensive analysis

Common Pitfalls

Using this skill for tasks outside its domain expertise
Applying recommendations without understanding your specific context
Not providing enough project context for accurate analysis

davila7/llm-ops

cli-tool/components/skills/ai-research/llm-ops/SKILL.md

LLM Operations -- RAG, embeddings, vector databases, fine-tuning, prompt engineering avancado, custos de LLM, evals de qualidade e arquiteturas de IA para producao.

24,567 stars

data-ai

Updated Apr 14, 2026

$ install --global

skillsauth

npx skillsauth add davila7/claude-code-templates llm-ops

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 14, 2026, 3:23 AM71.0s1 file scanned

SKILL.md

name:: llm-ops
description:: LLM Operations -- RAG, embeddings, vector databases, fine-tuning, prompt engineering avancado, custos de LLM, evals de qualidade e arquiteturas de IA para producao.
risk:: safe
source:: community
date_added:: 2026-03-06
author:: renat

LLM-OPS -- IA de Producao

Overview

When to Use This Skill

When you need specialized assistance with this domain

Do Not Use This Skill When

The task is unrelated to llm ops
A simpler, more specific tool can handle the request
The user needs general-purpose assistance without domain expertise

How It Works

A diferenca entre um prototipo de IA e um produto de IA e operabilidade. LLM-Ops e a engenharia que torna IA confiavel, escalavel e economica.

Arquitetura Rag Completa

[Documentos] -> [Chunking] -> [Embeddings] -> [Vector DB] | [Query] -> [Embed query] -> [Semantic Search] -> [Top K chunks] | [LLM + Context] -> [Resposta]

Pipeline De Indexacao

from anthropic import Anthropic import chromadb

client = Anthropic()
chroma = chromadb.PersistentClient(path="./chroma_db")

def chunk_text(text, chunk_size=500, overlap=50):
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size - overlap):
        chunk = " ".join(words[i:i + chunk_size])
        if chunk: chunks.append(chunk)
    return chunks

def index_document(doc_id, content_text, metadata=None):
    chunks = chunk_text(content_text)
    ids = [f"{doc_id}_chunk_{i}" for i in range(len(chunks))]
    collection.upsert(ids=ids, documents=chunks)
    return len(chunks)

Pipeline De Query Com Rag

{query}"}]) return response.content[0].text

Escolha Do Vector Db

Pgvector

Estrutura De Prompt De Elite

Componentes do system prompt Auri:

Identidade: Nome (Auri), Tom (Natural, caloroso, direto), Plataforma (Amazon Alexa)
Regras: Maximo 3 paragrafos curtos, sem markdown, linguagem conversacional
Capacidades: analise de negocios, conselho baseado em dados, criatividade
Limitacoes: sem internet tempo real, sem transacoes financeiras
Personalizacao: {user_name}, {user_preferences}, {relevant_history}

Chain-Of-Thought

PROBLEMA: {problem}

" prompt += " ".join(steps) + "

Resposta final (concisa, para voz):" return call_claude(prompt)

Cache Semantico

class SemanticCache: def init(self, similarity_threshold=0.95): self.threshold = similarity_threshold self.cache = {}

    def get_cached(self, query, embedding):
        for cached_emb, (response, _) in self.cache.items():
            if cosine_similarity(embedding, cached_emb) >= self.threshold:
                return response
        return None

    def set_cache(self, query, embedding, response):
        self.cache[tuple(embedding)] = (response, query)

Estimativa De Custos Claude

PRICING = { "claude-opus-4-20250805": {"input": 15.00, "output": 75.00}, "claude-sonnet-4-5": {"input": 3.00, "output": 15.00}, "claude-haiku-3-5": {"input": 0.80, "output": 4.00}, }

def estimate_monthly_cost(model, avg_input, avg_output, req_per_day):
    p = PRICING[model]
    daily = (avg_input + avg_output) * req_per_day / 1e6
    monthly = daily * p["input"] * 30
    return {"model": model, "monthly_cost": "USD %.2f" % monthly}

Framework De Avaliacao

from anthropic import Anthropic client = Anthropic()

def evaluate_response(question, expected, actual, criteria):
    criteria_text = "

".join(f"- {c}" for c in criteria) eval_prompt = ( f"Avalie a resposta do assistente de IA.

" f"PERGUNTA: {question} RESPOSTA ESPERADA: {expected} " f"RESPOSTA ATUAL: {actual}

Criterios: {criteria_text}

AURI_EVALS = [
    {
        "question": "Quais sao os principais riscos de abrir startup agora?",
        "criteria": ["precisao_factual", "relevancia", "clareza_para_voz"]
    },
]

6. Comandos

Best Practices

Provide clear, specific context about your project and requirements
Review all suggestions before applying them to production code
Combine with other complementary skills for comprehensive analysis

Common Pitfalls

Using this skill for tasks outside its domain expertise
Applying recommendations without understanding your specific context
Not providing enough project context for accurate analysis

Related Skills

davila7/zapier-make-patterns

tools

VerifiedTrustedCommunity

No-code automation democratizes workflow building. Zapier and Make (formerly Integromat) let non-developers automate business processes without writing code. But no-code doesn't mean no-complexity - these platforms have their own patterns, pitfalls, and breaking points. This skill covers when to use which platform, how to build reliable automations, and when to graduate to code-based solutions. Key insight: Zapier optimizes for simplicity and integrations (7000+ apps), Make optimizes for power

24,615SKILL.mdUpdated Apr 15, 2026

davila7/zapier-make-patterns

davila7/yeet

tools

VerifiedTrustedCommunity

Use only when the user explicitly asks to stage, commit, push, and open a GitHub pull request in one flow using the GitHub CLI (`gh`).

24,615SKILL.mdUpdated Apr 15, 2026

davila7/workflow-automation

tools

VerifiedTrustedCommunity

Workflow automation is the infrastructure that makes AI agents reliable. Without durable execution, a network hiccup during a 10-step payment flow means lost money and angry customers. With it, workflows resume exactly where they left off. This skill covers the platforms (n8n, Temporal, Inngest) and patterns (sequential, parallel, orchestrator-worker) that turn brittle scripts into production-grade automation. Key insight: The platforms make different tradeoffs. n8n optimizes for accessibility

24,615SKILL.mdUpdated Apr 15, 2026

davila7/workflow-automation

davila7/trigger-dev

development

VerifiedTrustedCommunity

Trigger.dev expert for background jobs, AI workflows, and reliable async execution with excellent developer experience and TypeScript-first design. Use when: trigger.dev, trigger dev, background task, ai background job, long running task.

24,615SKILL.mdUpdated Apr 15, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/davila7/claude-code-templates.git

# Copy into Claude Code skills folder (global)
cp -r claude-code-templates/cli-tool/components/skills/ai-research/llm-ops ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

davila7/claude-code-templates

24,567 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT