skills/llm-ops/SKILL.md
LLM Operations -- RAG, embeddings, vector databases, fine-tuning, prompt engineering avancado, custos de LLM, evals de qualidade e arquiteturas de IA para producao.
npx skillsauth add ranbot-ai/awesome-skills llm-opsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
LLM Operations -- RAG, embeddings, vector databases, fine-tuning, prompt engineering avancado, custos de LLM, evals de qualidade e arquiteturas de IA para producao. Ativar para: implementar RAG, criar pipeline de embeddings, Pinecone/Chroma/pgvector, fine-tuning, prompt engineering, reducao de custos de LLM, evals, cache semantico, streaming, agents.
A diferenca entre um prototipo de IA e um produto de IA e operabilidade. LLM-Ops e a engenharia que torna IA confiavel, escalavel e economica.
[Documentos] -> [Chunking] -> [Embeddings] -> [Vector DB] | [Query] -> [Embed query] -> [Semantic Search] -> [Top K chunks] | [LLM + Context] -> [Resposta]
from anthropic import Anthropic import chromadb
client = Anthropic()
chroma = chromadb.PersistentClient(path="./chroma_db")
def chunk_text(text, chunk_size=500, overlap=50):
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size - overlap):
chunk = " ".join(words[i:i + chunk_size])
if chunk: chunks.append(chunk)
return chunks
def index_document(doc_id, content_text, metadata=None):
chunks = chunk_text(content_text)
ids = [f"{doc_id}_chunk_{i}" for i in range(len(chunks))]
collection.upsert(ids=ids, documents=chunks)
return len(chunks)
def rag_query(query, top_k=5, system=None): results = collection.query( query_texts=[query], n_results=top_k, include=["documents", "metadatas", "distances"]) context_parts = [] for doc, meta, dist in zip(results["documents"][0], results["metadatas"][0], results["distances"][0]): if dist < 1.5: src = meta.get("source", "doc") context_parts.append(f"[Fonte: {src}] {doc}") context = "
".join(context_parts) response = client.messages.create( model="claude-opus-4-20250805", max_tokens=1024, system=system or "Responda baseado no contexto.", messages=[{"role": "user", "content": f"Contexto: {context}
{query}"}]) return response.content[0].text
| DB | Melhor Para | Hosting | Custo | |----|------------|---------|-------| | Chroma | Desenvolvimento, local | Self-hosted | Gratis | | pgvector | Ja usa PostgreSQL | Self/Cloud | Gratis | | Pinecone | Producao gerenciada | Cloud | USD 70+/mes | | Weaviate | Multi-modal | Self/Cloud | Gratis+ | | Qdrant | Alta performance | Self/Cloud | Gratis+ |
CREATE EXTENSION IF NOT EXISTS vector; CREATE TABLE knowledge_embeddings ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), content TEXT NOT NULL, embedding vector(1536), metadata JSONB, created_at TIMESTAMPTZ DEFAULT NOW() ); CREATE INDEX ON knowledge_embeddings USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); SELECT content, 1 - (embedding <=> QUERY_VECTOR) AS similarity FROM knowledge_embeddings ORDER BY similarity DESC LIMIT 5;
Componentes do system prompt Auri:
def cot_analysis(problem: str) -> str: steps = [ "1. O que exatamente esta sendo pedido?", "2. Que informacoes sao criticas para resolver?", "3. Quais abordagens possiveis existem?", "4. Qual abordagem e melhor e por que?", "5. Quais riscos ou limitacoes existem?", ] prompt = f"Analise passo a passo:
PROBLEMA: {problem}
" prompt += " ".join(steps) + "
Resposta final (concisa, para voz):" return call_claude(prompt)
class SemanticCache: def init(self, similarity_threshold=0.95): self.threshold = similarity_threshold self.cache = {}
def get_cached(self, query, embedding):
for cached_emb, (response, _) in self.cache.items():
development
Production-grade Android app development guide covering native (Kotlin/Java), cross-platform (Flutter, RN, KMM), and hybrid architectures.
testing
Plan, orchestrate, and adversarially verify parallel AI coding agents with a dynamic multi-agent workflow engine.
development
Generate professional, ATS-optimized CVs for FlowCV, Canva, Google Docs, or Word. Handles multi-source merging, JD targeting, seniority adaptation, and humanized rewriting. Outputs paste-ready text wi
tools
Generate hand-drawn 16:9 article illustrations with the Grav character IP, sparse annotations, and absurd but clear visual metaphors.