engineering-team/skills/senior-ml-engineer/SKILL.md
ML engineering skill for productionizing models, building MLOps pipelines, and integrating LLMs. Covers model deployment, feature stores, drift monitoring, RAG systems, and cost optimization. Use when the user asks about deploying ML models to production, setting up MLOps infrastructure (MLflow, Kubeflow, Kubernetes, Docker), monitoring model performance or drift, building RAG pipelines, or integrating LLM APIs with retry logic and cost controls. Focused on production and operational concerns rather than model research or initial training.
npx skillsauth add alirezarezvani/claude-skills senior-ml-engineerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Production ML engineering patterns for model deployment, MLOps infrastructure, and LLM integration.
Deploy a trained model to production with monitoring:
FROM python:3.11-slim
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY model/ /app/model/
COPY src/ /app/src/
HEALTHCHECK CMD curl -f http://localhost:8080/health || exit 1
EXPOSE 8080
CMD ["uvicorn", "src.server:app", "--host", "0.0.0.0", "--port", "8080"]
| Option | Latency | Throughput | Use Case | |--------|---------|------------|----------| | FastAPI + Uvicorn | Low | Medium | REST APIs, small models | | Triton Inference Server | Very Low | Very High | GPU inference, batching | | TensorFlow Serving | Low | High | TensorFlow models | | TorchServe | Low | High | PyTorch models | | Ray Serve | Medium | High | Complex pipelines, multi-model |
Establish automated training and deployment:
from feast import Entity, Feature, FeatureView, FileSource
user = Entity(name="user_id", value_type=ValueType.INT64)
user_features = FeatureView(
name="user_features",
entities=["user_id"],
ttl=timedelta(days=1),
features=[
Feature(name="purchase_count_30d", dtype=ValueType.INT64),
Feature(name="avg_order_value", dtype=ValueType.FLOAT),
],
online=True,
source=FileSource(path="data/user_features.parquet"),
)
| Trigger | Detection | Action | |---------|-----------|--------| | Scheduled | Cron (weekly/monthly) | Full retrain | | Performance drop | Accuracy < threshold | Immediate retrain | | Data drift | PSI > 0.2 | Evaluate, then retrain | | New data volume | X new samples | Incremental update |
Integrate LLM APIs into production applications:
from abc import ABC, abstractmethod
from tenacity import retry, stop_after_attempt, wait_exponential
class LLMProvider(ABC):
@abstractmethod
def complete(self, prompt: str, **kwargs) -> str:
pass
@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def call_llm_with_retry(provider: LLMProvider, prompt: str) -> str:
return provider.complete(prompt)
| Provider | Input Cost | Output Cost | |----------|------------|-------------| | GPT-4 | $0.03/1K | $0.06/1K | | GPT-3.5 | $0.0005/1K | $0.0015/1K | | Claude 3 Opus | $0.015/1K | $0.075/1K | | Claude 3 Haiku | $0.00025/1K | $0.00125/1K |
Build retrieval-augmented generation pipeline:
| Database | Hosting | Scale | Latency | Best For | |----------|---------|-------|---------|----------| | Pinecone | Managed | High | Low | Production, managed | | Qdrant | Both | High | Very Low | Performance-critical | | Weaviate | Both | High | Low | Hybrid search | | Chroma | Self-hosted | Medium | Low | Prototyping | | pgvector | Self-hosted | Medium | Medium | Existing Postgres |
| Strategy | Chunk Size | Overlap | Best For | |----------|------------|---------|----------| | Fixed | 500-1000 tokens | 50-100 | General text | | Sentence | 3-5 sentences | 1 sentence | Structured text | | Semantic | Variable | Based on meaning | Research papers | | Recursive | Hierarchical | Parent-child | Long documents |
Monitor production models for drift and degradation:
from scipy.stats import ks_2samp
def detect_drift(reference, current, threshold=0.05):
statistic, p_value = ks_2samp(reference, current)
return {
"drift_detected": p_value < threshold,
"ks_statistic": statistic,
"p_value": p_value
}
| Metric | Warning | Critical | |--------|---------|----------| | p95 latency | > 100ms | > 200ms | | Error rate | > 0.1% | > 1% | | PSI (drift) | > 0.1 | > 0.2 | | Accuracy drop | > 2% | > 5% |
references/mlops_production_patterns.md contains:
references/llm_integration_guide.md contains:
references/rag_system_architecture.md contains:
python scripts/model_deployment_pipeline.py --model model.pkl --target staging
Generates deployment artifacts: Dockerfile, Kubernetes manifests, health checks.
python scripts/rag_system_builder.py --config rag_config.yaml --analyze
Scaffolds RAG pipeline with vector store integration and retrieval logic.
python scripts/ml_monitoring_suite.py --config monitoring.yaml --deploy
Sets up drift detection, alerting, and performance dashboards.
| Category | Tools | |----------|-------| | ML Frameworks | PyTorch, TensorFlow, Scikit-learn, XGBoost | | LLM Frameworks | LangChain, LlamaIndex, DSPy | | MLOps | MLflow, Weights & Biases, Kubeflow | | Data | Spark, Airflow, dbt, Kafka | | Deployment | Docker, Kubernetes, Triton | | Databases | PostgreSQL, BigQuery, Pinecone, Redis |
tools
Code review automation for TypeScript, JavaScript, Python, Go, Swift, Kotlin, C#, .NET, Java, C, C++, Rust, Ruby, PHP, and Dart/Flutter. Analyzes PRs for complexity and risk, checks code quality for SOLID violations and code smells, generates review reports. Use when reviewing pull requests, analyzing code quality, identifying issues, generating review checklists.
tools
Use when planning, funding, scoping, or synthesizing enterprise research across workstreams — clinical study design, R&D program finance, market sizing/surveys, or product/user research. Triggers on "design this clinical study", "what sample size", "R&D budget", "burn rate", "capitalize or expense", "TAM SAM SOM", "market sizing", "survey design", "segment the market", "plan user interviews", "usability test", "synthesize research insights". Forks context to route to one of four Research-Operations sub-skills (clinical-research, research-finance, market-research, product-research) and returns a digest. Distinct from ra-qm-team (regulatory submission), finance (corporate close/valuation), research/grants (funding discovery), product-team (persona/journey/live experiments), and marketing-skill (campaign analytics).
development
Use when managing the money for an internal R&D program or portfolio — building a multi-period program budget with the F&A (indirect) split, tracking burn rate and runway against value-inflection milestones, or routing R&D cost items to a capitalize-vs-expense determination. Every budget output surfaces its assumptions block; capitalize-vs-expense is decision-support only and routes to a named finance owner — it never books an entry or decides accounting treatment. Distinct from finance/financial-analysis (corporate DCF, close, valuation) and research/grants (funding discovery — this manages money already won).
development
Use when planning and synthesizing product/user research as a method-and-repository discipline — selecting the right method for the goal (generative interviews vs usability test vs concept test vs validation), computing method-based saturation/sample size with an explicit confidence level, or synthesizing coded observations into insights while flagging single-source anecdotes. Never fabricates user insight; an insight requires recurrence across independent participants. Distinct from product-team/ux-researcher-designer (persona/journey artifacts), product-discovery (discovery-sprint planning), and experiment-designer (live A/B) — this is the research-ops method + insight-repository layer.