
Deploy, manage, and optimize vector databases for AI applications. Covers Qdrant, Weaviate, pgvector, and Pinecone — collection management, indexing strategies, backup, and performance tuning for production RAG and semantic search workloads.
Orchestrate AI/ML pipelines for data ingestion, model training, batch inference, and RAG indexing using Prefect, Airflow, or Dagster. Build reliable, observable, and retriable workflows for production AI systems.
Implement multi-layer LLM caching with exact match, semantic similarity, and provider-side prompt caching. Reduce API costs by 30–70%, cut latency, and improve throughput using Redis, GPTCache, and provider caching APIs.
Reduce LLM API and infrastructure costs through model selection, prompt caching, batching, caching, quantization, and self-hosting strategies. Track spend by team and model, set budgets, and implement cost-aware routing.
Design and operationalize SRE dashboards that surface reliability, latency, error, saturation, and capacity signals across services. Use when building observability views for SLOs, incident response, and executive reliability reporting.
Deploy ML models on Kubernetes with KServe (formerly KFServing) and NVIDIA Triton Inference Server. Includes canary deployments, autoscaling, model versioning, A/B testing, and GPU resource management for production model serving.
Harden OpenClaw self-hosted environments with baseline host controls, auth tightening, secret handling, network segmentation, and safe update/rollback workflows. Use when deploying OpenClaw in home labs, startups, or production-like local AI infrastructure.
Configure GitLab CI/CD pipelines and runners for automated building, testing, and deployment. Create .gitlab-ci.yml configurations, manage runners, and implement DevOps workflows. Use when working with GitLab repositories or self-hosted GitLab instances.
Use eBPF for deep kernel-level observability — trace syscalls, network flows, and application behavior without code changes using Cilium, Tetragon, and bpftrace.
Deploy, scale, and manage Kubernetes workloads. Create deployments, services, and configurations, manage cluster resources, troubleshoot pods, and implement production-ready Kubernetes patterns. Use when working with Kubernetes clusters, K8s deployments, or container orchestration.
Build internal developer platforms (IDPs) with self-service infrastructure, golden paths, and developer portals using Backstage, Crossplane, and score.
Deploy and manage Azure Kubernetes Service clusters. Configure node pools, networking, and integrations. Use when running Kubernetes workloads on Azure.
Configure Azure VNets, NSGs, and Azure Firewall. Implement hub-spoke topology and private endpoints. Use when designing Azure network infrastructure.
Provision Azure infrastructure with Terraform. Configure providers, manage state, and deploy resources. Use when implementing IaC for Azure.
Deploy and manage Google Kubernetes Engine clusters. Configure node pools, networking, and workload identity. Use when running Kubernetes on GCP.
Protect internal apps with Cloudflare Access, device posture, and Zero Trust policies.
Administer MongoDB databases. Configure replica sets, sharding, and backups. Use when managing MongoDB deployments.
Build and operate apps on Firebase using Auth, Firestore, Cloud Functions, and Hosting. Use when building mobile/web backends with managed services, real-time data sync, or serverless APIs.
Create and manage systemd services and timers. Configure service dependencies and resource limits. Use when managing system services.
Configure object storage with S3, GCS, and MinIO. Implement lifecycle policies and access controls. Use when managing object storage.
Harden AI/LLM deployments against prompt injection, data exfiltration, model theft, and supply chain attacks. Covers input validation, output filtering, access control, model API security, and compliance controls for production AI systems.
Secure LLM-powered applications with input validation, output controls, tenant isolation, and abuse prevention.
Apply CIS benchmarks and secure Linux servers. Configure SSH, manage users, implement firewall rules, and enable security features. Use when hardening Linux systems for production or meeting security compliance requirements.
Manage SSL/TLS certificates with Let's Encrypt and internal PKI. Configure secure HTTPS, certificate renewal, and cipher suites. Use when implementing secure communications.
Implement zero-trust network architecture. Configure identity-based access, micro-segmentation, and continuous verification. Use when implementing modern security architectures.
Perform static application security testing with tools like Semgrep, CodeQL, and SonarQube. Identify security vulnerabilities in source code before deployment. Use when implementing secure SDLC, code review automation, or security gates in CI/CD pipelines.
Encrypt files and configs with Mozilla SOPS. Integrate with AWS KMS, GCP KMS, or PGP for key management. Use when encrypting configuration files, Kubernetes secrets, or implementing GitOps with encrypted secrets.
Conduct periodic access reviews and certifications. Implement access governance and recertification workflows. Use when managing access compliance.
Instrument AI agents with tracing, token metrics, latency, and cost visibility. Use for reliability and debugging.
Secure AI agents against prompt injection, tool abuse, and data exfiltration with defense-in-depth controls. Use when building, deploying, or hardening agentic AI systems that invoke tools, access data, or interact with production infrastructure.
Secure AI coding agents (Claude Code, Cursor, Codex, Copilot) with permission boundaries, secret protection, code review gates, and safe sandbox configurations for team environments.
Use service mesh patterns for AI inference traffic management, mTLS, canary releases, policy enforcement, and cross-cluster resilience.
Run structured AI red team exercises for jailbreak resistance, data exfiltration risk, harmful output controls, and agent tool abuse resilience.
Set up alerting rules, configure on-call rotations, and manage incident response workflows. Integrate with PagerDuty, Opsgenie, or Grafana OnCall for alert routing and escalation. Use when implementing alerting strategies and on-call management for production systems.
Implement GitOps with ArgoCD for declarative Kubernetes deployments. Configure applications, manage sync policies, implement progressive delivery, and automate deployments from Git repositories. Use when implementing GitOps workflows or continuous deployment to Kubernetes.
Deploy Azure resources with ARM templates and Bicep. Create modular deployments and manage dependencies. Use when deploying Azure-native IaC.
Maintain IT asset inventory and configuration management database. Track hardware, software, and cloud resources. Use when managing IT assets.
Implement centralized audit logging and SIEM integration. Configure log retention and security monitoring. Use when implementing audit trail requirements.
Reduce AWS spend with rightsizing, autoscaling, commitment planning, and storage lifecycle policies. Use when running FinOps reviews, lowering cloud bills, or improving cost-per-request metrics.
Manage EC2 instances, AMIs, and auto-scaling groups. Configure security groups, key pairs, and instance types. Use when deploying compute resources on AWS.
Deploy containers on ECS and Fargate. Configure task definitions, services, and load balancing. Use when running containerized workloads on AWS.
Manage IAM users, roles, and policies. Implement least-privilege access and security best practices. Use when configuring AWS identity and access management.
Build and deploy serverless functions on AWS Lambda. Configure triggers, manage permissions, and optimize performance. Use when implementing serverless applications.
Provision and manage RDS databases. Configure backups, replication, and security. Use when deploying managed relational databases on AWS.
Configure S3 buckets, policies, and lifecycle rules. Implement versioning, replication, and security. Use when managing object storage on AWS.
Store and rotate secrets in AWS Secrets Manager. Configure automatic rotation, access policies, and application integration. Use when managing secrets in AWS environments or requiring automatic credential rotation.
Design and implement VPCs and networking. Configure subnets, route tables, and security groups. Use when setting up AWS network infrastructure.
Set up Azure Pipelines for CI/CD, configure build and release pipelines, manage Azure DevOps projects, and integrate with Azure services. Use when working with Azure DevOps Services or Server for enterprise DevOps workflows.
Build serverless applications on Azure Functions. Configure triggers, bindings, and deployment. Use when implementing serverless workloads on Azure.
Configure Azure Monitor and Activity Log for auditing. Set up diagnostic settings and log analytics. Use when auditing Azure activity.
Provision Azure SQL Database and Cosmos DB. Configure security, backups, and replication. Use when deploying managed databases on Azure.
Manage Azure Virtual Machines and scale sets. Configure availability sets and managed disks. Use when deploying compute resources on Azure.
Implement backup and recovery strategies. Configure rsync, Restic, and cloud backups. Use when designing data protection solutions.
Configure zero-downtime deployment strategies including blue-green, canary, and rolling deployments. Implement traffic shifting, health checks, and rollback procedures. Use when implementing production deployment strategies or zero-downtime releases.
Develop business continuity plans and impact analysis. Implement BCP testing and communication procedures. Use when building organizational resilience.
Configure CDNs for content delivery. Set up CloudFront, Cloudflare, and Fastly. Use when optimizing global content delivery.
Implement change management processes. Configure CAB reviews, change windows, and rollback procedures. Use when managing production changes.
Configure CircleCI workflows and orbs for continuous integration and deployment. Create config.yml pipelines, use orbs for reusable configurations, and optimize build performance. Use when working with CircleCI for CI/CD automation.
Deploy static sites and full-stack apps on Cloudflare Pages with previews, functions, and custom domains.
Manage Cloudflare R2 buckets, lifecycle, and signed URLs. Use for low-egress object storage and media delivery.
Build and deploy edge functions with Cloudflare Workers and Wrangler. Use for APIs, cron jobs, and edge middleware.
Secure Docker images and container runtime configurations. Implement non-root users, read-only filesystems, and security contexts. Use when building secure container images or hardening container deployments.
Manage container registries including ECR, ACR, GCR, and Docker Hub. Push and pull images, configure authentication, set up repository policies, and implement image lifecycle management. Use when working with container image storage and distribution.
Scan container images for vulnerabilities using Trivy, Grype, and cloud-native tools. Identify security issues in base images, packages, and configurations. Use when implementing container security, building secure images, or meeting compliance requirements.
Build reactive backends with Convex functions, schema validation, auth integration, and deployment workflows. Use when building real-time apps with type-safe server functions and automatic caching.
Perform dynamic application security testing with OWASP ZAP, Burp Suite, and Nikto. Test running applications for security vulnerabilities through automated and manual testing. Use when testing web applications, APIs, or performing penetration testing.
Implement database backup strategies. Configure automated backups, retention, and recovery testing. Use when designing backup and recovery procedures.
Scan package dependencies for known vulnerabilities using Snyk, Dependabot, and OWASP Dependency-Check. Identify and remediate vulnerable libraries in your software supply chain. Use when managing third-party dependencies or implementing software composition analysis.
Create reproducible development environments with Dev Containers, Nix flakes, and Devbox for consistent toolchains across teams. Use when onboarding developers, standardizing build environments, or eliminating "works on my machine" problems.
Implement disaster recovery strategies and runbooks. Configure RPO/RTO targets and failover procedures. Use when planning for business continuity.
Configure DNS zones and records. Manage Route53, Cloud DNS, and self-hosted DNS. Use when setting up DNS infrastructure.
Define and run multi-container Docker applications using Docker Compose. Create compose files, manage service dependencies, configure networks and volumes, and orchestrate local development environments. Use when setting up multi-service applications or development environments.
Build, optimize, and troubleshoot Docker containers and images. Create efficient Dockerfiles, manage container lifecycle, configure networking and volumes, and debug container issues. Use when working with Docker, containerization, or container troubleshooting.
Deploy and manage the ELK Stack (Elasticsearch, Logstash, Kibana) for log aggregation and analysis. Configure log pipelines, create visualizations, and implement log-based monitoring. Use when centralizing logs, implementing search functionality, or building log analytics platforms.
Configure iptables, nftables, and cloud firewalls. Implement network segmentation and traffic filtering. Use when securing network perimeters or implementing security zones.
Deploy serverless functions on Google Cloud Functions. Configure triggers and manage deployments. Use when implementing serverless workloads on GCP.
Manage Compute Engine instances and instance templates. Configure managed instance groups and preemptible VMs. Use when deploying compute resources on GCP.
Configure VPCs, firewall rules, and Cloud NAT. Implement shared VPC and private service connect. Use when designing GCP network infrastructure.
Secure secrets in Google Cloud Secret Manager. Configure IAM policies, integrate with GKE, and manage secret versions. Use when managing secrets in GCP environments.
Build, test, and deploy applications using GitHub Actions workflows. Create CI/CD pipelines, configure runners, manage secrets, and automate software delivery. Use when working with GitHub repositories, automating builds, running tests, or deploying applications.
Implement Git branching strategies, PR workflows, and release management patterns. Configure GitFlow, trunk-based development, or GitHub Flow for team collaboration. Use when establishing version control workflows or improving development team collaboration.
Operate GPU-backed Kubernetes clusters for AI inference and training with scheduling, autoscaling, node health, MIG partitioning, and cost controls.
Set up and manage NVIDIA GPU servers for AI workloads — driver installation, CUDA toolkit, container toolkit, MIG partitioning, GPU health monitoring, and multi-GPU configuration for LLM inference and training.
Implement HIPAA security and privacy rules. Configure PHI protections and BAA requirements. Use when handling healthcare data.
Set up and manage SSO, SCIM provisioning, and MFA for startup teams using Google Workspace, Okta, or Azure AD. Use when centralizing authentication, onboarding SSO, or meeting compliance requirements.
Implement incident management processes and escalation procedures. Configure on-call schedules and post-incident reviews. Use when managing production incidents.
Handle security incidents with IR playbooks and procedures. Implement detection, containment, eradication, and recovery processes. Use when responding to security events or building incident response capabilities.
Implement Kubernetes security contexts, Pod Security Standards, and network policies. Secure cluster components and workloads. Use when hardening Kubernetes deployments or meeting security compliance.
Customize Kubernetes manifests without templating using Kustomize. Create base configurations with environment overlays, manage configuration variants, and patch resources declaratively. Use when managing Kubernetes configurations across multiple environments without Helm.
System administration for Linux servers. Manage packages, services, and system configuration. Use when administering Linux systems.
Implement GDPR data protection requirements. Configure consent management, data subject rights, and privacy by design. Use when processing EU personal data.
Implement ISO 27001 Information Security Management System. Configure ISMS controls and risk management. Use when implementing enterprise security frameworks.
Implement policy as code with OPA, Sentinel, and Kyverno. Automate policy enforcement in CI/CD and infrastructure. Use when enforcing compliance through automation.
Create and manage Jenkins CI/CD pipelines, configure agents, manage plugins, and automate builds. Use when working with Jenkins servers, creating Jenkinsfiles, or setting up build automation for enterprise environments.
Manage containers using Podman, the daemonless container engine. Run rootless containers, create pods, manage images, and use Docker-compatible commands. Use when working with Podman or requiring rootless container operations.
Implement Datadog monitoring and APM for infrastructure and applications. Configure agents, create dashboards, set up alerts, and implement distributed tracing. Use when implementing enterprise monitoring, APM, or unified observability platforms.
Configure Grafana Loki for log aggregation and analysis. Set up Promtail for log collection, write LogQL queries, and integrate with Grafana for visualization. Use when implementing lightweight log aggregation, especially in Kubernetes environments.
Create, manage, and deploy Helm charts for Kubernetes package management. Build reusable chart templates, manage releases, configure values, and use Helm repositories. Use when packaging Kubernetes applications or managing K8s deployments with Helm.
Manage Red Hat OpenShift clusters and deployments. Configure projects, routes, builds, and deploy applications using OpenShift-specific features. Use when working with OpenShift Container Platform or OKD for enterprise Kubernetes.
Implement feature flags for progressive feature rollout using LaunchDarkly, Unleash, or custom solutions. Control feature visibility, perform A/B testing, and enable trunk-based development. Use when implementing gradual rollouts, feature toggles, or experimentation platforms.
Automate versioning and changelog generation using semantic versioning principles. Configure release automation, version bumping, and changelog tools. Use when implementing version management or automating release processes.
Deploy AWS resources with CloudFormation templates. Create stacks, use nested stacks, and implement drift detection. Use when deploying AWS-native IaC.
Provision Cloud SQL and Spanner databases. Configure high availability, backups, and security. Use when deploying managed databases on GCP.
Administer PostgreSQL databases. Configure replication, backups, and performance tuning. Use when managing PostgreSQL deployments.
Migrate from Terraform to OpenTofu with state compatibility, provider registry setup, and CI/CD pipeline updates. Use when adopting the open-source Terraform fork or evaluating license-free IaC.
Practical IT troubleshooting playbooks for small teams without dedicated IT staff.
Auto-scale LLM inference clusters on Kubernetes using KEDA, custom GPU metrics, and horizontal pod autoscaling. Handle traffic spikes, implement queue-based scaling, and optimize cost with spot instances for AI workloads.
Design secure, multi-tenant LLM hosting platforms with tenant isolation, quotas, billing attribution, noisy-neighbor protection, and per-tenant policy controls.
Deploy and manage vLLM for high-throughput LLM inference. Configure continuous batching, tensor parallelism, quantization, and OpenAI-compatible API endpoints for production LLM serving.
Configure nginx and Traefik as reverse proxies. Implement SSL termination and routing. Use when setting up application gateways.
Deploy frontend and full-stack apps on Vercel with previews, edge functions, environment promotion, and production guardrails. Use when shipping Next.js, SvelteKit, or static sites with zero-config CI/CD.
Manage block storage volumes and LVM. Configure cloud block storage and local disks. Use when managing disk storage.
Defend AI systems against prompt injection and indirect prompt attacks using input controls, tool permissions, output validation, and isolation boundaries.
Audit and remediate CIS benchmark violations. Use automated tools to assess compliance and implement hardening recommendations. Use when meeting compliance requirements or implementing security baselines.
Deploy and tune Web Application Firewalls. Configure rules for OWASP Top 10 protection. Use when protecting web applications from common attacks.
Generate, sign, and verify SBOMs and provenance attestations to secure the software supply chain. Use when implementing SLSA controls, artifact trust policies, or compliance evidence for releases.
Harden Windows servers per security baselines and CIS benchmarks. Configure Group Policy, Windows Defender, and security features. Use when securing Windows Server environments.
Secure OpenClaw deployments with preflight hardening checks, CI/CD guardrails, container runtime restrictions, and post-deploy verification. Use when shipping OpenClaw with Docker, Kubernetes, or automated release pipelines.
Detect, respond to, and prevent software supply chain attacks on package registries, container images, and CI/CD pipelines with lockfile auditing, provenance verification, and emergency response playbooks.
Manage secrets and certificates in Azure Key Vault. Configure access policies, integrate with Azure services, and implement secure secret management. Use when managing secrets in Azure environments.
Set up infrastructure for fine-tuning LLMs with QLoRA, LoRA, and full fine-tuning using Hugging Face TRL, Axolotl, and distributed training with DeepSpeed or FSDP. Covers dataset prep, training runs, and model export.
Deploy an API gateway for LLM traffic with load balancing, rate limiting, key management, semantic caching, fallback routing, and cost tracking. Covers LiteLLM Proxy, OpenRouter-compatible setup, and custom Nginx/Traefik patterns.
Build production LLMOps platforms with CI/CD, model promotion workflows, evaluation gates, rollback, and governance across cloud and self-hosted inference.
Configure load balancers and traffic distribution. Implement health checks and SSL termination. Use when distributing traffic across servers.
Configure a Mac mini as a reliable local LLM server with remote access, observability, and power-safe operation. Use when building an always-on private AI inference server on Apple Silicon.
Secure Model Context Protocol (MCP) servers with transport encryption, tool authorization, input validation, and audit logging for safe AI agent integrations.
Manage and secure company devices with MDM solutions — enroll macOS, Windows, iOS, and Android devices, enforce security policies, and automate software deployment. Use when setting up device management for a growing team.
Establish model registry standards, governance controls, metadata schemas, approvals, and lifecycle policies for enterprise AI deployments.
Secure the AI model supply chain with artifact signing, provenance attestation, SBOM workflows, dependency controls, and trusted model promotion.
Administer MySQL/MariaDB databases. Configure replication and optimize performance. Use when managing MySQL deployments.
Configure New Relic observability platform for infrastructure and application monitoring. Set up APM agents, create dashboards, configure alerts, and implement distributed tracing. Use when implementing full-stack observability with New Relic One.
Configure NFS servers and clients. Implement network file sharing for Linux systems. Use when setting up shared storage.
Run local LLM workloads with Ollama, Open WebUI, and GPU-aware tuning for private development environments. Use when setting up private inference, local AI dev environments, or air-gapped LLM deployments.
Set up OpenClaw locally and run it reliably on a Mac mini for private, always-on local agent workflows.
Instrument applications and infrastructure with OpenTelemetry for unified traces, metrics, and logs. Use when implementing distributed tracing, service-level troubleshooting, or vendor-neutral observability.
Implement PCI DSS requirements for payment card data. Configure cardholder data environment and security controls. Use when processing payment cards.
Perform basic penetration testing and security assessments. Use reconnaissance, vulnerability discovery, and exploitation techniques. Use when validating security controls or assessing system security.
Optimize Linux system performance. Configure kernel parameters, analyze bottlenecks, and tune resources. Use when improving system performance.
Operate MySQL-compatible databases on PlanetScale with branching workflows, safe migrations, and production rollouts.
Set up metrics collection and visualization with Prometheus and Grafana. Configure scrape targets, create PromQL queries, build dashboards, and implement alerting. Use when implementing monitoring, metrics collection, or visualization for applications and infrastructure.
Build and operate Retrieval-Augmented Generation (RAG) infrastructure with vector stores, embedding pipelines, and hybrid search. Covers ingestion, chunking strategies, reranking, and production deployment patterns.
Configure AWS CloudTrail for audit logging. Set up organization trails and event analysis. Use when auditing AWS activity.
Create operational runbooks and standard operating procedures. Document troubleshooting guides and recovery procedures. Use when documenting operational knowledge.
Audit and harden your SaaS tool stack — enforce SSO, review OAuth grants, manage shadow IT, and secure admin accounts across Slack, GitHub, Google Workspace, and AWS. Use when tightening security across company SaaS tools.
Automate security workflows and remediation. Build security pipelines, automate compliance checks, and implement SOAR capabilities. Use when scaling security operations or implementing DevSecOps.
Implement Istio and Linkerd service meshes. Configure mTLS, traffic management, and observability. Use when managing microservices communication.
Configure SSH servers and clients securely. Manage keys, tunnels, and config files. Use when setting up secure remote access.
Provision AWS infrastructure with Terraform. Create modules, manage state, and implement IaC best practices. Use when deploying AWS resources declaratively.
Provision GCP infrastructure with Terraform. Configure providers and deploy Google Cloud resources. Use when implementing IaC for GCP.
Conduct threat modeling using STRIDE methodology. Identify threats, assess risks, and design security controls. Use when designing secure systems or assessing application security.
Manage users, groups, and permissions on Linux systems. Configure sudo and access controls. Use when managing system access.
Implement vendor risk management programs. Assess third-party security and maintain vendor inventory. Use when managing supplier security.
Configure WireGuard, OpenVPN, and cloud VPNs. Implement secure remote access and site-to-site connectivity. Use when setting up secure network tunnels.
Scan systems and dependencies for CVEs and security vulnerabilities. Use tools like Nessus, OpenVAS, and Qualys to identify and prioritize vulnerabilities. Use when performing security assessments, compliance scanning, or vulnerability management.
Administer Windows Server systems. Manage IIS, Active Directory, and PowerShell automation. Use when administering Windows infrastructure.
Configure Redis for caching and data storage. Set up clustering, persistence, and Sentinel. Use when implementing Redis caching or queues.
Configure GCP Cloud Audit Logs for compliance. Set up log routing and BigQuery analysis. Use when auditing GCP activity.
Implement FedRAMP requirements for federal cloud services. Configure NIST 800-53 controls and continuous monitoring. Use when providing cloud services to US federal agencies.
Implement SOC 2 Trust Services Criteria. Configure security, availability, and processing integrity controls. Use when achieving SOC 2 certification.
Build automated evaluation suites for AI agents using golden datasets, rubrics, and regression gates. Use when shipping agent features, validating prompt changes, or gating deployments on quality.
Build AI-focused SRE incident response practices for LLM outages, degraded quality, runaway cost events, and safety regressions.
Monitor and evaluate RAG systems with retrieval quality metrics, groundedness checks, hallucination detection, and continuous regression testing.