.claude/skills/posthog-selfhosted-diagnosis/SKILL.md
Diagnose PostHog self-hosted deployment issues including Docker Compose and Helm/Kubernetes problems, ClickHouse and Postgres failures, resource limits, reverse proxy misconfigurations, and data ingestion gaps. Clearly distinguishes PostHog bugs from customer infrastructure issues.
npx skillsauth add mongo-ai/posthog-triage-agent posthog-selfhosted-diagnosisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
docs-search("self-hosted troubleshooting {specific symptom}")docs-search("self-hosted deployment {docker OR helm OR kubernetes}")gh search issues --repo PostHog/posthog "self-hosted {symptom}"PostHog support helps with PostHog configuration. Customer infrastructure (Kubernetes clusters, Docker hosts, cloud provider settings, networking) is the customer's responsibility.
Be helpful and point them in the right direction, but be clear about boundaries. Don't guess at infrastructure fixes you can't verify.
| Signal | Deployment type | |--------|----------------| | "docker compose" or "docker-compose.yml" | Docker Compose (hobby/small) | | "helm", "kubernetes", "k8s", "EKS", "GKE", "AKS" | Kubernetes / Helm | | "posthog cloud" or no mention of self-hosting | Cloud — wrong skill, use standard diagnosis |
If unclear, ask: "Are you using PostHog Cloud (app.posthog.com / eu.posthog.com) or a self-hosted deployment?"
Ask the customer to check:
{their-posthog-url}/_health or /api/projects/For Docker Compose:
docker compose ps # All containers should be "Up"
docker compose logs -f # Check for error patterns
For Kubernetes:
kubectl get pods -n posthog # All pods should be Running/Ready
kubectl logs -n posthog <pod> # Check for errors
| Symptom | Likely cause | Category |
|---------|-------------|----------|
| "No events coming in" | Ingestion pipeline, reverse proxy, or SDK config | PostHog + Infra |
| "Events arrive but queries are slow" | ClickHouse resource limits or schema issues | Infra |
| "Session replay not working" | Recording config, MinIO/object storage, or CSP | PostHog + Infra |
| "Feature flags not evaluating" | API endpoint not reachable, or /decide blocked | PostHog + Infra |
| "Can't log in / SSO broken" | Auth config, Postgres, or network issues | Infra |
| "Upgrade failed" | Migration issues, version compatibility | PostHog + Infra |
| "Out of disk / memory" | Resource limits, ClickHouse data growth | Infra |
| "CORS / CSP errors" | Reverse proxy misconfiguration | Infra |
| "SSL certificate errors" | Certificate config, proxy termination | Infra |
The most common self-hosted issue. Run:
docs-search("self-hosted reverse proxy CORS configuration")
Key checks:
Host header being passed through correctly?ClickHouse is the most resource-hungry component.
Common patterns:
system.query_log for long-running queriesAsk:
df -h on the ClickHouse data volume"If events aren't arriving:
app.posthog.com)Run: docs-search("self-hosted upgrade migration troubleshooting")
Common issues:
gh search issues --repo PostHog/posthog "self-hosted {symptom}" --limit 10
gh search issues --repo PostHog/posthog "self hosted {alternate terms}" --limit 10
Also check the self-hosted docs:
gh api repos/PostHog/posthog.com/contents/contents/docs/self-host --jq '.[].name'
DO: "PostHog needs at least 4GB of memory for ClickHouse to run smoothly. It looks like your deployment might be hitting resource limits. Here's our infrastructure requirements doc: [link]. Your DevOps team can use this to right-size the deployment."
DON'T: "That's an infrastructure issue, not our problem." or attempting to debug their Kubernetes cluster configuration.
If the customer is struggling with self-hosted complexity, it's appropriate to gently suggest PostHog Cloud:
"If managing the infrastructure is becoming a challenge, PostHog Cloud handles all of this automatically and includes the same features. Happy to help you evaluate whether Cloud might be a better fit — you can migrate your data over."
Only suggest this when:
Escalate to engineering when:
tools
Diagnose PostHog web analytics issues including missing pageviews, incorrect bounce rates, broken channel attribution, missing UTM data, reverse proxy problems, and discrepancies with other analytics tools.
business
Final synthesis skill. Produce a structured, evidence-graded triage report with a clear root-cause assessment, honest confidence, and a ready-to-send customer response.
tools
Normalize an incoming support ticket into structured investigation inputs: product area, identifiers, scope clues, URLs, timeframe, and likely first diagnostic path.
development
Diagnose PostHog survey issues including surveys not appearing, targeting mismatches, response collection failures, display timing problems, and API-mode survey integration issues.