skills/managing-path-cleaning-rules/SKILL.md
Inspects URL paths and proposes, tests, orders, and applies project-level path cleaning rules so dynamic segments (numeric IDs, UUIDs, slugs, dates) collapse into readable aliases. Use when the user says "clean the paths", "normalize URLs", "group similar pages", "too many distinct paths", "/users/123 and /users/456 are the same page", "set up path cleaning", or asks why a Web analytics or Paths breakdown is fragmented across thousands of nearly-identical URLs. Covers regex syntax (re2), alias placeholder convention, rule ordering, the test workflow, and applying rules via the project-settings-update MCP tool.
npx skillsauth add posthog/ai-plugin managing-path-cleaning-rulesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Path cleaning rules normalize $pathname and $entry_pathname so that pages
sharing the same template (/users/123/profile, /users/456/profile, …) collapse
into one row (/users/<id>/profile) in Web analytics tiles, Paths insights, and
any HogQL query that calls apply_path_cleaning. They are the right answer when
a breakdown is fragmented across thousands of near-identical URLs.
This skill teaches you how to:
regex + alias rules in re2 syntax with the project's placeholder
conventionTeam.path_cleaning_filters is a JSON list of PathCleaningFilter objects:
{
"regex": "/users/\\d+/profile",
"alias": "/users/<id>/profile",
"order": 0
}
regex — a re2 pattern. No
need to escape /. Anchor with ^ / $ when you mean it.alias — the literal replacement. Use angle-bracket placeholders
(<id>, <slug>, <uuid>, <date>) by convention so the cleaned path stays
human-readable. The alias is not a regex template — backreferences are not
supported.order — integer. Rules apply sequentially in order ascending,
each rule's output feeds the next.Application is replaceRegexpAll(pathname, regex, alias) per rule, chained.
Source: posthog/hogql/property.py:613.
Ask yourself: is the user complaining about cardinality (too many distinct paths
in a chart), or do they want a per-URL drill-down? Path cleaning is for the
former. If they want per-URL data, suggest a property filter on $pathname
instead.
Don't guess at patterns — query them. With the execute-sql MCP tool:
SELECT properties.$pathname AS path, count() AS views
FROM events
WHERE event = '$pageview'
AND timestamp > now() - INTERVAL 7 DAY
GROUP BY path
ORDER BY views DESC
LIMIT 200
Scan the result for:
/users/123, /orders/4242/sessions/8f3c1a3b-…/posts/why-i-love-posthog/archive/2024-09-12/en-US/, /fr-FR/?page=3, /page/3/| Pattern | Example match | regex | alias |
| ------------------- | ---------------------- | ---------------------------- | ---------------------- |
| Numeric segment | /users/123/profile | /users/\d+/profile | /users/<id>/profile |
| UUID v4 | /sessions/8f3c1a3b-… | /sessions/[0-9a-f-]{36} | /sessions/<uuid> |
| Slug | /posts/why-posthog | /posts/[a-z0-9-]+$ | /posts/<slug> |
| ISO date | /archive/2024-09-12 | /archive/\d{4}-\d{2}-\d{2} | /archive/<date> |
| Locale prefix | /en-US/about | ^/[a-z]{2}-[A-Z]{2}/ | /<locale>/ |
| Trailing query/page | /blog?page=3 | \?page=\d+$ | (empty alias drops it) |
Anchoring rules of thumb:
^ only when the segment must be at the beginning of
the path$ to keep a generic rule (e.g. \d+$) from matching mid-path
segmentsThree options, pick one:
Settings page tester: /settings/project#path_cleaning has a built-in
"test path" input that replays the full ordered chain.
Project HogQL (via execute-sql):
SELECT replaceRegexpAll('/users/42/profile', '/users/\d+/profile', '/users/<id>/profile')
Chain replaceRegexpAll calls in the same order the rules will run if you
want to verify multi-rule interaction.
Built-in AI helper: there is already an AiRegexHelper modal accessible
from the rule editor (Help me with Regex button) that turns natural
language into a regex. Suggest it to the user when they say "I don't know
regex" — but always validate the output against real paths via the tester.
Sequential application means a generic rule placed first will swallow everything that should have hit a specific rule.
order=0 /users/me/profile → /users/me/profile (specific, runs first)
order=1 /users/\d+/profile → /users/<id>/profile
order=2 /users/[a-z0-9-]+ → /users/<slug> (catch-all, runs last)
If /users/[a-z0-9-]+ ran first it would also match /users/me/profile and
make the more specific rule unreachable.
Use the project-settings-update tool with the full list (the field is
replaced, not merged):
{
"path_cleaning_filters": [
{ "regex": "/users/me/profile", "alias": "/users/me/profile", "order": 0 },
{ "regex": "/users/\\d+/profile", "alias": "/users/<id>/profile", "order": 1 },
{ "regex": "/users/[a-z0-9-]+", "alias": "/users/<slug>", "order": 2 }
]
}
Always read the existing rules first (project settings include
path_cleaning_filters) and merge — overwriting silently destroys whatever the
team has already configured.
When the user (or a HogQL query) opts in:
PathCleaningToggle.tsx)apply_path_cleaning(path_expr, team)The rules are stored once per project — they are not insight-scoped.
alias need double-escaping — ClickHouse's
replaceRegexpAll supports \0 (whole match) and \1–\9 (capture
groups). In a JSON field or SQL string literal the backslash must be
doubled, so use \\1 in path_cleaning_filters / HogQL to get the \1
backreference at the ClickHouse layer.$ — \d+ without an end anchor matches every numeric run
in any path, so /blog/2024-09-12/post becomes
/blog/<num>-<num>-<num>/post when you only meant to match the year
segment. Use \d+$ or \d+(/|$) depending on intent./ — re2 does not require it. \/ works but adds noise.(?i) at the
start of the pattern for case-insensitive matching, e.g. (?i)/users/\d+.path_cleaning_filters is overwrite, not
append. Always start from the current list.testing
Focused Signals scout for PostHog projects running surveys. Watches active surveys for score regressions (NPS / CSAT / rating drops), response-volume drops, abandonment spikes, and targeting drift, AND aggregates open-text responses into recurring themes the team should know about (clusters of complaints, praise, feature requests). Emits findings only when a theme or anomaly clears the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
development
Focused Signals scout for PostHog projects using revenue analytics. Watches the derived revenue product for upstream failures (Stripe sync stalls, capture regressions), config drift (missing subscription property, currency mix surprises, broken Stripe↔person joins, deferred-revenue gaps), and goal-miss escalations. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
testing
Focused Signals scout for finding observability gaps in PostHog itself — significant event volumes the team isn't tracking, custom events with no insight or dashboard coverage, insights pointing at events that have stopped firing, dashboards missing related context, critical events with no alerts. Watches the event-stream-vs-saved- inventory delta as the team's product evolves and emits findings recommending new insights, dashboard additions, or alerts when gaps clear the confidence bar. Self-contained peer in the signals-scout-* fleet — picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
testing
Focused Signals scout for PostHog projects using logs. Watches for volume bursts, severity-distribution shifts, service silence, fresh message patterns, and trace-correlated bursts via the logs ingestion pipeline. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.