skills/regex/SKILL.md
Regular expression mastery with syntax, common patterns, and language-specific usage. Use when user asks to "write a regex", "match pattern", "validate email", "extract data with regex", "parse string", "regex for phone number", "lookahead", "backreference", "capture group", "regex replace", "grep pattern", "sed substitute", "catastrophic backtracking", or any regular expression tasks.
npx skillsauth add 1mangesh1/dev-skills-collection regexInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Regex syntax, patterns, recipes, and cross-language reference.
. Any character (except newline, unless s/dotall flag)
\d \D Digit [0-9] / Non-digit
\w \W Word char [a-zA-Z0-9_] / Non-word
\s \S Whitespace [ \t\n\r\f\v] / Non-whitespace
[abc] Character set (a, b, or c)
[^abc] Negated set (not a, b, or c)
[a-z] Range (a through z)
[a-zA-Z] Multiple ranges
\p{L} Unicode letter (PCRE/Java/JS with u flag/Rust)
\p{N} Unicode number
* + ? 0+, 1+, 0-1 (greedy)
{3} {3,} {3,5} Exact, min, range
*? +? ?? Lazy (non-greedy) versions
*+ ++ Possessive (PCRE/Java) - no backtracking
Greedy: match max, then backtrack. Lazy: match min, then expand. Possessive: match max, never backtrack.
^ $ Start/end of string (or line with m flag)
\b \B Word boundary / Non-word boundary
\A \Z Start/end of string (ignores m flag)
(abc) Capture group
(?:abc) Non-capturing group
(?<name>abc) Named capture group (JS/PCRE/.NET)
(?P<name>abc) Named capture group (Python)
\1 Backreference to group 1
\k<name> Backreference to named group (JS/PCRE)
(?P=name) Backreference to named group (Python)
(?>abc) Atomic group (PCRE/Java) - no backtracking into group
Zero-width assertions: match a position without consuming characters.
(?=abc) Positive lookahead: followed by abc
(?!abc) Negative lookahead: NOT followed by abc
(?<=abc) Positive lookbehind: preceded by abc
(?<!abc) Negative lookbehind: NOT preceded by abc
^(?=.*\d)(?=.*[A-Z]).{8,}$ # Password: 8+ chars, digit + uppercase
\d+(?!%) # Number NOT followed by %
(?<=\$)\w+ # Word preceded by $
(?<!bar)foo # foo not preceded by bar
Lookbehind limits: JS requires fixed-length. Python allows fixed-length alternation branches. PCRE allows variable-length with \K.
g Global (all matches) i Case-insensitive
m Multiline (^/$ per line) s Dotall (. matches \n)
u Unicode x Extended (whitespace ignored, # comments)
Inline: (?i)pattern or scoped (?i:abc).
<.*> Greedy: <b>bold</b> → "<b>bold</b>" (first < to LAST >)
<.*?> Lazy: <b>bold</b> → "<b>" then "</b>" (first < to NEXT >)
<[^>]*> Negated class: same as lazy, faster (no backtracking)
# Email (simplified)
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
# URL
^https?://[^\s/$.?#].[^\s]*$
# Phone (US)
^\+?1?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$
# IPv4 (strict 0-255)
^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$
# Date (YYYY-MM-DD)
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
# UUID v4
^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$
# Semver
^v?(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-([\da-zA-Z-]+(?:\.[\da-zA-Z-]+)*))?(?:\+([\da-zA-Z-]+(?:\.[\da-zA-Z-]+)*))?$
# Hex color
^#?([0-9a-fA-F]{3}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$
# Strong password (8+ chars, upper, lower, digit, special)
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
# Slug
^[a-z0-9]+(?:-[a-z0-9]+)*$
# JWT structure
^[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+$
https?://([^/\s]+) # Domain from URL
[^/\\]+$ # Filename from path
-?\d+\.?\d* # Numbers (int or decimal)
(\w+)=([^\s&]+) # Key=value pairs
\[([^\]]+)\]\(([^)]+)\) # Markdown links
^\[(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\] \[(\w+)\] (.+)$ # Log line
$0 or \0 Entire match ${name} Named group (JS)
$1 or \1 First capture group \g<name> Named group (Python)
# Swap first/last name
Find: (\w+) (\w+) Replace: $2, $1
# camelCase to snake_case
Find: ([a-z])([A-Z]) Replace: $1_\l$2
# Remove console.log lines
Find: ^\s*console\.log\([^)]*\);?\s*\n Replace: (empty)
const re = /\d+/g; // Literal
const re2 = new RegExp('\\d+', 'g'); // Constructor
/^\d+$/.test('123'); // true
'abc 123 def 456'.match(/\d+/g); // ['123', '456']
for (const m of 'a1 b2'.matchAll(/([a-z])(\d)/g)) {
console.log(m[1], m[2]); // a 1, b 2
}
'hello'.replace(/[aeiou]/g, c => c.toUpperCase()); // "hEllO"
// Named groups
const m = '2024-01-15'.match(/(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/);
m.groups.y; // '2024'
'$100 €200'.match(/(?<=\$)\d+/); // ['100'] (ES2018+)
import re
re.match(r'^\d+', '123abc') # Match from start
re.search(r'\d+', 'abc123') # Search anywhere
re.findall(r'\d+', 'a1 b2 c3') # ['1', '2', '3']
re.sub(r'\d+', 'X', 'a1b2') # 'aXbX'
pat = re.compile(r'\b\w{4}\b') # Compiled for reuse
pat.findall('the quick brown fox') # ['quick', 'brown']
# Named groups (Python syntax)
m = re.search(r'(?P<year>\d{4})-(?P<month>\d{2})', '2024-01')
m.group('year') # '2024'
m.groupdict() # {'year': '2024', 'month': '01'}
# Verbose mode
pat = re.compile(r'''
(?P<year>\d{4}) # year
-(?P<month>\d{2}) # month
-(?P<day>\d{2}) # day
''', re.VERBOSE)
import "regexp"
re := regexp.MustCompile(`\d+`) // Panics on bad pattern
re.MatchString("abc123") // true
re.FindString("abc 123 def") // "123"
re.FindAllString("a1 b2", -1) // ["1", "2"]
re.ReplaceAllString("a1b2", "X") // "aXbX"
// Named groups
re := regexp.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})`)
match := re.FindStringSubmatch("2024-01")
// match[re.SubexpIndex("year")] == "2024"
Go uses RE2: no backreferences, no lookahead/lookbehind. Guarantees linear-time matching.
use regex::Regex;
let re = Regex::new(r"\d+").unwrap();
re.is_match("abc123"); // true
re.find("abc 123").unwrap().as_str(); // "123"
// Named captures
let re = Regex::new(r"(?P<y>\d{4})-(?P<m>\d{2})").unwrap();
let caps = re.captures("2024-01").unwrap();
&caps["y"]; // "2024"
Rust regex crate uses RE2-like semantics. Use fancy-regex for PCRE features.
| Feature | PCRE | ECMAScript (JS) | POSIX ERE | RE2 (Go/Rust) | Python re |
|-----------------------|-------------|-----------------|------------|----------------|------------|
| Lookahead | Yes | Yes | No | No | Yes |
| Lookbehind | Variable-len| Fixed-len | No | No | Fixed-len |
| Named groups | (?<n>) | (?<n>) | No | (?P<n>) | (?P<n>) |
| Backreferences | \1 | \1 | \1 (BRE) | No | \1 |
| Atomic groups | (?>) | No | No | Automatic | No |
| Possessive quantifiers| ++ *+ | No | No | No | No |
| Unicode properties | \p{L} | \p{L} (u flag)| No | \p{L} | Limited |
| Recursive patterns | (?R) | No | No | No | No |
# grep
grep 'pattern' file.txt # Basic regex (BRE)
grep -E 'extended|regex' file.txt # Extended regex (ERE)
grep -P '\d{3}' file.txt # Perl regex (GNU grep only)
grep -oirn 'pattern' src/ # -o only match, -i case, -r recursive, -n line nums
# sed
sed 's/old/new/g' file.txt # Replace all
sed -E 's/([0-9]+)/[\1]/g' file.txt # Extended regex with groups
sed '/pattern/d' file.txt # Delete matching lines
# awk
awk '/pattern/ {print $0}' file.txt # Print matching lines
awk '$3 ~ /^[0-9]+$/ {print $1}' file.txt # Field regex match
rg '\d{3}-\d{4}' src/ # Recursive, fast, .gitignore-aware
rg -t py 'import' . # Filter by file type
rg -U 'struct.*\n.*field' . # Multiline match
rg --pcre2 '(?<=\$)\d+' . # PCRE2 features
rg -r '$1' '(\w+)@\w+' emails.txt # Replace (output only)
# Enable regex: click .* button or Alt+R
# Capture groups in replace: $1, $2, etc.
Find: (\w+):\s*(\w+) Replace: $2: $1
Find: console\.log\(.*?\) Replace: (empty)
# Case modifiers in replacement
\u Next char uppercase \U Uppercase until \E
\l Next char lowercase \L Lowercase until \E
Find: const (\w+) Replace: const \l$1
Nested quantifiers cause exponential time on non-matching input:
# DANGEROUS # SAFE alternative
(a+)+b a+b
(a|a)+b [a]+b
(\w+)*@ \w+@
Mitigations:
\p{L}+ # Any Unicode letter (PCRE, Java, Rust, JS with u flag)
[\u4e00-\u9fff]+ # CJK Unified Ideographs
\p{Emoji} # Emoji (modern engines)
# JS: always use u flag for Unicode
/\p{L}+/u # Correct - \w is still ASCII-only even with u
# Python 3: re handles Unicode by default
re.findall(r'\w+', 'cafe\u0301 nai\u0308ve') # ['cafe\u0301', 'nai\u0308ve']
/^start/m # Multiline: ^ matches start of any line
/start.*end/s # Dotall: . matches \n
/start[\s\S]*?end/ # Cross-line match without s flag
re.compile(pattern, re.DEBUG) prints parse treeJSON.parse() or equivalentFor more recipes and testing tips: references/recipes.md
tools
Parallel execution with xargs, GNU parallel, and batch processing patterns. Use when user mentions "xargs", "parallel", "batch processing", "run in parallel", "parallel execution", "process list of files", "bulk operations", "concurrent commands", "map over files", or running commands on multiple inputs.
development
WebSocket implementation for real-time bidirectional communication. Use when user mentions "websocket", "ws://", "wss://", "real-time", "live updates", "chat application", "socket.io", "Server-Sent Events", "SSE", "push notifications", "live data", "streaming data", "bidirectional communication", "websocket server", "reconnection", or building real-time features.
tools
Frontend bundler configuration for Webpack and Vite. Use when user mentions "webpack", "vite", "bundler", "vite config", "webpack config", "code splitting", "tree shaking", "hot module replacement", "HMR", "build optimization", "bundle size", "chunk splitting", "loader", "plugin", "esbuild", "rollup", "dev server", or configuring JavaScript build tools.
tools
VS Code configuration, extensions, keybindings, and workspace optimization. Use when user mentions "vscode", "vs code", "vscode settings", "vscode extensions", "keybindings", "code editor", "workspace settings", "settings.json", "launch.json", "tasks.json", "vscode snippets", "devcontainer", "remote development", or customizing their VS Code setup.