Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

aradotso/privacy-parser-pii-extraction

Name: privacy-parser-pii-extraction
Author: aradotso

skills/privacy-parser-pii-extraction/SKILL.md

npx skillsauth add aradotso/trending-skills privacy-parser-pii-extraction

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Privacy Parser — PII Span Extraction

Skill by ara.so — Daily 2026 Skills collection.

privacy-parser is the inverse of OpenAI's Privacy Filter. Where the filter masks PII with <REDACTED>, this library returns structured spans — label, text, and character offsets — using the same 1.5B opf model weights and label taxonomy.

Installation

# Clone the repo (includes both subpackages)
git clone https://github.com/chiefautism/privacy-parser
cd privacy-parser

uv venv
uv pip install -e ./privacy-filter   # installs the opf model + weights loader
uv pip install -e ./pii_parser       # installs the parser library

First run downloads the opf 1.5B checkpoint (~3 GB) to ~/.opf/privacy_filter/.

Quick Start

from pii_parser.hybrid import HybridPIIParser

parser = HybridPIIParser(device="cpu")  # or "cuda" / "mps"
result = parser.parse(
    "Hi Quindle Testwick ([email protected] / +1-415-555-0102), "
    "account 40702810500001234567, 14 Beautiful Ct, Anytown USA, "
    "password Priv4cy-Filt3r-2026."
)

for span in result.spans:
    print(f"{span.label:18}  {span.text}")

Output:

private_person      Quindle Testwick
private_email       [email protected]
private_phone       +1-415-555-0102
account_number      40702810500001234567
private_address     14 Beautiful Ct, Anytown USA
secret              Priv4cy-Filt3r-2026

Three Backends

Choose the backend based on your speed/accuracy tradeoff:

| Backend | Weights | Speed | F1 | When to use | |-------------------|---------|------------|-------|------------------------------------| | PIIParser | none | µs | 1.000 | Tests, known-format structured data | | ModelPIIParser | 1.5B | ~500ms CPU | 0.733 | Model-only, no post-processing | | HybridPIIParser | 1.5B | ~600ms CPU | 0.929 | Production — ship this one |

# Regex-only (no model, instant, high precision on structured formats)
from pii_parser import PIIParser
parser = PIIParser()

# Model-only (raw BIOES logits → Viterbi → spans)
from pii_parser.model import ModelPIIParser
parser = ModelPIIParser(device="cpu")

# Hybrid: model + span-merge + regex backstop (recommended)
from pii_parser.hybrid import HybridPIIParser
parser = HybridPIIParser(device="cpu")

Span Object

Each span in result.spans has:

span.label    # str — one of the 8 label types
span.text     # str — the extracted substring
span.start    # int — char offset in original string
span.end      # int — char offset (exclusive)

Label Taxonomy (opf v2)

private_person    — full names of individuals
private_email     — email addresses
private_phone     — phone numbers (any format)
private_address   — street/postal addresses
private_url       — personal/private URLs
private_date      — dates tied to individuals
account_number    — bank/card/account identifiers
secret            — passwords, tokens, API keys

Common Patterns

Batch processing

from pii_parser.hybrid import HybridPIIParser

parser = HybridPIIParser(device="cpu")

texts = [
    "Email Bob at [email protected]",
    "SSN: 123-45-6789, DOB: 1990-03-15",
    "Token: ghp_abc123XYZ",
]

for text in texts:
    result = parser.parse(text)
    if result.spans:
        print(f"Text: {text!r}")
        for s in result.spans:
            print(f"  [{s.start}:{s.end}] {s.label} → {s.text!r}")
        print()

Filter by label type

result = parser.parse(long_document)

emails   = [s for s in result.spans if s.label == "private_email"]
phones   = [s for s in result.spans if s.label == "private_phone"]
secrets  = [s for s in result.spans if s.label == "secret"]
accounts = [s for s in result.spans if s.label == "account_number"]

Redact after inspection

def redact(text: str, spans) -> str:
    """Replace extracted PII with [LABEL] tokens."""
    result = list(text)
    for span in sorted(spans, key=lambda s: s.start, reverse=True):
        result[span.start:span.end] = f"[{span.label.upper()}]"
    return "".join(result)

result = parser.parse("Call Alice at 555-0100 re: account 9988776655.")
clean  = redact("Call Alice at 555-0100 re: account 9988776655.", result.spans)
# "Call [PRIVATE_PERSON] at [PRIVATE_PHONE] re: account [ACCOUNT_NUMBER]."

Export to JSON

import json

result = parser.parse("Jane Doe, [email protected], +44 20 7946 0958")
payload = [
    {"label": s.label, "text": s.text, "start": s.start, "end": s.end}
    for s in result.spans
]
print(json.dumps(payload, indent=2))

GPU acceleration

import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
parser = HybridPIIParser(device=device)

CLI

# Parse a string directly
python -m pii_parser.cli_model "Alice paid 40702810500001234567 on 2026-05-17."

# Pipe text from a file
cat dump.txt | python -m pii_parser.cli_model -

Architecture

text
  ↓
opf 1.5B → BIOES logits → Viterbi (tuned transitions) → char spans
  ↓
span-merge  (glues multi-token names: "Quindle" + "Testwick" → one span)
  ↓
regex backstop  (URL, secret, account_number — fills model gaps)
  ↓
result.spans[]

BIOES tagging: Beginning / Inside / Outside / End / Single — standard NER scheme
Viterbi: enforces valid tag transitions (no I- without B-)
Span-merge: heuristic that joins adjacent same-label spans separated only by whitespace
Regex backstop: high-precision patterns for labels the 1.5B model under-predicts (secrets, account numbers, URLs)

Running Tests / Benchmarks

# Full fixture suite + latency benchmark
python pii_parser/tests/test_hybrid.py

Expected output:

Fixture F1:  0.929
Scenarios:   8/8 passed
Latency:     ~600 ms CPU

Troubleshooting

Slow first run — The checkpoint (~3 GB) downloads to ~/.opf/privacy_filter/ on first use. Subsequent runs load from cache.

CUDA out of memory — Use device="cpu" or reduce batch size; the 1.5B model requires ~3 GB VRAM on GPU.

Low recall on secrets/URLs — Use HybridPIIParser (not ModelPIIParser); the regex backstop specifically covers these labels.

Span text doesn't match offsets — Offsets are byte-safe character indices into the original string passed to parse(). Do not preprocess/strip the string before parsing if you need offsets to remain valid.

Import error on privacy_filter — Ensure you installed both packages: uv pip install -e ./privacy-filter AND uv pip install -e ./pii_parser.

Model not found — Delete ~/.opf/privacy_filter/ and re-run to trigger a fresh download.

aradotso/privacy-parser-pii-extraction

skills/privacy-parser-pii-extraction/SKILL.md

Extract structured PII spans from text using the OpenAI Privacy Filter 1.5B model reversed — returns what, where, and which type instead of masking.

36 stars

data-ai

Updated Apr 27, 2026

$ install --global

skillsauth

npx skillsauth add aradotso/trending-skills privacy-parser-pii-extraction

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 27, 2026, 4:50 AM68.5s1 file scanned

SKILL.md

name:: privacy-parser-pii-extraction
description:: Extract structured PII spans from text using the OpenAI Privacy Filter 1.5B model reversed — returns what, where, and which type instead of masking.

Privacy Parser — PII Span Extraction

Skill by ara.so — Daily 2026 Skills collection.

Installation

# Clone the repo (includes both subpackages)
git clone https://github.com/chiefautism/privacy-parser
cd privacy-parser

uv venv
uv pip install -e ./privacy-filter   # installs the opf model + weights loader
uv pip install -e ./pii_parser       # installs the parser library

First run downloads the opf 1.5B checkpoint (~3 GB) to ~/.opf/privacy_filter/.

Quick Start

from pii_parser.hybrid import HybridPIIParser

parser = HybridPIIParser(device="cpu")  # or "cuda" / "mps"
result = parser.parse(
    "Hi Quindle Testwick ([email protected] / +1-415-555-0102), "
    "account 40702810500001234567, 14 Beautiful Ct, Anytown USA, "
    "password Priv4cy-Filt3r-2026."
)

for span in result.spans:
    print(f"{span.label:18}  {span.text}")

Output:

private_person      Quindle Testwick
private_email       [email protected]
private_phone       +1-415-555-0102
account_number      40702810500001234567
private_address     14 Beautiful Ct, Anytown USA
secret              Priv4cy-Filt3r-2026

Three Backends

Choose the backend based on your speed/accuracy tradeoff:

# Regex-only (no model, instant, high precision on structured formats)
from pii_parser import PIIParser
parser = PIIParser()

# Model-only (raw BIOES logits → Viterbi → spans)
from pii_parser.model import ModelPIIParser
parser = ModelPIIParser(device="cpu")

# Hybrid: model + span-merge + regex backstop (recommended)
from pii_parser.hybrid import HybridPIIParser
parser = HybridPIIParser(device="cpu")

Span Object

Each span in result.spans has:

span.label    # str — one of the 8 label types
span.text     # str — the extracted substring
span.start    # int — char offset in original string
span.end      # int — char offset (exclusive)

Label Taxonomy (opf v2)

private_person    — full names of individuals
private_email     — email addresses
private_phone     — phone numbers (any format)
private_address   — street/postal addresses
private_url       — personal/private URLs
private_date      — dates tied to individuals
account_number    — bank/card/account identifiers
secret            — passwords, tokens, API keys

Common Patterns

Batch processing

from pii_parser.hybrid import HybridPIIParser

parser = HybridPIIParser(device="cpu")

texts = [
    "Email Bob at [email protected]",
    "SSN: 123-45-6789, DOB: 1990-03-15",
    "Token: ghp_abc123XYZ",
]

for text in texts:
    result = parser.parse(text)
    if result.spans:
        print(f"Text: {text!r}")
        for s in result.spans:
            print(f"  [{s.start}:{s.end}] {s.label} → {s.text!r}")
        print()

Filter by label type

result = parser.parse(long_document)

emails   = [s for s in result.spans if s.label == "private_email"]
phones   = [s for s in result.spans if s.label == "private_phone"]
secrets  = [s for s in result.spans if s.label == "secret"]
accounts = [s for s in result.spans if s.label == "account_number"]

Redact after inspection

def redact(text: str, spans) -> str:
    """Replace extracted PII with [LABEL] tokens."""
    result = list(text)
    for span in sorted(spans, key=lambda s: s.start, reverse=True):
        result[span.start:span.end] = f"[{span.label.upper()}]"
    return "".join(result)

result = parser.parse("Call Alice at 555-0100 re: account 9988776655.")
clean  = redact("Call Alice at 555-0100 re: account 9988776655.", result.spans)
# "Call [PRIVATE_PERSON] at [PRIVATE_PHONE] re: account [ACCOUNT_NUMBER]."

Export to JSON

import json

result = parser.parse("Jane Doe, [email protected], +44 20 7946 0958")
payload = [
    {"label": s.label, "text": s.text, "start": s.start, "end": s.end}
    for s in result.spans
]
print(json.dumps(payload, indent=2))

GPU acceleration

import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
parser = HybridPIIParser(device=device)

CLI

# Parse a string directly
python -m pii_parser.cli_model "Alice paid 40702810500001234567 on 2026-05-17."

# Pipe text from a file
cat dump.txt | python -m pii_parser.cli_model -

Architecture

text
  ↓
opf 1.5B → BIOES logits → Viterbi (tuned transitions) → char spans
  ↓
span-merge  (glues multi-token names: "Quindle" + "Testwick" → one span)
  ↓
regex backstop  (URL, secret, account_number — fills model gaps)
  ↓
result.spans[]

BIOES tagging: Beginning / Inside / Outside / End / Single — standard NER scheme
Viterbi: enforces valid tag transitions (no I- without B-)
Span-merge: heuristic that joins adjacent same-label spans separated only by whitespace
Regex backstop: high-precision patterns for labels the 1.5B model under-predicts (secrets, account numbers, URLs)

Running Tests / Benchmarks

# Full fixture suite + latency benchmark
python pii_parser/tests/test_hybrid.py

Expected output:

Fixture F1:  0.929
Scenarios:   8/8 passed
Latency:     ~600 ms CPU

Troubleshooting

Slow first run — The checkpoint (~3 GB) downloads to ~/.opf/privacy_filter/ on first use. Subsequent runs load from cache.

CUDA out of memory — Use device="cpu" or reduce batch size; the 1.5B model requires ~3 GB VRAM on GPU.

Low recall on secrets/URLs — Use HybridPIIParser (not ModelPIIParser); the regex backstop specifically covers these labels.

Import error on privacy_filter — Ensure you installed both packages: uv pip install -e ./privacy-filter AND uv pip install -e ./pii_parser.

Model not found — Delete ~/.opf/privacy_filter/ and re-run to trigger a fresh download.

Related Skills

aradotso/skills/compose-performance-skills

development

VerifiedTrustedCommunity

```markdown --- name: compose-performance-skills description: Install and use the skydoves/compose-performance-skills agent skill library to diagnose and fix Jetpack Compose performance issues including stability, recomposition, lazy layouts, modifiers, side effects, and build configuration. triggers: - "my composable recomposes too often" - "LazyColumn drops frames during scroll" - "diagnose Compose stability issues" - "fix unnecessary recomposition in Jetpack Compose" - "optimize Com

46SKILL.mdUpdated May 5, 2026

aradotso/skills/compose-performance-skills

aradotso/baguette-ios-simulator

development

VerifiedTrustedCommunity

Headless iOS Simulator manager with host-side HID input injection, 60fps streaming, and device farm web UI for iOS 26

45SKILL.mdUpdated May 4, 2026

aradotso/baguette-ios-simulator

aradotso/skills/claude-code-game-studios

development

VerifiedTrustedCommunity

```markdown --- name: claude-code-game-studios description: Turn Claude Code into a full 49-agent game dev studio with 72 workflow skills, automated hooks, and a real studio hierarchy for Godot, Unity, and Unreal projects. triggers: - "set up claude code game studios" - "use ai agents for game development" - "set up game dev studio with claude" - "add game studio agents to my project" - "how do I use claude code for game dev" - "set up godot unity unreal ai workflow" - "49 agents g

43SKILL.mdUpdated May 3, 2026

aradotso/skills/claude-code-game-studios

aradotso/skills/xq-py-quantum-vm

development

VerifiedTrustedCommunity

```markdown --- name: xq-py-quantum-vm description: Python implementation of the Quip Network's quantum virtual machine (xqvm) triggers: - quantum virtual machine python - xqvm quip network - quantum circuit simulation python - xq-py quantum vm - quip network quantum python - simulate quantum gates python - quantum vm xqvm - xqvm-py quantum circuit --- # xq-py Quantum Virtual Machine > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. `xqvm-py` is a Python impl

42SKILL.mdUpdated May 2, 2026

aradotso/skills/xq-py-quantum-vm

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/aradotso/trending-skills.git

# Copy into Claude Code skills folder (global)
cp -r trending-skills/skills/privacy-parser-pii-extraction ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

aradotso/trending-skills

36 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT