plugins/jd-analyzer/skills/jd-analyzer/SKILL.md
# JD Analyzer - Career Transition Intelligence ## Quick Reference **Purpose**: Automate AI Product Engineer career transition with intelligent JD analysis **Primary Use**: Collect 100+ JDs, extract skills, match profile, generate actionable insights **Key Features**: Playwright automation, spaCy NLP, weighted scoring (10pt/3pt), Jinja2 reports **Performance**: < 10 min full pipeline, < 2 min skill extraction ## Execution Algorithm ### Step 1: Environment Validation **Goal**: En
npx skillsauth add jaykim88/claude-ai-engineering plugins/jd-analyzer/skills/jd-analyzerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Purpose: Automate AI Product Engineer career transition with intelligent JD analysis Primary Use: Collect 100+ JDs, extract skills, match profile, generate actionable insights Key Features: Playwright automation, spaCy NLP, weighted scoring (10pt/3pt), Jinja2 reports Performance: < 10 min full pipeline, < 2 min skill extraction
Goal: Ensure all dependencies and configurations are ready
Process:
en_core_web_sm if missing)~/.jd-analyzer/ existsError Handling:
python -m spacy download en_core_web_smplaywright install chromiumPseudocode:
def validate_environment():
if sys.version_info < (3, 9):
raise Error("Python 3.9+ required")
missing = check_dependencies()
if missing:
suggest_pip_install(missing)
if not spacy_model_exists("en_core_web_sm"):
auto_download_spacy_model()
if not playwright_browser_installed():
auto_install_playwright()
ensure_config_directory()
Goal: Load or create user profile for matching
Process:
~/.jd-analyzer/profile.yaml existsError Handling:
Profile Template Structure:
personal:
name: "Your Name"
location: "City, Country"
experience:
total_years: 0
frontend_years: 0
backend_years: 0
ai_ml_years: 0
skills:
frontend:
expert: ["React", "TypeScript"]
advanced: ["Next.js"]
learning: ["Vue.js"]
backend:
expert: ["Python"]
advanced: ["FastAPI"]
ai_ml:
advanced: ["Claude AI", "Prompt Engineering"]
learning: ["LangChain", "RAG"]
preferences:
remote_only: true
visa_required: false
min_match_score: 70
Pseudocode:
def load_or_create_profile():
profile_path = Path.home() / ".jd-analyzer" / "profile.yaml"
if not profile_path.exists():
create_profile_template(profile_path)
print("Profile template created. Please fill and re-run.")
sys.exit(0)
try:
profile = yaml.safe_load(profile_path.read_text())
validate_profile_structure(profile)
return profile
except yaml.YAMLError as e:
raise Error(f"YAML syntax error: {e}")
Goal: Determine which analysis mode to run
Modes:
JDs/ folder (Quick Win, 30-60 sec)User Interaction:
Use AskUserQuestion to present mode menu:
Select mode:
1. Analyze existing JDs (24 files in JDs/ folder) - Quick Win!
2. Search new JDs (LinkedIn + Wellfound automation)
3. Add single URL
4. Full analysis of all collected JDs
Choice (1-4):
Routing Logic:
Pseudocode:
def select_mode():
jds_count = count_existing_jds()
menu = f"""
1. Analyze existing JDs ({jds_count} files)
2. Search new JDs (LinkedIn + Wellfound)
3. Add single URL
4. Full analysis
"""
choice = AskUserQuestion(menu, type="choice")
return choice
Goal: Parse markdown JDs from JDs/ folder
Process:
JDs/**/*.md for all markdown filesJobDescription dataclass~/.jd-analyzer/data/jds.jsonRegex Patterns:
COMPANY_PATTERN = r"^#\s+(.+?)(?:\s+-\s+|\s+\|)"
TITLE_PATTERN = r"(?:Position|Role|Title):\s*(.+)"
LOCATION_PATTERN = r"(?:Location|Based):\s*(.+)"
REMOTE_KEYWORDS = ["remote", "anywhere", "distributed", "work from home"]
VISA_KEYWORDS = ["visa sponsorship", "visa support", "work permit"]
Error Handling:
Pseudocode:
def parse_existing_jds(jds_folder: Path):
jd_files = list(jds_folder.rglob("*.md"))
parsed = []
for file in jd_files:
try:
content = file.read_text()
jd = parse_markdown_jd(content, file)
parsed.append(jd)
except Exception as e:
logger.warning(f"Failed to parse {file}: {e}")
return parsed
def parse_markdown_jd(content: str, file: Path):
company = extract_company(content)
title = extract_title(content)
location = extract_location(content)
is_remote = detect_remote(content)
return JobDescription(
id=generate_id(company, title),
company=company,
title=title,
location=location,
is_remote=is_remote,
description=content,
...
)
Goal: Collect 100 JDs (LinkedIn 50 + Wellfound 50)
Phase 1: LinkedIn Collection (3-5 min)
~/.jd-analyzer/cookies/linkedin.enclinkedin.com/login/jobs/search/?keywords={query}&location=remotePhase 2: Wellfound Collection (2-3 min)
wellfound.com/jobsError Handling:
Security:
Pseudocode:
def collect_linkedin_jds(query: str, count: int = 50):
browser = playwright.chromium.launch(headless=False)
page = browser.new_page()
# Session management
if has_saved_cookie():
cookie = decrypt_cookie(load_cookie())
page.context.add_cookies(cookie)
else:
login_to_linkedin(page)
save_encrypted_cookie(page.context.cookies())
# Search
page.goto(f"linkedin.com/jobs/search/?keywords={query}")
jd_urls = scroll_and_extract_urls(page, count)
# Parse each JD
jds = []
for i, url in enumerate(jd_urls):
print(f"✓ LinkedIn: Collecting JD {i+1}/{count}...")
jd = parse_linkedin_jd(page, url)
jds.append(jd)
time.sleep(random.uniform(1, 3)) # Rate limit
return jds
def login_to_linkedin(page):
page.goto("linkedin.com/login")
email, password = get_credentials_from_keyring()
page.fill("#username", email)
page.fill("#password", password)
page.click("button[type=submit]")
if detect_captcha(page):
input("⚠️ CAPTCHA detected. Solve manually, press Enter...")
page.wait_for_url("linkedin.com/feed")
Goal: Parse one JD URL and add to collection
Supported Platforms:
Process:
Platform Detection:
PLATFORM_PATTERNS = {
"linkedin": r"linkedin\.com/jobs",
"wellfound": r"wellfound\.com|angel\.co",
"lever": r"\.lever\.co",
"greenhouse": r"boards\.greenhouse\.io",
}
Error Handling:
Pseudocode:
def add_single_url(url: str):
platform = detect_platform(url)
if platform in ["linkedin", "wellfound"]:
jd = parse_with_playwright(url, platform)
else:
jd = parse_with_beautifulsoup(url, platform)
save_jd(jd)
if AskUserQuestion("Re-analyze all JDs?", type="yes/no"):
analyze_all_jds()
Goal: Extract and categorize skills from JD descriptions
Taxonomy-Driven Extraction:
Load skill taxonomy (~/.jd-analyzer/skill_taxonomy.yaml):
categories:
frontend:
keywords: ["React", "Vue", "Angular", "Next.js"]
aliases:
React: ["ReactJS", "React.js"]
ai_ml:
keywords: ["LLM", "RAG", "LangChain", "Prompt Engineering"]
aliases:
LLM: ["Large Language Model", "GPT"]
spaCy NLP Processing:
en_core_web_sm modelCategorization:
Required vs Nice-to-have Detection:
REQUIRED_SECTIONS = [
"requirements", "must have", "required skills",
"qualifications", "what you need"
]
NICE_TO_HAVE_SECTIONS = [
"nice to have", "preferred", "bonus", "plus"
]
Error Handling:
Pseudocode:
class SkillExtractor:
def __init__(self, taxonomy_path: Path):
self.nlp = spacy.load("en_core_web_sm")
self.taxonomy = yaml.safe_load(taxonomy_path.read_text())
def extract(self, text: str) -> Dict[str, List[str]]:
doc = self.nlp(text.lower())
skills = {"required": set(), "nice_to_have": set()}
# Detect sections
required_text, nice_text = split_by_sections(text)
# Extract from each section
for category, config in self.taxonomy["categories"].items():
for keyword in config["keywords"]:
aliases = config.get("aliases", {}).get(keyword, [])
all_variants = [keyword] + aliases
for variant in all_variants:
if variant.lower() in required_text.lower():
skills["required"].add(keyword)
elif variant.lower() in nice_text.lower():
skills["nice_to_have"].add(keyword)
return {k: sorted(list(v)) for k, v in skills.items()}
Goal: Calculate match score using weighted algorithm
Algorithm (per spec):
Process:
Example Calculation:
JD Requirements:
Required: [React, TypeScript, Python, Docker] → 4 × 10 = 40 pts
Nice-to-have: [AWS, GraphQL, Redis] → 3 × 3 = 9 pts
Total possible: 49 pts
User Skills: [React, TypeScript, AWS, PostgreSQL]
Matched Required: React (10), TypeScript (10) → 20 pts
Matched Nice-to-have: AWS (3) → 3 pts
Earned: 23 pts
Match Score: (23 / 49) × 100 = 46.9%
Missing Skills: [Python, Docker, GraphQL, Redis]
Error Handling:
Pseudocode:
class ProfileMatcher:
def __init__(self, profile: Dict):
self.user_skills = self._flatten_skills(profile["skills"])
def match(self, jd: JobDescription) -> MatchResult:
required = set(jd.skills.get("required", []))
nice_to_have = set(jd.skills.get("nice_to_have", []))
# Calculate points
total_points = len(required) * 10 + len(nice_to_have) * 3
matched_required = required & self.user_skills
matched_nice = nice_to_have & self.user_skills
earned_points = len(matched_required) * 10 + len(matched_nice) * 3
# Compute score
score = (earned_points / total_points * 100) if total_points > 0 else 0
missing = (required | nice_to_have) - self.user_skills
return MatchResult(
score=score,
matched_required=matched_required,
matched_nice=matched_nice,
missing_skills=missing
)
Goal: Sort companies by match score and preferences
Filters:
profile.preferences.min_match_score (default 70%)remote_only=true, filter is_remote=truevisa_required=true, filter visa_sponsor=trueSorting Priority:
Output: Top 50 companies with metadata
Pseudocode:
def rank_companies(matches: List[MatchResult], profile: Dict):
# Filter
filtered = [
m for m in matches
if m.score >= profile["preferences"]["min_match_score"]
]
if profile["preferences"]["remote_only"]:
filtered = [m for m in filtered if m.jd.is_remote]
# Sort
sorted_matches = sorted(
filtered,
key=lambda m: (m.score, m.jd.is_remote, m.jd.posted_date),
reverse=True
)
return sorted_matches[:50]
Goal: Compute market trends and insights
Metrics:
Pseudocode:
class TrendAnalyzer:
def analyze(self, jds: List[JobDescription], matches: List[MatchResult]):
# Skill frequency
all_skills = Counter()
for jd in jds:
all_skills.update(jd.skills["required"])
all_skills.update(jd.skills["nice_to_have"])
top_skills = all_skills.most_common(20)
# Skills to learn
missing_skills = Counter()
for match in matches:
missing_skills.update(match.missing_skills)
skills_to_learn = missing_skills.most_common(5)
# Remote stats
remote_count = sum(1 for jd in jds if jd.is_remote)
remote_pct = remote_count / len(jds) * 100
# Match distribution
buckets = {
"80-100%": 0,
"60-79%": 0,
"40-59%": 0,
"<40%": 0
}
for match in matches:
if match.score >= 80:
buckets["80-100%"] += 1
elif match.score >= 60:
buckets["60-79%"] += 1
elif match.score >= 40:
buckets["40-59%"] += 1
else:
buckets["<40%"] += 1
return TrendReport(
top_skills=top_skills,
skills_to_learn=skills_to_learn,
remote_pct=remote_pct,
buckets=buckets,
...
)
Goal: Create actionable markdown report
Template Structure:
# JD Analysis Report - {date}
## Executive Summary
- **Total JDs Analyzed**: {count}
- **Average Match Score**: {avg_score}%
- **Top Match**: {best_company} ({best_score}%)
## 🎯 Actionable Insights
### Top 5 Skills to Learn
1. **Python** - Required in 78 JDs (78%)
2. **Docker** - Required in 65 JDs (65%)
...
### Top 10 Companies to Apply
1. **{company}** - {match_score}% match
- Missing: {missing_skills}
- Remote: {remote}
- URL: {url}
...
## 📊 Market Trends
### Top 20 Most Required Skills
| Skill | Frequency | Percentage |
|-------|-----------|------------|
| React | 85 | 85% |
...
### Remote Work Statistics
- Fully Remote: {remote_count} ({remote_pct}%)
- Hybrid: {hybrid_count}
- On-site: {onsite_count}
### Match Score Distribution
- 80-100%: {count} JDs
- 60-79%: {count} JDs
...
## 💼 Your Profile Match
### Strong Matches (70%+)
- {company} - {score}%
...
### Skill Gap Analysis
Top 5 missing skills:
1. Python (needed for 78 JDs)
...
## 📈 Next Steps
1. Learn {top_skill} (highest ROI)
2. Apply to {top_company}
...
Jinja2 Implementation:
class MarkdownReportGenerator:
def __init__(self, template_path: Path):
env = jinja2.Environment(loader=jinja2.FileSystemLoader(template_path.parent))
self.template = env.get_template(template_path.name)
def generate(self, data: Dict) -> str:
return self.template.render(**data)
Goal: Save outputs and prompt next steps
Outputs:
jd-analysis-report-{date}.md in current directory~/.jd-analyzer/data/jds.json (all JDs)~/.jd-analyzer/data/matches.json (scores)User Prompts:
Pseudocode:
def finalize(report: str, data: Dict):
# Save report
report_path = Path.cwd() / f"jd-analysis-report-{date.today()}.md"
report_path.write_text(report)
# Save data
data_dir = Path.home() / ".jd-analyzer" / "data"
(data_dir / "jds.json").write_text(json.dumps(data["jds"]))
(data_dir / "matches.json").write_text(json.dumps(data["matches"]))
print(f"\n✅ Report saved: {report_path}")
if AskUserQuestion("View report now?", type="yes/no"):
print(report)
| Error | Detection | Recovery | User Message | |-------|-----------|----------|--------------| | LinkedIn login failure | HTTP 401 after submit | Re-prompt credentials 3x | "⚠️ Login failed. Check email/password" | | CAPTCHA detected | Detect CAPTCHA iframe | Pause, wait for user | "⚠️ CAPTCHA detected. Solve manually, press Enter..." | | Session expired | Redirect to login | Delete cookie, re-login | "⚠️ Session expired. Re-logging in..." | | Browser crash | Process exit code | Save progress, resume | "⚠️ Browser crashed. Progress saved ({count}/100). Resume? [Y/n]" | | Rate limit (429) | HTTP status 429 | Wait 5 min, retry | "⚠️ Rate limited. Waiting 5 min..." | | Network timeout | Request timeout | 3 retries, exponential backoff | "⚠️ Network error. Retrying..." | | 404 on JD URL | HTTP 404 | Skip, log | "⚠️ JD deleted: {url}" | | Unsupported platform | Unknown domain | Prompt for manual input | "⚠️ Platform not supported: {platform}" | | YAML parse error | YAMLError exception | Show line, suggest fix | "⚠️ YAML syntax error (line {n}): {msg}" | | No skills extracted | Empty skill list | Log warning, continue | "⚠️ No skills found in: {title}" | | Missing spaCy model | Model not found | Auto-download | "⏳ Downloading spaCy model..." | | Invalid profile | Missing required fields | List fields, exit | "⚠️ Profile incomplete. Missing: {fields}" | | Disk space low | Check before save | Warn, ask to continue | "⚠️ Low disk space ({mb} MB). Continue? [y/N]" | | Duplicate JD | URL already exists | Skip, log | "⏭️ Skipping duplicate: {company}" | | Cookie encryption fail | Fernet error | Delete, re-login | "⚠️ Cookie corrupted. Re-login required" | | Platform API change | Parse failure | Fallback parser, log | "⚠️ {platform} layout changed. Using fallback parser" |
| Task | Target | Typical | Notes | |------|--------|---------|-------| | LinkedIn collection (50 JDs) | < 3 min | 2.5 min | Network-dependent | | Wellfound collection (50 JDs) | < 2 min | 1.5 min | No login faster | | Existing JD parsing (24 files) | < 30 sec | 15 sec | I/O-bound | | Skill extraction (100 JDs) | < 2 min | 1 min | CPU-bound (spaCy) | | Profile matching (100 JDs) | < 30 sec | 10 sec | Simple math | | Report generation | < 10 sec | 5 sec | Jinja2 template | | Full pipeline | < 10 min | ~6 min | Primary metric |
Located at ~/.jd-analyzer/skill_taxonomy.yaml, user-editable for custom skills.
User profile at ~/.jd-analyzer/profile.yaml, created on first run.
Optional advanced settings at ~/.jd-analyzer/settings.yaml:
rate_limits:
linkedin: 1 # requests per second
default: 2
timeouts:
page_load: 30
request: 10
performance:
max_jds_per_search: 100
cache_ttl: 86400 # 24 hours
~/.jd-analyzer/, never transmitted/jd-analyzer
# Select option 1
# Output: Report in 30-60 sec
/jd-analyzer
# Select option 2
# Enter LinkedIn credentials (first time only)
# Wait ~6 min for 100 JDs
# View report
/jd-analyzer
# Select option 3
# Paste URL: https://boards.greenhouse.io/company/jobs/123
# Output: JD parsed and added
/jd-analyzer
# Select option 4
# Output: Re-analyze all collected JDs
PLATFORM_PATTERNScollectors.pyconfig/selectors.yaml~/.jd-analyzer/skill_taxonomy.yamltemplates/report_template.jinja2Solution: Run python -m spacy download en_core_web_sm
Solution: Run playwright install chromium
Issue: 2FA enabled or wrong credentials Solution:
Issue: IP flagged for bot activity Solution:
Issue: JD uses non-standard terminology
Solution: Add synonyms to skill_taxonomy.yaml
development
Audit and optimize third-party scripts — analytics, tag managers, chat widgets, embeds — with the right loading strategy, performance budget, facades, and CSP/consent controls. Use when adding a script, when TBT/INP regress, when a GDPR/CCPA consent requirement arises, or before shipping. Not for first-party bundle size (use bundle-optimization) or broad Core Web Vitals diagnosis (use rendering-performance).
development
Apply the Testing Trophy (mostly integration tests with RTL + MSW, sparing E2E with Playwright) and set coverage thresholds. Use before new feature work, after bug fixes, when CI coverage falls below target, or when tests are flaky or break on every refactor. Not for wiring coverage gates + Playwright into the GitHub Actions matrix (use cicd-pipeline) or auditing WCAG a11y compliance (use accessibility-audit).
development
Inventory and prioritize technical debt — TODO/FIXME/HACK, any usage, deprecated APIs, untested logic — with impact × effort matrix. Use at quarter start, before a refactoring sprint, when a new teammate joins, or when feature velocity slows. Not for actually paying down debt (use code-refactoring) or recording a migration approach (use decision-records) — this only inventories and prioritizes.
development
Decision framework for choosing the right state location — URL, server cache, local component, or shared/global store. Use when state-sync bugs appear, prop drilling gets deep (3+ levels), filters/tabs lose state on reload, or quarterly review. Not for form state specifically (use form-ux) or when the state is actually server data (use api-caching-optimization).