Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

lv416e/performance-profiling

Name: performance-profiling
Author: lv416e

dot_claude/skills/performance-profiling/SKILL.md

npx skillsauth add lv416e/dotfiles performance-profiling

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Performance Profiling

Overview

Optimizing without measuring is guessing. Guessing wastes time and often makes things worse.

Core principle: ALWAYS measure before optimizing. Intuition about performance is wrong more often than right.

Violating the letter of this process is violating the spirit of performance engineering.

The Iron Law

MEASURE FIRST, OPTIMIZE SECOND - NO PREMATURE OPTIMIZATION WITHOUT PROFILING DATA

If you haven't profiled it, you cannot optimize it. Hunches are not data.

When to Use

Use for ANY performance concern:

Slow API responses
High memory usage
Slow page loads
Database query timeouts
Build/CI pipeline slowdowns
Memory leaks
Throughput degradation under load

Use this ESPECIALLY when:

Someone says "this feels slow"
Under pressure to "just make it faster"
A "quick optimization" seems obvious
You want to add caching
You want to rewrite in a "faster" language
You're about to refactor for performance

Don't skip when:

The fix seems obvious (obvious fixes are often wrong)
It's "just" adding an index (measure the impact)
You're an expert (experts still measure)

The Four Phases

You MUST complete each phase before proceeding to the next.

Phase 1: Establish Baseline

BEFORE changing ANY code:

Define What "Fast Enough" Means
- Set concrete targets: "API responds in < 200ms at p95"
- No vague goals like "make it faster"
- Agree on metrics with stakeholders
- If you can't define the target, you can't know when you're done

Measure Current Performance

# HTTP endpoints
hey -n 1000 -c 50 https://api.example.com/endpoint

# Application benchmarks
hyperfine --warmup 3 'command-to-benchmark'

# Database queries
EXPLAIN ANALYZE SELECT ...;

Record the Baseline
- Response time: p50, p95, p99
- Throughput: requests/second
- Resource usage: CPU%, memory, I/O
- Error rate under load
- Write these numbers down. You will compare against them.
Reproduce Under Realistic Conditions
- Production-like data volume
- Concurrent users matching real traffic
- Warm caches if production has warm caches
- Cold start if measuring cold start

No baseline? No optimization. Period.

Phase 2: Profile to Find Bottlenecks

Find where time is actually spent:

CPU Profiling

# Node.js
node --prof app.js
node --prof-process isolate-*.log

# Python
python -m cProfile -o output.prof script.py

# Go
go tool pprof http://localhost:6060/debug/pprof/profile

Generate Flame Graphs
- Flame graphs show exactly where CPU time goes
- Wide bars = time spent there
- Look for unexpected wide bars
- Don't guess - read the graph

Memory Profiling

# Node.js
node --inspect app.js
# Chrome DevTools → Memory → Heap Snapshot

# Python
tracemalloc.start()
# ... run code ...
snapshot = tracemalloc.take_snapshot()

I/O and Network Profiling
- Database query logging with timing
- Network request tracing (OpenTelemetry)
- File system I/O monitoring
- External API call latency
Identify the Actual Bottleneck

| Bottleneck Type | Symptoms | Tools | |----------------|----------|-------| | CPU-bound | High CPU%, slow computation | Profiler, flame graph | | Memory-bound | High memory, GC pauses, swapping | Heap profiler, memory tracker | | I/O-bound | Low CPU%, waiting on disk/network | Tracing, query logs | | Concurrency | Lock contention, thread starvation | Thread profiler, lock analysis |

Key insight: Most "slow" applications are I/O-bound, not CPU-bound. Don't optimize CPU when you're waiting on the database.

Phase 3: Algorithmic Complexity Analysis

Before micro-optimizing, check the algorithm:

Big-O Review
- What is the time complexity of the hot path?
- Is there an O(n^2) hiding in a loop?
- Are you doing O(n) lookups where O(1) is possible (array vs. hash map)?
- Nested database queries (N+1 problem)?
Common Hidden Complexity
<Bad> ```typescript // O(n*m) - linear search inside loop for (const user of users) { const role = roles.find(r => r.userId === user.id); } ``` </Bad> <Good> ```typescript // O(n+m) - build lookup map first const roleMap = new Map(roles.map(r => [r.userId, r])); for (const user of users) { const role = roleMap.get(user.id); } ``` </Good>
Database Query Analysis
```
-- Always EXPLAIN before optimizing
EXPLAIN ANALYZE
SELECT u.*, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id;
```
- Check for sequential scans on large tables
- Verify indexes are being used
- Look for unnecessary joins
- Detect N+1 queries in ORM-generated SQL
N+1 Query Detection
<Bad> ```typescript // 1 query for users + N queries for orders const users = await db.query('SELECT * FROM users'); for (const user of users) { user.orders = await db.query('SELECT * FROM orders WHERE user_id = ?', [user.id]); } ``` </Bad> <Good> ```typescript // 2 queries total regardless of user count const users = await db.query('SELECT * FROM users'); const orders = await db.query('SELECT * FROM orders WHERE user_id = ANY(?)', [userIds]); const ordersByUser = groupBy(orders, 'user_id'); ``` </Good>

Phase 4: Targeted Optimization with Benchmarks

Optimize only what profiling identified:

Write a Benchmark First

// Before optimizing, capture current performance
bench('current implementation', () => {
  currentFunction(testData);
});

bench('optimized implementation', () => {
  optimizedFunction(testData);
});

Make ONE Change at a Time
- Single optimization per iteration
- Measure after each change
- If no measurable improvement, revert
- Don't stack optimizations without measuring each
Verify Against Baseline
- Does it meet the target defined in Phase 1?
- Yes? Stop optimizing. You're done.
- No? Return to Phase 2, find next bottleneck.
- Getting diminishing returns? Stop. Good enough is good enough.
Frontend-Specific Optimization
- Core Web Vitals: LCP, FID/INP, CLS
- Bundle size analysis (webpack-bundle-analyzer, source-map-explorer)
- Lazy loading for below-fold content
- Image optimization (WebP/AVIF, responsive images, lazy load)
- Code splitting at route boundaries
Memory Leak Resolution
- Take heap snapshots at intervals
- Compare snapshots: what grows?
- Common causes: event listeners not removed, closures holding references, growing caches without eviction
- Fix the leak, verify with another snapshot series

Load Testing

Before any production deployment of performance-sensitive changes:

# k6 example
k6 run --vus 50 --duration 5m load-test.js

# Artillery example
artillery run load-test.yml

Test at expected load AND 2-3x expected load
Monitor error rates under load
Check for degradation patterns (gradual vs. cliff)
Verify resource consumption stays bounded

Red Flags - STOP and Follow Process

If you catch yourself thinking:

"This is obviously the slow part"
"Let me just add some caching"
"Rewriting in Rust/Go will fix it"
"Let me optimize this loop"
"Add an index, that'll fix it"
"Just increase the timeout"
"Premature optimization is bad, so I won't measure"
"The profiler is too hard to set up"
"I'll measure after I optimize"

ALL of these mean: STOP. Return to Phase 1.

Common Rationalizations

| Excuse | Reality | |--------|---------| | "Obviously slow, don't need to measure" | Obvious intuitions are wrong 70% of the time. Measure. | | "Just add caching" | Caching without understanding causes stale data bugs and hides real issues. | | "Rewrite in a faster language" | Algorithm matters more than language. O(n^2) in Rust is still O(n^2). | | "Micro-benchmarks show improvement" | Micro-benchmarks don't reflect real workload. Measure end-to-end. | | "Profiler is too hard to set up" | 10 minutes to set up profiler vs. hours guessing. Set it up. | | "It's just one query" | One query called 10,000 times is 10,000 queries. Measure frequency. | | "Increase the timeout" | Timeouts mask real problems. Find why it's slow. | | "Premature optimization, skip it" | Measuring is not optimizing. Always measure. Decide after. | | "Add an index, can't hurt" | Wrong indexes slow writes. EXPLAIN first. | | "Optimize later when it matters" | Launching without baseline means you can't measure regression. |

Anti-Patterns

| Anti-Pattern | Consequence | Correct Approach | |-------------|-------------|-----------------| | Optimizing without measuring | Wrong target, wasted effort, new bugs | Profile first, optimize the actual bottleneck | | Micro-optimizing cold paths | No user-visible improvement | Focus on hot paths identified by profiling | | Premature caching | Stale data, cache invalidation bugs, complexity | Fix the root cause; cache only after measuring | | Ignoring algorithmic complexity | Linear "optimization" on exponential problem | Fix the algorithm before tuning constants | | Optimizing in isolation | Benchmark looks good, production doesn't improve | Test under realistic load and data | | Stacking multiple optimizations | Can't tell which one helped (or hurt) | One change at a time, measure each |

Quick Reference

| Phase | Key Activities | Success Criteria | |-------|---------------|------------------| | 1. Baseline | Define targets, measure current state, record numbers | Concrete metrics documented | | 2. Profile | CPU/memory/I/O profiling, flame graphs, identify bottleneck | Know WHERE time is spent | | 3. Complexity | Big-O review, N+1 detection, EXPLAIN queries | Algorithmic issues found or ruled out | | 4. Optimize | Benchmark, single change, measure, compare to baseline | Meets target or next bottleneck identified |

Verification Checklist

Before marking performance work complete:

[ ] Performance target defined with concrete numbers
[ ] Baseline measured and recorded before any changes
[ ] Profiling data collected (not just intuition)
[ ] Bottleneck identified from profiling data
[ ] Algorithmic complexity reviewed
[ ] Database queries analyzed with EXPLAIN
[ ] Optimization benchmarked against baseline
[ ] Performance target met (or documented why not)
[ ] Load tested under realistic conditions
[ ] No regressions in correctness (all tests pass)
[ ] Memory stable under sustained load (no leaks)

Can't check all boxes? You're not done.

Integration with Other Skills

This skill requires using:

systematic-debugging - REQUIRED when performance issue has an unclear cause (use Phase 1 root cause investigation)
test-driven-development - REQUIRED for writing benchmarks and ensuring optimizations don't break correctness

Complementary skills:

documentation-generation - Document baseline metrics, optimization decisions, and load test results

Final Rule

No profiling data → no optimization
No baseline → no comparison
No target → no "done"

Measure. Profile. Optimize. Verify. In that order. Always.

lv416e/performance-profiling

dot_claude/skills/performance-profiling/SKILL.md

Use when investigating performance issues, before attempting any optimization - four-phase methodology (baseline measurement, bottleneck profiling, complexity analysis, targeted optimization) that ensures data-driven decisions over gut-feel tuning | パフォーマンスの問題を調査する際、最適化を試みる前に使用 - 4フェーズ手法（ベースライン測定、ボトルネックプロファイリング、計算量分析、的を絞った最適化）により、直感的なチューニングではなくデータに基づく判断を保証

4 stars

testing

Updated May 10, 2026

$ install --global

skillsauth

npx skillsauth add lv416e/dotfiles performance-profiling

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 10, 2026, 6:16 AM167.7s1 file scanned

SKILL.md

name:: performance-profiling
description:: Use when investigating performance issues, before attempting any optimization - four-phase methodology (baseline measurement, bottleneck profiling, complexity analysis, targeted optimization) that ensures data-driven decisions over gut-feel tuning | パフォーマンスの問題を調査する際、最適化を試みる前に使用 - 4フェーズ手法（ベースライン測定、ボトルネックプロファイリング、計算量分析、的を絞った最適化）により、直感的なチューニングではなくデータに基づく判断を保証

Performance Profiling

Overview

Optimizing without measuring is guessing. Guessing wastes time and often makes things worse.

Core principle: ALWAYS measure before optimizing. Intuition about performance is wrong more often than right.

Violating the letter of this process is violating the spirit of performance engineering.

The Iron Law

MEASURE FIRST, OPTIMIZE SECOND - NO PREMATURE OPTIMIZATION WITHOUT PROFILING DATA

If you haven't profiled it, you cannot optimize it. Hunches are not data.

When to Use

Use for ANY performance concern:

Slow API responses
High memory usage
Slow page loads
Database query timeouts
Build/CI pipeline slowdowns
Memory leaks
Throughput degradation under load

Use this ESPECIALLY when:

Someone says "this feels slow"
Under pressure to "just make it faster"
A "quick optimization" seems obvious
You want to add caching
You want to rewrite in a "faster" language
You're about to refactor for performance

Don't skip when:

The fix seems obvious (obvious fixes are often wrong)
It's "just" adding an index (measure the impact)
You're an expert (experts still measure)

The Four Phases

You MUST complete each phase before proceeding to the next.

Phase 1: Establish Baseline

BEFORE changing ANY code:

Define What "Fast Enough" Means
- Set concrete targets: "API responds in < 200ms at p95"
- No vague goals like "make it faster"
- Agree on metrics with stakeholders
- If you can't define the target, you can't know when you're done

Measure Current Performance

# HTTP endpoints
hey -n 1000 -c 50 https://api.example.com/endpoint

# Application benchmarks
hyperfine --warmup 3 'command-to-benchmark'

# Database queries
EXPLAIN ANALYZE SELECT ...;

Record the Baseline
- Response time: p50, p95, p99
- Throughput: requests/second
- Resource usage: CPU%, memory, I/O
- Error rate under load
- Write these numbers down. You will compare against them.
Reproduce Under Realistic Conditions
- Production-like data volume
- Concurrent users matching real traffic
- Warm caches if production has warm caches
- Cold start if measuring cold start

No baseline? No optimization. Period.

Phase 2: Profile to Find Bottlenecks

Find where time is actually spent:

CPU Profiling

# Node.js
node --prof app.js
node --prof-process isolate-*.log

# Python
python -m cProfile -o output.prof script.py

# Go
go tool pprof http://localhost:6060/debug/pprof/profile

Generate Flame Graphs
- Flame graphs show exactly where CPU time goes
- Wide bars = time spent there
- Look for unexpected wide bars
- Don't guess - read the graph

Memory Profiling

# Node.js
node --inspect app.js
# Chrome DevTools → Memory → Heap Snapshot

# Python
tracemalloc.start()
# ... run code ...
snapshot = tracemalloc.take_snapshot()

I/O and Network Profiling
- Database query logging with timing
- Network request tracing (OpenTelemetry)
- File system I/O monitoring
- External API call latency
Identify the Actual Bottleneck

| Bottleneck Type | Symptoms | Tools | |----------------|----------|-------| | CPU-bound | High CPU%, slow computation | Profiler, flame graph | | Memory-bound | High memory, GC pauses, swapping | Heap profiler, memory tracker | | I/O-bound | Low CPU%, waiting on disk/network | Tracing, query logs | | Concurrency | Lock contention, thread starvation | Thread profiler, lock analysis |

Key insight: Most "slow" applications are I/O-bound, not CPU-bound. Don't optimize CPU when you're waiting on the database.

Phase 3: Algorithmic Complexity Analysis

Before micro-optimizing, check the algorithm:

Big-O Review
- What is the time complexity of the hot path?
- Is there an O(n^2) hiding in a loop?
- Are you doing O(n) lookups where O(1) is possible (array vs. hash map)?
- Nested database queries (N+1 problem)?
Common Hidden Complexity
<Bad> ```typescript // O(n*m) - linear search inside loop for (const user of users) { const role = roles.find(r => r.userId === user.id); } ``` </Bad> <Good> ```typescript // O(n+m) - build lookup map first const roleMap = new Map(roles.map(r => [r.userId, r])); for (const user of users) { const role = roleMap.get(user.id); } ``` </Good>
Database Query Analysis
```
-- Always EXPLAIN before optimizing
EXPLAIN ANALYZE
SELECT u.*, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id;
```
- Check for sequential scans on large tables
- Verify indexes are being used
- Look for unnecessary joins
- Detect N+1 queries in ORM-generated SQL
N+1 Query Detection
<Bad> ```typescript // 1 query for users + N queries for orders const users = await db.query('SELECT * FROM users'); for (const user of users) { user.orders = await db.query('SELECT * FROM orders WHERE user_id = ?', [user.id]); } ``` </Bad> <Good> ```typescript // 2 queries total regardless of user count const users = await db.query('SELECT * FROM users'); const orders = await db.query('SELECT * FROM orders WHERE user_id = ANY(?)', [userIds]); const ordersByUser = groupBy(orders, 'user_id'); ``` </Good>

Phase 4: Targeted Optimization with Benchmarks

Optimize only what profiling identified:

Write a Benchmark First

// Before optimizing, capture current performance
bench('current implementation', () => {
  currentFunction(testData);
});

bench('optimized implementation', () => {
  optimizedFunction(testData);
});

Make ONE Change at a Time
- Single optimization per iteration
- Measure after each change
- If no measurable improvement, revert
- Don't stack optimizations without measuring each
Verify Against Baseline
- Does it meet the target defined in Phase 1?
- Yes? Stop optimizing. You're done.
- No? Return to Phase 2, find next bottleneck.
- Getting diminishing returns? Stop. Good enough is good enough.
Frontend-Specific Optimization
- Core Web Vitals: LCP, FID/INP, CLS
- Bundle size analysis (webpack-bundle-analyzer, source-map-explorer)
- Lazy loading for below-fold content
- Image optimization (WebP/AVIF, responsive images, lazy load)
- Code splitting at route boundaries
Memory Leak Resolution
- Take heap snapshots at intervals
- Compare snapshots: what grows?
- Common causes: event listeners not removed, closures holding references, growing caches without eviction
- Fix the leak, verify with another snapshot series

Load Testing

Before any production deployment of performance-sensitive changes:

# k6 example
k6 run --vus 50 --duration 5m load-test.js

# Artillery example
artillery run load-test.yml

Test at expected load AND 2-3x expected load
Monitor error rates under load
Check for degradation patterns (gradual vs. cliff)
Verify resource consumption stays bounded

Red Flags - STOP and Follow Process

If you catch yourself thinking:

"This is obviously the slow part"
"Let me just add some caching"
"Rewriting in Rust/Go will fix it"
"Let me optimize this loop"
"Add an index, that'll fix it"
"Just increase the timeout"
"Premature optimization is bad, so I won't measure"
"The profiler is too hard to set up"
"I'll measure after I optimize"

ALL of these mean: STOP. Return to Phase 1.

Common Rationalizations

Anti-Patterns

Quick Reference

Verification Checklist

Before marking performance work complete:

[ ] Performance target defined with concrete numbers
[ ] Baseline measured and recorded before any changes
[ ] Profiling data collected (not just intuition)
[ ] Bottleneck identified from profiling data
[ ] Algorithmic complexity reviewed
[ ] Database queries analyzed with EXPLAIN
[ ] Optimization benchmarked against baseline
[ ] Performance target met (or documented why not)
[ ] Load tested under realistic conditions
[ ] No regressions in correctness (all tests pass)
[ ] Memory stable under sustained load (no leaks)

Can't check all boxes? You're not done.

Integration with Other Skills

This skill requires using:

systematic-debugging - REQUIRED when performance issue has an unclear cause (use Phase 1 root cause investigation)
test-driven-development - REQUIRED for writing benchmarks and ensuring optimizations don't break correctness

Complementary skills:

documentation-generation - Document baseline metrics, optimization decisions, and load test results

Final Rule

No profiling data → no optimization
No baseline → no comparison
No target → no "done"

Measure. Profile. Optimize. Verify. In that order. Always.

Related Skills

lv416e/xlsx

development

VerifiedTrustedCommunity

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

5SKILL.mdUpdated May 11, 2026

lv416e/writing-skills

testing

VerifiedTrustedCommunity

Use when creating new skills, editing existing skills, or verifying skills work before deployment - applies TDD to process documentation by testing with subagents before writing, iterating until bulletproof against rationalization | 新しいスキルの作成、既存スキルの編集、またはデプロイ前にスキルが機能するか検証する際に使用 - プロセスドキュメントにTDDを適用し、記述前にサブエージェントでテストし、合理化に対して堅牢になるまで反復

4SKILL.mdUpdated May 11, 2026

lv416e/writing-skills

lv416e/writing-plans

development

VerifiedTrustedCommunity

Use when design is complete and you need detailed implementation tasks for engineers with zero codebase context - creates comprehensive implementation plans with exact file paths, complete code examples, and verification steps assuming engineer has minimal domain knowledge | 設計が完了し、コードベースの知識がゼロのエンジニア向けに詳細な実装タスクが必要な場合に使用 - 正確なファイルパス、完全なコード例、検証ステップを含む包括的な実装計画を作成。エンジニアの領域知識が最小限であることを前提

4SKILL.mdUpdated May 11, 2026

lv416e/webapp-testing

tools

VerifiedTrustedCommunity

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.

4SKILL.mdUpdated May 11, 2026

lv416e/webapp-testing

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/lv416e/dotfiles.git

# Copy into Claude Code skills folder (global)
cp -r dotfiles/dot_claude/skills/performance-profiling ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

lv416e/dotfiles

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT