skills/golang-benchmark/SKILL.md
Golang benchmarking, profiling, and performance measurement. Use when writing, running, or comparing Go benchmarks, profiling hot paths with pprof, interpreting CPU/memory/trace profiles, analyzing results with benchstat, setting up CI benchmark regression detection, or investigating production performance with Prometheus runtime metrics. Also use when the developer needs deep analysis on a specific performance indicator - this skill provides the measurement methodology, while `golang-performance` provides the optimization patterns.
npx skillsauth add rockcookies/skills golang-benchmarkInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Persona: You are a Go performance measurement engineer. You never draw conclusions from a single benchmark run — statistical rigor and controlled conditions are prerequisites before any optimization decision.
Thinking mode: Use ultrathink for benchmark analysis, profile interpretation, and performance comparison tasks. Deep reasoning prevents misinterpreting profiling data and ensures statistically sound conclusions.
Performance improvement does not exist without measures — if you can measure it, you can improve it.
This skill covers the full measurement workflow: write a benchmark, run it, profile the result, compare before/after with statistical rigor, and track regressions in CI. For optimization patterns to apply after measurement, → See golang-performance skill. For pprof setup on running services, → See golang-troubleshooting skill.
b.Loop() (Go 1.24+) — preferredFor Go 1.24+, prefer b.Loop() for new benchmarks. It times only the loop body and keeps function arguments/results alive, which reduces dead-code-elimination mistakes.
func BenchmarkParse(b *testing.B) {
data := loadFixture("large.json") // setup — excluded from timing
for b.Loop() {
Parse(data) // compiler cannot eliminate this call
}
}
Legacy b.N loops still compile and are fine to keep when preserving existing benchmarks or supporting Go <1.24. They are easier to get wrong: setup may need b.ResetTimer(), and results may need a sink if the compiler can eliminate the work. Go 1.26 fixed an earlier b.Loop() inlining limitation — benchmarks on 1.24–1.25 already benefit from b.Loop() but may miss inlining optimizations that 1.26 delivers.
func BenchmarkAlloc(b *testing.B) {
b.ReportAllocs() // or run with -benchmem flag
var sink []byte
for b.Loop() {
sink = make([]byte, 1024)
}
_ = sink
}
b.ReportMetric() adds custom metrics (e.g., throughput):
b.ReportMetric(float64(totalBytes)/b.Elapsed().Seconds(), "bytes/s") // b.Elapsed() is only valid inside b.Loop()
func BenchmarkEncode(b *testing.B) {
for _, size := range []int{64, 256, 4096} {
b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
data := make([]byte, size)
for b.Loop() {
Encode(data)
}
})
}
}
go test -bench=BenchmarkEncode -benchmem -count=10 ./pkg/... | tee bench.txt
| Flag | Purpose |
| ---------------------- | ----------------------------------------- |
| -bench=. | Run all benchmarks (regexp filter) |
| -benchmem | Report allocations (B/op, allocs/op) |
| -count=10 | Run 10 times for statistical significance |
| -benchtime=3s | Minimum time per benchmark (default 1s) |
| -cpu=1,2,4 | Run with different GOMAXPROCS values |
| -cpuprofile=cpu.prof | Write CPU profile |
| -memprofile=mem.prof | Write memory profile |
| -trace=trace.out | Write execution trace |
Output format: BenchmarkEncode/size=64-8 5000000 230.5 ns/op 128 B/op 2 allocs/op — the -8 suffix is GOMAXPROCS, ns/op is time per operation, B/op is bytes allocated per op, allocs/op is heap allocation count per op.
Paste benchstat output in the commit body when the change has a measurable performance impact. This documents why an optimization was made, prevents future readers from reverting it, and lets reviewers verify the claim without re-running benchmarks.
Commit format:
perf(parser): reduce Parse allocations 50% with sync.Pool
Replace per-call []byte allocation with a pooled buffer.
goos: linux / goarch: amd64 / cpu: AMD Ryzen 9 5950X
│ old │ new │
│ sec/op │ sec/op vs base │
Parse-32 4.592µ ± 2% 3.041µ ± 1% -33.78% (p=0.000 n=10)
│ old │ new │
│ B/op │ B/op vs base │
Parse-32 1.024Ki ± 0% 0.512Ki ± 0% -50.00% (p=0.000 n=10)
│ old │ new │
│ allocs/op │ allocs/op vs base │
Parse-32 12.00 ± 0% 6.000 ± 0% -50.00% (p=0.000 n=10)
Rules:
~ (no statistical significance) — the improvement cannot be claimedgoos/goarch/cpu) so results are reproducibleperf(scope): commit type for performance-only changesGenerate profiles directly from benchmark runs — no HTTP server needed:
# CPU profile
go test -bench=BenchmarkParse -cpuprofile=cpu.prof ./pkg/parser
go tool pprof cpu.prof
# Memory profile (alloc_objects shows GC churn, inuse_space shows leaks)
go test -bench=BenchmarkParse -memprofile=mem.prof ./pkg/parser
go tool pprof -alloc_objects mem.prof
# Execution trace
go test -bench=BenchmarkParse -trace=trace.out ./pkg/parser
go tool trace trace.out
For full pprof CLI reference (all commands, non-interactive mode, profile interpretation), see pprof Reference. For execution trace interpretation, see Trace Reference. For statistical comparison, see benchstat Reference.
pprof Reference — Interactive and non-interactive analysis of CPU, memory, and goroutine profiles. Full CLI commands, profile types (CPU vs allocobjects vs inuse_space), web UI navigation, and interpretation patterns. Use this to dive deep into _where time and memory are being spent in your code.
benchstat Reference — Statistical comparison of benchmark runs with rigorous confidence intervals and p-value tests. Covers output reading, filtering old benchmarks, interleaving results for visual clarity, and regression detection. Use this when you need to prove a change made a meaningful performance difference, not just a lucky run.
Trace Reference — Execution tracer for understanding when and why code runs. Visualizes goroutine scheduling, garbage collection phases, network blocking, and custom span annotations. Use this when pprof (which shows where CPU goes) isn't enough — you need to see the timeline of what happened.
Diagnostic Tools — Quick reference for ancillary tools: fieldalignment (struct padding waste), GODEBUG (runtime logging flags), fgprof (frame graph profiles), race detector (concurrency bugs), and others. Use this when you have a specific symptom and need a focused diagnostic — don't reach for pprof if a simpler tool already answers your question.
Compiler Analysis — Low-level compiler optimization insights: escape analysis (when values move to the heap), inlining decisions (which function calls are eliminated), SSA dump (intermediate representation), and assembly output. Use this when benchmarks show allocations you didn't expect, or when you want to verify the compiler did what you intended.
CI Regression Detection — Automated performance regression gating in CI pipelines. Covers three tools (benchdiff for quick PR comparisons, cob for strict threshold-based gating, gobenchdata for long-term trend dashboards), noisy neighbor mitigation strategies (why cloud CI benchmarks vary 5-10% even on quiet machines), and self-hosted runner tuning to make benchmarks reproducible. Use this when you want to ensure pull requests don't silently slow down your codebase — detecting regressions early prevents shipping performance debt.
Investigation Session — Production performance troubleshooting workflow combining Prometheus runtime metrics (heap size, GC frequency, goroutine counts), PromQL queries to correlate metrics with code changes, runtime configuration flags (GODEBUG env vars to enable GC logging), and cost warnings (when you're hitting performance tax). Use this when production benchmarks look good but real traffic behaves differently.
Prometheus Go Metrics Reference — Complete listing of Go runtime metrics actually exposed as Prometheus metrics by prometheus/client_golang. Covers 30 default metrics, 40+ optional metrics (Go 1.17+), process metrics, and common PromQL queries. Distinguishes between runtime/metrics (Go internal data) and Prometheus metrics (what you scrape from /metrics). Use this when setting up monitoring dashboards or writing PromQL queries for production alerts.
golang-performance skill for optimization patterns to apply after measuring ("if X bottleneck, apply Y")golang-troubleshooting skill for pprof setup on running services (enable, secure, capture), Delve debugger, GODEBUG flags, root cause methodologygolang-observability skill for everyday always-on monitoring, continuous profiling (Pyroscope), distributed tracing (OpenTelemetry)golang-testing skill for general testing practicessamber/cc-skills@promql-cli skill for querying Prometheus runtime metrics in production to validate benchmark findingsdevelopment
Vue 3 debugging and error handling for runtime errors, warnings, async failures, and SSR/hydration issues. Use when diagnosing or fixing Vue issues.
development
MUST be used for Vue.js tasks. Strongly recommends Composition API with `<script setup>` and TypeScript as the standard approach. Covers Vue 3, SSR, Volar, vue-tsc. Load for any Vue, .vue files, Vue Router, Pinia, or Vite with Vue work. ALWAYS use Composition API unless the project explicitly requires Options API.
development
GORM Gen 类型安全 DAO 代码生成,基于 github.com/rockcookies/go-gen(rockcookies fork)。涵盖代码生成配置、模型生成、查询构建、增删改查、关联关系、动态 SQL 注解、事务处理、datatypes 自定义字段类型(JSON/JSONMap/JSONSlice/JSONType/Date/UUID)、soft_delete 软删除插件(unix 时间戳/flag 模式),以及 fork 专有功能:Tmpl 运行时模板覆写(18 个模板)、Unsafe 底层方法(UnsafeSetDB/Alias/ModelType/TableName)、IGenericsDo[T,E] 泛型接口。使用时机:需要从数据库生成 DAO 代码(GenerateModel/GenerateModelAs)、编写 DAL 查询(DO 链式调用、DaoScope、事务、关联加载)、配置生成器(gen.Config、ModelOpt、FieldGORMTag、FieldModify、FieldType、Tmpl 自定义模板)、使用 datatypes(JSONMap、JSONSlice、JSONQuery、JSONSet)或 soft_delete(DeletedAt、softDelete:milli、deleteOpts)时使用本技能。当用户消息中包含以下任一关键词(go-gen、gorm-gen、GenerateModelAs、ModelOpt、FieldGORMTag、FieldModify、DaoScope、LoadOneToMany、LoadManyToMany、IGenericsDo、UnsafeSetDB、datatypes、JSONMap、JSONSlice、JSONQuery、soft_delete、softDelete、DeletedAt),或用户明确请求 GORM Gen 代码生成/DAO 编写时触发本技能。
development
轻量级 Go HTTP 客户端库,基于 github.com/rockcookies/go-fetch(零外部依赖)。涵盖 Dispatcher 初始化与中间件、Request 链式构建(RequestFunc 与 Middleware 分层)、Response 解码(JSON/XML/流)、请求体编码(JSON/XML/Form/Multipart/BodyGet)、URL 参数(PrepareURLMiddleware/URLOptions)、Header/Cookie 管理(ApplyHeader/ApplyCookie 与 Context)、中间件组合(Dispatcher/Request/Do 三层)、HTTP 交换日志(dump.New/dump.Transport/过滤器/WithRequestRedactor/WithResponseRedactor/SlogWriter)。使用时机:需要发起 HTTP 请求(GET/POST/PUT/PATCH/DELETE,均需 context.Context)、上传文件(Multipart/GetReader)、配置全局认证头(dispatcher.Use)、记录 HTTP 交换日志(dump.New、WithFilter、DefaultRedactor)、构建可复用的请求基础(Request.Clone)时使用本技能。当用户消息中包含以下任一关键词(go-fetch、NewDispatcher、NewDispatcherWithTransport、RequestFunc、PreFuncs、UseFuncs、BodyGet、MultipartField、dump.New、WithFilter、WithRequestRedactor、WithResponseRedactor、DefaultRedactor、DumpOptions、SlogWriter、URLOptions、PrepareURLMiddleware、PathParams、SetURLOptions、WithURLOptions、ApplyHeader、SetHeaderOptions、WithHeaderOptions、ApplyCookie、SetCookieOptions、WithCookieOptions、HandlerFunc、fetch.Handler、fetch.Middleware、dispatcher.Use、resp.Close、resp.JSON、resp.XML),或用户明确请求 go-fetch HTTP 客户端用法时触发本技能。