plugins/dev/skills/backend/golang-performance/SKILL.md
Use when profiling Go applications (pprof), running benchmarks, optimizing memory/CPU usage, or debugging performance bottlenecks in production Go code.
npx skillsauth add madappgang/claude-code golang-performanceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill provides comprehensive guidance for profiling, benchmarking, and optimizing Go applications. Use this skill when working on performance-critical code, investigating bottlenecks, or optimizing production systems.
When to Use This Skill:
Core Tools:
pprof - CPU, memory, and goroutine profilinggo test -bench - Benchmarking frameworkgo build -gcflags - Escape analysisGOGC and GOMEMLIMIT - GC tuningEnable CPU Profiling in Code:
import (
"os"
"runtime/pprof"
)
func main() {
f, err := os.Create("cpu.prof")
if err != nil {
log.Fatal("could not create CPU profile: ", err)
}
defer f.Close()
if err := pprof.StartCPUProfile(f); err != nil {
log.Fatal("could not start CPU profile: ", err)
}
defer pprof.StopCPUProfile()
// Your application code here
runApplication()
}
CLI Profiling:
# Profile a test
go test -cpuprofile=cpu.prof -bench=.
# Profile a binary
go test -c
./myapp.test -test.cpuprofile=cpu.prof -test.bench=.
Analysis Commands:
# Interactive web UI (recommended)
go tool pprof -http=:8080 cpu.prof
# Text output - top functions by CPU time
go tool pprof -top cpu.prof
# Top 20 with cumulative time
go tool pprof -top -cum cpu.prof | head -20
# Call graph visualization
go tool pprof -svg cpu.prof > cpu.svg
# Focus on specific function
go tool pprof -focus=processData cpu.prof
# Exclude standard library
go tool pprof -ignore=runtime cpu.prof
Interpreting CPU Profiles:
Example Output:
Showing nodes accounting for 2.50s, 83.33% of 3.00s total
flat flat% sum% cum cum%
0.80s 26.67% 26.67% 1.20s 40.00% processData
0.60s 20.00% 46.67% 0.90s 30.00% parseJSON
0.50s 16.67% 63.34% 0.50s 16.67% validateInput
Focus optimization on functions with high flat (own time) or cum (total time).
Heap Profiling:
import (
"os"
"runtime/pprof"
)
func captureHeapProfile() {
f, err := os.Create("mem.prof")
if err != nil {
log.Fatal("could not create memory profile: ", err)
}
defer f.Close()
// Force GC before capturing heap
runtime.GC()
if err := pprof.WriteHeapProfile(f); err != nil {
log.Fatal("could not write memory profile: ", err)
}
}
Memory Profiling via CLI:
# Profile memory allocations during test
go test -memprofile=mem.prof -bench=.
# Run benchmark multiple times for stable results
go test -memprofile=mem.prof -bench=. -benchtime=10s
Analysis Commands:
# Web UI showing allocation sites
go tool pprof -http=:8080 mem.prof
# Top allocators
go tool pprof -top mem.prof
# Focus on allocations (inuse_space)
go tool pprof -sample_index=inuse_space -top mem.prof
# Focus on allocation counts (inuse_objects)
go tool pprof -sample_index=inuse_objects -top mem.prof
# Show cumulative allocations (alloc_space)
go tool pprof -sample_index=alloc_space -top mem.prof
# Compare two profiles (before/after)
go tool pprof -base=before.prof after.prof
Memory Profile Types:
inuse_space: Memory currently in use (default)inuse_objects: Objects currently in usealloc_space: Total allocations since startalloc_objects: Total object allocationsDetect Goroutine Leaks:
import (
"os"
"runtime/pprof"
)
func captureGoroutineProfile() {
f, err := os.Create("goroutine.prof")
if err != nil {
log.Fatal("could not create goroutine profile: ", err)
}
defer f.Close()
if err := pprof.Lookup("goroutine").WriteTo(f, 0); err != nil {
log.Fatal("could not write goroutine profile: ", err)
}
}
Analysis:
go tool pprof -http=:8080 goroutine.prof
go tool pprof -top goroutine.prof
Goroutine Leak Indicators:
Enable pprof HTTP Server:
import (
_ "net/http/pprof"
"net/http"
)
func main() {
// Start pprof server on separate port (localhost only)
go func() {
log.Println("pprof server listening on localhost:6060")
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
// Your application here
runServer()
}
Access Profiles via HTTP:
# CPU profile (30 seconds)
curl http://localhost:6060/debug/pprof/profile?seconds=30 > cpu.prof
# Heap profile
curl http://localhost:6060/debug/pprof/heap > heap.prof
# Goroutine profile
curl http://localhost:6060/debug/pprof/goroutine > goroutine.prof
# Analyze immediately
go tool pprof http://localhost:6060/debug/pprof/profile
# Web UI
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile
Available Endpoints:
/debug/pprof/ - Index of all profiles/debug/pprof/profile - CPU profile/debug/pprof/heap - Heap profile/debug/pprof/goroutine - Goroutine stack traces/debug/pprof/threadcreate - Thread creation profile/debug/pprof/block - Blocking profile/debug/pprof/mutex - Mutex contention profileProduction Security:
// Only expose on localhost
http.ListenAndServe("localhost:6060", nil)
// Or use SSH port forwarding
// ssh -L 6060:localhost:6060 user@production-host
// Then access http://localhost:6060/debug/pprof/
Simple Benchmark:
func BenchmarkStringConcat(b *testing.B) {
for i := 0; i < b.N; i++ {
result := "hello" + " " + "world"
_ = result // Prevent compiler optimization
}
}
Benchmark with Setup:
func BenchmarkProcessData(b *testing.B) {
data := generateTestData(1000)
b.ResetTimer() // Exclude setup time
for i := 0; i < b.N; i++ {
processData(data)
}
}
Running Benchmarks:
# Run all benchmarks
go test -bench=.
# Run specific benchmark
go test -bench=BenchmarkStringConcat
# Benchmark with memory statistics
go test -bench=. -benchmem
# Run multiple iterations for stability
go test -bench=. -count=5
# Longer benchmark time for accurate results
go test -bench=. -benchtime=10s
# CPU profile during benchmark
go test -bench=. -cpuprofile=cpu.prof
Compare Multiple Implementations:
func BenchmarkStringBuilding(b *testing.B) {
items := []string{"hello", "world", "foo", "bar"}
b.Run("Concat", func(b *testing.B) {
for i := 0; i < b.N; i++ {
result := ""
for _, item := range items {
result += item
}
_ = result
}
})
b.Run("StringBuilder", func(b *testing.B) {
for i := 0; i < b.N; i++ {
var sb strings.Builder
for _, item := range items {
sb.WriteString(item)
}
_ = sb.String()
}
})
b.Run("Join", func(b *testing.B) {
for i := 0; i < b.N; i++ {
result := strings.Join(items, "")
_ = result
}
})
}
Output:
BenchmarkStringBuilding/Concat-8 500000 3245 ns/op 96 B/op 5 allocs/op
BenchmarkStringBuilding/StringBuilder-8 2000000 825 ns/op 64 B/op 1 allocs/op
BenchmarkStringBuilding/Join-8 2000000 780 ns/op 48 B/op 1 allocs/op
Track Allocations:
func BenchmarkWithAllocs(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
data := make([]int, 1000)
_ = data
}
}
Output Interpretation:
BenchmarkWithAllocs-8 200000 8234 ns/op 8192 B/op 1 allocs/op
------ ---- ---- ----
iters ns/op bytes/op allocs/op
Zero Allocation Goal:
// Bad: 2 allocations
func process(data string) string {
upper := strings.ToUpper(data) // 1 alloc
return strings.TrimSpace(upper) // 1 alloc
}
// Better: 1 allocation (reuse buffer)
func process(data string) string {
var sb strings.Builder
sb.Grow(len(data))
for _, r := range data {
if !unicode.IsSpace(r) {
sb.WriteRune(unicode.ToUpper(r))
}
}
return sb.String()
}
Compare Before/After:
# Baseline
go test -bench=. -count=10 > old.txt
# After optimization
go test -bench=. -count=10 > new.txt
# Statistical comparison
go install golang.org/x/perf/cmd/benchstat@latest
benchstat old.txt new.txt
Example Output:
name old time/op new time/op delta
StringConcat-8 3.24µs ± 2% 0.82µs ± 1% -74.69% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
StringConcat-8 96.0B ± 0% 64.0B ± 0% -33.33% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
StringConcat-8 5.00 ± 0% 1.00 ± 0% -80.00% (p=0.000 n=10+10)
Interpretation:
±2% - Variance across runs(p=0.000) - Statistical significance (p < 0.05 = significant)n=10+10 - Number of samples usedProblem: Repeated Reallocation:
// Bad: 14 reallocations for 10,000 items
func inefficient() []int {
var data []int
for i := 0; i < 10000; i++ {
data = append(data, i)
}
return data
}
Solution: Pre-allocate Capacity:
// Good: 1 allocation
func efficient() []int {
data := make([]int, 0, 10000)
for i := 0; i < 10000; i++ {
data = append(data, i)
}
return data
}
Transformation Pattern:
func transformItems(input []string) []Result {
output := make([]Result, 0, len(input))
for _, item := range input {
output = append(output, transform(item))
}
return output
}
Estimated Capacity:
func filterItems(input []string, minLen int) []string {
// Estimate ~50% will pass
output := make([]string, 0, len(input)/2)
for _, item := range input {
if len(item) >= minLen {
output = append(output, item)
}
}
return output
}
Benchmark Impact: 5x faster for 10,000 items
Problem: O(N²) String Concatenation:
// Bad: Creates new string on every iteration
func badConcat(items []string) string {
result := ""
for _, item := range items {
result += item // New allocation each time
}
return result
}
Solution: strings.Builder (O(N)):
// Good: Single allocation with growth
func goodConcat(items []string) string {
var sb strings.Builder
// Pre-allocate if size known
totalLen := 0
for _, item := range items {
totalLen += len(item)
}
sb.Grow(totalLen)
for _, item := range items {
sb.WriteString(item)
}
return sb.String()
}
Benchmark: 50x faster for 100 concatenations
Builder Methods:
var sb strings.Builder
sb.WriteString("hello") // Write string
sb.WriteByte('!') // Write single byte
sb.WriteRune('✓') // Write rune (Unicode)
sb.Grow(100) // Pre-allocate capacity
result := sb.String() // Get final string
sb.Reset() // Reuse builder
View Escape Decisions:
go build -gcflags='-m -m' main.go 2>&1 | grep "escapes to heap"
Stack vs Heap:
// Stack allocated (fast)
func sumArray() int {
data := [100]int{} // Stack
sum := 0
for _, v := range data {
sum += v
}
return sum
}
// Heap allocated (slower, escapes)
func createData() *Data {
data := &Data{} // Escapes: pointer returned
return data
}
Common Escape Scenarios:
// 1. Returning pointer to local variable
func escape1() *int {
x := 42
return &x // Escapes
}
// 2. Interface conversion
func escape2() interface{} {
x := 42
return x // Escapes (interface)
}
// 3. Storing in interface field
func escape3(data interface{}) {
globalVar = data // Escapes
}
// 4. Size too large for stack
func escape4() {
data := make([]byte, 1<<20) // 1MB, escapes
_ = data
}
// 5. Slice append beyond capacity
func escape5() {
data := make([]int, 0, 10)
for i := 0; i < 100; i++ {
data = append(data, i) // May escape
}
}
Reducing Escapes:
// Before: Escapes to heap
for _, item := range items {
result := &Result{Value: item}
process(result)
}
// After: Stack allocated (if process doesn't store it)
var result Result
for _, item := range items {
result.Value = item
process(&result)
}
Reuse Buffers:
// Package-level buffer pool
var bufferPool = sync.Pool{
New: func() interface{} {
return new(bytes.Buffer)
},
}
func processData(data []byte) string {
buf := bufferPool.Get().(*bytes.Buffer)
buf.Reset() // Clear previous content
defer bufferPool.Put(buf)
// Use buffer
buf.Write(data)
return buf.String()
}
Pre-allocate Maps:
// Bad: Multiple rehashes
m := make(map[string]Item)
for _, item := range items {
m[item.ID] = item
}
// Good: Single allocation
m := make(map[string]Item, len(items))
for _, item := range items {
m[item.ID] = item
}
Default Behavior:
# Default: GC when heap grows 100%
GOGC=100 ./myapp
Tuning Options:
# Less frequent GC (uses more memory, higher throughput)
GOGC=200 ./myapp
# More frequent GC (uses less memory, lower latency)
GOGC=50 ./myapp
# Disable GC (debugging only)
GOGC=off ./myapp
How GOGC Works:
GOGC=100: GC triggers when heap doublesGOGC=200: GC triggers when heap triplesGOGC=50: GC triggers when heap grows 50%Example:
GOGC=100: GC at 200MBGOGC=200: GC at 300MBGOGC=50: GC at 150MBSet Memory Limit:
# Via environment variable
GOMEMLIMIT=10GiB ./myapp
# Programmatically
debug.SetMemoryLimit(10 << 30) // 10GB
Units Supported:
B - BytesKiB - Kibibytes (1024 bytes)MiB - Mebibytes (1024² bytes)GiB - Gibibytes (1024³ bytes)TiB - Tebibytes (1024⁴ bytes)How it Works:
| Scenario | GOGC | GOMEMLIMIT | Rationale | |----------|------|------------|-----------| | High throughput batch | 200-400 | 80% of RAM | Reduce GC overhead, use available memory | | Memory-constrained (container) | 50-100 | Limit - 10% | Prevent OOM, more frequent GC | | Latency-sensitive API | 100 | Not set | Default balance between memory and pause | | Large heap (>4GB) | 100-200 | 80% of RAM | Reduce GC frequency for large heaps | | Short-lived processes | 400+ | Not set | Maximize speed, process ends soon |
Example: Container with 2GB RAM:
GOGC=75 GOMEMLIMIT=1800MiB ./myapp
Example: Batch Processing:
GOGC=300 GOMEMLIMIT=24GiB ./batch-processor
Monitoring GC:
import "runtime/debug"
// Get GC stats
var stats debug.GCStats
debug.ReadGCStats(&stats)
fmt.Printf("Last GC: %v\n", stats.LastGC)
fmt.Printf("Num GC: %d\n", stats.NumGC)
Anti-Pattern:
// Bad: O(N²) complexity
func buildString(items []string) string {
result := ""
for _, item := range items {
result += item // New allocation each iteration
}
return result
}
Solution:
// Good: O(N) complexity
func buildString(items []string) string {
var sb strings.Builder
for _, item := range items {
sb.WriteString(item)
}
return sb.String()
}
Anti-Pattern 1: Creating Pointers in Loops:
// Bad: N allocations
for _, item := range items {
ptr := &item
process(ptr)
}
// Good: Reuse pointer
var ptr *Item
for i := range items {
ptr = &items[i]
process(ptr)
}
Anti-Pattern 2: Converting to Interface:
// Bad: Causes allocation
func printAll(items []MyStruct) {
for _, item := range items {
fmt.Println(item) // Interface conversion
}
}
// Better: Pass pointer to avoid copy
func printAll(items []MyStruct) {
for i := range items {
fmt.Println(&items[i])
}
}
Anti-Pattern:
// Bad: Defer has overhead in hot loops
func processMany(items []Item) {
for _, item := range items {
mu.Lock()
defer mu.Unlock() // Accumulates, never runs until function exits
process(item)
}
}
Solution:
// Good: Manual unlock in loop
func processMany(items []Item) {
for _, item := range items {
mu.Lock()
process(item)
mu.Unlock()
}
}
// Or: Extract to function with defer
func processMany(items []Item) {
for _, item := range items {
processOne(item)
}
}
func processOne(item Item) {
mu.Lock()
defer mu.Unlock()
process(item)
}
# CPU profile
go test -cpuprofile=cpu.prof -bench=.
go tool pprof -http=:8080 cpu.prof
# Memory profile
go test -memprofile=mem.prof -bench=.
go tool pprof -http=:8080 mem.prof
# HTTP profiling (production)
curl http://localhost:6060/debug/pprof/profile?seconds=30 > cpu.prof
# Run benchmarks with memory stats
go test -bench=. -benchmem
# Compare before/after
go test -bench=. -count=10 > old.txt
benchstat old.txt new.txt
strings.Builder for string concatenation-gcflags='-m'sync.Pool-benchmemRelated Skills:
golang - Core Go idioms and patternsdatabase-patterns - Database performance optimizationapi-design - API performance best practicesSources:
testing
A test skill for validation testing. Use when testing skill parsing and validation logic.
tools
--- name: bad-skill description: This skill has invalid YAML in frontmatter allowed-tools: [invalid, array, syntax prerequisites: not-an-array --- # Bad Skill This skill has malformed frontmatter that should fail parsing. The YAML has: - Unclosed array bracket - Wrong type for prerequisites (should be array, not string)
tools
Plugin release process for MAG Claude Plugins marketplace. Covers version bumping, marketplace.json updates, git tagging, and common mistakes. Use when releasing new plugin versions or troubleshooting update issues.
testing
Fetch trending programming models from OpenRouter rankings. Use when selecting models for multi-model review, updating model recommendations, or researching current AI coding trends. Provides model IDs, context windows, pricing, and usage statistics from the most recent week.