Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

aiskillstore/golang-performance

Name: golang-performance
Author: aiskillstore

skills/89jobrien/golang-performance/SKILL.md

npx skillsauth add aiskillstore/marketplace golang-performance

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Golang Performance

This skill provides guidance on optimizing Go application performance including profiling, memory management, concurrency optimization, and avoiding common performance pitfalls.

When to Use This Skill

When profiling Go applications for CPU or memory issues
When optimizing memory allocations and reducing GC pressure
When implementing efficient concurrency patterns
When analyzing escape analysis results
When optimizing hot paths in production code

Profiling with pprof

Enable Profiling in HTTP Server

import (
    "net/http"
    _ "net/http/pprof"
)

func main() {
    // pprof endpoints available at /debug/pprof/
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()

    // Main application
}

CPU Profiling

# Collect 30-second CPU profile
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# Interactive commands
(pprof) top10          # Top 10 functions by CPU
(pprof) list FuncName  # Show source with timing
(pprof) web            # Open flame graph in browser

Memory Profiling

# Heap profile
go tool pprof http://localhost:6060/debug/pprof/heap

# Allocs profile (all allocations)
go tool pprof http://localhost:6060/debug/pprof/allocs

# Interactive commands
(pprof) top10 -cum     # Top by cumulative allocations
(pprof) list FuncName  # Show allocation sites

Programmatic Profiling

import (
    "os"
    "runtime/pprof"
)

func profileCPU() {
    f, _ := os.Create("cpu.prof")
    defer f.Close()

    pprof.StartCPUProfile(f)
    defer pprof.StopCPUProfile()

    // Code to profile
}

func profileMemory() {
    f, _ := os.Create("mem.prof")
    defer f.Close()

    runtime.GC() // Get accurate stats
    pprof.WriteHeapProfile(f)
}

Memory Optimization

Reduce Allocations

// BAD: Allocates on every call
func Process(items []string) []string {
    result := []string{}
    for _, item := range items {
        result = append(result, transform(item))
    }
    return result
}

// GOOD: Pre-allocate with known capacity
func Process(items []string) []string {
    result := make([]string, 0, len(items))
    for _, item := range items {
        result = append(result, transform(item))
    }
    return result
}

Use sync.Pool for Frequent Allocations

var bufferPool = sync.Pool{
    New: func() interface{} {
        return new(bytes.Buffer)
    },
}

func ProcessRequest(data []byte) []byte {
    buf := bufferPool.Get().(*bytes.Buffer)
    defer func() {
        buf.Reset()
        bufferPool.Put(buf)
    }()

    // Use buffer
    buf.Write(data)
    return buf.Bytes()
}

Avoid String Concatenation in Loops

// BAD: O(n^2) allocations
func BuildString(parts []string) string {
    result := ""
    for _, part := range parts {
        result += part
    }
    return result
}

// GOOD: Single allocation
func BuildString(parts []string) string {
    var builder strings.Builder
    for _, part := range parts {
        builder.WriteString(part)
    }
    return builder.String()
}

Slice Memory Leaks

// BAD: Keeps entire backing array alive
func GetFirst(data []byte) []byte {
    return data[:10]
}

// GOOD: Copy to release backing array
func GetFirst(data []byte) []byte {
    result := make([]byte, 10)
    copy(result, data[:10])
    return result
}

Escape Analysis

# Show escape analysis decisions
go build -gcflags="-m" ./...

# More verbose
go build -gcflags="-m -m" ./...

Avoiding Heap Escapes

// ESCAPES: Returned pointer
func NewUser() *User {
    return &User{}  // Allocated on heap
}

// STAYS ON STACK: Value return
func NewUser() User {
    return User{}  // May stay on stack
}

// ESCAPES: Interface conversion
func Process(v interface{}) { ... }

func main() {
    x := 42
    Process(x)  // x escapes to heap
}

Concurrency Optimization

Worker Pool Pattern

func ProcessItems(items []Item, workers int) []Result {
    jobs := make(chan Item, len(items))
    results := make(chan Result, len(items))

    // Start workers
    var wg sync.WaitGroup
    for i := 0; i < workers; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for item := range jobs {
                results <- process(item)
            }
        }()
    }

    // Send jobs
    for _, item := range items {
        jobs <- item
    }
    close(jobs)

    // Wait and collect
    go func() {
        wg.Wait()
        close(results)
    }()

    var output []Result
    for r := range results {
        output = append(output, r)
    }
    return output
}

Buffered Channels for Throughput

// SLOW: Unbuffered causes blocking
ch := make(chan int)

// FAST: Buffer reduces contention
ch := make(chan int, 100)

Avoid Lock Contention

// BAD: Global lock
var mu sync.Mutex
var cache = make(map[string]string)

func Get(key string) string {
    mu.Lock()
    defer mu.Unlock()
    return cache[key]
}

// GOOD: Sharded locks
type ShardedCache struct {
    shards [256]struct {
        mu    sync.RWMutex
        items map[string]string
    }
}

func (c *ShardedCache) getShard(key string) *struct {
    mu    sync.RWMutex
    items map[string]string
} {
    h := fnv.New32a()
    h.Write([]byte(key))
    return &c.shards[h.Sum32()%256]
}

func (c *ShardedCache) Get(key string) string {
    shard := c.getShard(key)
    shard.mu.RLock()
    defer shard.mu.RUnlock()
    return shard.items[key]
}

Use sync.Map for Specific Cases

// Good for: keys written once, read many; disjoint key sets
var cache sync.Map

func Get(key string) (string, bool) {
    v, ok := cache.Load(key)
    if !ok {
        return "", false
    }
    return v.(string), true
}

func Set(key, value string) {
    cache.Store(key, value)
}

Data Structure Optimization

Struct Field Ordering (Memory Alignment)

// BAD: 24 bytes (padding)
type Bad struct {
    a bool   // 1 byte + 7 padding
    b int64  // 8 bytes
    c bool   // 1 byte + 7 padding
}

// GOOD: 16 bytes (no padding)
type Good struct {
    b int64  // 8 bytes
    a bool   // 1 byte
    c bool   // 1 byte + 6 padding
}

Avoid Interface{} When Possible

// SLOW: Type assertions, boxing
func Sum(values []interface{}) float64 {
    var sum float64
    for _, v := range values {
        sum += v.(float64)
    }
    return sum
}

// FAST: Concrete types
func Sum(values []float64) float64 {
    var sum float64
    for _, v := range values {
        sum += v
    }
    return sum
}

Benchmarking Patterns

func BenchmarkProcess(b *testing.B) {
    data := generateTestData()
    b.ResetTimer() // Exclude setup time

    for i := 0; i < b.N; i++ {
        Process(data)
    }
}

// Memory benchmarks
func BenchmarkAllocs(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        _ = make([]byte, 1024)
    }
}

// Compare implementations
func BenchmarkComparison(b *testing.B) {
    b.Run("old", func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            OldImplementation()
        }
    })
    b.Run("new", func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            NewImplementation()
        }
    })
}

Run with:

go test -bench=. -benchmem ./...
go test -bench=. -benchtime=5s ./...  # Longer runs

Common Pitfalls

Defer in Hot Loops

// BAD: Defer overhead per iteration
for _, item := range items {
    mu.Lock()
    defer mu.Unlock()  // Defers stack up!
    process(item)
}

// GOOD: Explicit unlock
for _, item := range items {
    mu.Lock()
    process(item)
    mu.Unlock()
}

// BETTER: Extract to function
for _, item := range items {
    processWithLock(item)
}

func processWithLock(item Item) {
    mu.Lock()
    defer mu.Unlock()
    process(item)
}

JSON Encoding Performance

// SLOW: Reflection on every call
json.Marshal(v)

// FAST: Reuse encoder
var buf bytes.Buffer
encoder := json.NewEncoder(&buf)
encoder.Encode(v)

// FASTER: Code generation (easyjson, ffjson)

Best Practices

Measure before optimizing - Profile to find actual bottlenecks
Pre-allocate slices - Use make([]T, 0, capacity) when size is known
Pool frequently allocated objects - Use sync.Pool for buffers
Minimize allocations in hot paths - Reuse objects, avoid interfaces
Right-size channels - Buffer to reduce blocking without wasting memory
Avoid premature optimization - Clarity first, optimize measured problems
Use value receivers for small structs - Avoid pointer indirection
Order struct fields by size - Largest to smallest reduces padding

aiskillstore/golang-performance

skills/89jobrien/golang-performance/SKILL.md

Go performance optimization techniques including profiling with pprof, memory optimization, concurrency patterns, and escape analysis.

230 stars

development

Updated Mar 27, 2026

$ install --global

skillsauth

npx skillsauth add aiskillstore/marketplace golang-performance

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Mar 28, 2026, 1:52 PM35.8s2 files scanned

SKILL.md

name:: golang-performance
description:: Go performance optimization techniques including profiling with pprof,
author:: Joseph OBrien
status:: unpublished
updated:: 2025-12-23
version:: 1.0.1
tag:: skill
type:: skill

Golang Performance

This skill provides guidance on optimizing Go application performance including profiling, memory management, concurrency optimization, and avoiding common performance pitfalls.

When to Use This Skill

When profiling Go applications for CPU or memory issues
When optimizing memory allocations and reducing GC pressure
When implementing efficient concurrency patterns
When analyzing escape analysis results
When optimizing hot paths in production code

Profiling with pprof

Enable Profiling in HTTP Server

import (
    "net/http"
    _ "net/http/pprof"
)

func main() {
    // pprof endpoints available at /debug/pprof/
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()

    // Main application
}

CPU Profiling

# Collect 30-second CPU profile
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# Interactive commands
(pprof) top10          # Top 10 functions by CPU
(pprof) list FuncName  # Show source with timing
(pprof) web            # Open flame graph in browser

Memory Profiling

# Heap profile
go tool pprof http://localhost:6060/debug/pprof/heap

# Allocs profile (all allocations)
go tool pprof http://localhost:6060/debug/pprof/allocs

# Interactive commands
(pprof) top10 -cum     # Top by cumulative allocations
(pprof) list FuncName  # Show allocation sites

Programmatic Profiling

import (
    "os"
    "runtime/pprof"
)

func profileCPU() {
    f, _ := os.Create("cpu.prof")
    defer f.Close()

    pprof.StartCPUProfile(f)
    defer pprof.StopCPUProfile()

    // Code to profile
}

func profileMemory() {
    f, _ := os.Create("mem.prof")
    defer f.Close()

    runtime.GC() // Get accurate stats
    pprof.WriteHeapProfile(f)
}

Memory Optimization

Reduce Allocations

// BAD: Allocates on every call
func Process(items []string) []string {
    result := []string{}
    for _, item := range items {
        result = append(result, transform(item))
    }
    return result
}

// GOOD: Pre-allocate with known capacity
func Process(items []string) []string {
    result := make([]string, 0, len(items))
    for _, item := range items {
        result = append(result, transform(item))
    }
    return result
}

Use sync.Pool for Frequent Allocations

var bufferPool = sync.Pool{
    New: func() interface{} {
        return new(bytes.Buffer)
    },
}

func ProcessRequest(data []byte) []byte {
    buf := bufferPool.Get().(*bytes.Buffer)
    defer func() {
        buf.Reset()
        bufferPool.Put(buf)
    }()

    // Use buffer
    buf.Write(data)
    return buf.Bytes()
}

Avoid String Concatenation in Loops

// BAD: O(n^2) allocations
func BuildString(parts []string) string {
    result := ""
    for _, part := range parts {
        result += part
    }
    return result
}

// GOOD: Single allocation
func BuildString(parts []string) string {
    var builder strings.Builder
    for _, part := range parts {
        builder.WriteString(part)
    }
    return builder.String()
}

Slice Memory Leaks

// BAD: Keeps entire backing array alive
func GetFirst(data []byte) []byte {
    return data[:10]
}

// GOOD: Copy to release backing array
func GetFirst(data []byte) []byte {
    result := make([]byte, 10)
    copy(result, data[:10])
    return result
}

Escape Analysis

# Show escape analysis decisions
go build -gcflags="-m" ./...

# More verbose
go build -gcflags="-m -m" ./...

Avoiding Heap Escapes

// ESCAPES: Returned pointer
func NewUser() *User {
    return &User{}  // Allocated on heap
}

// STAYS ON STACK: Value return
func NewUser() User {
    return User{}  // May stay on stack
}

// ESCAPES: Interface conversion
func Process(v interface{}) { ... }

func main() {
    x := 42
    Process(x)  // x escapes to heap
}

Concurrency Optimization

Worker Pool Pattern

func ProcessItems(items []Item, workers int) []Result {
    jobs := make(chan Item, len(items))
    results := make(chan Result, len(items))

    // Start workers
    var wg sync.WaitGroup
    for i := 0; i < workers; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for item := range jobs {
                results <- process(item)
            }
        }()
    }

    // Send jobs
    for _, item := range items {
        jobs <- item
    }
    close(jobs)

    // Wait and collect
    go func() {
        wg.Wait()
        close(results)
    }()

    var output []Result
    for r := range results {
        output = append(output, r)
    }
    return output
}

Buffered Channels for Throughput

// SLOW: Unbuffered causes blocking
ch := make(chan int)

// FAST: Buffer reduces contention
ch := make(chan int, 100)

Avoid Lock Contention

// BAD: Global lock
var mu sync.Mutex
var cache = make(map[string]string)

func Get(key string) string {
    mu.Lock()
    defer mu.Unlock()
    return cache[key]
}

// GOOD: Sharded locks
type ShardedCache struct {
    shards [256]struct {
        mu    sync.RWMutex
        items map[string]string
    }
}

func (c *ShardedCache) getShard(key string) *struct {
    mu    sync.RWMutex
    items map[string]string
} {
    h := fnv.New32a()
    h.Write([]byte(key))
    return &c.shards[h.Sum32()%256]
}

func (c *ShardedCache) Get(key string) string {
    shard := c.getShard(key)
    shard.mu.RLock()
    defer shard.mu.RUnlock()
    return shard.items[key]
}

Use sync.Map for Specific Cases

// Good for: keys written once, read many; disjoint key sets
var cache sync.Map

func Get(key string) (string, bool) {
    v, ok := cache.Load(key)
    if !ok {
        return "", false
    }
    return v.(string), true
}

func Set(key, value string) {
    cache.Store(key, value)
}

Data Structure Optimization

Struct Field Ordering (Memory Alignment)

// BAD: 24 bytes (padding)
type Bad struct {
    a bool   // 1 byte + 7 padding
    b int64  // 8 bytes
    c bool   // 1 byte + 7 padding
}

// GOOD: 16 bytes (no padding)
type Good struct {
    b int64  // 8 bytes
    a bool   // 1 byte
    c bool   // 1 byte + 6 padding
}

Avoid Interface{} When Possible

// SLOW: Type assertions, boxing
func Sum(values []interface{}) float64 {
    var sum float64
    for _, v := range values {
        sum += v.(float64)
    }
    return sum
}

// FAST: Concrete types
func Sum(values []float64) float64 {
    var sum float64
    for _, v := range values {
        sum += v
    }
    return sum
}

Benchmarking Patterns

func BenchmarkProcess(b *testing.B) {
    data := generateTestData()
    b.ResetTimer() // Exclude setup time

    for i := 0; i < b.N; i++ {
        Process(data)
    }
}

// Memory benchmarks
func BenchmarkAllocs(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        _ = make([]byte, 1024)
    }
}

// Compare implementations
func BenchmarkComparison(b *testing.B) {
    b.Run("old", func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            OldImplementation()
        }
    })
    b.Run("new", func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            NewImplementation()
        }
    })
}

Run with:

go test -bench=. -benchmem ./...
go test -bench=. -benchtime=5s ./...  # Longer runs

Common Pitfalls

Defer in Hot Loops

// BAD: Defer overhead per iteration
for _, item := range items {
    mu.Lock()
    defer mu.Unlock()  // Defers stack up!
    process(item)
}

// GOOD: Explicit unlock
for _, item := range items {
    mu.Lock()
    process(item)
    mu.Unlock()
}

// BETTER: Extract to function
for _, item := range items {
    processWithLock(item)
}

func processWithLock(item Item) {
    mu.Lock()
    defer mu.Unlock()
    process(item)
}

JSON Encoding Performance

// SLOW: Reflection on every call
json.Marshal(v)

// FAST: Reuse encoder
var buf bytes.Buffer
encoder := json.NewEncoder(&buf)
encoder.Encode(v)

// FASTER: Code generation (easyjson, ffjson)

Best Practices

Measure before optimizing - Profile to find actual bottlenecks
Pre-allocate slices - Use make([]T, 0, capacity) when size is known
Pool frequently allocated objects - Use sync.Pool for buffers
Minimize allocations in hot paths - Reuse objects, avoid interfaces
Right-size channels - Buffer to reduce blocking without wasting memory
Avoid premature optimization - Clarity first, optimize measured problems
Use value receivers for small structs - Avoid pointer indirection
Order struct fields by size - Largest to smallest reduces padding

Related Skills

aiskillstore/hig-components-content

development

VerifiedTrustedCommunity

Apple Human Interface Guidelines for content display components. Use this skill when the user asks about charts component, collection view, image view, web view, color well, image well, activity view, lockup, data visualization, content display, displaying images, rendering web content, color pickers, or presenting collections of items in Apple apps. Also use when the user says how should I display charts, what's the best way to show images, should I use a web view, how do I build a grid of items, what component shows media, or how do I present a share sheet. Cross-references: hig-foundations for color/typography/accessibility, hig-patterns for data visualization patterns, hig-components-layout for structural containers, hig-platforms for platform-specific component behavior.

244SKILL.mdUpdated Apr 10, 2026

aiskillstore/hig-components-content

aiskillstore/helpdesk-automation

tools

VerifiedTrustedCommunity

Automate HelpDesk tasks via Rube MCP (Composio): list tickets, manage views, use canned responses, and configure custom fields. Always search tools first for current schemas.

244SKILL.mdUpdated Apr 10, 2026

aiskillstore/helpdesk-automation

aiskillstore/haskell-pro

testing

VerifiedTrustedCommunity

Expert Haskell engineer specializing in advanced type systems, pure functional design, and high-reliability software. Use PROACTIVELY for type-level programming, concurrency, and architecture guidance.

244SKILL.mdUpdated Apr 10, 2026

aiskillstore/haskell-pro

aiskillstore/graphql

tools

VerifiedTrustedCommunity

GraphQL gives clients exactly the data they need - no more, no less. One endpoint, typed schema, introspection. But the flexibility that makes it powerful also makes it dangerous. Without proper controls, clients can craft queries that bring down your server. This skill covers schema design, resolvers, DataLoader for N+1 prevention, federation for microservices, and client integration with Apollo/urql. Key insight: GraphQL is a contract. The schema is the API documentation. Design it carefully.

244SKILL.mdUpdated Apr 10, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/aiskillstore/marketplace.git

# Copy into Claude Code skills folder (global)
cp -r marketplace/skills/89jobrien/golang-performance ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

aiskillstore/marketplace

230 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT