plugins/autonomous-dev/skills/observability/SKILL.md
Logging, debugging, profiling, and performance monitoring for development. Use when adding logging, debugging issues, profiling performance, or instrumenting code for observability.
npx skillsauth add akaszubski/anyclaude-local observabilityInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Developer-focused observability: logging, debugging, profiling, and performance monitoring.
Traditional (unstructured):
print(f"User {user_id} logged in from {ip_address}")
# Hard to parse, search, filter
Structured (JSON):
logger.info("user_login", extra={
"user_id": user_id,
"ip_address": ip_address,
"timestamp": datetime.utcnow().isoformat()
})
# Easy to parse, search, filter, aggregate
Benefits:
Basic configuration:
import logging
import sys
# Configure root logger
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.StreamHandler(sys.stdout), # Console
logging.FileHandler('app.log') # File
]
)
logger = logging.getLogger(__name__)
# Use it
logger.info("Application started")
logger.warning("Database connection slow")
logger.error("Failed to process request", exc_info=True)
| Level | Value | Use When | Example | | ------------ | ----- | ----------------------------- | --------------------------------- | | DEBUG | 10 | Development, detailed tracing | Variable values, function calls | | INFO | 20 | Normal operations | User logged in, request processed | | WARNING | 30 | Unexpected but handled | Deprecated API used, slow query | | ERROR | 40 | Error occurred | Failed to save file, API error | | CRITICAL | 50 | System failure | Database down, disk full |
Usage:
logger.debug(f"Processing item: {item}") # Development only
logger.info(f"User {user_id} logged in") # Normal event
logger.warning(f"Query took {duration}s (slow!)") # Unexpected
logger.error(f"Failed to save: {error}") # Error
logger.critical(f"Database unreachable!") # System failure
Using python-json-logger:
from pythonjsonlogger import jsonlogger
import logging
# Setup JSON formatter
logHandler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter(
'%(asctime)s %(name)s %(levelname)s %(message)s'
)
logHandler.setFormatter(formatter)
logger = logging.getLogger()
logger.addHandler(logHandler)
logger.setLevel(logging.INFO)
# Log with extra fields
logger.info("user_login", extra={
"user_id": 123,
"ip_address": "192.168.1.1",
"user_agent": "Mozilla/5.0..."
})
Output:
{
"asctime": "2025-10-24 12:00:00,000",
"name": "root",
"levelname": "INFO",
"message": "user_login",
"user_id": 123,
"ip_address": "192.168.1.1",
"user_agent": "Mozilla/5.0..."
}
Add context to all logs in a function:
import logging
from contextvars import ContextVar
# Context variable
request_id_var: ContextVar[str] = ContextVar('request_id', default=None)
class ContextFilter(logging.Filter):
def filter(self, record):
record.request_id = request_id_var.get()
return True
# Add filter
logger.addFilter(ContextFilter())
# Use in request handler
def handle_request(request):
request_id = generate_request_id()
request_id_var.set(request_id)
logger.info("Processing request") # Includes request_id automatically
process_data()
logger.info("Request complete") # Includes request_id automatically
✅ DO:
# 1. Use appropriate log levels
logger.debug("Variable value: {value}") # Debug details
logger.info("User action completed") # Normal operations
logger.error("Failed to process", exc_info=True) # Errors
# 2. Include context
logger.info("Processing payment", extra={
"user_id": user_id,
"amount": amount,
"currency": currency
})
# 3. Log exceptions with stack traces
try:
process_data()
except Exception as e:
logger.error("Processing failed", exc_info=True) # Includes stack trace
# 4. Use lazy formatting
logger.debug("Processing %s items", len(items)) # Only formats if DEBUG enabled
❌ DON'T:
# 1. Don't log sensitive data
logger.info(f"User password: {password}") # NEVER!
logger.info(f"Credit card: {card_number}") # NEVER!
# 2. Don't log in loops (too verbose)
for item in items:
logger.info(f"Processing {item}") # 1M items = 1M log lines!
# 3. Don't use print() in production
print("Debug info") # Use logger.debug() instead
# 4. Don't catch and ignore
try:
process()
except:
pass # Silent failure! At least log it!
Strategic print statements:
def process_data(data):
print(f"[DEBUG] Input: {data}") # See input
result = transform(data)
print(f"[DEBUG] After transform: {result}") # See intermediate
validated = validate(result)
print(f"[DEBUG] After validate: {validated}") # See output
return validated
Use temporary, remove before commit:
# ✅ GOOD: Temporary debug
print(f"[DEBUG] value={value}") # Remove before commit
# ❌ BAD: Permanent debug prints
print("Processing...") # Clutters output forever
Built-in Python debugger:
import pdb
def buggy_function(x, y):
result = x + y
pdb.set_trace() # Debugger starts here
return result * 2
# When debugger starts:
# (Pdb) p x # Print variable x
# (Pdb) p y # Print variable y
# (Pdb) p result # Print result
# (Pdb) n # Next line
# (Pdb) s # Step into function
# (Pdb) c # Continue execution
# (Pdb) q # Quit debugger
Breakpoint (Python 3.7+):
def buggy_function(x, y):
result = x + y
breakpoint() # Better than pdb.set_trace()
return result * 2
Enhanced debugger with syntax highlighting:
pip install ipdb
import ipdb
def buggy_function(x, y):
result = x + y
ipdb.set_trace() # Colorized, tab completion
return result * 2
Commands:
(ipdb) p x # Print x
(ipdb) pp x # Pretty-print x
(ipdb) l # List code around current line
(ipdb) w # Where (stack trace)
(ipdb) u # Up stack frame
(ipdb) d # Down stack frame
(ipdb) h # Help
Debug after exception:
import pdb
import sys
def buggy_function():
x = 1
y = 0
return x / y # ZeroDivisionError
try:
buggy_function()
except:
pdb.post_mortem(sys.exc_info()[2])
# Debugger starts at exception point
Automatic on uncaught exception:
import sys
import pdb
def exception_handler(exc_type, exc_value, exc_traceback):
if exc_type is KeyboardInterrupt:
sys.__excepthook__(exc_type, exc_value, exc_traceback)
return
print("Uncaught exception, starting debugger:")
pdb.post_mortem(exc_traceback)
sys.excepthook = exception_handler
# Now any uncaught exception starts debugger
Find slow functions:
import cProfile
import pstats
def slow_function():
# Your code here
pass
# Profile
profiler = cProfile.Profile()
profiler.enable()
slow_function()
profiler.disable()
# Print stats
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10) # Top 10 slowest functions
Output:
ncalls tottime percall cumtime percall filename:lineno(function)
10 0.500 0.050 0.800 0.080 mymodule.py:10(slow_function)
100 0.300 0.003 0.300 0.003 mymodule.py:20(helper)
Sort options:
cumulative - Total time (includes subfunctions)time - Internal time (excludes subfunctions)calls - Number of callsProfile specific functions:
import cProfile
import pstats
from functools import wraps
def profile(func):
@wraps(func)
def wrapper(*args, **kwargs):
profiler = cProfile.Profile()
profiler.enable()
result = func(*args, **kwargs)
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10)
return result
return wrapper
@profile
def slow_function():
# Your code
pass
Profile line-by-line:
pip install line_profiler
from line_profiler import LineProfiler
def slow_function():
total = 0
for i in range(1000000):
total += i
return total
profiler = LineProfiler()
profiler.add_function(slow_function)
profiler.enable()
slow_function()
profiler.disable()
profiler.print_stats()
Output:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
2 def slow_function():
3 1 2.0 2.0 0.0 total = 0
4 1000001 150000.0 0.1 30.0 for i in range(1000000):
5 1000000 350000.0 0.4 70.0 total += i
6 1 1.0 1.0 0.0 return total
Find memory leaks:
pip install memory_profiler
from memory_profiler import profile
@profile
def memory_hog():
big_list = [0] * (10 ** 7) # 10M items
return sum(big_list)
memory_hog()
Output:
Line # Mem usage Increment Occurrences Line Contents
=============================================================
2 40.0 MiB 40.0 MiB 1 @profile
3 def memory_hog():
4 116.4 MiB 76.4 MiB 1 big_list = [0] * (10 ** 7)
5 116.4 MiB 0.0 MiB 1 return sum(big_list)
Profile running processes (no code changes needed):
pip install py-spy
# Profile running Python process
py-spy top --pid 12345
# Generate flamegraph
py-spy record -o profile.svg --pid 12345
Advantages:
import traceback
try:
risky_operation()
except Exception as e:
traceback.print_exc() # Print stack trace to stderr
Capture to string:
import traceback
try:
risky_operation()
except Exception as e:
trace = traceback.format_exc()
logger.error(f"Error occurred: {trace}")
Beautiful, informative tracebacks:
pip install rich
from rich.traceback import install
install(show_locals=True)
def buggy_function():
x = 42
y = None
return x + y # TypeError with beautiful traceback
buggy_function()
Shows:
Measure function execution time:
import time
from functools import wraps
def timer(func):
@wraps(func)
def wrapper(*args, **kwargs):
start = time.perf_counter()
result = func(*args, **kwargs)
end = time.perf_counter()
duration = end - start
print(f"{func.__name__} took {duration:.4f}s")
return result
return wrapper
@timer
def slow_function():
time.sleep(1)
return "done"
slow_function()
# Output: slow_function took 1.0001s
import time
from contextlib import contextmanager
@contextmanager
def timer(name):
start = time.perf_counter()
yield
end = time.perf_counter()
print(f"{name} took {end - start:.4f}s")
# Usage
with timer("Database query"):
result = db.query(...)
with timer("File processing"):
process_file(path)
Fail if too slow:
import time
def performance_test():
start = time.perf_counter()
# Your code
process_data()
duration = time.perf_counter() - start
assert duration < 0.1, f"Too slow: {duration:.4f}s"
class Metrics:
def __init__(self):
self.counters = {}
def increment(self, name, value=1):
self.counters[name] = self.counters.get(name, 0) + value
def get(self, name):
return self.counters.get(name, 0)
def reset(self):
self.counters = {}
# Usage
metrics = Metrics()
def process_item(item):
metrics.increment("items_processed")
if item.is_valid():
metrics.increment("items_valid")
else:
metrics.increment("items_invalid")
# Later
print(f"Processed: {metrics.get('items_processed')}")
print(f"Valid: {metrics.get('items_valid')}")
print(f"Invalid: {metrics.get('items_invalid')}")
Track value distributions:
from collections import defaultdict
class Histogram:
def __init__(self):
self.buckets = defaultdict(int)
def record(self, value):
# Bucket by value range
if value < 1:
bucket = "< 1s"
elif value < 5:
bucket = "1-5s"
elif value < 10:
bucket = "5-10s"
else:
bucket = "> 10s"
self.buckets[bucket] += 1
def report(self):
for bucket, count in sorted(self.buckets.items()):
print(f"{bucket}: {count}")
# Usage
histogram = Histogram()
for duration in query_durations:
histogram.record(duration)
histogram.report()
# Output:
# < 1s: 850
# 1-5s: 120
# 5-10s: 25
# > 10s: 5
Narrow down the problem:
# Problem: Something breaks between start and end
def debug_binary_search():
# Step 1: Check middle
middle_result = process_half()
print(f"Middle result: {middle_result}")
# Step 2: Narrow to first or second half
if middle_result == expected:
# Problem in second half
debug_second_half()
else:
# Problem in first half
debug_first_half()
Explain code out loud (or to a rubber duck):
Catch bugs early:
def divide(a, b):
assert b != 0, "Divisor cannot be zero"
assert isinstance(a, (int, float)), "a must be numeric"
assert isinstance(b, (int, float)), "b must be numeric"
result = a / b
assert isinstance(result, (int, float)), "Result should be numeric"
return result
Reduce problem to minimal case:
# ❌ BAD: Complex function with bug
def complex_function(data, config, options, flags):
# 100 lines of code
pass
# ✅ GOOD: Extract minimal failing case
def minimal_failing_case():
data = [1, 2, 3]
result = transform(data) # Fails here
return result
❌ DON'T:
# 1. Log sensitive data
logger.info(f"Password: {password}")
# 2. Log in tight loops
for item in million_items:
logger.info(f"Processing {item}")
# 3. Catch and ignore silently
try:
critical_operation()
except:
pass # Silent failure!
# 4. Over-log (log everything)
logger.debug("Starting function")
logger.debug("Variable x = 1")
logger.debug("Variable y = 2")
logger.debug("Calling helper")
logger.debug("Helper returned")
# 5. Use wrong log level
logger.error("User logged in") # Should be INFO
logger.info("Database crashed") # Should be CRITICAL
logger.debug("Detailed trace")
logger.info("Normal event")
logger.warning("Unexpected")
logger.error("Error occurred", exc_info=True)
logger.critical("System failure")
breakpoint() # Start debugger
pdb.set_trace() # Start debugger (old way)
traceback.print_exc() # Print stack trace
# CPU
python -m cProfile -s cumulative script.py
# Memory
@profile # from memory_profiler
def func(): pass
# Live process
py-spy top --pid 12345
@timer
def func(): pass
with timer("operation"):
do_work()
Version: 1.0.0 Type: Knowledge skill (no scripts) See Also: python-standards (best practices), testing-guide (test debugging), security-patterns (safe logging)
testing
Complete testing methodology - TDD, progression tracking, regression prevention, and test patterns
documentation
GenAI-powered semantic validation - detects outdated docs, version mismatches, and architectural drift
development
Security best practices, API key management, input validation. Use when handling secrets, user input, or security-sensitive code.
research
Research methodology and best practices for finding existing patterns