.claude/skills/environment-troubleshooting/SKILL.md
Use when the DevOp is diagnosing OS issues, network problems, permission errors, disk space, memory constraints, DNS resolution, port conflicts, or service crashes. Activates when debugging infrastructure, environment, or system-level problems.
npx skillsauth add dsivov/ai_development_team environment-troubleshootingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Apply this guidance when:
# System overview
uname -a # OS and kernel
df -h # Disk space
free -h # Memory usage
top -bn1 | head -20 # CPU and process overview
# Network
ip addr # Network interfaces
ss -tlnp # Listening ports
curl -v <url> # Test connectivity
dig <hostname> # DNS resolution
# Services
systemctl status <service> # Service status
journalctl -u <service> -n 50 # Recent logs
docker ps -a # Container status
| Symptom | Category | Common Causes | |---------|----------|---------------| | Service won't start | Process | Port conflict, missing config, permission | | Connection refused | Network | Firewall, wrong port, service not running | | Permission denied | Access | File permissions, user/group, SELinux | | Out of memory | Resource | Memory leak, insufficient allocation | | Disk full | Resource | Logs, temp files, unrotated data | | Slow response | Performance | CPU saturation, disk I/O, network latency | | DNS failure | Network | DNS config, resolv.conf, network partition |
Apply the fix, then verify the original issue is resolved.
# Find what's using a port
ss -tlnp | grep :3000
# or
lsof -i :3000
Solution: Stop the conflicting process or change the port.
# Check file ownership and permissions
ls -la /path/to/file
# Check the running user
whoami
id
Solution: chmod / chown to grant appropriate access. Never use 777.
# Find largest directories
du -sh /* 2>/dev/null | sort -rh | head -10
# Find large files
find / -type f -size +100M 2>/dev/null
Solution: Clean logs, remove unused images/containers, rotate old data.
# Check container logs
docker logs <container> --tail 100
# Check container resource usage
docker stats --no-stream
# Inspect container
docker inspect <container>
# Check per-process memory
ps aux --sort=-%mem | head -10
# Check for OOM kills
dmesg | grep -i "out of memory"
| Service Type | Log Location |
|-------------|-------------|
| System | /var/log/syslog or journalctl |
| Application | Application-specific log dir or stdout |
| Docker | docker logs <container> |
| Web server | /var/log/nginx/ or /var/log/apache2/ |
| Database | DB-specific log directory |
When the issue is outside your scope:
development
Use when the Integrator is writing unit tests, e2e tests, designing test strategies, improving test coverage, creating test fixtures, or mocking dependencies. Activates for any testing-related work including TDD, test refactoring, or test debugging.
development
Use when the Architect is breaking down change requests into implementable tasks, defining acceptance criteria, estimating task size, mapping dependencies, or creating technical sub-tasks for Developer and Integrator.
development
Use when the Architect is designing system architecture, choosing technology stacks, defining data models, designing APIs, making scalability decisions, or updating ARCHITECTURE.md. Activates for any architecture design, technology evaluation, or system structure discussion.
documentation
Use when the Manager is writing status updates, daily reports, queue messages to team members, escalation notices, or cross-role coordination messages. Activates when composing any team communication, reports, or documentation updates.