skills/diagnose-logs/SKILL.md
Diagnose disclaude service logs — analyze errors, warnings, WebSocket health, and agent behavior from launchd log files. Use when user says keywords like 'diagnose logs', 'check logs', 'debug service', 'what went wrong', 'service health', 'log analysis', '查看日志', '诊断日志'.
npx skillsauth add hs3180/disclaude diagnose-logsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Diagnose the disclaude launchd service by analyzing pino JSON logs.
/tmp/disclaude-stdout.log (all structured JSON logs)/tmp/disclaude-stderr.log (typically empty)Log format is one JSON object per line with fields: level, time, context, msg, plus arbitrary data. Note: Some non-JSON lines (e.g., ✓ Scheduler started) from console.log may be mixed in — all commands below handle this gracefully.
Run these steps in order. Use the Bash tool for every command. After each step, briefly interpret the output before moving on.
Important: All jq commands use grep '^{' | jq to skip non-JSON lines mixed in by console.log.
# Total lines and file size
wc -l /tmp/disclaude-stdout.log
ls -lh /tmp/disclaude-stdout.log
# Time range covered
echo "=== First entry ===" && grep '^{' /tmp/disclaude-stdout.log | head -1 | jq -r '.time'
echo "=== Last entry ===" && grep '^{' /tmp/disclaude-stdout.log | tail -1 | jq -r '.time'
# Error and warning counts (fast: grep -c is ~10x faster than jq for counting)
echo "=== Level distribution ==="
grep -c '"level":"error"' /tmp/disclaude-stdout.log | xargs -I{} echo " error: {}"
grep -c '"level":"warn"' /tmp/disclaude-stdout.log | xargs -I{} echo " warn: {}"
grep -c '"level":"info"' /tmp/disclaude-stdout.log | xargs -I{} echo " info: {}"
grep -c '"level":"debug"' /tmp/disclaude-stdout.log | xargs -I{} echo " debug: {}"
# Active contexts (modules)
echo "=== Top contexts ===" && grep '^{' /tmp/disclaude-stdout.log | jq -r '.context' | sort | uniq -c | sort -rn | head -15
Check $ARGUMENTS for filters:
| Argument | Action |
|----------|--------|
| (empty) | Full diagnostic (all steps) |
| --last 30m | Only analyze last 30 minutes of logs |
| --errors | Jump to Step 3 (errors only) |
| --ws | Jump to Step 5 (WebSocket health) |
| --agent | Jump to Step 6 (agent health) |
| --context Name | Filter to a specific context/module |
For --last, compute the cutoff timestamp:
cutoff=$(date -u -v-${MINUTES}M +%Y-%m-%dT%H:%M:%S.000Z 2>/dev/null || python3 -c "import datetime; print((datetime.datetime.utcnow() - datetime.timedelta(minutes=${MINUTES})).strftime('%Y-%m-%dT%H:%M:%S.000Z'))")
Then pipe all subsequent commands through:
grep '^{' /tmp/disclaude-stdout.log | jq -c "select(.time >= \"$cutoff\")"
For --context, filter with:
grep '^{' /tmp/disclaude-stdout.log | jq -c "select(.context == \"$CONTEXT_NAME\")"
# All errors with context and message
grep '^{' /tmp/disclaude-stdout.log | jq -c 'select(.level == "error") | {time, context, msg, err: .err.message, chatId}'
# Group errors by type (msg)
grep '^{' /tmp/disclaude-stdout.log | jq -r 'select(.level == "error") | .msg' | sort | uniq -c | sort -rn
# Group errors by context
grep '^{' /tmp/disclaude-stdout.log | jq -r 'select(.level == "error") | .context' | sort | uniq -c | sort -rn
# Extract unique error messages
grep '^{' /tmp/disclaude-stdout.log | jq -r 'select(.level == "error") | "\(.context): \(.err.message // .msg)"' | sort -u
# Warning frequency over time (grouped by 10-minute buckets)
grep '^{' /tmp/disclaude-stdout.log | jq -r 'select(.level == "warn") | .time[:16]' | sort | uniq -c
# Top warning messages
grep '^{' /tmp/disclaude-stdout.log | jq -r 'select(.level == "warn") | .msg' | sort | uniq -c | sort -rn | head -10
# Dead connection detection pattern (common issue)
grep '^{' /tmp/disclaude-stdout.log | jq -c 'select(.msg | test("Dead connection|dead.*connection"; "i")) | {time, context, elapsedMs, timeoutMs}'
# Connection state transitions
grep '^{' /tmp/disclaude-stdout.log | jq -c 'select(.context == "WsConnectionManager" or .context == "FeishuChannel") | select(.msg | test("state changed|reconnect|established|closed|ready")) | {time, context, msg, oldState, newState, attempt}'
# Reconnect attempts and outcomes
grep '^{' /tmp/disclaude-stdout.log | jq -c 'select(.msg | test("reconnect"; "i")) | {time, context, msg, attempt, reconnectAttempt}'
# Reconnect success rate
echo "=== Successful reconnects ===" && grep '^{' /tmp/disclaude-stdout.log | jq -c 'select(.msg | test("Reconnected successfully"))' | wc -l
echo "=== Reconnect attempts ===" && grep '^{' /tmp/disclaude-stdout.log | jq -c 'select(.msg | test("Scheduling reconnect attempt"))' | wc -l
# Time between reconnects (detect loops)
grep '^{' /tmp/disclaude-stdout.log | jq -r 'select(.msg | test("Reconnected successfully")) | .time' | head -20
# ChatAgent errors
grep '^{' /tmp/disclaude-stdout.log | jq -c 'select(.context == "ChatAgent" and .level == "error") | {time, msg, chatId, err: .err.message, messageCount}'
# SDK subprocess spawn events
grep '^{' /tmp/disclaude-stdout.log | jq -c 'select(.msg | test("subprocess spawning")) | {time, context, command, ANTHROPIC_BASE_URL}'
# Timeout patterns
grep '^{' /tmp/disclaude-stdout.log | jq -c 'select(.msg | test("timeout"; "i")) | {time, context, msg, reason}'
# Queries per chatId (load distribution)
grep '^{' /tmp/disclaude-stdout.log | jq -r 'select(.context == "ChatAgent") | .chatId // "cli"' | sort | uniq -c | sort -rn | head -10
After collecting data, produce a structured report:
## Log Diagnosis Report
### Service Health: [HEALTHY | DEGRADED | UNHEALTHY]
**Time range**: {first} to {last}
**Total entries**: {count}
**Errors**: {count} | **Warnings**: {count}
### Key Findings
1. [Most impactful issue]
2. [Second issue]
3. [Third issue]
### [If WebSocket issues found]
**WebSocket**: {reconnect count} reconnects in {timespan}, {success rate}% success rate.
Pattern: [describe — e.g., "Dead connection every ~3 minutes due to 130s idle timeout"]
### [If Agent issues found]
**ChatAgent**: {count} errors, {count} timeouts.
Affected chats: {list of chatId prefixes}
Root cause hint: [e.g., "GLM proxy not responding within timeout"]
### Recommendations
1. [Actionable fix]
2. [Actionable fix]
jq with select() filters over piping through grep — it's faster and handles JSON properly.tail -N | jq instead of jq ... file when you only need recent lines from large files.grep -c '"level":"error"' is faster than jq.jq -r to extract raw strings when you only need one field.head or tail to avoid flooding context.uniq -c | sort -rn.tools
--- name: issue-solver description: Issue Solver - creates a scheduled task to scan a GitHub repo for open issues, pick the best candidate, and submit a fix PR. Use when user wants to set up automated issue resolution. Keywords: "Issue Solver", "自动修 Bug", "solve issues", "issue solver", "issue solver 安装". allowed-tools: Read, Write, Edit, Bash, Glob, Grep --- # Issue Solver — Schedule 安装器 为指定 GitHub 仓库创建 Issue 扫描定时任务。将 schedule 模板实例化为可执行的 SCHEDULE.md。 **适用于**: 安装/配置 Issue Solver 定时任务 | **不适用于
testing
Dissolve a Feishu group chat and clean up associated resources. Use when a PR is merged/closed, a discussion is finished, or a group needs to be removed. Keywords: "解散群", "dissolve group", "删除群", "close group", "清理群".
data-ai
手气不错 — disclaude dogfooding skill. Randomly selects a real use case from disclaude's feature set, simulates a natural user interaction, and reports observations. Use when user says keywords like "手气不错", "随机测试", "feeling lucky", "dogfooding", "自我体验", "feeling-lucky".
tools
Feishu/Lark document operations via lark-cli. Read, upload, import, export, and manage Feishu docs. Keywords: '飞书文档', '上传文档', '读飞书文档', 'lark cli', '导入文档', '导出文档', 'upload to feishu', 'feishu doc', 'lark doc', 'lark-cli', 'feishu.cn', '读文档'.