Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

duruii/seeing-as-agent

Name: seeing-as-agent
Author: duruii

skills/seeing-as-agent/SKILL.md

npx skillsauth add duruii/scientific-skills seeing-as-agent

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Seeing as an Agent

Every tool has two users: the human who triggers the agent, and the model that decides how to call it. Design for both.

The model has no memory across turns, no outside world except through tools, non-deterministic output, and a training cutoff. Designing from that existence means asking:

What tools fit its abilities, not just the problem's complexity?
Are descriptions internally consistent? Is the output format natural given how it was trained?
What reduces its uncertainty about when and how to use a tool?
What helps at one capability level may constrain the next — what scaffolding becomes a cage?
Where are its edges, and how can the harness serve rather than just constrain it?

Debugging: trace from the model's side first

When a tool call goes wrong, resist the urge to fix the code immediately. Instead:

What did the model see? Extract the tool schema from llm_request — name, description, parameters. This is the model's entire action space.
What did the model decide? Look at the tool_use block — the exact arguments it passed. Not what you expected, what it actually sent.
What happened downstream? Trace the server-side path from input to result. Where did the routing branch?
What did the model get back? The tool result is what the model uses for its next decision. Is it helpful or misleading?
Only now: was it the model's fault? Most "model errors" are actually bad tool descriptions, broken routing, or poor error messages. The model did the best it could with what you gave it.

Debugging: runtime evidence beats code inference

For LLM request-shape bugs, tool-call regressions, reasoning_content / thinking mismatches, or any issue involving "was field X really sent?", follow this order:

Query runtime evidence first. Inspect messages, turn_metrics.llm_request, turn_metrics.llm_response, and the structured trace/log buffer before reading implementation code.
Prove the exact boundary. Show where a field is present, where it disappears, and which concrete function sits between those two points.
Do not claim a field was preserved just because the code path looks correct. If the logs or DB do not prove it, treat it as unproven.
If observability is missing, add it before attributing root cause. Add begin/end spans or structured logs at the persistence boundary, context-building boundary, and outbound provider-request boundary.
When reporting a root cause, include the concrete runtime artifact. Quote the conversation ID / turn ID / trace span / DB row / request payload shape that proves the conclusion.

Debugging: reasoning_content bugs must follow the live request chain

For any reasoning_content / thinking-model regression:

Simulate the real frontend flow with POST /api/v1/conversations/:id/messages/stream.
Inspect begin/end spans across StreamMessage -> buildSmartContext -> provider request -> stream parser -> persistence/backfill.
Prove where the field is present, where it disappears, and which function sits between those two points.
Only patch code after the runtime trace proves the boundary that dropped the field.
Re-run the same simulated request after the patch and verify the same trace chain now carries the field end-to-end.

duruii/seeing-as-agent

skills/seeing-as-agent/SKILL.md

Debugging methodology for LLM tool calls — trace from the model's side first, use runtime evidence over code inference, and follow live request chains for reasoning/thinking bugs.

tools

Updated May 29, 2026

$ install --global

skillsauth

npx skillsauth add duruii/scientific-skills seeing-as-agent

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 29, 2026, 3:07 AM9.7s1 file scanned

SKILL.md

name:: seeing-as-agent
description:: Debugging methodology for LLM tool calls — trace from the model's side first, use runtime evidence over code inference, and follow live request chains for reasoning/thinking bugs.

Seeing as an Agent

Every tool has two users: the human who triggers the agent, and the model that decides how to call it. Design for both.

The model has no memory across turns, no outside world except through tools, non-deterministic output, and a training cutoff. Designing from that existence means asking:

What tools fit its abilities, not just the problem's complexity?
Are descriptions internally consistent? Is the output format natural given how it was trained?
What reduces its uncertainty about when and how to use a tool?
What helps at one capability level may constrain the next — what scaffolding becomes a cage?
Where are its edges, and how can the harness serve rather than just constrain it?

Debugging: trace from the model's side first

When a tool call goes wrong, resist the urge to fix the code immediately. Instead:

What did the model see? Extract the tool schema from llm_request — name, description, parameters. This is the model's entire action space.
What did the model decide? Look at the tool_use block — the exact arguments it passed. Not what you expected, what it actually sent.
What happened downstream? Trace the server-side path from input to result. Where did the routing branch?
What did the model get back? The tool result is what the model uses for its next decision. Is it helpful or misleading?
Only now: was it the model's fault? Most "model errors" are actually bad tool descriptions, broken routing, or poor error messages. The model did the best it could with what you gave it.

Debugging: runtime evidence beats code inference

For LLM request-shape bugs, tool-call regressions, reasoning_content / thinking mismatches, or any issue involving "was field X really sent?", follow this order:

Query runtime evidence first. Inspect messages, turn_metrics.llm_request, turn_metrics.llm_response, and the structured trace/log buffer before reading implementation code.
Prove the exact boundary. Show where a field is present, where it disappears, and which concrete function sits between those two points.
Do not claim a field was preserved just because the code path looks correct. If the logs or DB do not prove it, treat it as unproven.
If observability is missing, add it before attributing root cause. Add begin/end spans or structured logs at the persistence boundary, context-building boundary, and outbound provider-request boundary.
When reporting a root cause, include the concrete runtime artifact. Quote the conversation ID / turn ID / trace span / DB row / request payload shape that proves the conclusion.

Debugging: reasoning_content bugs must follow the live request chain

For any reasoning_content / thinking-model regression:

Simulate the real frontend flow with POST /api/v1/conversations/:id/messages/stream.
Inspect begin/end spans across StreamMessage -> buildSmartContext -> provider request -> stream parser -> persistence/backfill.
Prove where the field is present, where it disappears, and which function sits between those two points.
Only patch code after the runtime trace proves the boundary that dropped the field.
Re-run the same simulated request after the patch and verify the same trace chain now carries the field end-to-end.

Related Skills

duruii/paper-summary

testing

VerifiedTrustedCommunity

Research-grade single-paper analysis with evidence-grounded structured extraction and internal self-evaluation. Use when users ask to summarize or screen one academic paper from an arXiv link/ID or local PDF and need verifiable claims with citations, especially for Chinese-language output to students.

SKILL.mdUpdated May 29, 2026

duruii/ieee-search-mcp

tools

VerifiedTrustedCommunity

Use browser MCP to access IEEE Xplore through university library proxy, preserve institutional session, run keyword/advanced/journal search, and optionally post-filter by CCF rank (for example CCF-A) with structured output.

SKILL.mdUpdated May 29, 2026

duruii/ieee-search-mcp

duruii/dlai-transcript-fetcher

testing

VerifiedTrustedCommunity

Fetch and organize course transcripts from DeepLearning.AI. Use this skill whenever the user mentions DeepLearning.AI courses, wants to download course transcripts, subtitles, or VTT files from a course, or asks to organize lesson transcripts from learn.deeplearning.ai. It does NOT trigger for general video subtitle downloading — only for DeepLearning.AI courses specifically.

SKILL.mdUpdated May 29, 2026

duruii/dlai-transcript-fetcher

duruii/ccf-rank

development

VerifiedTrustedCommunity

Query CCF (China Computer Federation) venue rankings for conferences and journals using year-partitioned reference data. Use when users ask for CCF level (A/B/C), category/domain, or rank verification for a venue abbreviation/full name (for example ICML, CVPR, TOCS), or request batch lookup/comparison across venues or years.

SKILL.mdUpdated May 29, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/duruii/scientific-skills.git

# Copy into Claude Code skills folder (global)
cp -r scientific-skills/skills/seeing-as-agent ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

duruii/scientific-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT