Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

tbroadley/hawk-view-results

Name: hawk-view-results
Author: tbroadley

claude/skills/hawk-view-results/SKILL.md

npx skillsauth add tbroadley/dotfiles hawk-view-results

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

View Hawk Eval Results

When the user wants to analyze evaluation results, use these hawk CLI commands:

1. List Eval Sets

You can list all eval sets if the user do not know the eval set ID:

hawk list eval-sets

Shows: eval set ID, creation date, creator.

You can increase the limit of results returned by --limit N.

hawk list eval-sets --limit 50

Or you can search for a specific eval set by using --search QUERY.

hawk list eval-sets --search pico

2. List Evaluations

With an eval set ID, you can list all evaluations in the eval-set:

hawk list evals [EVAL_SET_ID]

Shows: task name, model, status (success/error/cancelled), and sample counts.

3. List Samples

Or you can list individual samples and their scores:

hawk list samples [EVAL_SET_ID] [--eval FILE] [--limit N]

4. Download Transcript

To get the full conversation for a specific sample:

hawk transcript <UUID>

The transcript includes full conversation with tool calls, scores, and metadata.

To get even more details, you can get the raw data by using --raw:

hawk transcript <UUID> --raw

Batch Transcript Download

You can also download all transcripts for an entire eval set:

# Fetch all samples in an eval set
hawk transcripts <EVAL_SET_ID>

# Write to individual files in a directory
hawk transcripts <EVAL_SET_ID> --output-dir ./transcripts

# Limit number of samples
hawk transcripts <EVAL_SET_ID> --limit 10

# Raw JSON output (one JSON per line to stdout, or .json files with --output-dir)
hawk transcripts <EVAL_SET_ID> --raw

Known Limitations

hawk list samples has a max --limit of 500 (API returns 422 for higher values)
hawk transcript and hawk transcripts time out on large eval files (100MB+), common with side-task evals that have thousands of samples × multiple epochs
hawk list samples does not index score values — the score_value field is often None

Prefer the Data Warehouse for Bulk Analysis

For querying sample-level data across eval sets (scores, limits, errors, token counts), use the warehouse-query skill instead of downloading eval files from S3 via inspect_ai.log.read_eval_log(). The warehouse has eval, sample, score, and message tables. A SQL query takes seconds vs minutes/hours for large eval files.

Example — find all samples that hit the working limit:

SELECT s.id AS sample_id, s.epoch, e.eval_set_id, e.model, e.task_args
FROM sample s
JOIN eval e ON s.eval_pk = e.pk
WHERE e.eval_set_id = 'eval-set-xxx'
  AND s."limit" = 'working';

Use hawk transcript <uuid> only when you need the full conversation transcript for a specific sample.

Workflow

Run hawk list eval-sets to see available eval sets 2a. Run hawk list evals <EVAL_SET_ID> to see available evaluations 2b. or run hawk list samples <EVAL_SET_ID> to find samples of interest (max 500 per request)
For bulk sample-level analysis (scores, limits, errors), use warehouse-query skill with SQL
Run hawk transcript <uuid> only for full conversation details on individual samples

API Environments

Production (https://api.inspect-ai.internal.metr.org) is used by default. Set HAWK_API_URL only when targeting non-production environments:

| Environment | URL | |-------------|-----| | Staging | https://api.inspect-ai.staging.metr-dev.org | | Dev1 | https://api.inspect-ai.dev1.staging.metr-dev.org | | Dev2 | https://api.inspect-ai.dev2.staging.metr-dev.org | | Dev3 | https://api.inspect-ai.dev3.staging.metr-dev.org | | Dev4 | https://api.inspect-ai.dev4.staging.metr-dev.org |

Example:

HAWK_API_URL=https://api.inspect-ai.staging.metr-dev.org hawk list eval_sets

tbroadley/hawk-view-results

claude/skills/hawk-view-results/SKILL.md

View and analyze Hawk evaluation results. Use when the user wants to see eval-set results, check evaluation status, list samples, view transcripts, or analyze agent behavior from a completed evaluation run.

testing

Updated Apr 15, 2026

$ install --global

skillsauth

npx skillsauth add tbroadley/dotfiles hawk-view-results

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 15, 2026, 2:15 AM80.6s1 file scanned

SKILL.md

name:: hawk-view-results
description:: View and analyze Hawk evaluation results. Use when the user wants to see eval-set results, check evaluation status, list samples, view transcripts, or analyze agent behavior from a completed evaluation run.

View Hawk Eval Results

When the user wants to analyze evaluation results, use these hawk CLI commands:

1. List Eval Sets

You can list all eval sets if the user do not know the eval set ID:

hawk list eval-sets

Shows: eval set ID, creation date, creator.

You can increase the limit of results returned by --limit N.

hawk list eval-sets --limit 50

Or you can search for a specific eval set by using --search QUERY.

hawk list eval-sets --search pico

2. List Evaluations

With an eval set ID, you can list all evaluations in the eval-set:

hawk list evals [EVAL_SET_ID]

Shows: task name, model, status (success/error/cancelled), and sample counts.

3. List Samples

Or you can list individual samples and their scores:

hawk list samples [EVAL_SET_ID] [--eval FILE] [--limit N]

4. Download Transcript

To get the full conversation for a specific sample:

hawk transcript <UUID>

The transcript includes full conversation with tool calls, scores, and metadata.

To get even more details, you can get the raw data by using --raw:

hawk transcript <UUID> --raw

Batch Transcript Download

You can also download all transcripts for an entire eval set:

# Fetch all samples in an eval set
hawk transcripts <EVAL_SET_ID>

# Write to individual files in a directory
hawk transcripts <EVAL_SET_ID> --output-dir ./transcripts

# Limit number of samples
hawk transcripts <EVAL_SET_ID> --limit 10

# Raw JSON output (one JSON per line to stdout, or .json files with --output-dir)
hawk transcripts <EVAL_SET_ID> --raw

Known Limitations

hawk list samples has a max --limit of 500 (API returns 422 for higher values)
hawk transcript and hawk transcripts time out on large eval files (100MB+), common with side-task evals that have thousands of samples × multiple epochs
hawk list samples does not index score values — the score_value field is often None

Prefer the Data Warehouse for Bulk Analysis

Example — find all samples that hit the working limit:

SELECT s.id AS sample_id, s.epoch, e.eval_set_id, e.model, e.task_args
FROM sample s
JOIN eval e ON s.eval_pk = e.pk
WHERE e.eval_set_id = 'eval-set-xxx'
  AND s."limit" = 'working';

Use hawk transcript <uuid> only when you need the full conversation transcript for a specific sample.

Workflow

Run hawk list eval-sets to see available eval sets 2a. Run hawk list evals <EVAL_SET_ID> to see available evaluations 2b. or run hawk list samples <EVAL_SET_ID> to find samples of interest (max 500 per request)
For bulk sample-level analysis (scores, limits, errors), use warehouse-query skill with SQL
Run hawk transcript <uuid> only for full conversation details on individual samples

API Environments

Production (https://api.inspect-ai.internal.metr.org) is used by default. Set HAWK_API_URL only when targeting non-production environments:

Example:

HAWK_API_URL=https://api.inspect-ai.staging.metr-dev.org hawk list eval_sets

Related Skills

tbroadley/wispr-dictionary

tools

VerifiedTrustedCommunity

Add words to the Wispr Flow dictionary. Use when the user wants to add a word, phrase, or snippet to Wispr Flow for voice dictation.

SKILL.mdUpdated Apr 15, 2026

tbroadley/wispr-dictionary

tbroadley/upload-pr-images

documentation

VerifiedTrustedCommunity

Upload images to a GitHub PR description or comment using a shared gist as image hosting. Use when the user wants to add plots, screenshots, or other images to a PR.

SKILL.mdUpdated Apr 15, 2026

tbroadley/upload-pr-images

tbroadley/todoist

testing

VerifiedTrustedCommunity

Manage tasks, projects, and productivity in Todoist. View tasks, add new items, check completed work, and organize projects.

SKILL.mdUpdated Apr 15, 2026

tbroadley/rebase-stack

data-ai

VerifiedTrustedCommunity

Use when working with stacked diffs (branch B based on branch A, which is based on main).

SKILL.mdUpdated Apr 15, 2026

tbroadley/rebase-stack

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/tbroadley/dotfiles.git

# Copy into Claude Code skills folder (global)
cp -r dotfiles/claude/skills/hawk-view-results ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

tbroadley/dotfiles

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT