Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

nicsuzor/ground-truth

Name: ground-truth
Author: nicsuzor

archived/skills/ground-truth/SKILL.md

npx skillsauth add nicsuzor/academicops ground-truth

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Ground Truth Labeling

Purpose

Establish rigorous, defensible ground truth labels for evaluation datasets. Ensures labels derive from authoritative sources rather than intuition, and documents reasoning for reproducibility and auditability.

When to Use

Invoke when:

Creating ground truth labels for a new dataset
Reviewing records with high scorer/judge disagreement
Refining existing labels based on new understanding
Auditing label consistency across a dataset

Core Principle: Guidelines Are Authoritative

Ground truth derives from explicit guidelines, not intuition.

When labeling, the answer must come from the established criteria themselves, not from general judgment about what "should" be the case.

❌ "This seems like good journalism, so it shouldn't be flagged"
✅ "Guideline X permits quoting harmful language when [condition]. This article meets that condition."

Workflow

1. Load Relevant Guidelines

Before labeling any record, load and review the authoritative criteria:

What rules apply?
What are the explicit conditions for violation/non-violation?
What edge cases does the guideline address?

2. Analyze the Record

For each record:

Identify potential issues (terminology, framing, sources, etc.)
For each issue, find the specific guideline provision that applies
Determine if the guideline's conditions for violation are met

3. Construct the Label

Label structure:

ground_truth:
  violating: true/false
  reasons:
  - Primary reason with guideline reference
  - 'OPTIONAL: Secondary observation that scorers need not require'

Reason categories:

Primary: Scorers should expect judges to identify this
OPTIONAL: Valid observation that reasonable judges might not mention

4. Document Ambiguity

High disagreement signals:

Ambiguity in the guidelines themselves
Cases where guidelines conflict or don't clearly apply
Need to consult authoritative sources

When encountering genuine ambiguity, document it - don't force a label.

OPTIONAL Reasons

Prefix with "OPTIONAL:" for secondary observations:

Scorers should not require judges to mention these
If a judge does comment, scorers should expect correctness
Captures edge cases or nuanced guideline applications

Example:

reasons:
- Article provides critical framing and therefore DOES NOT VIOLATE quote attribution rules.
- 'OPTIONAL: Uses "activists" - guidelines discourage this when implying negative connotations, but usage here is neutral.'

Common Labeling Pitfalls

| Pitfall | Correction | | -------------------------- | ---------------------------------------- | | Labeling by intuition | Find explicit guideline provision | | Assuming guidelines agree | Check each criterion separately | | Over-strict interpretation | Guidelines often permit with conditions | | Ignoring context | Most guidelines consider framing/purpose | | Binary thinking | Use OPTIONAL for nuanced observations |

Consistency Checks

When refining labels:

Same reasoning → same label: If two records have the same characteristic, they should have the same label
Document changes: Log all label changes with rationale
Test edge cases: Does this label imply changes to similar records?

Output

For each labeling decision, provide:

The label (violating: true/false)
Primary reason(s) with guideline references
Any OPTIONAL observations
Rationale connecting guideline text to record content

nicsuzor/ground-truth

archived/skills/ground-truth/SKILL.md

Establish and refine ground truth labels for evaluation datasets. Use when creating, reviewing, or updating labels for any judgment/reasoning task.

data-ai

Updated Apr 9, 2026

$ install --global

skillsauth

npx skillsauth add nicsuzor/academicops ground-truth

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 9, 2026, 2:26 AM18.7s1 file scanned

SKILL.md

name:: ground-truth
category:: instruction
description:: Establish and refine ground truth labels for evaluation datasets. Use when creating, reviewing, or updating labels for any judgment/reasoning task.
allowed-tools:: Read,Grep,Glob,Edit,Write
version:: 1.0.0
permalink:: skills-ground-truth

Ground Truth Labeling

Purpose

When to Use

Invoke when:

Creating ground truth labels for a new dataset
Reviewing records with high scorer/judge disagreement
Refining existing labels based on new understanding
Auditing label consistency across a dataset

Core Principle: Guidelines Are Authoritative

Ground truth derives from explicit guidelines, not intuition.

When labeling, the answer must come from the established criteria themselves, not from general judgment about what "should" be the case.

❌ "This seems like good journalism, so it shouldn't be flagged"
✅ "Guideline X permits quoting harmful language when [condition]. This article meets that condition."

Workflow

1. Load Relevant Guidelines

Before labeling any record, load and review the authoritative criteria:

What rules apply?
What are the explicit conditions for violation/non-violation?
What edge cases does the guideline address?

2. Analyze the Record

For each record:

Identify potential issues (terminology, framing, sources, etc.)
For each issue, find the specific guideline provision that applies
Determine if the guideline's conditions for violation are met

3. Construct the Label

Label structure:

ground_truth:
  violating: true/false
  reasons:
  - Primary reason with guideline reference
  - 'OPTIONAL: Secondary observation that scorers need not require'

Reason categories:

Primary: Scorers should expect judges to identify this
OPTIONAL: Valid observation that reasonable judges might not mention

4. Document Ambiguity

High disagreement signals:

Ambiguity in the guidelines themselves
Cases where guidelines conflict or don't clearly apply
Need to consult authoritative sources

When encountering genuine ambiguity, document it - don't force a label.

OPTIONAL Reasons

Prefix with "OPTIONAL:" for secondary observations:

Scorers should not require judges to mention these
If a judge does comment, scorers should expect correctness
Captures edge cases or nuanced guideline applications

Example:

reasons:
- Article provides critical framing and therefore DOES NOT VIOLATE quote attribution rules.
- 'OPTIONAL: Uses "activists" - guidelines discourage this when implying negative connotations, but usage here is neutral.'

Common Labeling Pitfalls

Consistency Checks

When refining labels:

Same reasoning → same label: If two records have the same characteristic, they should have the same label
Document changes: Log all label changes with rationale
Test edge cases: Does this label imply changes to similar records?

Output

For each labeling decision, provide:

The label (violating: true/false)
Primary reason(s) with guideline references
Any OPTIONAL observations
Rationale connecting guideline text to record content

Related Skills

nicsuzor/end_session

data-ai

VerifiedTrustedCommunity

Canonical session close — commit, push, PR, release_task, reflection blocks, handover. Use /dump for emergency bail (no commit/PR/reflection).

2SKILL.mdUpdated Jul 7, 2026

nicsuzor/dump

data-ai

VerifiedTrustedCommunity

Emergency session bail — fast resume task + short handover, no commit/PR/reflection. For when you (or the user) need a clean context now. Use /end-session for canonical close.

2SKILL.mdUpdated Jul 7, 2026

nicsuzor/daily

data-ai

VerifiedTrustedCommunity

Daily note lifecycle — compose and maintain a factual daily note. Reports the state of the day; does not prioritise or recommend. SSoT for daily note structure.

2SKILL.mdUpdated Jul 7, 2026

nicsuzor/narrative-digest

testing

VerifiedTrustedCommunity

Launder supervisor/worker task-log output into a Nic-facing narrative — what happened, where things are headed, and what (if anything) is genuinely his to decide. Never relays raw process detail (worker IDs, thread pointers, log paths) or verbatim task-log stream-of-consciousness.

2SKILL.mdUpdated Jul 7, 2026

nicsuzor/narrative-digest

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/nicsuzor/academicops.git

# Copy into Claude Code skills folder (global)
cp -r academicops/archived/skills/ground-truth ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

nicsuzor/academicops

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT