packages/skills/skills/huggingface-evaluation/SKILL.md
# Hugging Face Evaluation Add structured evaluation results to model cards with support for README extraction, Artificial Analysis API, and custom evaluations using vLLM/lighteval. ## Prerequisites - uv (Python package manager) - HF_TOKEN environment variable - For Artificial Analysis: AA_API_KEY environment variable ## Instructions ### Workflow: Extract from README ```bash # 1. Check for existing PRs first uv run scripts/evaluation_manager.py get-prs --repo-id "username/model-name" # 2.
npx skillsauth add mediar-ai/skillhubz packages/skills/skills/huggingface-evaluationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Add structured evaluation results to model cards with support for README extraction, Artificial Analysis API, and custom evaluations using vLLM/lighteval.
# 1. Check for existing PRs first
uv run scripts/evaluation_manager.py get-prs --repo-id "username/model-name"
# 2. Inspect tables to find evaluation data
uv run scripts/evaluation_manager.py inspect-tables --repo-id "username/model"
# 3. Extract specific table (prints YAML)
uv run scripts/evaluation_manager.py extract-readme \
--repo-id "username/model" \
--table 1
# 4. Apply changes
uv run scripts/evaluation_manager.py extract-readme \
--repo-id "username/model" \
--table 1 \
--create-pr # or --apply for direct push
AA_API_KEY="your-key" uv run scripts/evaluation_manager.py import-aa \
--creator-slug "anthropic" \
--model-name "claude-sonnet-4" \
--repo-id "username/model-name" \
--create-pr
lighteval with vLLM:
uv run scripts/lighteval_vllm_uv.py \
--model meta-llama/Llama-3.2-1B \
--tasks "leaderboard|mmlu|5"
inspect-ai with vLLM:
uv run scripts/inspect_vllm_uv.py \
--model meta-llama/Llama-3.2-1B \
--task mmlu
Via HF Jobs:
hf jobs uv run scripts/lighteval_vllm_uv.py \
--flavor a10g-small \
--secrets HF_TOKEN=$HF_TOKEN \
-- --model meta-llama/Llama-3.2-1B \
--tasks "leaderboard|mmlu|5"
| Model Size | Hardware | |------------|----------| | < 3B params | t4-small | | 3B - 13B | a10g-small | | 13B - 34B | a10g-large | | 34B+ | a100-large |
Source: huggingface/skills
tools
# X Twitter Scraper Use Xquik for X/Twitter tweet search, user lookup, profile tweets, follower export, media download, monitors, webhooks, posting workflows, and MCP-backed API exploration. ## Prerequisites - A Xquik API key in `XQUIK_API_KEY`. - Internet access to `https://xquik.com/api/v1`, `https://xquik.com/mcp`, and `https://docs.xquik.com`. - A clear user request that identifies the target tweets, users, accounts, keywords, media, monitor, webhook, or write action. ## Source Truth -
tools
Use when the user says "mk0r", "appmaker CLI", "open a VM", "run something in the sandbox", "talk to the VM agent", "spin up an E2B sandbox", or "chat with appmaker from CLI." Wraps the `mk0r` CLI to list projects, exec commands inside their E2B sandboxes, stream chat with the VM agent (same `/api/chat` the web UI uses), toggle SOAX residential IP, manage schedules, and copy files. Supports a sticky default project via `mk0r projects use`.
testing
Use when the user mentions "influencer candidates", "social media operator", "check proposals on Upwork/Fiverr", "review influencer applications", "qualify candidates", or "reach out to operators". Manages the IG/TikTok account operator hiring pipeline — review applicants, check replies, qualify, and do proactive outreach.
tools
End-to-end newsletter pipeline: investigate recent features, draft, send via API endpoint, and track delivery/open/click metrics.