packages/skills/skills/huggingface-datasets/SKILL.md
# Hugging Face Datasets Create and manage datasets on Hugging Face Hub with SQL-based querying, transformation, and streaming row updates. ## Prerequisites - uv (Python package manager) - HF_TOKEN environment variable with Write-access token ## Instructions ### Core Capabilities 1. **SQL-Based Querying** - Query any HF dataset using DuckDB SQL via `hf://` protocol 2. **Dataset Creation** - Initialize repos, configure metadata, stream row updates 3. **Multi-Format Support** - Chat, classifi
npx skillsauth add mediar-ai/skillhubz packages/skills/skills/huggingface-datasetsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Create and manage datasets on Hugging Face Hub with SQL-based querying, transformation, and streaming row updates.
hf:// protocol# Query a dataset
uv run scripts/sql_manager.py query \
--dataset "cais/mmlu" \
--sql "SELECT * FROM data WHERE subject='nutrition' LIMIT 10"
# Get dataset schema
uv run scripts/sql_manager.py describe --dataset "cais/mmlu"
# Sample random rows
uv run scripts/sql_manager.py sample --dataset "cais/mmlu" --n 5
# Count with filter
uv run scripts/sql_manager.py count --dataset "cais/mmlu" --where "subject='nutrition'"
# Query and push to new dataset
uv run scripts/sql_manager.py query \
--dataset "cais/mmlu" \
--sql "SELECT * FROM data WHERE subject='nutrition'" \
--push-to "username/mmlu-nutrition-subset" \
--private
# Export to local file
uv run scripts/sql_manager.py export \
--dataset "cais/mmlu" \
--sql "SELECT * FROM data LIMIT 100" \
--output "sample.parquet" \
--format parquet
# Initialize repository
uv run scripts/dataset_manager.py init --repo_id "your-username/dataset-name" --private
# Quick setup with template
uv run scripts/dataset_manager.py quick_setup \
--repo_id "your-username/dataset-name" \
--template classification
# Add rows with template validation
uv run scripts/dataset_manager.py add_rows \
--repo_id "your-username/dataset-name" \
--template qa \
--rows_json "$(cat your_qa_data.json)"
chat - Multi-turn dialogues with tool usageclassification - Text classification with labelsqa - Question-answering pairscompletion - Text completion promptstabular - Structured datahf:// protocol provides direct dataset accessSource: huggingface/skills
tools
# X Twitter Scraper Use Xquik for X/Twitter tweet search, user lookup, profile tweets, follower export, media download, monitors, webhooks, posting workflows, and MCP-backed API exploration. ## Prerequisites - A Xquik API key in `XQUIK_API_KEY`. - Internet access to `https://xquik.com/api/v1`, `https://xquik.com/mcp`, and `https://docs.xquik.com`. - A clear user request that identifies the target tweets, users, accounts, keywords, media, monitor, webhook, or write action. ## Source Truth -
tools
Use when the user says "mk0r", "appmaker CLI", "open a VM", "run something in the sandbox", "talk to the VM agent", "spin up an E2B sandbox", or "chat with appmaker from CLI." Wraps the `mk0r` CLI to list projects, exec commands inside their E2B sandboxes, stream chat with the VM agent (same `/api/chat` the web UI uses), toggle SOAX residential IP, manage schedules, and copy files. Supports a sticky default project via `mk0r projects use`.
testing
Use when the user mentions "influencer candidates", "social media operator", "check proposals on Upwork/Fiverr", "review influencer applications", "qualify candidates", or "reach out to operators". Manages the IG/TikTok account operator hiring pipeline — review applicants, check replies, qualify, and do proactive outreach.
tools
End-to-end newsletter pipeline: investigate recent features, draft, send via API endpoint, and track delivery/open/click metrics.