skills/astronomer/profiling-tables/SKILL.md
Deep-dive data profiling for a specific table. Use when the user asks to profile a table, wants statistics about a dataset, asks about data quality, or needs to understand a table's structure and content. Requires a table name.
npx skillsauth add rory-data/copilot profiling-tablesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate a comprehensive profile of a table that a new team member could use to understand the data.
Query column metadata:
SELECT COLUMN_NAME, DATA_TYPE, COMMENT
FROM <database>.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = '<schema>' AND TABLE_NAME = '<table>'
ORDER BY ORDINAL_POSITION
If the table name isn't fully qualified, search INFORMATION_SCHEMA.TABLES to locate it first.
Run via run_sql:
SELECT
COUNT(*) as total_rows,
COUNT(*) / 1000000.0 as millions_of_rows
FROM <table>
For each column, gather appropriate statistics based on data type:
SELECT
MIN(column_name) as min_val,
MAX(column_name) as max_val,
AVG(column_name) as avg_val,
STDDEV(column_name) as std_dev,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY column_name) as median,
SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) as null_count,
COUNT(DISTINCT column_name) as distinct_count
FROM <table>
SELECT
MIN(LEN(column_name)) as min_length,
MAX(LEN(column_name)) as max_length,
AVG(LEN(column_name)) as avg_length,
SUM(CASE WHEN column_name IS NULL OR column_name = '' THEN 1 ELSE 0 END) as empty_count,
COUNT(DISTINCT column_name) as distinct_count
FROM <table>
SELECT
MIN(column_name) as earliest,
MAX(column_name) as latest,
DATEDIFF('day', MIN(column_name), MAX(column_name)) as date_range_days,
SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) as null_count
FROM <table>
For columns that look like categorical/dimension keys:
SELECT
column_name,
COUNT(*) as frequency,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) as percentage
FROM <table>
GROUP BY column_name
ORDER BY frequency DESC
LIMIT 20
This reveals:
Get representative rows:
SELECT *
FROM <table>
LIMIT 10
If the table is large and you want variety, sample from different time periods or categories.
Summarize quality across dimensions:
Provide a structured profile:
2-3 sentences describing what this table contains, who uses it, and how fresh it is.
| Column | Type | Nulls% | Distinct | Description | |--------|------|--------|----------|-------------| | ... | ... | ... | ... | ... |
List any data quality concerns discovered.
3-5 useful queries for common questions about this data.
tools
Queries, manages, and troubleshoots Apache Airflow using the af CLI. Covers listing DAGs, triggering runs, reading task logs, diagnosing failures, debugging DAG import errors, checking connections, variables, pools, and monitoring health. Also routes to sub-skills for writing DAGs, debugging, deploying, and migrating Airflow 2 to 3. Use when user mentions "Airflow", "DAG", "DAG run", "task log", "import error", "parse error", "broken DAG", or asks to "trigger a pipeline", "debug import errors", "check Airflow health", "list connections", "retry a run", or any Airflow operation. Do NOT use for warehouse/SQL analytics on Airflow metadata tables — use analyzing-data instead.
tools
Build Airflow 3.1+ plugins that embed FastAPI apps, custom UI pages, React components, middleware, macros, and operator links directly into the Airflow UI. Use this skill whenever the user wants to create an Airflow plugin, add a custom UI page or nav entry to Airflow, build FastAPI-backed endpoints inside Airflow, serve static assets from a plugin, embed a React app in the Airflow UI, add middleware to the Airflow API server, create custom operator extra links, or call the Airflow REST API from inside a plugin. Also trigger when the user mentions AirflowPlugin, fastapi_apps, external_views, react_apps, plugin registration, or embedding a web app in Airflow 3.1+. If someone is building anything custom inside Airflow 3.1+ that involves Python and a browser-facing interface, this skill almost certainly applies.
data-ai
Use when the user needs human-in-the-loop workflows in Airflow (approval/reject, form input, or human-driven branching). Covers ApprovalOperator, HITLOperator, HITLBranchOperator, HITLEntryOperator, HITLTrigger. Requires Airflow 3.1+. Does not cover AI/LLM calls (see airflow-ai).
development
Detects and fixes common code smells during review or refactoring. Invoke whenever reviewing code for quality issues, before merging a PR, when refactoring legacy code, or when the user asks about code quality, anti-patterns, or technical debt. Detects: over-abstraction, complex inheritance, large functions, tight coupling, hidden dependencies, magic numbers, boolean traps, swallowed exceptions, global state, and duplicate code. Provides specific fixes with before/after examples. Also invoke when someone says "review this code", "is this clean?", "can I improve this?", "this feels messy", or "find problems in my code".