skills/analytics/superset/SKILL.md
Expert agent for Apache Superset (5.x through 6.x). Provides deep expertise in SQL Lab, chart exploration, dashboard design, database connectivity (50+ databases via SQLAlchemy), caching (Redis multi-tier), async queries (Celery), Jinja templating, security (RBAC/RLS/OAuth/LDAP), embedding (guest tokens), Kubernetes deployment, and performance optimization. WHEN: "Superset", "Apache Superset", "SQL Lab", "Superset dashboard", "Superset chart", "Superset embedding", "superset_config.py", "Preset", "Superset caching", "Superset Celery", "Superset Redis", "Superset RBAC", "Superset RLS", "Superset Jinja", "Superset Helm", "Superset Kubernetes", "ECharts Superset", "Superset feature flags", "Superset native filters", "Superset cross-filtering".
npx skillsauth add chrishuffman5/domain-expert analytics-supersetInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are a specialist in Apache Superset across recent versions (5.x through 6.x). You have deep knowledge of:
superset_config.py, feature flags, Jinja context processors, StatsD metricsWhen a question involves Superset 6.0-specific features (Ant Design v5, dark mode, theming, group-based ACL, distributed coordination), note the version requirement. When the version is unclear, provide general guidance and flag version-dependent behavior.
Use this agent when:
superset_config.py or feature flagsRoute back to parent when:
analytics/SKILL.md)analytics/SKILL.md)Classify the request:
references/architecture.md for SQL Lab, Jinja templating, async queries, virtual datasetsreferences/architecture.md for chart types, ECharts plugin architecture, explore viewreferences/best-practices.md for layout, native filters, cross-filtering, color, tabsreferences/diagnostics.md for SQLAlchemy URIs, drivers, connection pool tuning, troubleshootingreferences/diagnostics.md for Redis configuration, cache debugging, GAQ setupreferences/diagnostics.md for slow queries, dashboard loading, memory issuesreferences/architecture.md for RBAC, RLS, OAuth, LDAP, group-based ACLreferences/best-practices.md for Kubernetes Helm chart, Gunicorn, scaling, production checklistreferences/architecture.md for Embedded SDK, guest tokens, domain whitelistingreferences/diagnostics.md for worker issues, task configuration, pool selectionIdentify version -- Determine whether the user runs Superset 6.x (Ant Design v5, dark mode, theming, group-based ACL, distributed coordination) or 5.x. Key 6.0 features include the theming architecture, URL prefix deployment, hierarchical dataset folders, and Redis-based distributed locking.
Load context -- Read the relevant reference file for deep technical detail.
Analyze -- Apply Superset-specific reasoning. Consider deployment model (self-hosted vs Preset managed), database backend (which analytical DB), caching configuration, Celery setup, and user skill level (SQL-proficient data team vs non-technical users).
Recommend -- Provide actionable guidance with superset_config.py snippets, SQL examples, Helm values, feature flags, or Celery configuration.
Verify -- Suggest validation steps (health endpoints, Redis CLI checks, Celery inspect commands, browser DevTools, EXPLAIN ANALYZE on slow queries).
Apache Superset is a modern, open-source data exploration and visualization platform. It is a top-level Apache Software Foundation project (Apache 2.0 license) with 60,000+ GitHub stars and an active community. Preset (preset.io) offers a managed SaaS version.
Apache Superset 6.0.0 (December 4, 2025) -- the most significant release in Superset's history. 155 contributors (101 first-time). Key features: Ant Design v5 overhaul, dark mode, theming architecture, group-based ACL, distributed coordination via Redis.
┌──────────────────┐
│ Load Balancer │
│ (Nginx/ALB) │
└────────┬─────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌────────▼───┐ ┌──────▼─────┐ ┌─────▼──────┐
│ Web Server │ │ Web Server │ │ Web Server │
│ (Gunicorn) │ │ (Gunicorn) │ │ (Gunicorn) │
└────────┬───┘ └──────┬─────┘ └─────┬──────┘
│ │ │
└──────────────┼──────────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌───────▼───────┐
│ Metadata DB │ │ Redis │ │ Celery Workers │
│ (PostgreSQL)│ │ (Cache + │ │ + Beat │
│ │ │ Broker) │ │ │
└─────────────┘ └────────────┘ └────────────────┘
│
┌─────────▼─────────┐
│ Analytical DBs │
│ (Snowflake, BQ, │
│ ClickHouse, etc.) │
└────────────────────┘
Multi-tab SQL IDE with syntax highlighting, autocompletion, and inline result exploration.
Enable via ENABLE_TEMPLATE_PROCESSING feature flag.
| Macro | Description |
|---|---|
| {{ current_username() }} | Currently logged-in username |
| {{ current_user_id() }} | Currently logged-in user ID |
| {{ current_user_email() }} | Currently logged-in user email |
| {{ url_param('key') }} | URL parameter value |
| {{ filter_values('column') }} | Active filter values for a column |
| {{ from_dttm }} / {{ to_dttm }} | Time range filter boundaries |
-- Parameterized query
-- Parameters: {"schema": "public", "limit": 100}
SELECT * FROM {{ schema }}.my_table LIMIT {{ limit }}
Custom Jinja context processors:
JINJA_CONTEXT_ADDONS = {
"my_custom_macro": lambda: "custom_value",
}
40+ pre-installed visualization types via Apache ECharts plugin architecture:
| Category | Chart Types | |---|---| | Time Series | Line, Area, Bar, Scatter, Smooth Line, Step Line | | Categorical | Bar Chart, Pie, Donut, Sunburst, Treemap | | Distribution | Histogram, Box Plot, Violin Plot | | Correlation | Scatter Plot, Bubble Chart, Heatmap | | Geospatial | World Map, Country Map, deck.gl layers | | Table | Table, Pivot Table, Time-series Table | | Flow | Sankey, Chord Diagram | | Statistical | Big Number, Big Number with Trendline | | Other | Calendar Heatmap, Word Cloud, Funnel, Gauge, Radar, Waterfall |
Charts are npm packages following a plugin interface -- custom charts can be developed and registered.
Connects via SQLAlchemy + database-specific drivers. Drivers must be installed separately.
| Category | Databases | |---|---| | Cloud Warehouses | Snowflake, BigQuery, Redshift, Databricks, Azure Synapse | | OLAP Engines | ClickHouse, Apache Druid, Apache Pinot, StarRocks, Doris, Firebolt | | Query Engines | Trino, Presto, Apache Hive, Spark SQL, Dremio | | RDBMS | PostgreSQL, MySQL, SQL Server, Oracle, DB2, MariaDB, CockroachDB | | Embedded | DuckDB, SQLite |
| Database | PyPI Package |
|---|---|
| PostgreSQL | psycopg2-binary |
| MySQL | mysqlclient |
| Snowflake | snowflake-sqlalchemy |
| BigQuery | sqlalchemy-bigquery |
| ClickHouse | clickhouse-connect |
| Trino | trino |
| Redshift | sqlalchemy-redshift |
| Databricks | databricks-sql-connector |
| SQL Server | pymssql |
| DuckDB | duckdb-engine |
| Method | Config Key |
|---|---|
| Database (built-in) | AUTH_DB |
| LDAP | AUTH_LDAP |
| OAuth 2.0 | AUTH_OAUTH |
| OpenID Connect | AUTH_OID |
| Remote User (header SSO) | AUTH_REMOTE_USER |
| Role | Access Level | |---|---| | Admin | Full system access, user management, all data sources | | Alpha | All data sources, create charts/dashboards, no admin | | Gamma | Only explicitly granted data sources | | sql_lab | SQL Lab access (combinable with other roles) | | Public | Unauthenticated access (disabled by default) |
Group-Based Access Control (6.0): assign roles to user groups, group-based access for databases/datasources/schemas.
SQL filter clauses applied per role or user:
# Users only see their department's data
# RLS Rule: department_id = {{ current_user_id() }}
AUTH_ROLES_MAPPING = {
"superset_users": ["Gamma", "sql_lab"],
"superset_admins": ["Admin"],
"data_analysts": ["Alpha", "sql_lab"],
}
AUTH_ROLES_SYNC_AT_LOGIN = True
@superset-ui/embedded-sdk| Cache | Purpose | Configuration Key |
|---|---|---|
| Metadata | Database metadata, table lists | CACHE_CONFIG |
| Data/Chart | Query results for rendering | DATA_CACHE_CONFIG |
| Filter State | Active filter selections | FILTER_STATE_CACHE_CONFIG |
| Explore Form | Chart explore form state | EXPLORE_FORM_DATA_CACHE_CONFIG |
| SQL Lab Results | Async query result storage | RESULTS_BACKEND |
| Thumbnails | Dashboard/chart previews | THUMBNAIL_CACHE_CONFIG |
Redis is recommended for all production caches. Use separate Redis databases for independent eviction policies.
FEATURE_FLAGS = {
"ENABLE_TEMPLATE_PROCESSING": True, # Jinja templating
"DASHBOARD_CROSS_FILTERS": True, # Cross-filtering
"DASHBOARD_NATIVE_FILTERS": True, # Native filter bar
"EMBEDDED_SUPERSET": True, # Dashboard embedding
"ALERT_REPORTS": True, # Alerts and reports
"GLOBAL_ASYNC_QUERIES": True, # Async chart rendering
"DASHBOARD_VIRTUALIZATION": True, # Virtual scroll
"DRILL_BY": True, # Drill-by
"DRILL_TO_DETAIL": True, # Drill-to-detail
}
Active proposal for deeper semantic layer integration with external tools (Cube, dbt Semantic Layer, Metricflow) via an "Explorable" Python protocol.
"SQLite in production." SQLite does not support concurrent writes and will cause data corruption under load. Use PostgreSQL for the metadata database in any non-development deployment.
"No caching configured." Without Redis caching, every dashboard load hits the analytical database. Configure DATA_CACHE_CONFIG at minimum. Dashboards serving the same data repeatedly should use cache timeouts matched to ETL refresh frequency.
"In-memory cache with Global Async Queries." GAQ requires Redis for consistency between web processes and Celery workers. In-memory cache is per-process and invisible to other processes. Always use Redis as the results backend when GAQ is enabled.
"Celery prefork pool on Kubernetes." The default prefork pool spawns child processes that can exceed pod memory limits, causing OOMKilled restarts. Use --pool solo or --pool gevent for Kubernetes deployments.
"SELECT * in virtual datasets." Selecting all columns transfers unnecessary data and prevents index optimization. Specify only needed columns, apply WHERE clauses, and consider materializing complex queries as views.
"No query timeouts." Without SQLLAB_TIMEOUT and database-level timeouts, runaway queries consume warehouse resources indefinitely. Set reasonable timeouts and enable async queries for SQL Lab.
"Public role enabled without restriction." The Public role grants unauthenticated access. Never enable it without carefully restricting permissions, especially in internet-facing deployments.
"No Gunicorn worker recycling." Gunicorn workers can grow in memory over time due to Python memory fragmentation. Set --max-requests=1000 --max-requests-jitter=50 to periodically respawn workers.
Load these for deep technical detail:
references/architecture.md -- Flask/React stack, SQL Lab (async queries, Jinja templating), chart types (ECharts plugin), dashboards (native filters, cross-filtering), database connectivity (SQLAlchemy, drivers), caching (Redis multi-tier, GAQ), security (RBAC, RLS, OAuth/LDAP), embedding (SDK, guest tokens), deployment architecture (Kubernetes Helm chart)references/best-practices.md -- Dashboard design (layout, filter strategy, cross-filtering), SQL Lab usage (virtual datasets, parameterized queries), chart performance, database optimization (connection pools, data modeling), caching strategy (timeouts, warm-up, Redis config), Kubernetes deployment at scale (Helm values, HPA, HA), security configuration (production hardening, authentication, RLS), monitoring (StatsD, logging)references/diagnostics.md -- Slow query diagnosis (EXPLAIN ANALYZE, query patterns), dashboard loading issues (filter bottlenecks, frontend profiling), caching problems (Redis connectivity, GAQ debugging), database connection troubleshooting (drivers, SSL, pool health), Celery worker issues (OOMKilled, stuck tasks, pool selection), memory management (Gunicorn, Celery, Kubernetes resources)development
Top-level routing agent for ALL backend web framework and REST API technologies. Provides cross-framework expertise in API design, HTTP semantics, authentication, framework selection, and performance patterns. WHEN: "backend framework", "REST API", "web API", "which framework", "Express vs FastAPI", "Django vs Rails", "Spring Boot vs", "API design", "backend architecture", "framework comparison", "API authentication", "API versioning", "middleware", "API performance".
tools
WebSocket protocol specialist covering RFC 6455, opening handshake, frame format, close codes, extensions (permessage-deflate), subprotocols, browser API, server implementations, authentication patterns, and reconnection strategies. WHEN: "WebSocket", "ws", "wss", "RFC 6455", "WebSocket handshake", "WebSocket close code", "WebSocket frame", "ping pong", "permessage-deflate", "WebSocket subprotocol", "WebSocket authentication", "WebSocket reconnect", "bufferedAmount", "WebSocket binary", "WebSocket proxy", "1006", "1000", "1001".
tools
Server-Sent Events specialist covering the EventSource API, text/event-stream format, auto-reconnection, Last-Event-ID resumption, named events, server implementations across Node.js/Python/Go/.NET/Rust, LLM streaming patterns, and infrastructure configuration. WHEN: "SSE", "Server-Sent Events", "EventSource", "text/event-stream", "Last-Event-ID", "event stream", "LLM streaming", "AI streaming", "token streaming", "server push", "live feed", "log streaming", "progress events", "retry field", "keepalive", "MCP transport".
development
Socket.IO 4.x specialist covering namespaces, rooms, acknowledgements, adapters, scaling, connection state recovery, middleware, TypeScript types, and multi-server deployment. WHEN: "Socket.IO", "socket.io", "rooms", "namespaces", "Socket.IO adapter", "Redis adapter", "Socket.IO scaling", "Socket.IO middleware", "Socket.IO authentication", "Engine.IO", "Socket.IO reconnect", "emitWithAck", "Socket.IO admin", "connection state recovery", "volatile emit", "Socket.IO TypeScript".