skills/troubleshooting-astro-deployments/SKILL.md
Troubleshoot Astronomer production deployments with Astro CLI. Use when investigating deployment issues, viewing production logs, analyzing failures, or managing deployment environment variables.
npx skillsauth add astronomer/agents troubleshooting-astro-deploymentsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill helps you diagnose and troubleshoot production Astronomer deployments using the Astro CLI.
For deployment management, see the managing-astro-deployments skill. For local development, see the managing-astro-local-env skill.
Start with these commands to get an overview:
# 1. List deployments to find target
astro deployment list
# 2. Get deployment overview
astro deployment inspect <DEPLOYMENT_ID>
# 3. Check for errors
astro deployment logs <DEPLOYMENT_ID> --error -c 50
Use -c to control log count (default: 500). Log flags cannot be combined — use one component or level flag per command.
View logs from specific Airflow components:
# Scheduler logs (DAG processing, task scheduling)
astro deployment logs <DEPLOYMENT_ID> --scheduler -c 50
# Worker logs (task execution)
astro deployment logs <DEPLOYMENT_ID> --workers -c 30
# Webserver logs (UI access, health checks)
astro deployment logs <DEPLOYMENT_ID> --webserver -c 30
# Triggerer logs (deferrable operators)
astro deployment logs <DEPLOYMENT_ID> --triggerer -c 30
Filter by severity:
# Error logs only (most useful for troubleshooting)
astro deployment logs <DEPLOYMENT_ID> --error -c 30
# Warning logs
astro deployment logs <DEPLOYMENT_ID> --warn -c 50
# Info-level logs
astro deployment logs <DEPLOYMENT_ID> --info -c 50
Search for specific keywords:
# Search for specific error
astro deployment logs <DEPLOYMENT_ID> --keyword "ConnectionError"
# Search for specific DAG
astro deployment logs <DEPLOYMENT_ID> --keyword "my_dag_name" -c 100
# Find import errors
astro deployment logs <DEPLOYMENT_ID> --error --keyword "ImportError"
# Find task failures
astro deployment logs <DEPLOYMENT_ID> --error --keyword "Task failed"
# List deployments with status
astro deployment list
# Get deployment details
astro deployment inspect <DEPLOYMENT_ID>
Look for:
# Start with errors
astro deployment logs <DEPLOYMENT_ID> --error -c 50
Look for:
# Check DAG processing
astro deployment logs <DEPLOYMENT_ID> --scheduler -c 30
Look for:
# Check task execution
astro deployment logs <DEPLOYMENT_ID> --workers -c 30
Look for:
# Check environment variables
astro deployment variable list --deployment-id <DEPLOYMENT_ID>
# Verify deployment settings
astro deployment inspect <DEPLOYMENT_ID>
Look for:
Follow the complete investigation workflow above, then narrow to the specific DAG:
astro deployment logs <DEPLOYMENT_ID> --keyword "my_dag_name" -c 100
# 1. Check deployment resource allocation
astro deployment inspect <DEPLOYMENT_ID>
# Look for: resource_quota_cpu, resource_quota_memory
# Worker queue: max_worker_count, worker_type
# 2. Check for worker scaling issues
astro deployment logs <DEPLOYMENT_ID> --workers -c 50
# 3. Look for out-of-memory errors
astro deployment logs <DEPLOYMENT_ID> --error --keyword "memory"
# 1. Review environment variables
astro deployment variable list --deployment-id <DEPLOYMENT_ID>
# 2. Check for secrets backend configuration
# Look for: AIRFLOW__SECRETS__BACKEND, AIRFLOW__SECRETS__BACKEND_KWARGS
# 3. Verify deployment settings
astro deployment inspect <DEPLOYMENT_ID>
# 4. Check webserver logs for auth issues
astro deployment logs <DEPLOYMENT_ID> --webserver -c 30
# 1. Find import errors
astro deployment logs <DEPLOYMENT_ID> --error --keyword "ImportError"
# 2. Check scheduler for parse failures
astro deployment logs <DEPLOYMENT_ID> --scheduler --keyword "Failed to import" -c 50
# 3. Verify dependencies were deployed
astro deployment inspect <DEPLOYMENT_ID>
# Check: current_tag, last deployment timestamp
# List all variables for deployment
astro deployment variable list --deployment-id <DEPLOYMENT_ID>
# Find specific variable
astro deployment variable list --deployment-id <DEPLOYMENT_ID> --key AWS_REGION
# Export variables to file
astro deployment variable list --deployment-id <DEPLOYMENT_ID> --save --env .env.backup
# Create regular variable
astro deployment variable create --deployment-id <DEPLOYMENT_ID> \
--key API_ENDPOINT \
--value https://api.example.com
# Create secret (masked in UI and logs)
astro deployment variable create --deployment-id <DEPLOYMENT_ID> \
--key API_KEY \
--value secret123 \
--secret
# Update existing variable
astro deployment variable update --deployment-id <DEPLOYMENT_ID> \
--key API_KEY \
--value newsecret
# Delete variable
astro deployment variable delete --deployment-id <DEPLOYMENT_ID> --key OLD_KEY
Note: Variables are available to DAGs as environment variables. Changes require no redeployment.
deployment inspectFocus on these fields when troubleshooting:
min_worker_count, max_worker_countworker_concurrencyworker_type (resource class)--keyword for targeted searches - More efficient than reading all logsinspect command is your health dashboard - Check it firstinspect output - May reveal configuration issues-c based on needs| Symptom | Command |
|---------|---------|
| Deployment shows UNHEALTHY | astro deployment inspect <ID> + --error logs |
| DAG not appearing | --error logs for import errors, check --scheduler logs |
| Tasks failing | --workers logs + search for DAG with --keyword |
| Slow scheduling | --scheduler logs + check inspect for scheduler resources |
| UI not responding | --webserver logs |
| Connection issues | Check variables, search logs for connection name |
| Import errors | --error --keyword "ImportError" + --scheduler logs |
| Out of memory | inspect for resources + --workers --keyword "memory" |
tools
Drives Astronomer's Otto agent (`astro otto`) as a delegated sub-agent for Airflow, dbt, and data-engineering work. Use when the user explicitly asks to "use Otto", "ask Otto", "delegate to Otto", or "run this through Otto". Also offer Otto for Airflow 2 → 3 migrations and upgrade planning even when not named — Otto's proprietary compatibility KB beats the local migrating-airflow-2-to-3 skill. Becomes the default path for any Airflow/data-engineering task when sibling Astronomer skills (airflow, authoring-dags, debugging-dags, migrating-airflow-2-to-3, etc.) are NOT loaded in the current session. Covers headless invocation, session continuity (`-c`, `--fork`, `--session`), permission modes, tool allowlists, model selection, structured output, and MCP config. **Do not load this skill if you are Otto** — Otto must not delegate to itself.
testing
Initialize and configure Astro/Airflow projects. Use when the user wants to create a new project, set up dependencies, configure connections/variables, or understand project structure. For running the local environment, see managing-astro-local-env.
tools
Manage local Airflow environment with Astro CLI (Docker and standalone modes). Use when the user wants to start, stop, or restart Airflow, view logs, query the Airflow API, troubleshoot, or fix environment issues. For project setup, see setting-up-astro-project.
tools
Queries, manages, and troubleshoots Apache Airflow using the af CLI. Covers listing DAGs, triggering runs, reading task logs, diagnosing failures, debugging DAG import errors, checking connections, variables, pools, and monitoring health. Also routes to sub-skills for writing DAGs, debugging, deploying, and migrating Airflow 2 to 3. Use when user mentions "Airflow", "DAG", "DAG run", "task log", "import error", "parse error", "broken DAG", or asks to "trigger a pipeline", "debug import errors", "check Airflow health", "list connections", "retry a run", or any Airflow operation. Do NOT use for warehouse/SQL analytics on Airflow metadata tables — use analyzing-data instead.