skills/datagen/domo-data-generator/SKILL.md
**Generating sample data for Domo** -- invoke when a user needs to create realistic sample datasets and upload them to a Domo instance. Primary signals: requests for sample data, demo data, test data, fake data for Domo; mentions of Salesforce, Google Analytics, QuickBooks, NetSuite, Google Ads, Facebook Ads, HubSpot, Marketo, or Health Portal sample data; questions about the datagen CLI or domo_data_generator. Covers: generating datasets, uploading to Domo, creating datasets in Domo, rolling dates, entity pools, connector icons, catalog management, and adding new dataset definitions. Skip for: real connector setup, production data pipelines, data transformations (Magic ETL), or Domo App Platform.
npx skillsauth add stahura/domo-ai-vibe-rules domo-data-generatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate realistic, cross-referenced sample data for Domo using the datagen CLI.
Repository: https://github.com/brrink/domo_data_generator
The generator creates sample data mirroring major business platforms with consistent cross-source entity integrity, then uploads it to Domo. It includes:
# Install globally with pipx (recommended)
pipx install git+https://github.com/brrink/domo_data_generator.git
# Initialize a working directory
mkdir my-domo-data && cd my-domo-data
datagen init
# Edit .env with your Domo credentials
If .env.example is missing or you want a clean start, create .env in the working directory with:
cat > .env <<'EOF'
DOMO_CLIENT_ID=your_client_id_here
DOMO_CLIENT_SECRET=your_client_secret_here
DOMO_API_HOST=api.domo.com
DOMO_INSTANCE=your_instance_name
DOMO_SET_CONNECTOR_TYPE=false
EOF
| Variable | Purpose |
|----------|---------|
| DOMO_CLIENT_ID | OAuth client identifier |
| DOMO_CLIENT_SECRET | OAuth client secret |
| DOMO_API_HOST | API endpoint hostname |
| DOMO_INSTANCE | Domo instance name |
| DOMO_SET_CONNECTOR_TYPE | Enable connector icon customization (optional, default: false) |
Auth boundary note:
domo_data_generatoruses its own public-API/OAuth credential flow and does not run throughcommunity-domo-clior ryuu session auth.Current tooling boundary: most Product API automation should use
community-domo-cli, but datagen dataset create/upload in this skill currently depends onpython -m datagenwith.envOAuth credentials (DOMO_CLIENT_ID/DOMO_CLIENT_SECRET).
Entry point: datagen [OPTIONS] COMMAND [ARGS]
| Option | Description |
|--------|-------------|
| --verbose / -v | Enable verbose logging |
| --output / -o TEXT | Output format: json (default), table, yaml |
| --yes / -y | Skip confirmation prompts |
All commands emit structured JSON by default for easy machine parsing.
init -- Initialize a working directorydatagen init # Initialize current directory
datagen init /path/to/dir # Initialize a specific directory
Copies bundled catalog YAML files to ./catalog/, creates .env template, and creates ./data/ directory. Run this once before using the CLI in a new directory.
generate -- Generate sample datadatagen generate --all # Generate all datasets
datagen generate salesforce_opportunities # Generate one dataset
datagen generate --all --seed 42 # Reproducible generation
datagen generate --all --dry-run # Preview without writing
Requires entity pool initialization first. Run python -m datagen pool regenerate before generate even if your schema has no explicit entity_ref columns.
| Option | Description |
|--------|-------------|
| name | Dataset name (YAML filename stem), optional |
| --all | Generate all datasets |
| --seed INTEGER | Random seed for reproducibility |
| --catalog-dir PATH | Catalog directory override |
| --data-dir PATH | Data directory override |
| --dry-run | Preview without writing files |
upload -- Upload data to Domo (full replace)datagen upload --all
datagen upload salesforce_opportunities
Requires DOMO_CLIENT_ID and DOMO_CLIENT_SECRET.
| Option | Description |
|--------|-------------|
| name | Dataset name, optional |
| --all | Upload all datasets |
| --catalog-dir PATH | Catalog directory override |
| --data-dir PATH | Data directory override |
create-dataset -- Create dataset(s) in Domo from catalogdatagen create-dataset --all --skip-existing
datagen create-dataset salesforce_opportunities
Requires DOMO_CLIENT_ID and DOMO_CLIENT_SECRET. The domo_id is persisted locally (in the catalog YAML if writable, otherwise in data/domo_ids.json).
| Option | Description |
|--------|-------------|
| name | Dataset name, optional |
| --all | Create all datasets |
| --skip-existing | Skip datasets that already have a domo_id |
| --catalog-dir PATH | Catalog directory override |
roll-dates -- Shift rolling date columns to stay currentdatagen roll-dates
datagen roll-dates --anchor-date 2026-04-01
| Option | Description |
|--------|-------------|
| --anchor-date TEXT | Target date (YYYY-MM-DD), defaults to today |
| --catalog-dir PATH | Catalog directory override |
| --data-dir PATH | Data directory override |
list -- List catalog dataset definitionsdatagen list # JSON output (default)
datagen --output table list # Rich table for humans
datagen list --verbose # Include column/schema details
status -- Display generation status for all datasetsdatagen status
Require DOMO_DEVELOPER_TOKEN and DOMO_INSTANCE.
discover-types -- Search Domo connector/provider typesdatagen discover-types salesforce
set-type -- Set connector icon on a Domo datasetdatagen set-type salesforce_opportunities
datagen set-type salesforce_opportunities --provider-key custom_key
set-type-all -- Set connector icon on all datasets with a domo_iddatagen set-type-all
pool regenerate -- Regenerate the shared entity pooldatagen pool regenerate
datagen pool regenerate --seed 99
datagen pool regenerate --company-count 500 --person-count 1000
| Option | Default |
|--------|---------|
| --seed INTEGER | 42 |
| --company-count INTEGER | 200 |
| --person-count INTEGER | 500 |
| --product-count INTEGER | 50 |
| --sales-rep-count INTEGER | 20 |
| --campaign-count INTEGER | 30 |
pool show -- Display entity pool summarydatagen pool show
datagen init
# Edit .env with credentials
datagen pool regenerate
datagen generate --all
datagen create-dataset --all
datagen upload --all
datagen set-type-all
# Crontab entry: roll dates and re-upload daily at 6 AM
0 6 * * * cd /path/to/project && datagen roll-dates && datagen upload --all
datagen generate salesforce_opportunities
datagen create-dataset salesforce_opportunities
datagen upload salesforce_opportunities
datagen set-type salesforce_opportunities
| Category | Dataset Name | Key | Rows |
|----------|-------------|-----|------|
| Salesforce | Salesforce - Accounts | salesforce_accounts | 500 |
| Salesforce | Salesforce - Contacts | salesforce_contacts | 1,500 |
| Salesforce | Salesforce - Opportunities | salesforce_opportunities | 2,500 |
| Google Analytics | Google Analytics - Sessions | ga_sessions | 5,000 |
| Google Analytics | Google Analytics - Page Views | ga_pageviews | 10,000 |
| Financial | QuickBooks - Invoices | financial_invoices | 3,000 |
| Financial | NetSuite - General Ledger | financial_gl_entries | 5,000 |
| Marketing | Google Ads - Campaign Performance | marketing_google_ads | 3,000 |
| Marketing | Facebook Ads - Campaign Performance | marketing_facebook_ads | 2,500 |
| Marketing | HubSpot - Contacts | marketing_hubspot_contacts | 2,000 |
| Marketing | Marketing - Market Leads | marketing_market_leads | 2,500 |
| Marketing | Marketo - Leads | marketing_marketo_leads | 3,000 |
| Health | Health Portal - Demographics | health_demographics | 15 |
| Health | Health Portal - Lab Results | health_lab_results | 1,470 |
| Health | Health Portal - Vitals | health_vitals | 5,250 |
| AdPoint | AdPoint - Orders | adpoint_orders | 150 |
| AdPoint | AdPoint - Line Items | adpoint_line_items | 500 |
| AdPoint | AdPoint - Flights | adpoint_flights | 2,000 |
The shared entity pool provides consistent cross-dataset references. Entities are generated once and reused across all datasets.
| Entity Type | Default Count | Key Fields | |-------------|---------------|------------| | company | 200 | id, account_id, name, domain, industry, size, city, state, annual_revenue, employee_count | | person | 500 | id, contact_id, first_name, last_name, full_name, email, company_id, company_name, title, phone | | product | 50 | id, name, category, unit_price, sku | | sales_rep | 20 | id, rep_id, first_name, last_name, full_name, email, region | | campaign | 30 | id, name, channel, budget, status |
Dataset definitions live in the catalog/ directory as YAML files. Each YAML file defines metadata, columns, and generator configurations.
dataset:
name: My Custom Dataset
domo_id: null
source_type: custom
description: "Description of the dataset"
row_count: 1000
tags:
- custom
- demo
schema:
- name: id
type: STRING
generator: uuid4
- name: company_name
type: STRING
generator: entity_ref
entity: company
field: name
- name: amount
type: DOUBLE
generator: random_decimal
min: 100.0
max: 10000.0
precision: 2
- name: created_date
type: DATE
generator: date_range
start_days_ago: 365
end_days_ahead: 0
rolling: true
STRING, LONG, DOUBLE, DECIMAL, DATETIME, DATE
Generic: uuid4, random_choice, weighted_choice, random_int, random_decimal, date_range, entity_ref, compound, sequence, constant, derived_from_date, stage_derived, faker
Salesforce: sf_id, sf_opportunity_name, sf_case_subject, sf_lead_rating
Google Analytics: ga_session_id, ga_page_path, ga_source, ga_medium, ga_campaign, ga_browser, ga_device_category, ga_country, ga_bounce_rate, ga_session_duration, ga_pageviews, ga_landing_page
Financial: gl_account_code, gl_account_name, invoice_number, payment_terms, payment_method, invoice_status, journal_type, department, fiscal_period, debit_credit
Marketing/Ads: ad_platform, campaign_objective, ad_format, ad_headline, ad_keyword, targeting_type, impressions, clicks_from_impressions, ctr, cost_per_click, ad_spend, conversions_from_clicks, hubspot_lifecycle, hubspot_lead_status, ad_group_id
Health: health_lab_init, health_lab_field, health_vital_init, health_vital_field, health_demographics
| Option | Used With | Description |
|--------|-----------|-------------|
| entity | entity_ref | Entity pool type to reference |
| field | entity_ref | Field to pull from the entity |
| choices | random_choice, weighted_choice | List of possible values |
| min / max | random_int, random_decimal | Value range |
| precision | random_decimal | Decimal places |
| template | compound | String template with {field} placeholders |
| refs | compound | Column references for template substitution — must be a YAML list of column name strings (e.g. ["sku", "line_id"]), not a dict/object. A dict triggers ValidationError: schema.N.refs — Input should be a valid list. |
| start_days_ago / end_days_ahead | date_range | Date range relative to today |
| rolling | date_range | Enable date rolling for freshness |
| mapping | stage_derived | Map source values to derived values |
| source_column | stage_derived, derived_from_date | Column to derive from |
| format | derived_from_date | Date format string |
| faker_method | faker | Faker library method name |
| faker_args | faker | Arguments for the Faker method |
weighted_choice YAML format:
generator: weighted_choice
choices:
"Tier 1": 0.40
"Tier 2": 0.35
"Tier 3": 0.25
compoundrefsvs formatted random strings: For values likePO-12345, preferfakerwithbothifyinstead of abusingcompound/refs:- name: purchase_order_ref type: STRING generator: faker faker_method: bothify faker_args: text: "PO-#####"
datagen init first -- Initialize a working directory before using any other commands. This copies the catalog and creates .env.generate (or generate --all) before upload to ensure CSV data files exist.create-dataset before upload for new datasets. The domo_id is persisted locally.--skip-existing -- When running create-dataset --all, use --skip-existing to avoid duplicating datasets that already have a domo_id.pool regenerate) invalidates all previously generated data. Re-generate all datasets afterward.roll-dates before upload to keep date columns current. Only columns with rolling: true are affected.DOMO_CLIENT_ID and DOMO_CLIENT_SECRET are required for upload and create-dataset. DOMO_DEVELOPER_TOKEN is required for set-type and discover-types. Offline commands need no credentials.--seed for reproducible data generation across runs.--output table for human-readable Rich tables.pipx install git+https://github.com/brrink/domo_data_generator.git)datagen init).env configured with Domo credentialsdatagen pool regenerate)datagen generate --all)datagen create-dataset --all --skip-existing)datagen upload --all)datagen set-type-all) if desiredtools
Step-by-step orchestrator for building Domo App Studio apps with native KPI cards via community-domo-cli. Sequences app creation, pages, theme, hero metrics, native charts, filter cards, layout assembly, and navigation. CLI-first — no raw API calls.
tools
Create, update, and execute Magic ETL dataflows programmatically via API and CLI. Covers DAG-based JSON dataflow definitions, input/transform/output node wiring, join operations, and execution lifecycle.
tools
Magic ETL dataflows via community-domo-cli — list, get-definition, create, update, run, execution status; JSON DAG actions, transforms, joins. Use when automating dataflows with the community Domo CLI end-to-end. For REST/Java-CLI–first flows or mixed API patterns, use magic-etl instead.
development
Clean, professional dashboard theme for Domo custom apps. CSS custom properties, layout patterns, typography, and design polish that feel native to the Domo platform. Includes OKLCH color palette, layered shadows, concentric border radius, tabular numbers, and micro-interaction patterns.