cli-tool/components/skills/scientific/vaex/SKILL.md
Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that don't fit in memory.
npx skillsauth add davila7/claude-code-templates vaexInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Vaex is a high-performance Python library designed for lazy, out-of-core DataFrames to process and visualize tabular datasets that are too large to fit into RAM. Vaex can process over a billion rows per second, enabling interactive data exploration and analysis on datasets with billions of rows.
Use Vaex when:
Vaex provides six primary capability areas, each documented in detail in the references directory:
Load and create Vaex DataFrames from various sources including files (HDF5, CSV, Arrow, Parquet), pandas DataFrames, NumPy arrays, and dictionaries. Reference references/core_dataframes.md for:
Perform filtering, create virtual columns, use expressions, and aggregate data without loading everything into memory. Reference references/data_processing.md for:
Leverage Vaex's lazy evaluation, caching strategies, and memory-efficient operations. Reference references/performance.md for:
delay=True for batching operationsCreate interactive visualizations of large datasets including heatmaps, histograms, and scatter plots. Reference references/visualization.md for:
Build ML pipelines with transformers, encoders, and integration with scikit-learn, XGBoost, and other frameworks. Reference references/machine_learning.md for:
Efficiently read and write data in various formats with optimal performance. Reference references/io_operations.md for:
For most Vaex tasks, follow this pattern:
import vaex
# 1. Open or create DataFrame
df = vaex.open('large_file.hdf5') # or .csv, .arrow, .parquet
# OR
df = vaex.from_pandas(pandas_df)
# 2. Explore the data
print(df) # Shows first/last rows and column info
df.describe() # Statistical summary
# 3. Create virtual columns (no memory overhead)
df['new_column'] = df.x ** 2 + df.y
# 4. Filter with selections
df_filtered = df[df.age > 25]
# 5. Compute statistics (fast, lazy evaluation)
mean_val = df.x.mean()
stats = df.groupby('category').agg({'value': 'sum'})
# 6. Visualize
df.plot1d(df.x, limits=[0, 100])
df.plot(df.x, df.y, limits='99.7%')
# 7. Export if needed
df.export_hdf5('output.hdf5')
The reference files contain detailed information about each capability area. Load references into context based on the specific task:
references/core_dataframes.md and references/data_processing.mdreferences/performance.mdreferences/visualization.mdreferences/machine_learning.mdreferences/io_operations.mddelay=True when performing multiple calculationsdf.stat() to understand memory usage and optimize operationsimport vaex
# Open large CSV (processes in chunks automatically)
df = vaex.from_csv('large_file.csv')
# Export to HDF5 for faster future access
df.export_hdf5('large_file.hdf5')
# Future loads are instant
df = vaex.open('large_file.hdf5')
# Use delay=True to batch multiple operations
mean_x = df.x.mean(delay=True)
std_y = df.y.std(delay=True)
sum_z = df.z.sum(delay=True)
# Execute all at once
results = vaex.execute([mean_x, std_y, sum_z])
# No memory overhead - computed on the fly
df['age_squared'] = df.age ** 2
df['full_name'] = df.first_name + ' ' + df.last_name
df['is_adult'] = df.age >= 18
This skill includes reference documentation in the references/ directory:
core_dataframes.md - DataFrame creation, loading, and basic structuredata_processing.md - Filtering, expressions, aggregations, and transformationsperformance.md - Optimization strategies and lazy evaluationvisualization.md - Plotting and interactive visualizationsmachine_learning.md - ML pipelines and model integrationio_operations.md - File formats and data import/exporttools
No-code automation democratizes workflow building. Zapier and Make (formerly Integromat) let non-developers automate business processes without writing code. But no-code doesn't mean no-complexity - these platforms have their own patterns, pitfalls, and breaking points. This skill covers when to use which platform, how to build reliable automations, and when to graduate to code-based solutions. Key insight: Zapier optimizes for simplicity and integrations (7000+ apps), Make optimizes for power
tools
Use only when the user explicitly asks to stage, commit, push, and open a GitHub pull request in one flow using the GitHub CLI (`gh`).
tools
Workflow automation is the infrastructure that makes AI agents reliable. Without durable execution, a network hiccup during a 10-step payment flow means lost money and angry customers. With it, workflows resume exactly where they left off. This skill covers the platforms (n8n, Temporal, Inngest) and patterns (sequential, parallel, orchestrator-worker) that turn brittle scripts into production-grade automation. Key insight: The platforms make different tradeoffs. n8n optimizes for accessibility
development
Trigger.dev expert for background jobs, AI workflows, and reliable async execution with excellent developer experience and TypeScript-first design. Use when: trigger.dev, trigger dev, background task, ai background job, long running task.