.config/opencode/skills/duckdb-data-explorer/SKILL.md
This skill should be used when performing local data exploration, profiling, quality analysis, or transformation tasks using DuckDB. It handles CSV, Parquet, and JSON files, provides automated data quality reports, supports complex JSON transformations, and generates interactive HTML reports for data analysis.
npx skillsauth add alexismanuel/dotfiles duckdb-data-explorerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables comprehensive local data exploration using DuckDB, supporting automated data profiling, quality analysis, and transformation workflows for CSV, Parquet, and JSON files. It provides reusable scripts for common data tasks, reference patterns for complex queries, and HTML report generation for interactive data visualization.
To quickly analyze a data file and generate a quality report:
Use scripts/data_profiler.py to profile the file:
python scripts/data_profiler.py data.csv --output profile.json
Generate an HTML report:
python scripts/html_report_generator.py profile.json report.html
Open the HTML report to view data quality metrics, null analysis, and sample data.
For complex JSON handling and transformation:
Analyze JSON structure:
python scripts/json_transformer.py structure data.json
Transform JSON data:
python scripts/json_transformer.py transform "*.json" "SELECT json_extract(data, '$.user.name') as name FROM json_data" --output transformed.parquet
Use scripts/data_profiler.py for automated data quality assessment:
When to use: Initial data exploration, data quality assessment, before data cleaning or transformation.
Use scripts/json_transformer.py for advanced JSON operations:
*.jsonWhen to use: Working with nested JSON data, API responses, log files, or document databases.
Use scripts/html_report_generator.py to create visual data exploration reports:
When to use: Sharing data insights, creating documentation, or interactive data exploration.
Reference references/duckdb_patterns.md for common query patterns:
When to use: Writing custom DuckDB queries, optimizing performance, learning DuckDB syntax.
Reference references/json_functions.md for comprehensive JSON function documentation:
json_extract, json_extract_string, etc.json_array_length, json_contains, etc.json_structure, json_type, etc.When to use: Complex JSON transformations, API data processing, nested data extraction.
Reference references/data_quality_checks.md for comprehensive quality assessment:
When to use: Data quality audits, data cleaning workflows, data validation pipelines.
# Profile a new dataset
python scripts/data_profiler.py sales_data.csv --output sales_profile.json
# Generate interactive report
python scripts/html_report_generator.py sales_profile.json sales_report.html
# Open report for exploration
open sales_report.html
# Analyze JSON structure
python scripts/json_transformer.py structure api_responses.json
# Transform and flatten JSON data
python scripts/json_transformer.py transform "logs/*.json" \
"SELECT
json_extract(data, '$.timestamp') as timestamp,
json_extract(data, '$.user.id') as user_id,
json_extract(data, '$.event.type') as event_type
FROM json_data" \
--output cleaned_logs.parquet
# Profile data for quality issues
python scripts/data_profiler.py customer_data.parquet --output quality_profile.json
# Generate detailed quality report
python scripts/html_report_generator.py quality_profile.json quality_report.html
# Use reference queries for deeper analysis
duckdb :memory: "SELECT * FROM read_parquet('customer_data.parquet') LIMIT 10"
When exporting to PostgreSQL for analysis:
For processing multiple files:
# Process all CSV files in directory
for file in data/*.csv; do
python scripts/data_profiler.py "$file" --output "profiles/$(basename "$file" .csv).json"
python scripts/html_report_generator.py "profiles/$(basename "$file" .csv).json" "reports/$(basename "$file" .csv).html"
done
Executable Python scripts for data operations:
data_profiler.py: Automated data quality analysis and profilingjson_transformer.py: Complex JSON handling and transformation utilitieshtml_report_generator.py: Interactive HTML report generationComprehensive documentation for DuckDB operations:
duckdb_patterns.md: Common query patterns and best practicesjson_functions.md: Complete JSON function reference with examplesdata_quality_checks.md: Data quality assessment frameworks and queriesTemplates and resources for output generation:
report_template.html: Interactive HTML template for data exploration reportsdevelopment
Generate GitLab merge request descriptions from git commits with automatic categorization and Jira integration.
development
This skill should be used when validating that an implementation plan was correctly executed. It verifies success criteria, runs tests, identifies deviations, and presents structured completion options including MR creation or discard.
development
This skill should be used when reviewing code changes in a branch against main/master/develop. It analyzes commits, integrates JIRA ticket and MR context when available, and produces a structured code review using Conventional Comments format.
development
This skill should be used when conducting comprehensive codebase research to answer questions, understand architecture, or prepare context for implementation planning. It spawns parallel sub-agents and synthesizes findings into a structured research document.