skills/data-quality-checker/SKILL.md
Validate data quality in market analysis documents and blog articles before publication. Use when checking for price scale inconsistencies (ETF vs futures), instrument notation errors, date/day-of-week mismatches, allocation total errors, and unit mismatches. Supports English and Japanese content. Advisory mode -- flags issues as warnings for human review, not as blockers.
npx skillsauth add MileniumTick/skills data-quality-checkerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Detect common data quality issues in market analysis documents before publication. The checker validates five categories: price scale consistency, instrument notation, date/weekday accuracy, allocation totals, and unit usage. All findings are advisory -- they flag potential issues for human review rather than blocking publication.
Accept the target markdown file path and optional parameters:
--file: Path to the markdown document to validate (required)--checks: Comma-separated list of checks to run (optional; default: all)--as-of: Reference date for year inference in YYYY-MM-DD format (optional)--output-dir: Directory for report output (optional; default: reports/)Run the data quality checker script:
python3 skills/data-quality-checker/scripts/check_data_quality.py \
--file path/to/document.md \
--output-dir reports/
To run specific checks only:
python3 skills/data-quality-checker/scripts/check_data_quality.py \
--file path/to/document.md \
--checks price_scale,dates,allocations
To provide a reference date for year inference (useful for documents without explicit year in dates):
python3 skills/data-quality-checker/scripts/check_data_quality.py \
--file path/to/document.md \
--as-of 2026-02-28
Read the relevant reference documents to contextualize findings:
references/instrument_notation_standard.md -- Standard ticker notation,
digit-count hints, and naming conventions for each instrument classreferences/common_data_errors.md -- Catalog of frequently observed errors
including FRED data delays, ETF/futures scale confusion, holiday oversights,
allocation total pitfalls, and unit confusion patternsUse these references to explain findings and suggest corrections.
Examine each finding in the output:
The script produces two output files:
data_quality_YYYY-MM-DD_HHMMSS.json): Machine-readable
list of findings with severity, category, message, line number, and context.data_quality_YYYY-MM-DD_HHMMSS.md): Human-readable
report grouped by severity level.Present the findings to the user with explanations referencing the knowledge base. Suggest specific corrections for each issue.
{
"severity": "WARNING",
"category": "price_scale",
"message": "GLD: $2,800 has 4 digits (expected 2-3 digits)",
"line_number": 5,
"context": "GLD: $2,800"
}
# Data Quality Report
**Source:** path/to/document.md
**Generated:** 2026-02-28 14:30:00
**Total findings:** 3
## ERROR (1)
- **[dates]** (line 12): Date-weekday mismatch: January 1, 2026 (Monday) -- actual weekday is Thursday
## WARNING (2)
- **[price_scale]** (line 5): GLD: $2,800 has 4 digits (expected 2-3 digits)
> `GLD: $2,800`
- **[allocations]**: Allocation total: 110.0% (expected ~100%)
scripts/check_data_quality.py -- Main validation scriptreferences/instrument_notation_standard.md -- Notation and price scale referencereferences/common_data_errors.md -- Common error patterns and preventionAdvisory mode: All findings are warnings for human review. The script always exits with code 0 on successful execution, even when findings are present. Exit code 1 is reserved for script failures (file not found, parse errors).
Section-aware allocation checking: Only percentages within allocation sections (identified by headings like "配分", "Allocation", or table columns like "ウェイト", "目安比率") are checked. Random percentages in body text (probability, RSI, YoY growth) are ignored.
Bilingual support: Handles both English and Japanese date formats, weekday names, and section headings. Full-width characters (%, 〜, en-dash) are normalized before processing.
Year inference: For dates without an explicit year, the checker infers
the year using (in priority order): the --as-of option, a YYYY pattern
found in the document title/metadata, or the current year with a 6-month
cross-year heuristic.
Digit-count heuristic: Price scale validation uses digit counts (number of digits before the decimal point) rather than absolute price ranges. This approach is resilient to price changes over time while still catching ETF/futures confusion errors.
development
Writes, reviews, and debugs idiomatic Rust code with memory safety and zero-cost abstractions. Implements ownership patterns, manages lifetimes, designs trait hierarchies, builds async applications with tokio, and structures error handling with Result/Option. Use when building Rust applications, solving ownership or borrowing issues, designing trait-based APIs, implementing async/await concurrency, creating FFI bindings, or optimizing for performance and memory safety. Invoke for Rust, Cargo, ownership, borrowing, lifetimes, async Rust, tokio, zero-cost abstractions, memory safety, systems programming.
development
Guide for writing idiomatic Rust code based on Apollo GraphQL's best practices handbook. Use this skill when: (1) writing new Rust code or functions, (2) reviewing or refactoring existing Rust code, (3) deciding between borrowing vs cloning or ownership patterns, (4) implementing error handling with Result types, (5) optimizing Rust code for performance, (6) writing tests or documentation for Rust projects.
development
Master Rust async programming with Tokio, async traits, error handling, and concurrent patterns. Use when building async Rust applications, implementing concurrent systems, or debugging async code.
tools
When the user wants help with revenue operations, lead lifecycle management, or marketing-to-sales handoff processes. Also use when the user mentions 'RevOps,' 'revenue operations,' 'lead scoring,' 'lead routing,' 'MQL,' 'SQL,' 'pipeline stages,' 'deal desk,' 'CRM automation,' 'marketing-to-sales handoff,' 'data hygiene,' 'leads aren't getting to sales,' 'pipeline management,' 'lead qualification,' or 'when should marketing hand off to sales.' Use this for anything involving the systems and processes that connect marketing to revenue. For cold outreach emails, see cold-email. For email drip campaigns, see email-sequence. For pricing decisions, see pricing-strategy.