.cursor/skills/research/SKILL.md
Analyze data, investigate datasets, work with CSV/parquet/pandas/dataframes. Use when analyzing data, exploring datasets, running experiments, or when user mentions data, analysis, parquet, csv, pandas, dataframe, statistics, investigation.
npx skillsauth add dmitryprg-ai/cursor-develop-autorules researchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Principle: DATA FIRST, CODE SECOND.
Choose analysis method based on project stack:
-- Schema inspection
SELECT column_name, data_type, is_nullable
FROM information_schema.columns WHERE table_name = 'target';
-- Data profiling
SELECT count(*), count(DISTINCT column_name),
count(*) FILTER (WHERE column_name IS NULL) as nulls
FROM target;
-- Distribution
SELECT column_name, count(*) FROM target GROUP BY 1 ORDER BY 2 DESC LIMIT 20;
// Shape and types
console.log(`Records: ${data.length}`);
console.log(`Keys: ${Object.keys(data[0] || {})}`);
// Profiling
const nullCount = data.filter(item => item.field == null).length;
const uniqueCount = new Set(data.map(item => item.field)).size;
const duplicates = data.length - uniqueCount;
print(f"Shape: {df.shape}")
print(f"dtypes:\n{df.dtypes}")
print(f"head:\n{df.head()}")
print(f"nunique:\n{df.nunique()}")
print(f"nulls:\n{df.isnull().sum()}")
| Risk | SQL Check | TypeScript Check | Python Check |
|------|-----------|------------------|--------------|
| Missing data | count(*) FILTER (WHERE col IS NULL) | data.filter(x => x.col == null).length | df.isnull().sum() |
| Duplicates | count(*) - count(DISTINCT col) | data.length - new Set(data.map(x => x.col)).size | df.duplicated().sum() |
| Wrong types | SELECT pg_typeof(col) | typeof item.field | df.dtypes |
| Outliers | percentile_cont(0.99) | Sort + inspect extremes | df.describe() |
# EXPERIMENT: [Description]
# HYPOTHESIS: [What we expect]
# METHOD: [SQL query / TypeScript code / Python code]
# RESULT: [actual output]
# EXPECTED: [what we expected]
# STATUS: PASS / FAIL
Rules:
development
Scan codebase for technical debt and fix safely with TDD. Use to find oversized files, duplicated code, code smells, and refactor safely. Workflow - SCAN, TEST CASES, REFACTOR, VERIFY. Keywords - techdebt, tech debt, duplicates, code quality audit.
development
Test-Driven Development workflow with strict Red-Green-Refactor cycle. Use when developing features with TDD, writing tests before code, or when test-driven approach is needed. MANDATORY order - test cases table BEFORE code, failing tests BEFORE implementation.
testing
Review work session quality and capture improvements. Use at end of session, after large tasks, after series of errors, or when user asks for session review, retrospective, lessons learned. Records improvements to backlog.
development
Improve code structure without changing behavior. Use when refactoring, cleaning up, optimizing, simplifying, extracting methods, or reducing complexity. Requires tests BEFORE refactoring, small steps, commit after each green test.