plugins/lineage/skills/lineage/SKILL.md
Use when user invokes /lineage with a column name (optionally qualified with table/schema). Also triggers on "trace this column", "where does X come from", "what reads from Y table". Traces column-level data lineage through SQL, Kafka, Spark, JDBC, and ORM codebases. Produces a structured lineage path with confidence ratings and an SVG or ASCII diagram. Do NOT use for general code explanation — use /explain instead.
npx skillsauth add harnessprotocol/harness-kit lineageInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Trace column-level data lineage through heterogeneous data stacks — SQL views, Kafka topics, JDBC writes, Spark jobs, ORM mappings. Designed for environments where there's no single tool or convention that maps the full path from source to destination.
Core principles:
User types /lineage followed by:
total_amount (searches all tables for this column)schema.table.column or table.columncustomer_id in reporting.daily_summary/lineage orders.total_amount
/lineage schema.table.column
/lineage customer_id in reporting.daily_summary
/lineage revenue
You MUST follow this order. No skipping steps.
Extract the target from the user's input:
| Input Format | Parsed As |
|--------------|-----------|
| column | column_name = column, table = unknown |
| table.column | column_name = column, table = table |
| schema.table.column | column_name = column, table = schema.table |
| column in table | column_name = column, table = table |
| column in schema.table | column_name = column, table = schema.table |
If the column name is ambiguous (exists in multiple tables and no table was specified):
*.sql, *.ddl)Search for SQL definitions that reference the column:
*.sql, *.ddl, *.hql)migrations/, db/, sql/, flyway/, liquibase/, alembic/) for SQL files referencing the columnFor each match, classify it:
Record all discovered tables, views, and their files.
For each view that references the target column:
SELECT source_col AS target_col → trace back to source_colSELECT SUM(amount) AS total_amount → trace back to amount + note the transformationSELECT * — note this as a medium-confidence mapping (the column passes through but isn't explicitly named)Search for code that writes to the target table. The search patterns depend on the technology stack:
SQL / JDBC:
INSERT INTO, MERGE INTO, UPDATE with the target table name in Java/Scala/Kotlin/Python filesPreparedStatement, JdbcTemplate, Statement, or execute in Java/Scala filesSpark:
write, save, insertInto, saveAsTable, format, or jdbc in Scala/Java/Python filesINSERT INTO or INSERT OVERWRITE with the target table name in Scala/Java/Python filesORM:
@Table with the target table name or @Column with the column name in Java/Kotlin/Scala files@Entity or @Table annotationsKafka producer:
ProducerRecord or .send( near the target topic/table name in Java/Scala/Kotlin/Python files*.yml, *.yaml, *.properties, *.conf)For each write path found:
For each upstream source found in Steps 3-4, search for code that reads from it:
SELECT, ResultSet, query, JdbcTemplate, or FROM in Java/Scala filesread, load, table, jdbc, or format in Scala/Java/Python files@KafkaListener, subscribe, consumer, poll, or deserializ in Java/Scala/Kotlin/Python filesConnect each read path to the write path that produces the target data. This is where you identify the transformation logic.
For each source table or topic found, repeat Steps 2-5 to trace further upstream.
Limits:
Search for anything that reads from the target table/view/topic:
FROM or JOIN with the target table name in SQL/Java/Scala/Python filesCREATE VIEW or CREATE MATERIALIZED in SQL files@KafkaListener, subscribe, or consumer in Java/Scala filesselect, from, read, load, or query in Java/Scala/Python filesLimits:
Rate each hop in the lineage:
| Rating | Criteria |
|--------|----------|
| High | Direct SQL column reference, explicit column mapping in code, or test that exercises this path |
| Medium | Table-level match but column mapping is through SELECT *, dynamic field mapping, or generic serialization |
| Low | Inferred from naming convention, proximity in code, or config-only reference (no code path found) |
Overall confidence = minimum of all hops.
Also note specific concerns:
SELECT * pass-throughs where the column isn't explicitly namedPresent the lineage using this template:
LINEAGE: [schema.]table.column
UPSTREAM (where does this data come from?):
[Source System] source_name
-> field: source_column (format)
-> consumed by: package.Class.method()
file: path/to/File.java:line
-> transformation: description of how data is transformed
-> writes to: [Target System] target_table.target_column
file: path/to/Writer.java:line
[Next Hop] ...
-> ...
DOWNSTREAM (what depends on this?):
[Consumer System] consumer_name
-> file: path/to/Consumer.java:line
-> uses: how the column is used (aggregation, filter, display, etc.)
CONFIDENCE: high|medium|low
[checkmark] N hops traced with file references
[warning] specific concerns about uncertain hops
Use actual checkmarks and warning symbols: ✓ and ⚠.
Primary: SVG diagram
Generate a self-contained SVG file showing the lineage graph:
Write to a file: lineage-[column_name].svg
The SVG must be self-contained (inline styles, no external dependencies) and openable in any browser.
Layout guidelines:
Fallback: ASCII diagram
If the user requests terminal-only output, or the lineage is simple (3 or fewer nodes), use ASCII:
source_topic ──> ProcessorJob ──> staging.table
│
target_view (VIEW)
│ │
downstream_1 downstream_2
End with:
| Technology | Search Patterns |
|------------|----------------|
| SQL views/tables | CREATE VIEW, CREATE TABLE, column name in SELECT, ALTER TABLE |
| JDBC | PreparedStatement, ResultSet, JdbcTemplate, table name strings, INSERT INTO, SELECT.*FROM |
| Spark SQL | .sql(", spark.read, .write, .insertInto, .saveAsTable |
| Spark DataFrame | .select(, .withColumn(, .groupBy(, .agg( near column name |
| Kafka producer | ProducerRecord, .send(, KafkaTemplate, topic name strings |
| Kafka consumer | @KafkaListener, subscribe(, poll(, ConsumerRecord, topic name strings |
| ORM/annotations | @Table, @Column, @Entity, entity class field names |
| Config files | application.yml, application.properties, *.conf for topic/table mappings |
| dbt | ref('model_name'), source('schema', 'table'), *.sql in models/ |
| Airflow/orchestration | DAG definitions referencing tables or topics |
| Mistake | Fix |
|---------|-----|
| Only checked SQL files | Also search Java/Scala/Python for JDBC writes, Spark jobs, Kafka producers |
| Stopped at the first hop | Trace recursively — most lineage paths have 3+ hops |
| Didn't check views | Views are a major lineage hop. Always check for CREATE VIEW referencing the table |
| Reported SELECT * as high confidence | SELECT * is medium confidence — the column passes through but isn't explicitly mapped |
| Generated SVG with external dependencies | SVG must be self-contained — inline all styles, no external fonts or stylesheets |
| Traced more than 5 hops upstream | Stop at 5 hops to avoid runaway. Note "further upstream not traced" |
| Didn't check config files | Topic names and table names are often mapped in config, not hardcoded |
| Guessed transformations | Only report transformations you can see in the code. "Unknown transformation" is fine. |
| Didn't ask about ambiguous columns | If a column exists in multiple tables and no table was specified, ask before tracing |
| Mixed up upstream and downstream | Upstream = where data comes FROM. Downstream = what DEPENDS on this data. |
| No confidence assessment | Every hop needs a rating. Overall confidence = minimum of all hops. |
| SVG not openable in browser | Test that the SVG is valid XML with proper namespace declaration |
development
Use when you've planned a non-trivial change and are about to implement it, finished a complex or multi-file piece of work, just wrote tests, or are stuck on repeated failures — and any time the user says "rubber duck this", "rubber ducky", "get a second opinion", "sanity-check my plan", "poke holes in this", "what am I missing", "critique my approach", "review this before I build it", or "/rubber-ducky". Spawns independent read-only critics on DIFFERENT Claude models than the one driving the session to catch blind spots, design flaws, and substantive bugs while course corrections are still cheap. Skip it only for small, obvious, well-understood changes. Do NOT use for reviewing a finished diff or PR — use /review for that; rubber-ducky pressure-tests your own in-progress thinking before and during implementation.
tools
Use when the user wants to fix, address, clear, or resolve open Dependabot security/vulnerability alerts for a repository, end to end. Fetches open alerts via the gh CLI, fixes them per ecosystem (pnpm/npm overrides + lockfile regen, cargo update, pip/go/bundler), verifies with audit and frozen-lockfile installs, then branches → commits → pushes → opens a PR, and squash-merges once CI is green — escalating only when a fix carries breaking-change risk or can't be resolved. Trigger on "/dependabot-sweep", "address the dependabot alerts", "fix the security vulnerabilities", "clear the dependabot alerts", "handle the dependency vulnerabilities", "sweep dependabot".
tools
Harness Kit documentation — installation, plugin catalog, creating plugins, cross-harness setup, architecture, and FAQ. Use when working with or configuring harness-kit plugins, understanding the plugin/skill system, installing slash commands, setting up AI coding tool configuration, answering questions about the plugin marketplace, writing SKILL.md files, using harness.yaml, or integrating with Copilot, Cursor, or Codex. Do NOT use for general Claude Code questions unrelated to harness-kit.
development
Use when user invokes /stats or asks about Claude Code usage, token consumption, session history, model distribution, or activity patterns. Generates an interactive HTML dashboard with charts and tables, auto-opens in browser. Also triggers on "how much have I used Claude", "show my usage", "token usage", "session stats", "usage report", "usage dashboard". Do NOT use for API billing or cost estimation — token counts are not costs.