Data Lineage Architecture Lens

Cognitive Mode: Data-Centric Primary Question: "Where is the data?" Focus: Information Flow, Transformations, Storage Locations, Format Conversions

When to Use

Need to understand how data flows through the system
Documenting data transformations and conversions
Identifying storage destinations and access patterns
User invokes /autoskillit:arch-lens-data-lineage or /autoskillit:make-arch-diag data

Critical Constraints

NEVER:

Modify any source code files
Focus on runtime behavior (that's process flow lens)
Show static structure without data context
Run subagents in the background (run_in_background: true is prohibited)

ALWAYS:

Trace data from INPUT to STORAGE
Show transformation stages and format changes
Identify the single source of truth
Distinguish read vs write operations
BEFORE creating any diagram, LOAD the /autoskillit:mermaid skill using the Skill tool - this is MANDATORY
If the Skill tool cannot be used (disable-model-invocation) or refuses this invocation, do NOT proceed with diagram creation. Abort this step and omit the diagram from output.
After writing the diagram file, emit the absolute path as a structured output token as your final output. Resolve the relative temp/arch-lens-data-lineage/... save path to absolute by prepending the full CWD:
```
diagram_path = /absolute/cwd/temp/arch-lens-data-lineage/{filename}.md
```
This token is MANDATORY — the pipeline cannot proceed without it.

Arguments

/autoskillit:arch-lens-data-lineage [context_path]

context_path (optional) — Absolute path to a PR context file containing new files (★-prefixed) and modified files (●-prefixed) from the PR diff. When provided, read this file before beginning analysis and focus the diagram on the architectural areas affected by these specific files. When absent, explore the full CWD.

Analysis Workflow

Step 0: Read PR context (when provided)

If a context_path positional argument is present:

Read the file at context_path
Extract: new files list (★-prefixed), modified files list (●-prefixed)
Focus Step 1 exploration on the modules/components these files belong to
Apply ★ prefix on diagram nodes representing new files/components
Apply ● prefix on diagram nodes representing modified files/components

If no context_path is provided, skip this step and explore the full CWD in Step 1.

Step 1: Launch Parallel Exploration Subagents

Spawn Explore subagents to investigate:

Data Origins (Inputs)

Find user input handling
Identify external data sources
Look for: CLI args, API requests, file reads, imports, user input, data ingestion

Transformation Stages

Find data conversion/transformation code
Identify adapters and converters
Look for: Adapter, Converter, transform, parse, serialize, from_, to_, mapping, conversion

Format Changes

Find schema definitions and conversions
Identify format boundaries (JSON, XML, protobuf, etc.)
Look for: schema models, type definitions, serialization, deserialization, format conversion

Storage Destinations

Find database operations
Identify file outputs
Look for: database operations, persistence, .save(), .create(), .write(), storage

Access Patterns

Find data retrieval code
Identify query patterns
Look for: .get(), .query(), .find(), .load(), read operations, data access layer

Step 2: Map Data Flow

Document the journey of key data entities:

Origin: Where does it come from?
Transformations: What changes happen?
Storage: Where is it persisted?
Retrieval: How is it accessed later?

CRITICAL - Analyze Read/Write Direction: For EVERY storage location and data flow:

Read sources (inputs): Components that READ from this location
Write destinations (outputs): Components that WRITE to this location
Read-write (primary storage): Both read and written by the system
Write-only (artifacts): Written but NEVER read back by the system

Clearly distinguish:

Primary storage (source of truth) - system reads AND writes
Write-only artifacts (debugging, logging) - system writes but never reads back
External inputs - system reads only

Use different arrow styles:

Solid arrows for read/write primary storage
Dashed arrows for write-only artifacts

Step 3: Identify Conversion Boundaries

Find format changes:

External format -> Internal format
Internal format -> Database format
Database format -> API response
Note naming convention changes

Step 4: Create the Diagram

Use flowchart with:

Direction: LR (left-to-right) for data flow, or TB for hierarchical

Subgraphs for Stages:

Input/Origins
Transformation/Processing
Storage (primary)
Artifacts (secondary/write-only)
External Sync (if applicable)

Node Styling:

cli class: Data origins, user input
handler class: Transformation, adapters
stateNode class: Database tables, primary storage
output class: Write-only artifacts, files
integration class: External sync, APIs

Connection Types:

Solid arrows for primary data flow
Dashed arrows for write-only/secondary
Label with operation names

Database Nodes:

Use cylinder shape: [(Label)]
Show table relationships

Step 5: Write Output

Write the diagram to: {{AUTOSKILLIT_TEMP}}/arch-lens-data-lineage/arch_diag_data_lineage_{YYYY-MM-DD_HHMMSS}.md (relative to the current working directory)

After writing the diagram file, emit a structured output line:

IMPORTANT: Emit the structured output tokens as literal plain text with no markdown formatting on the token names. Do not wrap token names in **bold**, *italic*, or any other markdown. The adjudicator performs a regex match on the exact token name — decorators cause match failure.

diagram_path = {absolute_path_to_diagram_file}

Output Template

# Data Lineage Diagram: {System Name}

**Lens:** Data Lineage (Data-Centric)
**Question:** Where is the data?
**Date:** {YYYY-MM-DD}
**Scope:** {What was analyzed}

## Data Flow Overview

| Stage | Format | Key Transformation |
|-------|--------|-------------------|
| Input | {format} | {description} |
| Processing | {format} | {description} |
| Storage | {format} | {description} |

## Lineage Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart LR
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph Input ["Data Origins"]
        USER["User Input<br/>━━━━━━━━━━<br/>Source type<br/>Format"]
    end

    subgraph Transform ["Transformation"]
        direction TB
        ADAPTER["Adapter<br/>━━━━━━━━━━<br/>Conversion type"]
    end

    subgraph Storage ["Primary Storage (Source of Truth)"]
        direction TB
        DB[("Database Table<br/>━━━━━━━━━━<br/>Key fields")]
    end

    subgraph Artifacts ["Write-Only Artifacts"]
        direction TB
        FILE["output.json<br/>━━━━━━━━━━<br/>For debugging"]
    end

    %% FLOWS %%
    USER -->|"input"| ADAPTER
    ADAPTER -->|"save()"| DB
    DB -.->|"write-only"| FILE

    %% CLASS ASSIGNMENTS %%
    class USER cli;
    class ADAPTER handler;
    class DB stateNode;
    class FILE output;

Color Legend: | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Input | Data origins (user, external) | | Orange | Transform | Format conversion and adapters | | Teal | Storage | Primary storage (source of truth) | | Dark Teal | Artifacts | Write-only outputs | | Red | Sync | External sync services |

Data Transformation Summary

| Stage | Format | Key Conversion | |-------|--------|----------------| | {stage} | {format} | {conversion} |

Storage Destinations

| Entity | Primary Storage | Secondary | Access Pattern | |--------|-----------------|-----------|----------------| | {entity} | {location} | {artifact} | {how accessed} |

Critical Design Principle

Source of Truth: {e.g., "Database is single source of truth. File outputs are write-only."}


---

## Pre-Diagram Checklist

Before creating the diagram, verify:

- [ ] LOADED `/autoskillit:mermaid` skill using the Skill tool
- [ ] Using ONLY classDef styles from the mermaid skill (no invented colors)
- [ ] Diagram will include a color legend table

---

## Related Skills

- `/autoskillit:make-arch-diag` - Parent skill for lens selection
- `/autoskillit:mermaid` - MUST BE LOADED before creating diagram
- `/autoskillit:arch-lens-c4-container` - For container-level storage view

Data Lineage Architecture Lens

Cognitive Mode: Data-Centric Primary Question: "Where is the data?" Focus: Information Flow, Transformations, Storage Locations, Format Conversions

When to Use

Need to understand how data flows through the system
Documenting data transformations and conversions
Identifying storage destinations and access patterns
User invokes /autoskillit:arch-lens-data-lineage or /autoskillit:make-arch-diag data

Critical Constraints

NEVER:

Modify any source code files
Focus on runtime behavior (that's process flow lens)
Show static structure without data context
Run subagents in the background (run_in_background: true is prohibited)

ALWAYS:

Trace data from INPUT to STORAGE
Show transformation stages and format changes
Identify the single source of truth
Distinguish read vs write operations
BEFORE creating any diagram, LOAD the /autoskillit:mermaid skill using the Skill tool - this is MANDATORY
If the Skill tool cannot be used (disable-model-invocation) or refuses this invocation, do NOT proceed with diagram creation. Abort this step and omit the diagram from output.
After writing the diagram file, emit the absolute path as a structured output token as your final output. Resolve the relative temp/arch-lens-data-lineage/... save path to absolute by prepending the full CWD:
```
diagram_path = /absolute/cwd/temp/arch-lens-data-lineage/{filename}.md
```
This token is MANDATORY — the pipeline cannot proceed without it.

Arguments

/autoskillit:arch-lens-data-lineage [context_path]

context_path (optional) — Absolute path to a PR context file containing new files (★-prefixed) and modified files (●-prefixed) from the PR diff. When provided, read this file before beginning analysis and focus the diagram on the architectural areas affected by these specific files. When absent, explore the full CWD.

Analysis Workflow

Step 0: Read PR context (when provided)

If a context_path positional argument is present:

Read the file at context_path
Extract: new files list (★-prefixed), modified files list (●-prefixed)
Focus Step 1 exploration on the modules/components these files belong to
Apply ★ prefix on diagram nodes representing new files/components
Apply ● prefix on diagram nodes representing modified files/components

If no context_path is provided, skip this step and explore the full CWD in Step 1.

Step 1: Launch Parallel Exploration Subagents

Spawn Explore subagents to investigate:

Data Origins (Inputs)

Find user input handling
Identify external data sources
Look for: CLI args, API requests, file reads, imports, user input, data ingestion

Transformation Stages

Find data conversion/transformation code
Identify adapters and converters
Look for: Adapter, Converter, transform, parse, serialize, from_, to_, mapping, conversion

Format Changes

Find schema definitions and conversions
Identify format boundaries (JSON, XML, protobuf, etc.)
Look for: schema models, type definitions, serialization, deserialization, format conversion

Storage Destinations

Find database operations
Identify file outputs
Look for: database operations, persistence, .save(), .create(), .write(), storage

Access Patterns

Find data retrieval code
Identify query patterns
Look for: .get(), .query(), .find(), .load(), read operations, data access layer

Step 2: Map Data Flow

Document the journey of key data entities:

Origin: Where does it come from?
Transformations: What changes happen?
Storage: Where is it persisted?
Retrieval: How is it accessed later?

CRITICAL - Analyze Read/Write Direction: For EVERY storage location and data flow:

Read sources (inputs): Components that READ from this location
Write destinations (outputs): Components that WRITE to this location
Read-write (primary storage): Both read and written by the system
Write-only (artifacts): Written but NEVER read back by the system

Clearly distinguish:

Primary storage (source of truth) - system reads AND writes
Write-only artifacts (debugging, logging) - system writes but never reads back
External inputs - system reads only

Use different arrow styles:

Solid arrows for read/write primary storage
Dashed arrows for write-only artifacts

Step 3: Identify Conversion Boundaries

Find format changes:

External format -> Internal format
Internal format -> Database format
Database format -> API response
Note naming convention changes

Step 4: Create the Diagram

Use flowchart with:

Direction: LR (left-to-right) for data flow, or TB for hierarchical

Subgraphs for Stages:

Input/Origins
Transformation/Processing
Storage (primary)
Artifacts (secondary/write-only)
External Sync (if applicable)

Node Styling:

cli class: Data origins, user input
handler class: Transformation, adapters
stateNode class: Database tables, primary storage
output class: Write-only artifacts, files
integration class: External sync, APIs

Connection Types:

Solid arrows for primary data flow
Dashed arrows for write-only/secondary
Label with operation names

Database Nodes:

Use cylinder shape: [(Label)]
Show table relationships

Step 5: Write Output

Write the diagram to: {{AUTOSKILLIT_TEMP}}/arch-lens-data-lineage/arch_diag_data_lineage_{YYYY-MM-DD_HHMMSS}.md (relative to the current working directory)

After writing the diagram file, emit a structured output line:

IMPORTANT: Emit the structured output tokens as literal plain text with no markdown formatting on the token names. Do not wrap token names in **bold**, *italic*, or any other markdown. The adjudicator performs a regex match on the exact token name — decorators cause match failure.

diagram_path = {absolute_path_to_diagram_file}

Output Template

# Data Lineage Diagram: {System Name}

**Lens:** Data Lineage (Data-Centric)
**Question:** Where is the data?
**Date:** {YYYY-MM-DD}
**Scope:** {What was analyzed}

## Data Flow Overview

| Stage | Format | Key Transformation |
|-------|--------|-------------------|
| Input | {format} | {description} |
| Processing | {format} | {description} |
| Storage | {format} | {description} |

## Lineage Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart LR
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph Input ["Data Origins"]
        USER["User Input<br/>━━━━━━━━━━<br/>Source type<br/>Format"]
    end

    subgraph Transform ["Transformation"]
        direction TB
        ADAPTER["Adapter<br/>━━━━━━━━━━<br/>Conversion type"]
    end

    subgraph Storage ["Primary Storage (Source of Truth)"]
        direction TB
        DB[("Database Table<br/>━━━━━━━━━━<br/>Key fields")]
    end

    subgraph Artifacts ["Write-Only Artifacts"]
        direction TB
        FILE["output.json<br/>━━━━━━━━━━<br/>For debugging"]
    end

    %% FLOWS %%
    USER -->|"input"| ADAPTER
    ADAPTER -->|"save()"| DB
    DB -.->|"write-only"| FILE

    %% CLASS ASSIGNMENTS %%
    class USER cli;
    class ADAPTER handler;
    class DB stateNode;
    class FILE output;

Data Transformation Summary

| Stage | Format | Key Conversion | |-------|--------|----------------| | {stage} | {format} | {conversion} |

Storage Destinations

| Entity | Primary Storage | Secondary | Access Pattern | |--------|-----------------|-----------|----------------| | {entity} | {location} | {artifact} | {how accessed} |

Critical Design Principle

Source of Truth: {e.g., "Database is single source of truth. File outputs are write-only."}


---

## Pre-Diagram Checklist

Before creating the diagram, verify:

- [ ] LOADED `/autoskillit:mermaid` skill using the Skill tool
- [ ] Using ONLY classDef styles from the mermaid skill (no invented colors)
- [ ] Diagram will include a color legend table

---

## Related Skills

- `/autoskillit:make-arch-diag` - Parent skill for lens selection
- `/autoskillit:mermaid` - MUST BE LOADED before creating diagram
- `/autoskillit:arch-lens-c4-container` - For container-level storage view

Adoption

talont-org/arch-lens-data-lineage

$ install --global

Security Scan Results

SKILL.md

Data Lineage Architecture Lens

When to Use

Critical Constraints

Arguments

Analysis Workflow

Step 0: Read PR context (when provided)

Step 1: Launch Parallel Exploration Subagents

Step 2: Map Data Flow

Step 3: Identify Conversion Boundaries

Step 4: Create the Diagram

Step 5: Write Output

Output Template

Data Transformation Summary

Storage Destinations

Critical Design Principle

Related Skills

talont-org/write-recipe

talont-org/vis-lens-uncertainty

talont-org/vis-lens-temporal

talont-org/vis-lens-story-arc

talont-org/arch-lens-data-lineage

$ install --global

Security Scan Results

SKILL.md

Data Lineage Architecture Lens

When to Use

Critical Constraints

Arguments

Analysis Workflow

Step 0: Read PR context (when provided)

Step 1: Launch Parallel Exploration Subagents

Step 2: Map Data Flow

Step 3: Identify Conversion Boundaries

Step 4: Create the Diagram

Step 5: Write Output

Output Template

Data Transformation Summary

Storage Destinations

Critical Design Principle

Related Skills

talont-org/write-recipe

talont-org/vis-lens-uncertainty

talont-org/vis-lens-temporal

talont-org/vis-lens-story-arc