Add Step (p2cs_project)

When to Use

Use this skill when:

A new pipeline step needs to be added under code/step_{N}_{name}/.
A step should be inserted between existing steps, requiring renumbering of later steps.
You need a checklist for creating the step class, substeps, README, explore.ipynb, and wiring dependencies/config.

This skill is project-specific to p2cs_project and assumes the architecture described in the root README.md.

Overview: Step Pattern

Each numbered step follows the same high-level pattern:

Directory: code/step_{N}_{snake_name}/
- step.py: main Step{N}{CamelName} class inheriting from base.Step.
- data_classes.py: data classes inheriting from DataFile subclasses (when the step has new artifacts).
- __init__.py: re-exports the main step class (and sometimes helper symbols).
- README.md: step-specific documentation (inputs, outputs, substeps, external tools).
- explore.ipynb: exploration notebook following the standard notebook structure from the root README.md.
- Numbered substeps: 1_*.py, 2_*.py, etc., each implementing focused logic.
Outputs in data/step_{N}_{snake_name}/ and figures in figures/step_{N}_{snake_name}/ are routed via base.paths.
The step is registered in:
- pipeline_config.yaml under steps: step_{N}_{snake_name}: ...
- Tests in code/tests/test_step_{N}.py.

Good templates to copy from:

Simple single-substep data step: step_4_prepare_pairs/
Multi-substep data/tooling step: step_2_organism_distance/
Modeling/evaluation step: step_6_train_model/, step_7_crosstalk_estimation/

Workflow A: Append a New Step at the End

Determine the new step index and name
- Inspect existing numbered steps under code/step_*.
- Let N_max be the largest index (currently 8 for step_8_generate_paper).
- Choose:
  - New index: N_new = N_max + 1
  - Snake name: step_{N_new}_{snake_name}
  - Class name: Step{N_new}{CamelName}
Create the step directory
- Create code/step_{N_new}_{snake_name}/ with at least:
  - __init__.py (re-export the main step class).
  - step.py (main step implementation).
  - README.md (step documentation).
  - explore.ipynb (exploration notebook).
  - One or more numbered substeps 1_*.py, 2_*.py, etc.
  - data_classes.py and/or config.json if this step defines new data types or config.
- Recommended pattern: Copy the closest existing step directory (e.g., step_4_prepare_pairs/) and rename/trim to match the new step’s responsibilities.
Implement the main step class
- In step.py:
  - Import Step and path helpers from code/base/step.py and code/base/paths.py.
  - Define a class like:
    - class Step{N_new}{CamelName}(Step):
  - Implement:
    - name and description properties (or class attributes).
    - dependencies property returning a List[str] of upstream steps, using canonical IDs like "step_1_get_p2cs_data".
    - get_input_paths() and get_output_paths() using data classes and get_step_input_path / get_step_output_path.
    - run() orchestrating any substeps via self.run_substeps(...).
Define data classes (if needed)
- In data_classes.py:
  - Inherit from appropriate DataFile subclasses (e.g., PickleDataFile, CSVDataFile, NumpyDataFile).
  - Define schemas, descriptions, and default loaders/savers as in existing steps.
- Use these data classes in get_input_paths() / get_output_paths() and in substeps.
Create numbered substeps
- Add scripts 1_*.py, 2_*.py, etc. inside the new step directory.
- Follow existing substep patterns:
  - Each substep is a small class/function using the step’s data classes and paths helpers.
  - The main step’s run() calls self.run_substeps(...) with:
    - Substep objects
    - step_numbers=[1, 2, ...]
    - descriptions=[...]
    - Appropriate on_failure mode ("strict" or "warning").
Create the step README
- In README.md, mirror the structure used in other steps:
  - Short description.
  - Inputs (data classes, upstream steps).
  - Outputs and their data classes.
  - Substeps and what they do.
  - Any external tools / configs required.
Create the explore.ipynb notebook
- Follow the standard structure from the root README.md:
  - # Imports (path setup + step/data class imports).
  - # Load Data
    - ## Load Inputs (using step.get_input_paths() + data classes).
    - ## Load Outputs.
  - # Plot
    - Display saved figures from visualization substeps first.
    - Put any extra exploratory plots after those.
  - # Notes (short list of exploration ideas).
- Respect the collapsible headings rule (heading-only markdown cells).
Wire the step into pipeline_config.yaml
- Under steps:, add a new entry:
  - Key: step_{N_new}_{snake_name}:
  - Fields: enabled, description, overwrite_outputs, optional fast_plots, and substeps:.
- Add a substeps: section keyed by the filenames (without .py), matching patterns in other steps.
Add tests
- Create code/tests/test_step_{N_new}.py by copying a nearby test (e.g., test_step_4.py) and adjusting:
  - Imports to the new step and data classes.
  - Test names and assertions to cover the new step’s behavior.
Run tests / pipeline checks
- Run pytest code/tests/test_step_{N_new}.py.
- Optionally run the step via:
  - cd code && python run_pipeline.py --step step_{N_new}_{snake_name}

Workflow B: Insert a Step in the Middle (with Renumbering)

Use this when inserting a new step between existing steps (e.g., between step_3_embed_proteins and step_4_prepare_pairs).

B1. Plan the new ordering

Identify current step order
- List existing code/step_* directories and their indices (including step_0_draw_theoretical).
Choose insertion point
- Let:
  - N_insert_after = index of the step before the new one.
  - N_new = N_insert_after + 1.
- All steps with index > N_insert_after must be shifted up by 1:
  - Old k → new k + 1 for all k > N_insert_after.
Decide the new step’s ID
- Choose:
  - New directory name: step_{N_new}_{snake_name}.
  - New class name: Step{N_new}{CamelName}.

B2. Renumber existing steps (highest → lowest)

Perform renaming from highest index down to N_insert_after + 1 to avoid collisions.

For each step index k in descending order where k > N_insert_after:

Compute new index
- k_new = k + 1.
Rename step directories
- Code: code/step_{k}_{name}/ → code/step_{k_new}_{name}/.
- Data: data/step_{k}_{name}/ → data/step_{k_new}_{name}/ (if exists).
- Figures: figures/step_{k}_{name}/ → figures/step_{k_new}_{name}/ (if exists).
Rename tests
- code/tests/test_step_{k}.py → code/tests/test_step_{k_new}.py.
Update configuration keys
- In pipeline_config.yaml, change:
  - step_{k}_{name}: → step_{k_new}_{name}:.
Update string references and imports
- Use text search for step_{k}_{name} and test_step_{k} across the repo and update to the new IDs:
  - Imports like from step_{k}_{name}....
  - Dependency lists in dependencies properties (e.g., return ["step_{k}_{name}", ...]).
  - Any key strings that reference step_{k}_{name}.
Update doc references
- In README.md files and notebooks, update any textual references to the old step name or number, if present.

B3. Add the new step

After all affected steps k > N_insert_after have been shifted to k + 1:

Create code/step_{N_new}_{snake_name}/
- Follow Workflow A, steps 2–7 to:
  - Implement step.py and data_classes.py.
  - Add numbered substeps.
  - Add README.md.
  - Add explore.ipynb.
Wire into pipeline_config.yaml
- Under steps: add:
  - step_{N_new}_{snake_name}: with its configuration and substeps.
Update dependencies
- For the new step:
  - Set dependencies to the upstream steps, using the renumbered IDs.
- For downstream steps:
  - Review their dependencies properties:
    - Replace any old IDs that were shifted, and add the new step as a dependency where appropriate.
Add test file
- Create code/tests/test_step_{N_new}.py following neighboring step tests.
Sanity check references
- Run a repo-wide search for any old step IDs (step_{k}_{name} where k was renumbered) and ensure:
  - All references are either removed or updated to the new IDs.

B4. Validate after renumbering

Run targeted tests
- Run:
  - cd code && pytest tests/test_step_{N_new}.py
  - Plus tests for all renumbered steps: test_step_{k_new}.py.
Run a dry pipeline
- Optionally run:
  - cd code && python run_pipeline.py --list-steps to confirm updated IDs and ordering.
  - cd code && python run_pipeline.py --step step_{N_new}_{snake_name} to test the new step in context.

Notebook Guidelines (Quick Reference)

When creating or editing explore.ipynb for a step:

Follow the standard sections:
- # Imports
- # Load Data
  - ## Load Inputs
  - ## Load Outputs
- # Plot
- # Notes
Ensure each heading is in its own markdown cell to enable collapsible sections.
Use the data classes for loading inputs/outputs, not raw paths.
Display visualization substep figures first under # Plot; additional exploratory plots come after.

Usage Summary

When asked to add a new step:

Decide whether it is an append (Workflow A) or insert with renumbering (Workflow B).
Follow the appropriate workflow carefully, especially:
- Directory and file naming: step_{N}_{snake_name}, test_step_{N}.py.
- Dependency updates and imports.
- pipeline_config.yaml step and substep entries.
- explore.ipynb structure and data class usage.
Always finish by running the relevant tests and, if feasible, a pipeline run of the new step.

Add Step (p2cs_project)

When to Use

Use this skill when:

A new pipeline step needs to be added under code/step_{N}_{name}/.
A step should be inserted between existing steps, requiring renumbering of later steps.
You need a checklist for creating the step class, substeps, README, explore.ipynb, and wiring dependencies/config.

This skill is project-specific to p2cs_project and assumes the architecture described in the root README.md.

Overview: Step Pattern

Each numbered step follows the same high-level pattern:

Directory: code/step_{N}_{snake_name}/
- step.py: main Step{N}{CamelName} class inheriting from base.Step.
- data_classes.py: data classes inheriting from DataFile subclasses (when the step has new artifacts).
- __init__.py: re-exports the main step class (and sometimes helper symbols).
- README.md: step-specific documentation (inputs, outputs, substeps, external tools).
- explore.ipynb: exploration notebook following the standard notebook structure from the root README.md.
- Numbered substeps: 1_*.py, 2_*.py, etc., each implementing focused logic.
Outputs in data/step_{N}_{snake_name}/ and figures in figures/step_{N}_{snake_name}/ are routed via base.paths.
The step is registered in:
- pipeline_config.yaml under steps: step_{N}_{snake_name}: ...
- Tests in code/tests/test_step_{N}.py.

Good templates to copy from:

Simple single-substep data step: step_4_prepare_pairs/
Multi-substep data/tooling step: step_2_organism_distance/
Modeling/evaluation step: step_6_train_model/, step_7_crosstalk_estimation/

Workflow A: Append a New Step at the End

Determine the new step index and name
- Inspect existing numbered steps under code/step_*.
- Let N_max be the largest index (currently 8 for step_8_generate_paper).
- Choose:
  - New index: N_new = N_max + 1
  - Snake name: step_{N_new}_{snake_name}
  - Class name: Step{N_new}{CamelName}
Create the step directory
- Create code/step_{N_new}_{snake_name}/ with at least:
  - __init__.py (re-export the main step class).
  - step.py (main step implementation).
  - README.md (step documentation).
  - explore.ipynb (exploration notebook).
  - One or more numbered substeps 1_*.py, 2_*.py, etc.
  - data_classes.py and/or config.json if this step defines new data types or config.
- Recommended pattern: Copy the closest existing step directory (e.g., step_4_prepare_pairs/) and rename/trim to match the new step’s responsibilities.
Implement the main step class
- In step.py:
  - Import Step and path helpers from code/base/step.py and code/base/paths.py.
  - Define a class like:
    - class Step{N_new}{CamelName}(Step):
  - Implement:
    - name and description properties (or class attributes).
    - dependencies property returning a List[str] of upstream steps, using canonical IDs like "step_1_get_p2cs_data".
    - get_input_paths() and get_output_paths() using data classes and get_step_input_path / get_step_output_path.
    - run() orchestrating any substeps via self.run_substeps(...).
Define data classes (if needed)
- In data_classes.py:
  - Inherit from appropriate DataFile subclasses (e.g., PickleDataFile, CSVDataFile, NumpyDataFile).
  - Define schemas, descriptions, and default loaders/savers as in existing steps.
- Use these data classes in get_input_paths() / get_output_paths() and in substeps.
Create numbered substeps
- Add scripts 1_*.py, 2_*.py, etc. inside the new step directory.
- Follow existing substep patterns:
  - Each substep is a small class/function using the step’s data classes and paths helpers.
  - The main step’s run() calls self.run_substeps(...) with:
    - Substep objects
    - step_numbers=[1, 2, ...]
    - descriptions=[...]
    - Appropriate on_failure mode ("strict" or "warning").
Create the step README
- In README.md, mirror the structure used in other steps:
  - Short description.
  - Inputs (data classes, upstream steps).
  - Outputs and their data classes.
  - Substeps and what they do.
  - Any external tools / configs required.
Create the explore.ipynb notebook
- Follow the standard structure from the root README.md:
  - # Imports (path setup + step/data class imports).
  - # Load Data
    - ## Load Inputs (using step.get_input_paths() + data classes).
    - ## Load Outputs.
  - # Plot
    - Display saved figures from visualization substeps first.
    - Put any extra exploratory plots after those.
  - # Notes (short list of exploration ideas).
- Respect the collapsible headings rule (heading-only markdown cells).
Wire the step into pipeline_config.yaml
- Under steps:, add a new entry:
  - Key: step_{N_new}_{snake_name}:
  - Fields: enabled, description, overwrite_outputs, optional fast_plots, and substeps:.
- Add a substeps: section keyed by the filenames (without .py), matching patterns in other steps.
Add tests
- Create code/tests/test_step_{N_new}.py by copying a nearby test (e.g., test_step_4.py) and adjusting:
  - Imports to the new step and data classes.
  - Test names and assertions to cover the new step’s behavior.
Run tests / pipeline checks
- Run pytest code/tests/test_step_{N_new}.py.
- Optionally run the step via:
  - cd code && python run_pipeline.py --step step_{N_new}_{snake_name}

Workflow B: Insert a Step in the Middle (with Renumbering)

Use this when inserting a new step between existing steps (e.g., between step_3_embed_proteins and step_4_prepare_pairs).

B1. Plan the new ordering

Identify current step order
- List existing code/step_* directories and their indices (including step_0_draw_theoretical).
Choose insertion point
- Let:
  - N_insert_after = index of the step before the new one.
  - N_new = N_insert_after + 1.
- All steps with index > N_insert_after must be shifted up by 1:
  - Old k → new k + 1 for all k > N_insert_after.
Decide the new step’s ID
- Choose:
  - New directory name: step_{N_new}_{snake_name}.
  - New class name: Step{N_new}{CamelName}.

B2. Renumber existing steps (highest → lowest)

Perform renaming from highest index down to N_insert_after + 1 to avoid collisions.

For each step index k in descending order where k > N_insert_after:

Compute new index
- k_new = k + 1.
Rename step directories
- Code: code/step_{k}_{name}/ → code/step_{k_new}_{name}/.
- Data: data/step_{k}_{name}/ → data/step_{k_new}_{name}/ (if exists).
- Figures: figures/step_{k}_{name}/ → figures/step_{k_new}_{name}/ (if exists).
Rename tests
- code/tests/test_step_{k}.py → code/tests/test_step_{k_new}.py.
Update configuration keys
- In pipeline_config.yaml, change:
  - step_{k}_{name}: → step_{k_new}_{name}:.
Update string references and imports
- Use text search for step_{k}_{name} and test_step_{k} across the repo and update to the new IDs:
  - Imports like from step_{k}_{name}....
  - Dependency lists in dependencies properties (e.g., return ["step_{k}_{name}", ...]).
  - Any key strings that reference step_{k}_{name}.
Update doc references
- In README.md files and notebooks, update any textual references to the old step name or number, if present.

B3. Add the new step

After all affected steps k > N_insert_after have been shifted to k + 1:

Create code/step_{N_new}_{snake_name}/
- Follow Workflow A, steps 2–7 to:
  - Implement step.py and data_classes.py.
  - Add numbered substeps.
  - Add README.md.
  - Add explore.ipynb.
Wire into pipeline_config.yaml
- Under steps: add:
  - step_{N_new}_{snake_name}: with its configuration and substeps.
Update dependencies
- For the new step:
  - Set dependencies to the upstream steps, using the renumbered IDs.
- For downstream steps:
  - Review their dependencies properties:
    - Replace any old IDs that were shifted, and add the new step as a dependency where appropriate.
Add test file
- Create code/tests/test_step_{N_new}.py following neighboring step tests.
Sanity check references
- Run a repo-wide search for any old step IDs (step_{k}_{name} where k was renumbered) and ensure:
  - All references are either removed or updated to the new IDs.

B4. Validate after renumbering

Run targeted tests
- Run:
  - cd code && pytest tests/test_step_{N_new}.py
  - Plus tests for all renumbered steps: test_step_{k_new}.py.
Run a dry pipeline
- Optionally run:
  - cd code && python run_pipeline.py --list-steps to confirm updated IDs and ordering.
  - cd code && python run_pipeline.py --step step_{N_new}_{snake_name} to test the new step in context.

Notebook Guidelines (Quick Reference)

When creating or editing explore.ipynb for a step:

Follow the standard sections:
- # Imports
- # Load Data
  - ## Load Inputs
  - ## Load Outputs
- # Plot
- # Notes
Ensure each heading is in its own markdown cell to enable collapsible sections.
Use the data classes for loading inputs/outputs, not raw paths.
Display visualization substep figures first under # Plot; additional exploratory plots come after.

Usage Summary

When asked to add a new step:

Decide whether it is an append (Workflow A) or insert with renumbering (Workflow B).
Follow the appropriate workflow carefully, especially:
- Directory and file naming: step_{N}_{snake_name}, test_step_{N}.py.
- Dependency updates and imports.
- pipeline_config.yaml step and substep entries.
- explore.ipynb structure and data class usage.
Always finish by running the relevant tests and, if feasible, a pipeline run of the new step.

Adoption

segal-noam/add-step

$ install --global

Security Scan Results

SKILL.md

Add Step (p2cs_project)

When to Use

Overview: Step Pattern

Workflow A: Append a New Step at the End

Workflow B: Insert a Step in the Middle (with Renumbering)

B1. Plan the new ordering

B2. Renumber existing steps (highest → lowest)

B3. Add the new step

B4. Validate after renumbering

Notebook Guidelines (Quick Reference)

Usage Summary

Related Skills

openclaw/openclaw-secret-scanning-maintainer

openclaw/openclaw-release-maintainer

openclaw/openclaw-qa-testing

openclaw/openclaw-parallels-smoke

segal-noam/add-step

$ install --global

Security Scan Results

SKILL.md

Add Step (p2cs_project)

When to Use

Overview: Step Pattern

Workflow A: Append a New Step at the End

Workflow B: Insert a Step in the Middle (with Renumbering)

B1. Plan the new ordering

B2. Renumber existing steps (highest → lowest)

B3. Add the new step

B4. Validate after renumbering

Notebook Guidelines (Quick Reference)

Usage Summary

Related Skills

openclaw/openclaw-secret-scanning-maintainer

openclaw/openclaw-release-maintainer

openclaw/openclaw-qa-testing

openclaw/openclaw-parallels-smoke