GPT Prompt Maintenance

Goal

Maintain prompt artifacts so GPT-series models receive clear, outcome-first instructions with the constraints, evidence guidance, validation, and stop rules needed to complete the user's requested work without unnecessary process noise.

Success Criteria

A good prompt-maintenance result:

Preserves the target artifact's intent, audience, scope, and authorization boundary.
Defines the outcome, success criteria, constraints, evidence or retrieval budget, output shape, validation expectations, and stop rules when they affect behavior.
Removes or compresses legacy process-heavy instructions unless order is required for correctness.
Reserves absolute words such as always, never, must, and only for true invariants: safety, exact output contracts, irreversible actions, required fields, or tool syntax.
Keeps personality and collaboration guidance short enough to shape behavior without replacing the task goal.
Makes model-specific claims only from user-provided guidance, official OpenAI documentation, or explicitly labeled inference.
Reports what changed, what was validated, and what was intentionally left outside scope.

Boundaries

Use this skill once a prompt artifact exists to maintain: a SKILL.md, rules file, system/developer prompt, product prompt, agent workflow prompt, eval prompt, grader prompt, or comparable reusable instruction block.

Use skill-creator instead when the main task is to create a brand-new skill, design skill evals, benchmark a skill, or optimize skill triggering from scratch.

Use context-engineering when the main question is where a rule belongs: global rules, project rules, a skill, tooling, task context, or an external system — unless the prompt artifact is already selected or the work is specifically GPT prompt quality.

Use writing or editing skills for ordinary prose polishing. Do not turn blog drafts, documentation prose, customer copy, or creative writing into prompt maintenance unless the text is itself an instruction to a model.

Do not broaden the artifact's behavior, target model family, tool access, external side effects, sync state, commits, deployment, or persistent configuration unless the user explicitly asks.

Evidence And Retrieval Budget

Read the target prompt artifact first. If the task mentions a current GPT model, OpenAI prompt guidance, migration, or model-specific behavior, read the user-provided guidance or current official OpenAI documentation before making model-specific edits.

Continue retrieval only when:

a required source artifact, prompt version, model target, or product surface is missing
the prompt's current behavior depends on tool contracts, surrounding rules, code, or eval results
the user asks for comprehensive coverage, batch migration, or comparison
a model-specific recommendation would otherwise be unsupported

Stop retrieval once the core rewrite can be justified; do not search again for phrasing, decorative examples, or noncritical background.

When the source is a local file supplied by the user, prefer that file over web search. Use official OpenAI sources for external refreshes unless the user requests another source.

Prompt Types

Agent Or System Prompts

Optimize for role clarity, collaboration style, tool behavior, evidence discipline, validation, and stop conditions. Keep personality short. Separate how the assistant sounds from how it works.

For long-running or tool-heavy workflows, include a short user-visible preamble rule when the host supports intermediate messages. For Responses API workflows that replay assistant items manually, preserve phase values exactly when the artifact controls replay behavior.

Product Assistant Prompts

Define the user's visible outcome, what completion means, what actions are allowed, and what the final answer should contain. Include fallback behavior for missing evidence, unavailable tools, or unsupported requests.

For customer-facing text, define tone and length, but do not let tone instructions obscure policy, evidence, or action boundaries.

Agent Workflow Prompts

Prefer decision rules over fixed sequences. Keep required order only for fragile operations, safety checks, exact tool syntax, validation integrity, or irreversible side effects.

For coding agents, require concrete validation commands when available and an explicit explanation when validation cannot run.

Skill Prompts

For SKILL.md, keep frontmatter description: focused on trigger conditions, near-miss exclusions, and routing. Put reusable workflow guidance in the body.

The body should start with outcome and constraints before process. Keep references, large examples, scripts, and domain-specific details outside SKILL.md when progressive disclosure would reduce context load.

For skills with user-stated acceptance, safety, parity, quality, or evidence invariants, promote each invariant into a hard gate. A hard gate defines when it activates, what fields or artifacts are required, which weaker substitutes are insufficient, what self-review should catch, and what evidence is required before completion.

When modifying existing skills, do not rename directories, update indexes, or sync installed skills unless the user asked for those operations.

Eval And Grader Prompts

Make the task, inputs, scoring criteria, and output schema explicit. Avoid leaking expected answers into the prompt under test unless that is the purpose of the eval. Keep grading rubrics atomic enough that pass/fail evidence is inspectable.

Audit Checklist

Before rewriting, identify which defects actually matter for the user's goal:

missing or vague target outcome
success criteria buried in process steps
over-specified sequence where judgment would be better
unbounded search, tool use, retries, or iteration
unsupported claims about models, users, policy, product facts, or data
broad always / never rules used for judgment calls
personality text that is longer than the task contract
formatting instructions that make simple answers too heavy
no validation path for code, data, visual, or document outputs
no stop condition after enough evidence is collected
hard acceptance requirements present only in examples, middle prose, or should language
output contracts that omit evidence required by a stated acceptance, parity, safety, or quality gate
stop rules that allow completion without required artifacts, measurements, approvals, or explicit waivers
prompt content that changes the artifact's product behavior without authorization

Do not force every prompt into the same template. Add sections only when they change behavior or make maintenance safer.

Rewrite Rules

Preserve the artifact's requested behavior first. Improve clarity, ordering, and enforceability without adding new product requirements, facts, capabilities, tools, or obligations.

Use the shortest structure that covers the real risk:

Role:
Goal:
Success Criteria:
Constraints:
Evidence:
Output:
Stop Rules:

For small prompts, this can be compressed into a few paragraphs. For durable agent prompts, explicit headings are usually worth the space.

Convert brittle sequences into decision rules:

Bad: first do A, then B, then C, then explain every step.
Better: collect the minimum evidence needed to answer; continue only if a required fact is missing, sources conflict, or the user requested exhaustive coverage.

For retrieval, include a budget:

Read the directly relevant source first. Make another retrieval call only when a
required fact, source, ID, date, parameter, or comparison target is missing.
Stop once the core answer can be supported.

For rewriting, summary, and customer-facing outputs, state what to preserve:

Preserve the requested artifact, length, structure, genre, and factual claims.
Improve clarity and flow without adding new claims, extra sections, or a more
promotional tone unless explicitly requested.

For creative drafting, separate source-backed facts from allowed creative wording. Use placeholders or labeled assumptions instead of inventing metrics, customer names, roadmap status, capabilities, or dates.

For reasoning guidance, ask for concise rationale, checks, or evidence in the final answer when useful. Do not ask the model to reveal hidden chain of thought.

Batch Maintenance

When maintaining multiple prompt artifacts, process each artifact on its own terms. Collapse them into a shared rewrite only when the user asks for a common template.

For each artifact:

read the current file
identify its task, audience, tools, risks, and near-miss exclusions
decide which GPT prompt-guidance rules apply
edit only the needed parts
validate that local naming, schemas, or index rules still hold

If the user asks for independent review per artifact, create one worker per artifact when available and aggregate their findings. If worker capacity is limited, batch workers and record the batching constraint. Do not pretend a single local pass was independent worker review.

Validation

Choose validation that matches the artifact:

For edited files, inspect the diff for accidental scope expansion.
For SKILL.md, verify frontmatter name: matches the directory and that trigger descriptions are still focused on use and near-miss exclusions.
For added, removed, or renamed skills, update the repo's public index if the repo has one, then run the repo's skill inventory and stale-name checks.
For coding-agent prompts, include or run relevant test, typecheck, lint, build, or smoke-test commands when the prompt change is tied to code behavior.
For visual or document prompts, render or inspect the artifact when feasible.
For eval or grader prompts, test at least one should-pass and one should-fail case when practical.

After rewriting a prompt that carries user acceptance, safety, parity, or quality invariants, run an adversarial check: could an agent follow the rewritten prompt and skip the invariant while still sounding compliant? If so, strengthen the gate, required output field, or stop rule before finishing.

If validation cannot run, say why and name the next best check.

Output

For rewrite tasks, provide:

changed files or rewritten prompt text
concise summary of behavioral changes
validation performed
any intentionally excluded work

For review-only tasks, lead with findings ordered by impact, then give concrete rewrite recommendations. Say that no files changed.

For batch work, summarize per artifact. Avoid hiding individual decisions behind a generic theme.

Stop Rules

Stop when the requested prompt artifacts are rewritten or reviewed, evidence and validation status are stated, and out-of-scope improvements are separated.

Ask one narrow question only when missing information would change the target artifact, model family, product behavior, authorization boundary, or validation claim.

Stop optimizing wording once the prompt is clear, scoped, supported, and testable.

GPT Prompt Maintenance

Goal

Success Criteria

A good prompt-maintenance result:

Preserves the target artifact's intent, audience, scope, and authorization boundary.
Defines the outcome, success criteria, constraints, evidence or retrieval budget, output shape, validation expectations, and stop rules when they affect behavior.
Removes or compresses legacy process-heavy instructions unless order is required for correctness.
Reserves absolute words such as always, never, must, and only for true invariants: safety, exact output contracts, irreversible actions, required fields, or tool syntax.
Keeps personality and collaboration guidance short enough to shape behavior without replacing the task goal.
Makes model-specific claims only from user-provided guidance, official OpenAI documentation, or explicitly labeled inference.
Reports what changed, what was validated, and what was intentionally left outside scope.

Boundaries

Use skill-creator instead when the main task is to create a brand-new skill, design skill evals, benchmark a skill, or optimize skill triggering from scratch.

Do not broaden the artifact's behavior, target model family, tool access, external side effects, sync state, commits, deployment, or persistent configuration unless the user explicitly asks.

Evidence And Retrieval Budget

Continue retrieval only when:

a required source artifact, prompt version, model target, or product surface is missing
the prompt's current behavior depends on tool contracts, surrounding rules, code, or eval results
the user asks for comprehensive coverage, batch migration, or comparison
a model-specific recommendation would otherwise be unsupported

Stop retrieval once the core rewrite can be justified; do not search again for phrasing, decorative examples, or noncritical background.

When the source is a local file supplied by the user, prefer that file over web search. Use official OpenAI sources for external refreshes unless the user requests another source.

Prompt Types

Agent Or System Prompts

Optimize for role clarity, collaboration style, tool behavior, evidence discipline, validation, and stop conditions. Keep personality short. Separate how the assistant sounds from how it works.

Product Assistant Prompts

For customer-facing text, define tone and length, but do not let tone instructions obscure policy, evidence, or action boundaries.

Agent Workflow Prompts

Prefer decision rules over fixed sequences. Keep required order only for fragile operations, safety checks, exact tool syntax, validation integrity, or irreversible side effects.

For coding agents, require concrete validation commands when available and an explicit explanation when validation cannot run.

Skill Prompts

For SKILL.md, keep frontmatter description: focused on trigger conditions, near-miss exclusions, and routing. Put reusable workflow guidance in the body.

When modifying existing skills, do not rename directories, update indexes, or sync installed skills unless the user asked for those operations.

Eval And Grader Prompts

Audit Checklist

Before rewriting, identify which defects actually matter for the user's goal:

missing or vague target outcome
success criteria buried in process steps
over-specified sequence where judgment would be better
unbounded search, tool use, retries, or iteration
unsupported claims about models, users, policy, product facts, or data
broad always / never rules used for judgment calls
personality text that is longer than the task contract
formatting instructions that make simple answers too heavy
no validation path for code, data, visual, or document outputs
no stop condition after enough evidence is collected
hard acceptance requirements present only in examples, middle prose, or should language
output contracts that omit evidence required by a stated acceptance, parity, safety, or quality gate
stop rules that allow completion without required artifacts, measurements, approvals, or explicit waivers
prompt content that changes the artifact's product behavior without authorization

Do not force every prompt into the same template. Add sections only when they change behavior or make maintenance safer.

Rewrite Rules

Preserve the artifact's requested behavior first. Improve clarity, ordering, and enforceability without adding new product requirements, facts, capabilities, tools, or obligations.

Use the shortest structure that covers the real risk:

Role:
Goal:
Success Criteria:
Constraints:
Evidence:
Output:
Stop Rules:

For small prompts, this can be compressed into a few paragraphs. For durable agent prompts, explicit headings are usually worth the space.

Convert brittle sequences into decision rules:

Bad: first do A, then B, then C, then explain every step.
Better: collect the minimum evidence needed to answer; continue only if a required fact is missing, sources conflict, or the user requested exhaustive coverage.

For retrieval, include a budget:

Read the directly relevant source first. Make another retrieval call only when a
required fact, source, ID, date, parameter, or comparison target is missing.
Stop once the core answer can be supported.

For rewriting, summary, and customer-facing outputs, state what to preserve:

Preserve the requested artifact, length, structure, genre, and factual claims.
Improve clarity and flow without adding new claims, extra sections, or a more
promotional tone unless explicitly requested.

For reasoning guidance, ask for concise rationale, checks, or evidence in the final answer when useful. Do not ask the model to reveal hidden chain of thought.

Batch Maintenance

When maintaining multiple prompt artifacts, process each artifact on its own terms. Collapse them into a shared rewrite only when the user asks for a common template.

For each artifact:

read the current file
identify its task, audience, tools, risks, and near-miss exclusions
decide which GPT prompt-guidance rules apply
edit only the needed parts
validate that local naming, schemas, or index rules still hold

Validation

Choose validation that matches the artifact:

For edited files, inspect the diff for accidental scope expansion.
For SKILL.md, verify frontmatter name: matches the directory and that trigger descriptions are still focused on use and near-miss exclusions.
For added, removed, or renamed skills, update the repo's public index if the repo has one, then run the repo's skill inventory and stale-name checks.
For coding-agent prompts, include or run relevant test, typecheck, lint, build, or smoke-test commands when the prompt change is tied to code behavior.
For visual or document prompts, render or inspect the artifact when feasible.
For eval or grader prompts, test at least one should-pass and one should-fail case when practical.

If validation cannot run, say why and name the next best check.

Output

For rewrite tasks, provide:

changed files or rewritten prompt text
concise summary of behavioral changes
validation performed
any intentionally excluded work

For review-only tasks, lead with findings ordered by impact, then give concrete rewrite recommendations. Say that no files changed.

For batch work, summarize per artifact. Avoid hiding individual decisions behind a generic theme.

Stop Rules

Stop when the requested prompt artifacts are rewritten or reviewed, evidence and validation status are stated, and out-of-scope improvements are separated.

Ask one narrow question only when missing information would change the target artifact, model family, product behavior, authorization boundary, or validation claim.

Stop optimizing wording once the prompt is clear, scoped, supported, and testable.

Adoption

plimeor/meta-gpt-prompt-maintenance

$ install --global

Security Scan Results

SKILL.md

GPT Prompt Maintenance

Goal

Success Criteria

Boundaries

Evidence And Retrieval Budget

Prompt Types

Agent Or System Prompts

Product Assistant Prompts

Agent Workflow Prompts

Skill Prompts

Eval And Grader Prompts

Audit Checklist

Rewrite Rules

Batch Maintenance

Validation

Output

Stop Rules

Related Skills

plimeor/workbench

plimeor/subagent-delegation

plimeor/reconsider

plimeor/context-engineering

plimeor/meta-gpt-prompt-maintenance

$ install --global

Security Scan Results

SKILL.md

GPT Prompt Maintenance

Goal

Success Criteria

Boundaries

Evidence And Retrieval Budget

Prompt Types

Agent Or System Prompts

Product Assistant Prompts

Agent Workflow Prompts

Skill Prompts

Eval And Grader Prompts

Audit Checklist

Rewrite Rules

Batch Maintenance

Validation

Output

Stop Rules

Related Skills

plimeor/workbench

plimeor/subagent-delegation

plimeor/reconsider

plimeor/context-engineering