skills/llm/assistant-prefill-response-steering/SKILL.md
Steers LLM output format by injecting a partial assistant response into the chat template before generation, forcing structured output without fine-tuning.
npx skillsauth add wenmin-wu/ds-skills llm-assistant-prefill-response-steeringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
LLMs often ignore output format instructions, producing preambles, explanations, or inconsistent delimiters. Prefilling the assistant turn with the desired format start (e.g., "Here are the keywords comma separated: ") forces the model to continue in that format. This works with any chat-templated model — inject the partial response, trim the template's trailing EOS tokens so generation continues, then strip the prefill from the decoded output.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")
def generate_with_prefill(prompt, prefill="Here are the keywords: "):
messages = [
{"role": "user", "content": prompt},
{"role": "assistant", "content": prefill},
]
input_ids = tokenizer.apply_chat_template(
messages, return_tensors="pt"
)
# Trim trailing EOS tokens added by template
while input_ids[0, -1] == tokenizer.eos_token_id:
input_ids = input_ids[:, :-1]
output_ids = model.generate(
input_ids,
max_new_tokens=100,
do_sample=False,
pad_token_id=tokenizer.eos_token_id,
)
decoded = tokenizer.decode(output_ids[0], skip_special_tokens=True)
# Extract only the generated part after prefill
return decoded.split(prefill.strip())[-1].strip()
keywords = generate_with_prefill(
"Extract patent keywords from: <abstract text>",
prefill="Keywords: "
)
do_sample=False for deterministic outputbegin_suppress_tokens to block newlines or special chars in the continuationdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF