skills/llm/structured-output-sanitization/SKILL.md
Parses LLM-generated markup (SVG, HTML, XML) with lxml, strips disallowed elements and attributes via an allowlist, and validates structural constraints like path data.
npx skillsauth add wenmin-wu/ds-skills llm-structured-output-sanitizationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Even with constrained prompts, LLMs produce invalid or disallowed markup — extra attributes, unsupported elements, malformed path data. Post-generation sanitization parses the output as XML, walks the tree, removes anything not in an allowlist, validates structural properties (e.g., SVG path d attribute syntax), and serializes the cleaned result. This is the defense-in-depth complement to prompt-level constraints: the prompt reduces violations, sanitization eliminates them.
from lxml import etree
import re
ALLOWED = {
'svg': {'viewBox', 'width', 'height', 'xmlns'},
'circle': {'cx', 'cy', 'r', 'fill', 'stroke'},
'rect': {'x', 'y', 'width', 'height', 'fill', 'stroke'},
'path': {'d', 'fill', 'stroke', 'stroke-width'},
'common': {'transform', 'opacity'},
}
PATH_RE = re.compile(r'^[MmLlHhVvCcSsQqTtAaZz0-9\s,.\-eE]+$')
def sanitize_svg(svg_string):
parser = etree.XMLParser(remove_blank_text=True, remove_comments=True)
root = etree.fromstring(svg_string.encode(), parser=parser)
to_remove = []
for el in root.iter():
tag = etree.QName(el.tag).localname
if tag not in ALLOWED:
to_remove.append(el); continue
allowed_attrs = ALLOWED[tag] | ALLOWED.get('common', set())
for attr in list(el.attrib):
if etree.QName(attr).localname not in allowed_attrs:
del el.attrib[attr]
if tag == 'path' and not PATH_RE.match(el.get('d', '')):
to_remove.append(el)
for el in to_remove:
if el.getparent() is not None:
el.getparent().remove(el)
return etree.tostring(root, encoding='unicode')
d data crash renderers — validate or removedata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF