skills/skillxiv-v0.0.2-claude-opus-4.6/duetsvg-multimodal-svg/SKILL.md
Generate SVGs through simultaneous image and SVG token generation with internal visual guidance. DuetSVG overcomes text-only limitations by leveraging visual predictions to enhance SVG coherence—ideal when visual quality and geometric correctness matter.
npx skillsauth add ADu2021/skillXiv duetsvg-multimodal-svgInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
DuetSVG addresses limitations in vision-language-only SVG generation by simultaneously generating both image tokens and SVG tokens in an integrated process. Test-time scaling uses the model's own visual predictions as guidance to enhance generation quality.
Unified multimodal generation with test-time scaling:
# DuetSVG: Unified multimodal SVG generation
class DuetSVGGenerator:
def __init__(self):
self.image_encoder = ImageEncoder()
self.svg_tokenizer = SVGTokenizer()
self.multimodal_decoder = MultimodalDecoder()
def generate_svg_with_visual_guidance(self, text_prompt):
"""
Generate SVG jointly with image tokens.
Use visual predictions to guide SVG generation.
"""
# Initialize generation
generated_image_tokens = []
generated_svg_tokens = []
for step in range(max_steps):
# Joint decoding: image and SVG tokens together
next_image_token = self.multimodal_decoder.predict_image_token(
generated_image_tokens,
generated_svg_tokens,
text_prompt
)
next_svg_token = self.multimodal_decoder.predict_svg_token(
generated_image_tokens,
generated_svg_tokens,
text_prompt
)
# Internal visual guidance: use predicted image to guide SVG
if self.should_apply_visual_guidance(step):
# Decode partial image
partial_image = self.decode_image_tokens(
generated_image_tokens + [next_image_token]
)
# Analyze visual properties
visual_features = self.extract_visual_features(
partial_image
)
# Re-score SVG token based on visual consistency
next_svg_token = self.rescore_svg_token(
next_svg_token,
visual_features,
partial_image
)
generated_image_tokens.append(next_image_token)
generated_svg_tokens.append(next_svg_token)
# Check for completion
if self.is_complete(generated_svg_tokens):
break
# Decode final SVG
final_svg = self.svg_tokenizer.decode(generated_svg_tokens)
final_image = self.decode_image_tokens(generated_image_tokens)
return final_svg, final_image
def extract_visual_features(self, image):
"""Extract relevant visual properties from partial rendering."""
features = {
'color_palette': self.extract_colors(image),
'spatial_layout': self.extract_layout(image),
'object_positions': self.detect_objects(image),
'visual_style': self.analyze_style(image)
}
return features
def rescore_svg_token(self, token, visual_features, partial_image):
"""Re-evaluate SVG token based on visual consistency."""
# Generate alternative SVG tokens
alternatives = self.generate_alternatives(token)
scores = []
for alt in alternatives:
# Simulate adding this SVG element
test_svg = self.simulate_svg_addition(alt, partial_image)
# Score consistency with visual features
consistency = self.compute_visual_consistency(
test_svg,
visual_features,
partial_image
)
scores.append(consistency)
# Select highest-scoring alternative
best_token = alternatives[torch.argmax(torch.tensor(scores))]
return best_token
def test_time_scaling(self, text_prompt, budget=10):
"""
Test-time scaling: use remaining budget to refine SVG.
Can sample multiple SVGs and select best.
"""
candidates = []
for sample_idx in range(budget):
# Generate SVG (stochastic sampling)
svg, image = self.generate_svg_with_visual_guidance(
text_prompt
)
candidates.append((svg, image))
# Score candidates
best_svg = self.select_best_candidate(
candidates,
text_prompt
)
return best_svg
def select_best_candidate(self, candidates, text_prompt):
"""Score and select best generated SVG."""
scores = []
for svg, image in candidates:
# Score: visual quality, semantic correctness, coherence
visual_quality = self.score_visual_quality(image)
semantic_match = self.score_semantic_match(svg, text_prompt)
internal_consistency = self.score_consistency(svg, image)
total_score = (
0.4 * visual_quality +
0.4 * semantic_match +
0.2 * internal_consistency
)
scores.append(total_score)
best_idx = torch.argmax(torch.tensor(scores))
return candidates[best_idx][0]
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.