skills/skillxiv-v0.0.2-claude-opus-4.6/codev-verilog-reasoning/SKILL.md
Generate Verilog hardware code from natural language using reasoning-enhanced LLMs, combining rule-based testbench generation with round-trip data synthesis and adaptive DAPO reinforcement learning for reliable hardware design.
npx skillsauth add ADu2021/skillXiv codev-verilog-reasoningInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
CodeV-R1 addresses the challenge of automatically generating Verilog hardware code from natural language specifications. The framework tackles three key obstacles: lack of automated verification tools for hardware, insufficient NL-to-code training pairs, and high computational costs of hardware-focused RL training.
The solution combines three innovations: a rule-based testbench generator for automated equivalence checking, round-trip data synthesis that validates consistency between code and natural language descriptions, and adaptive DAPO (a custom RL algorithm) that reduces training costs through dynamic sampling. CodeV-R1-7B achieves 68.6-72.9% pass rates, matching or exceeding larger models.
The following steps outline how to implement hardware code generation with reasoning and verification:
from typing import List, Dict, Tuple
import torch
import torch.nn as nn
class VerilogTestbenchGenerator:
def __init__(self, module_spec: Dict):
self.module_spec = module_spec
def generate_testbench(self, ports: Dict, module_name: str) -> str:
"""Generate Verilog testbench for verification."""
testbench = f"""
module {module_name}_tb();
// Clock and reset
reg clk, rst;
initial begin
clk = 0;
forever #5 clk = ~clk;
end
initial begin
rst = 1;
#10 rst = 0;
end
// Port declarations
"""
for port_name, port_type in ports.items():
testbench += f" {port_type} {port_name};\n"
testbench += f"""
// Instantiate module
{module_name} uut({', '.join(ports.keys())});
// Test procedures
initial begin
$monitor("Time=%0t ", $time);
#100 $finish;
end
endmodule
"""
return testbench
def verify_equivalence(self, generated_code: str, reference_code: str) -> bool:
"""Check equivalence between generated and reference designs."""
# In practice, this would use formal verification tools like yosys
# Simplified version for demonstration
generated_hash = hash(generated_code.lower().replace(" ", ""))
reference_hash = hash(reference_code.lower().replace(" ", ""))
return generated_hash == reference_hash
class RoundTripValidator:
def __init__(self, description_model, code_model):
self.description_model = description_model
self.code_model = code_model
def validate_consistency(self, verilog_code: str) -> Tuple[bool, str]:
"""Validate Verilog code by round-trip description generation."""
# Step 1: Generate description from Verilog
description = self.description_model.generate(
f"Describe this Verilog code:\n{verilog_code}\n\nDescription:",
max_tokens=200
)
# Step 2: Regenerate code from description
regenerated = self.code_model.generate(
f"Write Verilog code for:\n{description}\n\nVerilog code:",
max_tokens=500
)
# Step 3: Check consistency (simplified)
consistency_score = self._compute_similarity(verilog_code, regenerated)
is_consistent = consistency_score > 0.7
return is_consistent, description
def _compute_similarity(self, code1: str, code2: str) -> float:
"""Compute similarity between two code snippets."""
# Simplified token-level similarity
tokens1 = set(code1.split())
tokens2 = set(code2.split())
intersection = len(tokens1 & tokens2)
union = len(tokens1 | tokens2)
return intersection / union if union > 0 else 0.0
class AdaptiveDAPO:
"""Adaptive DAPO: Dynamic Advantageous Policy Optimization for RL training."""
def __init__(self, model: nn.Module, sampling_rate_init: float = 1.0):
self.model = model
self.sampling_rate = sampling_rate_init
self.performance_history = []
def step(self, batch_size: int, learning_rate: float = 0.001) -> Dict:
"""Execute one DAPO step with dynamic sampling."""
# Dynamically adjust sampling based on recent performance
recent_performance = sum(self.performance_history[-5:]) / len(self.performance_history[-5:])
# Reduce sampling rate if performance is poor
if recent_performance < 0.5:
self.sampling_rate *= 0.9
else:
self.sampling_rate *= 1.05
# Clamp sampling rate
self.sampling_rate = max(0.1, min(1.0, self.sampling_rate))
# Sample subset of trajectories based on sampling rate
sample_size = int(batch_size * self.sampling_rate)
return {
"sample_size": sample_size,
"sampling_rate": self.sampling_rate,
"learning_rate": learning_rate
}
class CodeVR1Model:
def __init__(self, base_model, teacher_model=None):
self.model = base_model
self.teacher = teacher_model
def distill_knowledge(self, train_data: List[Dict]) -> float:
"""Initialize model via knowledge distillation from teacher."""
if not self.teacher:
return 0.0
total_loss = 0.0
for sample in train_data:
nl_spec = sample["specification"]
verilog = sample["verilog"]
# Teacher generates description
teacher_output = self.teacher.generate(f"Describe: {verilog}", max_tokens=100)
# Student learns to match teacher
student_output = self.model.generate(f"Describe: {verilog}", max_tokens=100)
# Loss (simplified KL divergence simulation)
loss = abs(len(teacher_output) - len(student_output)) / 100.0
total_loss += loss
return total_loss / len(train_data) if train_data else 0.0
def train_with_rl(self, nl_specs: List[str], max_episodes: int = 100) -> float:
"""Train model with adaptive DAPO reinforcement learning."""
optimizer = AdaptiveDAPO(self.model)
total_reward = 0.0
for episode in range(max_episodes):
for nl_spec in nl_specs:
# Generate Verilog
generated_verilog = self.model.generate(
f"Generate Verilog for: {nl_spec}\n\nVerilog code:",
max_tokens=500
)
# Verify (reward signal)
testbench_gen = VerilogTestbenchGenerator({})
is_valid = testbench_gen.verify_equivalence(generated_verilog, "")
# Compute reward
reward = 1.0 if is_valid else 0.0
total_reward += reward
optimizer.performance_history.append(is_valid)
# Update with dynamic sampling
step_info = optimizer.step(batch_size=32)
avg_reward = total_reward / (max_episodes * len(nl_specs))
return avg_reward
Training data requirements:
Verification setup:
When to use:
When NOT to use:
Common pitfalls:
CodeV-R1-7B achieves 68.6% (Verilog-Eval) and 72.9% (VerilogEval-bench) pass rates, surpassing prior work by 12-20% while matching or exceeding larger models like DeepSeek-R1. The approach is practical and scalable, enabling hardware synthesis without extensive domain expertise.
Original paper: "CodeV-R1: Reasoning-Enhanced Verilog Generation" (arxiv.org/abs/2505.24183)
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.