skills/llm/code-execution-weighted-vote/SKILL.md
Execute Python code blocks from LLM math responses in a sandbox, then double-weight code-derived answers in majority voting
npx skillsauth add wenmin-wu/ds-skills llm-code-execution-weighted-voteInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Math-reasoning LLMs often generate Python code alongside natural language solutions. Executing this code provides a verified computational answer that should be trusted more than text-only reasoning. Extract Python code blocks, run them in a sandboxed REPL, and add code-derived answers with double weight in the majority vote — boosting the influence of computationally verified results.
import re
from collections import Counter
def extract_python_blocks(text):
pattern = r'```python\s*\n(.*?)```'
return re.findall(pattern, text, re.DOTALL)
def safe_exec(code, timeout=10):
try:
local_vars = {}
exec(code, {"__builtins__": {}}, local_vars)
output = str(local_vars.get('result', ''))
nums = re.findall(r'-?\d+', output)
return int(nums[-1]) if nums else None
except Exception:
return None
def weighted_vote(responses, code_weight=2):
counter = Counter()
for resp in responses:
# Text-derived answer (weight=1)
text_ans = extract_boxed_answer(resp)
if text_ans is not None:
counter[text_ans] += 1
# Code-derived answers (weight=code_weight)
for code in extract_python_blocks(resp):
code_ans = safe_exec(code)
if code_ans is not None:
counter[code_ans] += code_weight
return max(counter, key=counter.get) if counter else None
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF