synalinks-rewards/SKILL.md
Use when configuring or writing Synalinks reward functions and metrics — ExactMatch, CosineSimilarity, LMAsJudge, ProgramAsJudge, RewardFunctionWrapper, custom reward functions (async, register_synalinks_serializable), in_mask / out_mask filtering, F1Score / FBetaScore / BinaryF1Score / ListF1Score metrics, MeanMetricWrapper, or whenever you're shaping the signal that drives optimization.
npx skillsauth add synalinks/synalinks-skills synalinks-rewardsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Rewards drive optimization (the optimizer maximizes them). Metrics are passive — monitored but not used to update variables.
Both return a float, typically in [0.0, 1.0]. Both support field masking via in_mask and out_mask so you can score only the fields you care about.
program.compile(
reward=synalinks.rewards.ExactMatch(in_mask=["answer"]),
optimizer=synalinks.optimizers.RandomFewShot(),
metrics=[
synalinks.metrics.F1Score(in_mask=["answer"]),
],
)
compile() also accepts string identifiers, Keras-style (case-insensitive lookup against cls.__name__.lower()):
program.compile(
reward="exactmatch", # or "ExactMatch", "EXACTMATCH"
optimizer=synalinks.optimizers.RandomFewShot(),
metrics=["mean"], # wraps as MeanMetricWrapper around the reward
)
String equality on the JSON of selected fields. Simple, deterministic, free.
synalinks.rewards.ExactMatch(
in_mask=["answer"], # Compare only these fields
out_mask=None, # Or: exclude these fields
)
Use when: Discrete answers (numbers, multiple choice, named entities).
Semantic similarity via embeddings. Returns (cosine + 1) / 2, mapping [-1, 1] -> [0, 1].
synalinks.rewards.CosineSimilarity(
embedding_model=em, # Optional: falls back to synalinks.default_embedding_model() at call time
in_mask=["answer"],
)
Use when: Free-form text where exact wording varies but meaning matters (summaries, paraphrases).
Cost: Two embedding calls per (y_true, y_pred) pair per evaluation.
Use an LLM as judge. Internally builds a SelfCritique module that returns a reward field in [0, 1].
synalinks.rewards.LMAsJudge(
language_model=lm, # Optional: falls back to synalinks.default_language_model() at call time
instructions=None, # Optional. Default instructions string for the judge generator.
prompt_template=None, # Optional. Jinja2 prompt template (see Generator).
examples=None, # Optional. Few-shot examples for the judge.
in_mask=None,
out_mask=None,
)
Use when: Subjective criteria — helpfulness, safety, formatting quality. Steer the judge via instructions=... (there is no criteria= parameter).
Cost: One LM call per evaluation. Use a small/cheap model — judging is easier than generating.
Use another Synalinks program as judge. Lets you build complex multi-criterion graders.
The judge program is called with [y_true, y_pred] and its output schema must include a field named reward (a float in [0, 1]).
judge = synalinks.Program(...) # outputs include a "reward" field
synalinks.rewards.ProgramAsJudge(program=judge)
Use when: You need a custom judge with structured intermediate steps (rubric scoring, multi-aspect evaluation, retrieval-grounded judging).
A custom reward is an async function (y_true, y_pred) -> float, decorated with register_synalinks_serializable and wrapped in RewardFunctionWrapper (or passed as a bare function/string to compile, which auto-wraps it):
@synalinks.saving.register_synalinks_serializable()
async def word_overlap(y_true, y_pred):
"""Fraction of ground-truth words that appear in the prediction."""
true_val = y_true.get("answer") if y_true else None
pred_val = y_pred.get("answer") if y_pred else None
if true_val is None or pred_val is None:
return 0.0
true_words = set(true_val.lower().split())
pred_words = set(pred_val.lower().split())
overlap = len(true_words & pred_words)
return overlap / max(len(true_words), 1)
program.compile(
reward=synalinks.rewards.RewardFunctionWrapper(fn=word_overlap),
optimizer=...,
)
Always handle None — y_true or y_pred may be None (especially in branched programs where some outputs aren't activated).
in_mask and out_mask on rewards and metrics accept a list[str] of field names. They are passed through to the underlying JsonDataModel.in_mask(mask=..., pattern=...) / out_mask(mask=..., pattern=...) methods.
# Compare only "answer"
synalinks.rewards.ExactMatch(in_mask=["answer"])
# Compare everything except "thinking"
synalinks.rewards.ExactMatch(out_mask=["thinking"])
The pattern= regex argument lives on the data-model methods, not on Reward/Metric constructors. To filter by regex inside a custom reward, call the method directly:
@synalinks.saving.register_synalinks_serializable()
async def answer_only(y_true, y_pred):
if not y_true or not y_pred:
return 0.0
yt = y_true.in_mask(pattern="^answer")
yp = y_pred.in_mask(pattern="^answer")
return float(yt.get_json() == yp.get_json())
Why mask: ChainOfThought outputs include thinking which is rarely worth scoring directly. Masking focuses the optimizer on the field that drives task success.
synalinks.metrics.F1Score(in_mask=["answer"]) # token-set F1 over flattened fields
synalinks.metrics.FBetaScore(beta=0.5, in_mask=["answer"])
synalinks.metrics.BinaryF1Score(in_mask=["label"]) # for boolean / Score classification
synalinks.metrics.ListF1Score(in_mask=["sources"]) # for list/retrieval fields
There are no Precision or Recall classes — F1Score is implemented at the token level over flattened fields. For precision-only / recall-only signals, write a custom metric or use FBetaScore with a high/low beta.
Metrics are tracked alongside reward during fit() and evaluate() and appear in the History object.
@synalinks.saving.register_synalinks_serializable()
async def length_match(y_true, y_pred):
if not y_true or not y_pred:
return 0.0
return float(len(y_true.get("answer")) == len(y_pred.get("answer")))
program.compile(
metrics=[synalinks.metrics.MeanMetricWrapper(fn=length_match)],
...
)
| Task signal | Recommended reward |
|-------------|--------------------|
| Discrete / categorical answer | ExactMatch |
| Numeric answer | ExactMatch (cast to string) or custom (within tolerance) |
| Free-form text | CosineSimilarity |
| Subjective quality | LMAsJudge |
| Multi-criterion / structured | ProgramAsJudge or custom |
| Trajectory / agent | Custom reward inspecting trajectory field |
register_synalinks_serializable on a custom reward — the program won't save/load correctly.bool instead of float — works but loses gradient signal in optimizers like OMEGA that use softmax.None — leads to AttributeError: 'NoneType' object has no attribute 'get' on branched outputs.thinking fields — usually noise; mask them out.reward does.ProgramAsJudge output without a reward field — the judge program must expose a float field literally named reward; otherwise ProgramAsJudge defaults to 0.0.SelfCritique outputs a reward field that can feed a ProgramAsJudgedevelopment
Use when training Synalinks programs — program.compile() / fit() / evaluate() / predict(), validation_split, validation_data, batch_size, epochs, callbacks (ProgramCheckpoint, custom Callback subclasses), History, in-context reinforcement learning workflow. For reward functions see synalinks-rewards; for optimizer internals see synalinks-optimizers.
tools
Use when integrating Synalinks with LM providers — picking the right model prefix (openai/, anthropic/, ollama/, groq/, cohere/, openrouter/, bedrock/, deepseek/, together_ai/, doubleword/, hosted_vllm/ (alias vllm/), gemini/, xai/, mistral/, azure/), env vars per provider, structured-output dispatch (constrained json_schema vs tool-call), local OpenAI-compatible servers (LMStudio, vLLM) requiring litellm.register_model and a dummy OPENAI_API_KEY, and OpenRouter embeddings (LiteLLM doesn't support them — use OpenRouterEmbeddingModel).
development
Use when building or composing a Synalinks Program — the four building APIs (Functional, Sequential, Subclassing, Mixed), Input nodes, multi-input/multi-output graphs, the call/build lifecycle, training=True/False semantics, summary, get_module, plot_program, save/load, get_state_tree/set_state_tree, get_config/from_config and custom serialization. For DataModel/Field, JSON operators (+ & | ^ ~), and LanguageModel/EmbeddingModel basics see synalinks-core. For inner modules see synalinks-modules; for compile/fit/evaluate/predict see synalinks-training.
development
Use when picking, configuring, or tuning a Synalinks optimizer — RandomFewShot (nb_min_examples, nb_max_examples, sampling, sampling_temperature), OMEGA (Dominated Novelty Search, mutation/crossover, k_nearest_fitter, population_size, mutation_temperature, crossover_temperature, selection_temperature, merging_rate, algorithm "dns" vs "ga", selection "softmax"/"best"/"random", reasoning_effort), or evolutionary / quality-diversity prompt optimization in general.