skills/discovering-high-level-patterns/SKILL.md
Extract high-level semantic patterns from fine-grained simulation or event logs using LM-guided program synthesis. Transforms raw numerical traces into annotated pattern timelines, then composes reward/query programs from natural language goals. Use when: 'analyze simulation logs for patterns', 'find high-level events in trace data', 'generate reward functions from goals', 'summarize physics simulation output', 'build pattern detectors for time-series logs', 'annotate event traces with semantic labels'.
npx skillsauth add ndpvt-web/arxiv-claude-skills discovering-high-level-patternsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to transform verbose, fine-grained simulation logs or event traces into compact, semantically meaningful pattern annotations -- then reason over those annotations using natural language. The core technique, from Memery & Subr (2026), synthesizes executable pattern detector programs that scan raw traces and output boolean activation timelines (e.g., "collision at t=3.2", "stable support from t=5 to t=8"). These annotated traces replace raw numerical data as context for downstream tasks like question answering, reward program generation, and planning -- dramatically improving scalability and LM reasoning accuracy over physics or event-driven domains.
The scalability problem. Feeding raw simulation traces directly to an LM fails because traces contain thousands of timesteps of numerical state (positions, velocities, contacts per object). This exceeds context limits and overwhelms the LM's ability to reason about physics. The paper's insight is to insert an intermediate representation: a pattern annotation matrix where each row is a timestep and each column is a named high-level pattern (e.g., "sliding_contact", "lever_launch", "global_rest"), with boolean or continuous activation values.
Program synthesis for pattern detection. Rather than hand-coding detectors, the method uses FunSearch-style evolutionary program synthesis. An LM proposes candidate Python functions that read trace data and return activation signals. These candidates are scored on a fitness function combining: (1) correlation between state-space distances and annotation distances across trajectories, (2) novelty relative to existing detectors, and (3) penalties for excessive length or computation time. Candidates are iteratively mutated, evaluated, and refined. Natural language descriptions from domain experts ("a lever launches a ball") seed the initial candidates, and additional patterns are self-discovered by analyzing LM reasoning traces from Q&A tasks.
Reward program composition via DSL. Once a pattern library exists, natural language goals like "Launch the green ball into the second bucket" are translated into structured reward programs using three predicate types: EVENT (checks pattern occurrence with parameters), temporal predicates (BEFORE, AFTER, DURING for ordering), and logical predicates (AND, OR, NOT for composition). Dense reward is computed by averaging the fraction of satisfied subclauses, enabling gradient-rich optimization signals instead of sparse binary rewards.
Structure the raw trace data. Parse simulation output into a sequence of state snapshots tau = [x_1, x_2, ..., x_T] where each x_t is a dictionary of object IDs mapped to their properties (position, velocity, rotation, contact list, semantic labels). Normalize coordinates and timestamps.
Define the pattern vocabulary. Collect natural language descriptions of expected high-level patterns from the user or domain knowledge. Examples: "two objects collide", "object rests stably on surface", "object moves then stops", "lever-like launch". Each description becomes a candidate pattern label.
Synthesize pattern detector programs. For each pattern label, generate a Python function signature def detect_pattern(trace: List[Dict]) -> List[float] that returns per-timestep activation values in [0, 1]. Use the LM to propose an initial implementation based on the natural language description and trace schema. Include the trace data format in the prompt so the detector accesses correct fields.
Evaluate and refine detectors. Score each detector using: (a) correlation -- do traces that are similar in state-space also produce similar annotation vectors? (b) novelty -- does this detector capture information not already covered by existing detectors? (c) penalties for runtime and code length. Iteratively prompt the LM to mutate low-scoring detectors, supplying the previous code and its fitness breakdown. If execution fails, supply the error message and request repair.
Build the pattern annotation matrix. Run all accepted detectors over each trace to produce a matrix of shape (T, num_patterns). Each cell is the activation level of pattern j at timestep t. This matrix replaces the raw trace as the primary representation.
Compose reward or query programs from goals. When the user provides a natural language goal, translate it into a DSL expression combining EVENT, temporal (BEFORE, AFTER), and logical (AND, OR, NOT) predicates over the pattern library. Include object-specific parameters where needed. Compute dense reward as the fraction of satisfied subclauses.
Validate reward programs against known outcomes. Test the composed reward program on traces with known success/failure labels. Check that successful traces score higher than failed ones. Adjust predicate thresholds or rewrite subclauses if ranking is incorrect.
Generate natural language summaries. Convert the annotation matrix into a textual timeline: "t=0.0-1.2: ball slides along ramp (sliding_contact active). t=1.3: ball collides with block (rigid_collision active). t=1.4-3.0: block at rest on platform (stable_support active)." Use this for Q&A, planning context, or human review.
Iterate the pattern library. Analyze cases where reward programs or summaries fail. Identify missing patterns by examining raw trace segments that lack any annotation. Synthesize new detectors for these gaps and re-annotate.
Example 1: Detecting events in a physics simulation log
User: I have a log from a 2D physics engine with timestamped positions and
velocities for 5 objects. Find the key physical events.
Approach:
1. Parse the log into structured snapshots:
trace = [
{"t": 0.0, "objects": {"ball": {"pos": [10,50], "vel": [5,-2]}, "platform": {"pos": [30,10], "vel": [0,0]}, ...}},
{"t": 0.1, "objects": {"ball": {"pos": [10.5,49.8], "vel": [5,-2.1]}, ...}},
...
]
2. Define candidate patterns from domain knowledge:
- "collision": two objects' positions converge to within contact distance
and relative velocity reverses sign
- "rest_state": an object's velocity magnitude stays below threshold
for N consecutive steps
- "free_fall": object has downward-increasing velocity with no contacts
3. Synthesize detector for "collision":
def detect_collision(trace, obj_a, obj_b, contact_dist=2.0):
activations = []
for t in range(1, len(trace)):
dist = euclidean(trace[t][obj_a]["pos"], trace[t][obj_b]["pos"])
prev_dist = euclidean(trace[t-1][obj_a]["pos"], trace[t-1][obj_b]["pos"])
closing = prev_dist > dist
contact = dist < contact_dist
activations.append(1.0 if (contact and closing) else 0.0)
return activations
4. Run all detectors, produce annotation matrix, then summarize:
Output:
t=0.0-0.8: ball in free_fall toward platform
t=0.9: ball collides with platform (collision detected)
t=1.0-2.5: ball at rest on platform (rest_state active)
t=2.6: ball collides with wedge (collision detected)
t=2.7-3.5: ball in free_fall off edge
Example 2: Generating a reward function from a natural language goal
User: Write a reward function for the goal "Stack the red block on top of
the blue block" given my simulation trace format.
Approach:
1. Identify required patterns from goal semantics:
- "contact": red block touches blue block
- "above": red block center-y > blue block center-y
- "stable_support": both blocks have near-zero velocity for sustained period
2. Compose DSL reward program:
reward = AND(
EVENT("contact", {"obj_a": "red_block", "obj_b": "blue_block"}),
EVENT("above", {"top": "red_block", "bottom": "blue_block"}),
AFTER("contact", "stable_support",
{"objects": ["red_block", "blue_block"], "min_duration": 0.5})
)
3. Implement dense scoring:
def compute_reward(trace, reward_program):
subclauses = reward_program.subclauses
scores = [evaluate_clause(trace, c) for c in subclauses]
return sum(scores) / len(scores) # dense: fraction of satisfied subclauses
Output:
def reward_fn(trace):
contact = any(detect_contact(trace, "red_block", "blue_block"))
above = any(detect_above(trace, "red_block", "blue_block"))
stable = detect_stable_support(trace, ["red_block", "blue_block"],
min_duration=0.5)
stable_after_contact = stable and contact_before_stable(trace)
return (float(contact) + float(above) + float(stable_after_contact)) / 3.0
Example 3: Summarizing a robotics telemetry trace for Q&A
User: I have a 10,000-step trace from a robotic arm simulation. Summarize
it so I can ask questions about what happened.
Approach:
1. Define patterns relevant to manipulation:
- "grasp": gripper width decreasing while object within gripper envelope
- "lift": grasped object's z-position increasing
- "place": grasped object's z-position decreasing then gripper opens
- "transport": object moving laterally while grasped
- "idle": all joint velocities below threshold
2. Run detectors across 10,000 steps, producing annotation matrix (10000 x 5).
3. Compress into phase summary by merging consecutive identical activations:
Output:
Phase 1 (steps 0-1200): idle -- arm stationary, no object contact
Phase 2 (steps 1201-1450): grasp -- gripper closes on red cube
Phase 3 (steps 1451-2800): lift + transport -- red cube lifted and moved
Phase 4 (steps 2801-3100): place -- red cube lowered onto table B
Phase 5 (steps 3101-3400): idle -- arm returns to home position
Phase 6 (steps 3401-3650): grasp -- gripper closes on blue cylinder
...
Now answerable: "When was the red cube placed?" -> Phase 4, steps 2801-3100
"How many objects were grasped?" -> 2 (red cube, blue cylinder)
| Problem | Solution | |---|---| | Synthesized detector throws runtime errors | Feed the traceback and detector code back to the LM with a repair prompt. Iterate up to 3 times before discarding the candidate. | | Detector always returns 0 or always returns 1 | Check activation variance across traces. Discard degenerate detectors and re-synthesize with a more specific natural language description or adjusted thresholds. | | Reward program ranks failed traces higher than successful ones | Validate against labeled trace pairs. Decompose the reward into subclauses and test each independently to isolate the faulty predicate. | | Pattern library misses important events | Examine raw trace segments where no pattern is active. Ask the user to describe what is happening in those segments, then synthesize a new detector for the gap. | | Annotation matrix too large for context | Compress by merging consecutive timesteps with identical activations into phase summaries with start/end timestamps. |
Memery, S. & Subr, K. (2026). Discovering High Level Patterns from Simulation Traces. arXiv:2602.10009v1. https://arxiv.org/abs/2602.10009v1
Key sections to consult: Algorithm 1 (DiscoverPatternDetectors) for the synthesis loop, Algorithm 2 (Evaluate) for the fitness function components (correlation, novelty, penalties), and Section 5 for reward program DSL syntax and dense scoring.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".