
Use when tasks involve reading, creating, or reviewing PDF files where rendering and layout matter; prefer visual checks by rendering pages (Poppler) and use Python tools such as `reportlab`, `pdfplumber`, and `pypdf` for generation and extraction.
Iteratively optimize cluster job throughput and resource efficiency for minimum total wall-clock completion time by running staged experiment rounds, analyzing Slurm statuses/logs/outputs, and tuning job shape/resources across pipeline stages (data fetching, preprocessing, caching, training, eval). Use when planning, submitting, monitoring, or refining cluster runs while avoiding crashes and OOM failures.
Review recent changes and/or broader code to simplify or reduce technical debt without breaking behavior or performance. Use when asked to tidy up, lightly refactor, or address tech debt.
Alias for a checkpoint cycle: run `organise-docs`, then `git-commit`, then `git-push-safe` when repo policy allows push.
Wait for long-running external tasks from waiting or queued state through running to validated completion before continuing work. Use when Codex must block on cluster jobs, batch pipelines, CI runs, or other asynchronous operations by polling status every N seconds (default 120), checking the best available logs/outputs/results while the work is active, and proceeding only after completion is confirmed, with a hard timeout cap of 8 hours.
Help address review/issue comments on the open GitHub PR for the current branch using gh CLI; verify gh auth first and address actionable comments autonomously, asking the user only for true blockers or ambiguous high-risk choices.
Create, edit, render, verify, and export PowerPoint slide decks. Use when Codex needs to build or modify a deck, presentation deck, slide deck, slides, PowerPoint, PPT, or visually ambitious editable .pptx file.
Use this skill when a user requests to create, modify, analyze, visualize, or work with spreadsheet files (`.xlsx`, `.xls`, `.csv`, `.tsv`) with formulas, formatting, charts, tables, and recalculation.
Use when the user explicitly asks for a desktop or system screenshot (full screen, specific app or window, or a pixel region), or when tool-specific capture capabilities are unavailable and an OS-level capture is needed.
Persistent browser and Electron interaction through `js_repl` for fast iterative UI debugging.
Use when the user asks how to build with OpenAI products or APIs and needs up-to-date official documentation with citations (for example: Codex, Responses API, Chat Completions, Apps SDK, Agents SDK, Realtime, model capabilities or limits); prioritize OpenAI docs MCP tools and restrict any fallback browsing to official OpenAI domains.
Run and monitor a Loopy recipe for the current project after checkpointing and pushing the current branch, selecting the right recipe from the local loopy recipes tree, and intervening only when run health or delivery correctness clearly requires it. Stay attached from launch until the terminal launch state and retained outputs/logs/results have been checked and still make sense.
Sync local git state to the latest remote branch state (`main`, current branch, or explicit target branch) with safe fast-forward behavior and clear verification.
Commit all current uncommitted changes into small, logical commits with clear messages. Do not push. Use when asked to commit everything in the working tree.
Use when a user asks to debug or fix failing GitHub PR checks that run in GitHub Actions; use `gh` to inspect checks and logs, summarize failure context, draft a fix plan, and implement autonomously with verification unless a true blocker requires user input. Treat external providers (for example Buildkite) as out of scope and report only the details URL.
Use automatically for Sentinel repo sessions, trading research questions, market/company/ticker/source questions, or any request that should use Sentinel's read-only data sources and reference context. Enforces Sentinel's high-confidence, read-only, no-local-query-trace research posture.
Drive work end-to-end only when the required decisions are high confidence or very high confidence. Investigate deeply first, act autonomously on high-confidence changes or no-change conclusions, and defer medium/low-confidence decisions to the user.
Prime any conversation at session start by familiarizing with available project docs and repo state, then establishing cross-project operating principles, autonomy defaults, verification posture, and recurring execution loops. Use when the user asks to prime the session or wants proactive/autonomous behavior with minimal unnecessary questions, frequent docs updates, frequent checkpoint commits, and regular cleanup.
Run and monitor DeepReview against the current repo/project or an explicit local git repo using the local DeepReview Go tool. Use when the user asks to run deepreview, start a monitored multi-round repo review, watch a deepreview run while it executes, or gather preflight/artifact details before or after a deepreview pass. Stay attached from launch until the terminal outcome and saved artifacts have been checked and still make sense.
Proactively battle-test recent code changes across many configurations and perspectives. Use when asked to validate changes, run broad test coverage, or stress the codebase beyond the obvious checks.
Audit active or recent Slurm queue state to find likely job-shape misconfigurations that strand shared cluster capacity (CPU, memory, GPU) and block scheduling for others. Use when users ask why resources appear idle, who may be blocking allocation, which jobs/users look misconfigured, or when preparing evidence for neutral outreach. Keep the workflow strictly read-only: inspect and report only, never cancel, edit, reprioritize, or otherwise mutate jobs or cluster state.
Primary Slurm cluster skill for this workspace. Monitor current conversation jobs and current project jobs over long horizons with low-noise polling, microscope-level log/output/result inspection, and high-bar interventions only when not intervening would likely produce invalid results or force costly reruns. Stay attached from queue through running to terminal completion, keep checking scheduler state plus accessible logs/outputs/results/files throughout, and do not call the work done until the finished outputs still make sense. When intervention is warranted, cancel scoped jobs, clean up artifacts/cache/logs, implement and verify fixes, resubmit, and continue monitoring until validated completion.
Verify end-to-end ability to submit to competition platforms (for example AMMChallenge/Highload) using browser automation, with clear evidence and blocker diagnostics.
Consider new evidence or reviews about recent work, investigate and reason deeply, and decide what (if anything) to integrate or change. Use when asked to evaluate new information and determine next actions.
Deep, thorough decision support. Use when the conversation presents decisions to be made and requires background research, options analysis, and a consolidated recommendation report.
Execute the current plan end-to-end, verifying completion; use when asked to run or carry out an existing plan and report results.
Meticulously familiarize with a codebase to understand structure, purpose, and workflows; use when asked to get the lay of the land, orient in a repo, summarize architecture, or assess current branch changes vs main.
Prepare the current branch to merge cleanly into main by ensuring a clean working tree, syncing with remote, understanding diffs and intent, planning safe changes, resolving conflicts, and verifying with battle tests.
Safely push the current branch when a remote exists, with mandatory pre-commit/tests/CI verification gates before push.
Deep review of a branch vs main with severity-first focus on evidence-backed critical red flags and serious issues before merge. Use when asked to assess readiness to merge or to audit branch risk, and explicitly report when no high-confidence critical findings are present.
Analyze current branch vs main, understand all diffs and intent, and produce a concise lower-case bullet list suitable for a PR summary, including nuances and gotchas.
Thoroughly evaluate one or more GPT Pro reports item by item, plus any user comments or handling preferences supplied with them; independently verify each recommendation, build a high-confidence plan of change, refine that plan through at least three explicit rounds, then execute it end-to-end with verification, battle testing, documentation updates, checkpoint commits, and safe pushes until the repo is clean.
Deep, meticulous investigation of a problem, issue, or topic by forming hypotheses, gathering evidence, and testing empirically. Use when the user asks to investigate, deep dive, research, debug complex behavior, understand a codebase thoroughly, or build high confidence in an explanation or solution.
Use when the user asks to create, scaffold, or edit Jupyter notebooks (`.ipynb`) for experiments, explorations, or tutorials; prefer the bundled templates and run the helper script `new_notebook.py` to generate a clean starting notebook.
Create and maintain scientific/empirical experiment or investigation reports in Notion via MCP from freeform artifacts. Default to direct Notion editing with a Claude-assisted wording/structure pass when available; support local-first draft QA before publishing when explicitly requested. Use the report shape that best communicates the evidence instead of forcing one narrative format.
Autonomously maintain repository documentation from the active conversation across any project. Use when the user asks to update docs from chat context, capture explicit or inferred high-confidence decisions, consolidate contradictions, reduce ambiguity, reorganize doc structure, and preserve durable knowledge for future contributors.
Create a comprehensive, high-conviction change plan that improves the codebase using all available context and decisions. Only include plan items you are highly confident in; if anything is unclear or low conviction, run targeted investigation first and ask for clarification only when still blocked.
Use when the task requires automating a real browser from the terminal (navigation, form filling, snapshots, screenshots, data extraction, UI-flow debugging) via `playwright-cli` or the bundled wrapper script.
Initialize docs/plan/decisions conventions plus note-routing and orchestration defaults in a repo; create structure if missing; no-op if already set up.
Summarize complex information from any source into concise, decision-ready briefs. Use when asked to "summarize" work, discussions, research, plans, tickets, incidents, meetings, audits, reviews, or project status while preserving background context, evidence when available, reasoning, pros/cons, and clearly stating when no critical red flags are evidenced.
Verify correctness of recent code changes, decisions, plans, or outputs by running checks/tests and gathering evidence. Use when the user asks to confirm, validate, double-check, or assess whether recent work (including plans) is correct, complete, or meets requirements, especially after edits, bug fixes, refactors, or discussions.
Use only when the user explicitly asks to stage, commit, push, and open a GitHub pull request in one flow using the GitHub CLI (`gh`).
Export statements, bills, reports, and similar user documents from official web portals using browser automation, completeness tracking, and local-only artifact handling.
Assess a codebase for potentially dangerous or malicious behavior before running it. Use when the user wants a safety audit of an untrusted repo, scripts, installers, build/test pipelines, or dependencies to decide whether to run locally or only in a sandbox/container.
Explain the current topic in HFT/quant/trader terms (PnL, risk, exposure, execution, microstructure, latency, liquidity, limits, failure modes) while preserving all important details and nuances. Use for quant/trading projects when the user wants a trader-perspective explanation of technical work (code, model/ML components, math/stats, infrastructure, incidents, experiments, research results, design decisions), especially when live-trading step-by-step behavior matters (inflight fills, cancel/replace races, queue position), or to translate non-trading terminology into what matters for running a book.
Compact the current conversation into a handoff document for another agent to pick up.
Run the Codex custom review feature from the CLI for arbitrary review instructions. Use when the user asks to use /review, custom review, or Codex review without tying the review to commits, uncommitted changes, or a base branch; prefer multicodex exec review with no explicit account and fall back to codex exec review only when multicodex is unavailable.