
Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When Claude needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks
Create an active, dataset-driven AgentJet swarm client. Write agent_roll.py and agent_run.py that iterate through a dataset, execute agent workflows, and compute rewards for reinforcement learning training with AgentJet Swarm.
Download per-step time-series metric data (reward, entropy, response length, etc.) from a SwanLab cloud run URL as a pandas.DataFrame. Use when the user provides a SwanLab URL and wants to fetch or analyze training curves.
Your task is to investigate the chat template of given model, go to its tokenizer config and check whether the following behavior exists: > > Remove history <think> block from the input when apply chat template when converting messages. > This behavior will make RL training slower, if this behavior exists, please change the chat template to forbid such behavior. You must not do this in-place, instead, please create another model. E.g., "/mnt/data_cpfs/xielipeng.xlp/models/Qwen3-8B" -> "/mnt
Install AgentJet client for connecting to a swarm server. Use when the user only needs to run the AgentJet client (not a swarm server) and does not need to run models locally, e.g. on a laptop. Installs basic requirements via `pip install -e .`.
Convert skills in non-standard formats to the standard Agent Skills `SKILL.md` format. Validates YAML frontmatter (name, description, license, compatibility, metadata, allowed-tools), directory structure (SKILL.md, scripts/, references/, assets/), and best practices. Use when the user asks to normalize, validate, or fix a skill.
Train complex blackbox agents (agents without clear reward signals) using AgentJet. Write dataset collectors, episode runners with LLM-as-Judge reward functions, and integrate with the AgentJet training loop.
--- name: auto-research-blueprint-execute-swarm description: Execute AgentJet reinforcement learning experiments using experiment blueprints in swarm mode. Handles full lifecycle: generate blueprint if needed, launch experiment in tmux, monitor progress, analyze errors, collect results, and write finish flag. Use when the user wants to run or debug AgentJet training experiments. --- ## 你的任务 0. 如果用户没有提供实验蓝图,则生成一个实验蓝图 1. 根据实验蓝图,运行实验 2. 等待实验结束或者超时 3. 如果实验失败,尝试进行修正,把试错过程放置到指定位置(exp_result_dir中创建一个
Install AgentJet swarm server using Conda. Handles Python 3.10 environment creation, dependency installation with the verl training backbone, flash-attn compilation, and optional PyPI mirror for China users.
Install and run the AgentJet Swarm Server in a Docker container with NVIDIA GPU support. Use when the user wants to deploy a swarm server on a GPU machine via Docker, including GPU driver setup, Docker mirror configuration, model weight mounting, and server startup.
Map VERL training configuration to AgentJet configuration. Find VERL config in verl_default.yaml, check for existing mappings in config_auto_convertion_verl.jsonc, add new mappings to ajet_default.yaml and the conversion schema, and optionally add parameters to AgentJetJob.
Monitor training progress by reading tmux content with exponential backoff intervals (30s, 1min, 2min, 4min, 8min, 16min), analyze logs when anomalies occur, and provide fix suggestions
How `max_env_worker` caps the "Running Episodes" gauge, and how `AgentJetJob` relates to the YAML config.
Install AgentJet swarm server using the UV package manager. Handles virtual environment creation with Python 3.10, dependency installation with the verl training backbone, flash-attn compilation, and optional PyPI mirror for China users.
--- name: auto-research-blueprint-execute-classic description: Execute AgentJet reinforcement learning experiments using experiment blueprints in classic (non-swarm) mode. Handles full lifecycle: launch experiment in tmux, monitor progress, analyze errors, collect results, and write finish flag. Use when the user wants to run AgentJet training experiments without the swarm distributed framework. --- ## 你的任务 1. 根据实验蓝图,运行实验 2. 等待实验结束或者超时 3. 如果实验失败,尝试进行修正,把试错过程放置到指定位置(exp_result_dir中创建一个文档),如果无法修
Build custom LLM evaluation pipelines using the OpenJudge framework. Covers selecting and configuring graders (LLM-based, function-based, agentic), running batch evaluations with GradingRunner, combining scores with aggregators, applying evaluation strategies (voting, average), auto-generating graders from data, and analyzing results (pairwise win rates, statistics, validation metrics). Use when the user wants to evaluate LLM outputs, compare multiple models, design scoring criteria, or build an automated evaluation system.
Create a passive swarm client that waits for user input instead of iterating through a dataset by itself.