Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

modelscope/train-complex-blackbox

Name: train-complex-blackbox
Author: modelscope

ajet/copilot/train-complex-blackbox/SKILL.md

npx skillsauth add modelscope/agentjet train-complex-blackbox

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

0. Ask user for API key + model (or API key + base url + model) for debugging

This is not 100% necessary, but it can help a lot in debugging in step 1. If user has not given a API, ask user to give your one.

By default, the code you write should be located at ./tutorial/opencode_build_xxxxxx/*.py

1. Initial Programming

Writing dataset collector (`get_training_dataset_item_list.py`)

get_training_dataset_item_list.py: Returns a list of training data items. Maybe a list of training tasks, each item is a string identifier of a training task, or a dict containing necessary information for the training task.

Episode Runner (`run_episode_once.py`)

run_episode_once.py:
- Argument Parser: takes (training data item identifier + api-key + base-url) as input, model-name is not required, you can make up a model name because we ignore it.
- Execute the agent: read the document of the agent user asked you to train, figure out how to execute the agent. In most cases you can use subprocess to start a commandline process to execute the agent, your biggest issue is to figure out how to pass the training data item identifier, api-key and base-url to that commandline process. You can also use python code to execute the agent if you think it's more convenient.
- Reward: extract / compute the reward/score for the agent's output. Some agents have clear reward sigal, but others don't.
  - clear reward signal: take that down as the reward, no need to do extra reward engineering.
  - no clear reward signal: you need to design a reward function to compute the reward/score for the agent's output. You can use another LLM to help you design the reward function, or you can design it by yourself if you have domain knowledge.

Test

Remember to test these two parts before moving to step 2, make sure they work as expected.

2. Writing training code

This part is easy, simply follow this template and change the necessary part such as dataset path, model name, etc.

agent_roll.py

# -*- coding: utf-8 -*-

import os
import re
import requests
from textwrap import dedent
from ajet.schema.task import Task, WorkflowOutput
from ajet.copilot.job import AgentJetJob
from ajet.task_reader import RouterTaskReader
from ajet.utils.thread_executors import PeriodicDrainThreadPoolExecutor
from ajet.tuner_lib.as_oai_baseurl_apikey import OpenaiBaseUrlAndApiKey
from ajet.default_config.ajet_config_schema import AjetTaskReader, HuggingfaceDatRepo
from ajet.tuner_lib.experimental.swarm_client import SwarmClient

# python -m tutorial.example_math_swarm.math

GRPO_N = 4  # grpo group size
NUM_EPOCH = 10000
AJET_SWARM_URL = os.getenv("AJET_SWARM_URL", "http://localhost:10086")
REMOTE_MODEL_PATH = os.getenv("REMOTE_MODEL_PATH", "/mnt/data_cpfs/model_cache/modelscope/hub/Qwen/Qwen/Qwen2.5-7B-Instruct")
REMOTE_BATCH_SIZE = 32
REMOTE_ALLOCATE_GPU_PER_NODE = 8

def main():

    # Handshake with swarm remote, then send training param to swarm remote (such as model to be trained, algorithm, etc)
    dataset = RouterTaskReader(
        reader_type = "huggingface_dat_repo",
        reader_config = AjetTaskReader(
            huggingface_dat_repo = HuggingfaceDatRepo(
                dataset_path = '/mnt/data_cpfs/model_cache/modelscope/dataset/openai/gsm8k/main',
                # dataset_path = "/root/agentjet/benchmark_datasets/dataset/gsm8k/socratic",
                # dataset_path = "openai/gsm8k",
                # dataset_name = "main",
            )
        )
    )
    # Load the CountDown dataset
    # print(f"Loading dataset from: {LOCAL_DATASET_PATH}")
    # dataset = RouterTaskReader(
    #     reader_type="jsonl_dataset_file",
    #     reader_config=AjetTaskReader(
    #         jsonl_dataset_file=JsonlDatasetFile(
    #             training=JsonlTrainingFp(file_path=LOCAL_DATASET_PATH)
    #         )
    #     ),
    # )

    # Hand shake with remote swarm server
    swarm_worker = SwarmClient(AJET_SWARM_URL)
    ajet_job = AgentJetJob(
        experiment_name="math_gsm8k_grpo",
        algorithm="grpo",
        n_gpu=REMOTE_ALLOCATE_GPU_PER_NODE,
        model=REMOTE_MODEL_PATH,
        batch_size=REMOTE_BATCH_SIZE,
        num_repeat=GRPO_N,
        # LoRA parameters (optional, for parameter-efficient fine-tuning):
        # lora_rank=8,           # Set > 0 to enable LoRA training (default: 0 = disabled)
        # lora_alpha=16,         # LoRA alpha scaling factor (default: 16)
        # lora_target_modules="all-linear",  # Target modules for LoRA (default: "all-linear")
        # Full argument list: run `help(AgentJetJob)` or check `ajet/copilot/job.py`
    )
    print(ajet_job.config.to_dict())
    swarm_worker.auto_sync_train_config_and_start_engine(
        ajet_job,
        force_restart=True,
    )

    def rollout(task):
        # begin episode
        episode_uuid, api_baseurl_key = swarm_worker.begin_episode(discard_episode_timeout=60)
        # execute agent ( base_url = api_baseurl_key.base_url, api_key = api_baseurl_key.api_key )
        workflow_output = execute_agent(task, api_baseurl_key)  # reward is in `workflow_output`
        # report output back to swarm remote
        swarm_worker.end_episode(task, episode_uuid, workflow_output)
        return

    executor = PeriodicDrainThreadPoolExecutor(workers=GRPO_N * REMOTE_BATCH_SIZE, auto_retry=True)
    for _ in range(NUM_EPOCH):
        for _, task in enumerate(dataset.generate_training_tasks()):
            for _ in range(GRPO_N):
                executor.submit_with_periodic_drain(fn=rollout, task=task)

    return None


def execute_agent(task: Task, api_baseurl_key: OpenaiBaseUrlAndApiKey):
    ....
    raw_reward: float = ...  # compute the reward for the agent's output
    return WorkflowOutput(reward=raw_reward, metadata={"important_metadata": important_metadata})


if __name__ == "__main__":
    main()

It is very clear now, your job in step 2 is to:

use get_training_dataset_item_list.py to generate List[Task] (from ajet.schema.task import Task)
use run_episode_once.py to execute a single episode and place it in execute_agent function

3. Simplify your code and fix bugs

before moving to step 4, you can simplify your code and fix bugs to make sure it can run smoothly.

4. Training

Finally, you can start training.

Run ajet-swarm start to start training server (if the user has already installed agentjet swarm environment), if the user has docker environment, you can also refer to docs/en/ajet-swarm-docker.md to start a AgentSwarm docker container. If the user can provider the ssh connection to the GPU server / cluster, you can send the ajet-swarm start command to the remote server via ssh to start the swarm server, the port forward 10086 port (default agentjet swarm port) to user local machine.

Create a duplication of agent_roll.py named agent_roll_one_episode_debug.py, and modify it to only run one episode, this can help you debug whether the episode runner and reward function work as expected.

After the server side is ready, run

python /path/to/agent_roll_one_episode_debug.py

watch console log to see if the episode can be executed successfully and reward can be computed correctly.

If anything goes wrong, keep server running, rewrite and fix agent_roll_one_episode_debug.py, and run it again until it can run one episode successfully.

Next, patch agent_roll.py if there are any bugs discorvered via the debugging of agent_roll_one_episode_debug.py, and then run

python /path/to/agent_roll.py

to start the training!

modelscope/train-complex-blackbox

ajet/copilot/train-complex-blackbox/SKILL.md

Train complex blackbox agents (agents without clear reward signals) using AgentJet. Write dataset collectors, episode runners with LLM-as-Judge reward functions, and integrate with the AgentJet training loop.

209 stars

documentation

Updated May 27, 2026

$ install --global

skillsauth

npx skillsauth add modelscope/agentjet train-complex-blackbox

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 27, 2026, 8:16 AM98.4s1 file scanned

SKILL.md

name:: train-complex-blackbox
description:: Train complex blackbox agents (agents without clear reward signals) using AgentJet. Write dataset collectors, episode runners with LLM-as-Judge reward functions, and integrate with the AgentJet training loop.
license:: Complete terms in LICENSE.txt

0. Ask user for API key + model (or API key + base url + model) for debugging

This is not 100% necessary, but it can help a lot in debugging in step 1. If user has not given a API, ask user to give your one.

By default, the code you write should be located at ./tutorial/opencode_build_xxxxxx/*.py

1. Initial Programming

Writing dataset collector (`get_training_dataset_item_list.py`)

get_training_dataset_item_list.py: Returns a list of training data items. Maybe a list of training tasks, each item is a string identifier of a training task, or a dict containing necessary information for the training task.

Episode Runner (`run_episode_once.py`)

run_episode_once.py:
- Argument Parser: takes (training data item identifier + api-key + base-url) as input, model-name is not required, you can make up a model name because we ignore it.
- Execute the agent: read the document of the agent user asked you to train, figure out how to execute the agent. In most cases you can use subprocess to start a commandline process to execute the agent, your biggest issue is to figure out how to pass the training data item identifier, api-key and base-url to that commandline process. You can also use python code to execute the agent if you think it's more convenient.
- Reward: extract / compute the reward/score for the agent's output. Some agents have clear reward sigal, but others don't.
  - clear reward signal: take that down as the reward, no need to do extra reward engineering.
  - no clear reward signal: you need to design a reward function to compute the reward/score for the agent's output. You can use another LLM to help you design the reward function, or you can design it by yourself if you have domain knowledge.

Test

Remember to test these two parts before moving to step 2, make sure they work as expected.

2. Writing training code

This part is easy, simply follow this template and change the necessary part such as dataset path, model name, etc.

agent_roll.py

# -*- coding: utf-8 -*-

import os
import re
import requests
from textwrap import dedent
from ajet.schema.task import Task, WorkflowOutput
from ajet.copilot.job import AgentJetJob
from ajet.task_reader import RouterTaskReader
from ajet.utils.thread_executors import PeriodicDrainThreadPoolExecutor
from ajet.tuner_lib.as_oai_baseurl_apikey import OpenaiBaseUrlAndApiKey
from ajet.default_config.ajet_config_schema import AjetTaskReader, HuggingfaceDatRepo
from ajet.tuner_lib.experimental.swarm_client import SwarmClient

# python -m tutorial.example_math_swarm.math

GRPO_N = 4  # grpo group size
NUM_EPOCH = 10000
AJET_SWARM_URL = os.getenv("AJET_SWARM_URL", "http://localhost:10086")
REMOTE_MODEL_PATH = os.getenv("REMOTE_MODEL_PATH", "/mnt/data_cpfs/model_cache/modelscope/hub/Qwen/Qwen/Qwen2.5-7B-Instruct")
REMOTE_BATCH_SIZE = 32
REMOTE_ALLOCATE_GPU_PER_NODE = 8

def main():

    # Handshake with swarm remote, then send training param to swarm remote (such as model to be trained, algorithm, etc)
    dataset = RouterTaskReader(
        reader_type = "huggingface_dat_repo",
        reader_config = AjetTaskReader(
            huggingface_dat_repo = HuggingfaceDatRepo(
                dataset_path = '/mnt/data_cpfs/model_cache/modelscope/dataset/openai/gsm8k/main',
                # dataset_path = "/root/agentjet/benchmark_datasets/dataset/gsm8k/socratic",
                # dataset_path = "openai/gsm8k",
                # dataset_name = "main",
            )
        )
    )
    # Load the CountDown dataset
    # print(f"Loading dataset from: {LOCAL_DATASET_PATH}")
    # dataset = RouterTaskReader(
    #     reader_type="jsonl_dataset_file",
    #     reader_config=AjetTaskReader(
    #         jsonl_dataset_file=JsonlDatasetFile(
    #             training=JsonlTrainingFp(file_path=LOCAL_DATASET_PATH)
    #         )
    #     ),
    # )

    # Hand shake with remote swarm server
    swarm_worker = SwarmClient(AJET_SWARM_URL)
    ajet_job = AgentJetJob(
        experiment_name="math_gsm8k_grpo",
        algorithm="grpo",
        n_gpu=REMOTE_ALLOCATE_GPU_PER_NODE,
        model=REMOTE_MODEL_PATH,
        batch_size=REMOTE_BATCH_SIZE,
        num_repeat=GRPO_N,
        # LoRA parameters (optional, for parameter-efficient fine-tuning):
        # lora_rank=8,           # Set > 0 to enable LoRA training (default: 0 = disabled)
        # lora_alpha=16,         # LoRA alpha scaling factor (default: 16)
        # lora_target_modules="all-linear",  # Target modules for LoRA (default: "all-linear")
        # Full argument list: run `help(AgentJetJob)` or check `ajet/copilot/job.py`
    )
    print(ajet_job.config.to_dict())
    swarm_worker.auto_sync_train_config_and_start_engine(
        ajet_job,
        force_restart=True,
    )

    def rollout(task):
        # begin episode
        episode_uuid, api_baseurl_key = swarm_worker.begin_episode(discard_episode_timeout=60)
        # execute agent ( base_url = api_baseurl_key.base_url, api_key = api_baseurl_key.api_key )
        workflow_output = execute_agent(task, api_baseurl_key)  # reward is in `workflow_output`
        # report output back to swarm remote
        swarm_worker.end_episode(task, episode_uuid, workflow_output)
        return

    executor = PeriodicDrainThreadPoolExecutor(workers=GRPO_N * REMOTE_BATCH_SIZE, auto_retry=True)
    for _ in range(NUM_EPOCH):
        for _, task in enumerate(dataset.generate_training_tasks()):
            for _ in range(GRPO_N):
                executor.submit_with_periodic_drain(fn=rollout, task=task)

    return None


def execute_agent(task: Task, api_baseurl_key: OpenaiBaseUrlAndApiKey):
    ....
    raw_reward: float = ...  # compute the reward for the agent's output
    return WorkflowOutput(reward=raw_reward, metadata={"important_metadata": important_metadata})


if __name__ == "__main__":
    main()

It is very clear now, your job in step 2 is to:

use get_training_dataset_item_list.py to generate List[Task] (from ajet.schema.task import Task)
use run_episode_once.py to execute a single episode and place it in execute_agent function

3. Simplify your code and fix bugs

before moving to step 4, you can simplify your code and fix bugs to make sure it can run smoothly.

4. Training

Finally, you can start training.

After the server side is ready, run

python /path/to/agent_roll_one_episode_debug.py

watch console log to see if the episode can be executed successfully and reward can be computed correctly.

If anything goes wrong, keep server running, rewrite and fix agent_roll_one_episode_debug.py, and run it again until it can run one episode successfully.

Next, patch agent_roll.py if there are any bugs discorvered via the debugging of agent_roll_one_episode_debug.py, and then run

python /path/to/agent_roll.py

to start the training!

Related Skills

modelscope/swarm-configuration

data-ai

VerifiedTrustedCommunity

How `max_env_worker` caps the "Running Episodes" gauge, and how `AgentJetJob` relates to the YAML config.

209SKILL.mdUpdated May 27, 2026

modelscope/swarm-configuration

modelscope/skill-normalizer

tools

VerifiedTrustedCommunity

Convert skills in non-standard formats to the standard Agent Skills `SKILL.md` format. Validates YAML frontmatter (name, description, license, compatibility, metadata, allowed-tools), directory structure (SKILL.md, scripts/, references/, assets/), and best practices. Use when the user asks to normalize, validate, or fix a skill.

209SKILL.mdUpdated May 27, 2026

modelscope/skill-normalizer

modelscope/download-from-swanlab-url

devops

VerifiedTrustedCommunity

Download per-step time-series metric data (reward, entropy, response length, etc.) from a SwanLab cloud run URL as a pandas.DataFrame. Use when the user provides a SwanLab URL and wants to fetch or analyze training curves.

209SKILL.mdUpdated May 27, 2026

modelscope/download-from-swanlab-url

modelscope/ajet/copilot/create-keep-think-model-chat-template

development

VerifiedTrustedCommunity

Your task is to investigate the chat template of given model, go to its tokenizer config and check whether the following behavior exists: > > Remove history <think> block from the input when apply chat template when converting messages. > This behavior will make RL training slower, if this behavior exists, please change the chat template to forbid such behavior. You must not do this in-place, instead, please create another model. E.g., "/mnt/data_cpfs/xielipeng.xlp/models/Qwen3-8B" -> "/mnt

209SKILL.mdUpdated May 27, 2026

modelscope/ajet/copilot/create-keep-think-model-chat-template

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/modelscope/agentjet.git

# Copy into Claude Code skills folder (global)
cp -r agentjet/ajet/copilot/train-complex-blackbox ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

modelscope/agentjet

209 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

modelscope/train-complex-blackbox

$ install --global

Security Scan Results

SKILL.md

0. Ask user for API key + model (or API key + base url + model) for debugging

1. Initial Programming

Writing dataset collector (get_training_dataset_item_list.py)

Episode Runner (run_episode_once.py)

Test

2. Writing training code

3. Simplify your code and fix bugs

4. Training

Related Skills

modelscope/swarm-configuration

modelscope/skill-normalizer

modelscope/download-from-swanlab-url

modelscope/ajet/copilot/create-keep-think-model-chat-template

modelscope/train-complex-blackbox

$ install --global

Security Scan Results

SKILL.md

0. Ask user for API key + model (or API key + base url + model) for debugging

1. Initial Programming

Writing dataset collector (get_training_dataset_item_list.py)

Episode Runner (run_episode_once.py)

Test

2. Writing training code

3. Simplify your code and fix bugs

4. Training

Related Skills

modelscope/swarm-configuration

modelscope/skill-normalizer

modelscope/download-from-swanlab-url

modelscope/ajet/copilot/create-keep-think-model-chat-template

Writing dataset collector (`get_training_dataset_item_list.py`)

Episode Runner (`run_episode_once.py`)

Writing dataset collector (`get_training_dataset_item_list.py`)

Episode Runner (`run_episode_once.py`)