Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

huggingface/train-sentence-transformers

Name: train-sentence-transformers
Author: huggingface

skills/train-sentence-transformers/SKILL.md

npx skillsauth add huggingface/skills train-sentence-transformers

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Train a sentence-transformers Model

This SKILL.md is a router, not a manual. It tells you which references and example scripts to load for your task. The actual content — recommended losses, evaluators, training-script structure, model selection, training-arg knobs, troubleshooting — lives in references/ and scripts/.

Do not synthesize a training script from this file alone. Open the per-type production template (scripts/train_<type>_example.py) and copy it as your starting point. The templates contain load-bearing scaffolding (autocast helper, model-card class, logger silencing list, force=True, seed, TF32, version-compatible imports, named-evaluator metric handling) that prior agent runs have repeatedly missed when rolling their own from a synthesized snippet.

1. Identify the model type

| Tag | Class | What it does | When to pick | |---|---|---|---| | [SentenceTransformer] | SentenceTransformer (bi-encoder) | Maps each input to a fixed-dim dense vector | Retrieval, similarity, clustering, classification, paraphrase mining, dedup | | [CrossEncoder] | CrossEncoder (reranker) | Scores (query, passage) pairs jointly | Two-stage retrieval (rerank top-100 from bi-encoder), pair classification | | [SparseEncoder] | SparseEncoder (SPLADE) | Sparse vectors over the vocabulary | Learned-sparse retrieval, inverted-index backends (Elasticsearch / OpenSearch / Lucene) |

Tiebreakers when the request is ambiguous: "embedding model" / "vector search" / "similarity" → [SentenceTransformer]. "rerank" / "ranker" / "two-stage" → [CrossEncoder]. "SPLADE" / "sparse" / "inverted index" → [SparseEncoder]. If still unclear, ask.

2. Required reading

Read these in full before writing any code. Do not triage by perceived relevance.

Per-type — always required

[SentenceTransformer]

references/losses_sentence_transformer.md — loss-to-data-shape mapping; BatchSamplers.NO_DUPLICATES requirement for MNRL-family; Cached* ↔ gradient_checkpointing incompatibility.
references/evaluators_sentence_transformer.md — evaluator-to-task mapping; metric_for_best_model key construction (named vs unnamed); per-evaluator primary_metric values.
references/model_architectures.md — encoder vs decoder vs static vs Router pipelines; pooling rules (mean / cls / lasttoken); auto-mean-pooling behavior for fresh-start MLM bases.
scripts/train_sentence_transformer_example.py — production template; copy this as your starting point.

[CrossEncoder]

references/losses_cross_encoder.md — pointwise / pairwise / listwise / distillation; pos_weight derivation; activation_fn=Identity() mandatory for non-BCE losses (silent eval-rank collapse otherwise).
references/evaluators_cross_encoder.md — CrossEncoderRerankingEvaluator recipe; named-evaluator key format eval_{name}_{primary_metric}.
scripts/train_cross_encoder_example.py — production template; copy this as your starting point.

[SparseEncoder]

references/losses_sparse_encoder.md — SpladeLoss wrapper requirement; FLOPS regularizer weights; smoke-test active-dim ramp behavior.
references/evaluators_sparse_encoder.md — SparseNanoBEIREvaluator (English-only) and the in-domain alternative; eval_{name}_{primary_metric} key format.
scripts/train_sparse_encoder_example.py — production template; copy this as your starting point.

Cross-cutting — always required (regardless of task)

references/training_args.md — TrainingArguments knobs, precision rules (load fp32 + autocast bf16/fp16; never torch_dtype=bfloat16), warmup_steps (float) vs deprecated warmup_ratio, save_steps must be a multiple of eval_steps for load_best_model_at_end, schedulers, HPO, tracker, resume, hub-push variants.
references/dataset_formats.md — column-matching rules (label name auto-detection; column-order-not-name); reshaping recipes; hard-negative mining options.
references/base_model_selection.md — discovery commands; per-type model namespaces; ModernBERT-family max_seq_length=8192 trap; datasets >= 4 script-loader rejection; non-English starting-point shortcuts.
references/troubleshooting.md — symptom-indexed failure recipes. Skim the section headings on every run, even a healthy one; the "Metrics don't improve" and "Hub push fails" entries cover bugs that bite frequently and are cheaper to recognize before they fire than to debug after.

Cross-cutting — load when applicable

references/hardware_guide.md — VRAM sizing, multi-GPU, FSDP / DeepSpeed, HF Jobs flavors. Required for >24GB models, multi-GPU, or HF Jobs runs.
references/hf_jobs_execution.md — required when running on HF Jobs.
references/prompts_and_instructions.md — required when using prompt-tuned bases (E5, BGE, GTE, Qwen3-Embedding, Instructor, Nomic, etc.) or adding query: / passage: style prefixes.

Variant scripts (open when the task matches)

[SentenceTransformer] scripts/train_sentence_transformer_<matryoshka|multi_dataset|with_lora|distillation|make_multilingual|static_embedding>_example.py.
[CrossEncoder] scripts/train_cross_encoder_<distillation|listwise>_example.py.
[SparseEncoder] scripts/train_sparse_encoder_distillation_example.py.
Hard-negative mining CLI — scripts/mine_hard_negatives.py.

3. Defaults

Override only if the user specifies otherwise:

Local execution. Pitch HF Jobs only if local hardware can't fit the job.
Single run. After it completes, propose experimentation if the user would benefit (weak/marginal verdict, "see how high you can push it" framing, etc.). Iteration rules in references/training_args.md (Experimentation section).
Public Hub push at end-of-run, wrapped in try-except. On HF Jobs (ephemeral env) ALSO enable in-trainer push (push_to_hub=True + hub_strategy="every_save"); details in references/hf_jobs_execution.md.

4. Constraints the produced script must satisfy

These are non-negotiable contracts. Implementation lives in the production templates and references — do not reinvent.

Capture the pre-training evaluator score as baseline_eval before trainer.train().
Emit a single end-of-run line: VERDICT: WIN|MARGINAL|REGRESSION | score=... | baseline=... | delta=.... A monitor scrapes for this.
Silence httpx, httpcore, huggingface_hub, urllib3, filelock, fsspec to WARNING (otherwise HF download URLs flood the agent's context).
Tee logs to logs/{RUN_NAME}.log.
End with model.push_to_hub(...) wrapped in try/except.
Smoke-test before any long run (max_steps=1 + tiny dataset slice). The production templates show one common pattern (SMOKE_TEST env var).
[CrossEncoder] Include EarlyStoppingCallback(patience>=3) — CE rerankers often peak mid-training and regress.
[SparseEncoder] Log query_active_dims / corpus_active_dims on the verdict line; high nDCG with collapsed sparsity is not a win. The keys come back name-prefixed (e.g. ..._query_active_dims); use suffix matching to pluck them — see the SPARSE production template for the exact pattern.

5. Workflow

Identify the model type (§1). Ask if ambiguous.
Load the §2 required-reading files for that type.
Open scripts/train_<type>_example.py and copy it as your starting point.
Replace MODEL_NAME, DATASET_NAME, RUN_NAME, the loss, and the evaluator with the user's task. Cross-check loss/data-shape match against references/losses_<type>.md; cross-check the metric_for_best_model key against references/evaluators_<type>.md (named evaluators format the key as eval_{name}_{primary_metric}).
Smoke-test (max_steps=1).
Run.
After the run, append to logs/experiments.md and propose iteration if the verdict is weak/marginal.

Prerequisites

pip install "sentence-transformers[train]>=5.0"        # add [train,image] / [audio] / [video] for [SentenceTransformer] multimodal
pip install trackio                                    # optional tracker; or wandb / tensorboard / mlflow
hf auth login                                          # or set HF_TOKEN with write scope (for Hub push)

GPU strongly recommended. CPU works only for demos and [SentenceTransformer] StaticEmbedding.

huggingface/train-sentence-transformers

skills/train-sentence-transformers/SKILL.md

Train or fine-tune sentence-transformers models across `SentenceTransformer` (bi-encoder; dense or static embedding model; for retrieval, similarity, clustering, classification, paraphrase mining, dedup, multimodal), `CrossEncoder` (reranker; pair scoring for two-stage retrieval / pair classification), and `SparseEncoder` (SPLADE, sparse embedding model; for learned-sparse retrieval). Covers loss selection, hard-negative mining, evaluators, distillation, LoRA, Matryoshka, and Hugging Face Hub publishing. Use for any sentence-transformers training task.

10,411 stars

development

Updated May 8, 2026

$ install --global

skillsauth

npx skillsauth add huggingface/skills train-sentence-transformers

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 8, 2026, 4:00 AM49.2s28 files scanned

SKILL.md

name:: train-sentence-transformers
description:: Train or fine-tune sentence-transformers models across `SentenceTransformer` (bi-encoder; dense or static embedding model; for retrieval, similarity, clustering, classification, paraphrase mining, dedup, multimodal), `CrossEncoder` (reranker; pair scoring for two-stage retrieval / pair classification), and `SparseEncoder` (SPLADE, sparse embedding model; for learned-sparse retrieval). Covers loss selection, hard-negative mining, evaluators, distillation, LoRA, Matryoshka, and Hugging Face Hub publishing. Use for any sentence-transformers training task.

Train a sentence-transformers Model

1. Identify the model type

2. Required reading

Read these in full before writing any code. Do not triage by perceived relevance.

Per-type — always required

[SentenceTransformer]

references/losses_sentence_transformer.md — loss-to-data-shape mapping; BatchSamplers.NO_DUPLICATES requirement for MNRL-family; Cached* ↔ gradient_checkpointing incompatibility.
references/evaluators_sentence_transformer.md — evaluator-to-task mapping; metric_for_best_model key construction (named vs unnamed); per-evaluator primary_metric values.
references/model_architectures.md — encoder vs decoder vs static vs Router pipelines; pooling rules (mean / cls / lasttoken); auto-mean-pooling behavior for fresh-start MLM bases.
scripts/train_sentence_transformer_example.py — production template; copy this as your starting point.

[CrossEncoder]

references/losses_cross_encoder.md — pointwise / pairwise / listwise / distillation; pos_weight derivation; activation_fn=Identity() mandatory for non-BCE losses (silent eval-rank collapse otherwise).
references/evaluators_cross_encoder.md — CrossEncoderRerankingEvaluator recipe; named-evaluator key format eval_{name}_{primary_metric}.
scripts/train_cross_encoder_example.py — production template; copy this as your starting point.

[SparseEncoder]

references/losses_sparse_encoder.md — SpladeLoss wrapper requirement; FLOPS regularizer weights; smoke-test active-dim ramp behavior.
references/evaluators_sparse_encoder.md — SparseNanoBEIREvaluator (English-only) and the in-domain alternative; eval_{name}_{primary_metric} key format.
scripts/train_sparse_encoder_example.py — production template; copy this as your starting point.

Cross-cutting — always required (regardless of task)

references/training_args.md — TrainingArguments knobs, precision rules (load fp32 + autocast bf16/fp16; never torch_dtype=bfloat16), warmup_steps (float) vs deprecated warmup_ratio, save_steps must be a multiple of eval_steps for load_best_model_at_end, schedulers, HPO, tracker, resume, hub-push variants.
references/dataset_formats.md — column-matching rules (label name auto-detection; column-order-not-name); reshaping recipes; hard-negative mining options.
references/base_model_selection.md — discovery commands; per-type model namespaces; ModernBERT-family max_seq_length=8192 trap; datasets >= 4 script-loader rejection; non-English starting-point shortcuts.
references/troubleshooting.md — symptom-indexed failure recipes. Skim the section headings on every run, even a healthy one; the "Metrics don't improve" and "Hub push fails" entries cover bugs that bite frequently and are cheaper to recognize before they fire than to debug after.

Cross-cutting — load when applicable

references/hardware_guide.md — VRAM sizing, multi-GPU, FSDP / DeepSpeed, HF Jobs flavors. Required for >24GB models, multi-GPU, or HF Jobs runs.
references/hf_jobs_execution.md — required when running on HF Jobs.
references/prompts_and_instructions.md — required when using prompt-tuned bases (E5, BGE, GTE, Qwen3-Embedding, Instructor, Nomic, etc.) or adding query: / passage: style prefixes.

Variant scripts (open when the task matches)

[SentenceTransformer] scripts/train_sentence_transformer_<matryoshka|multi_dataset|with_lora|distillation|make_multilingual|static_embedding>_example.py.
[CrossEncoder] scripts/train_cross_encoder_<distillation|listwise>_example.py.
[SparseEncoder] scripts/train_sparse_encoder_distillation_example.py.
Hard-negative mining CLI — scripts/mine_hard_negatives.py.

3. Defaults

Override only if the user specifies otherwise:

Local execution. Pitch HF Jobs only if local hardware can't fit the job.
Single run. After it completes, propose experimentation if the user would benefit (weak/marginal verdict, "see how high you can push it" framing, etc.). Iteration rules in references/training_args.md (Experimentation section).
Public Hub push at end-of-run, wrapped in try-except. On HF Jobs (ephemeral env) ALSO enable in-trainer push (push_to_hub=True + hub_strategy="every_save"); details in references/hf_jobs_execution.md.

4. Constraints the produced script must satisfy

These are non-negotiable contracts. Implementation lives in the production templates and references — do not reinvent.

Capture the pre-training evaluator score as baseline_eval before trainer.train().
Emit a single end-of-run line: VERDICT: WIN|MARGINAL|REGRESSION | score=... | baseline=... | delta=.... A monitor scrapes for this.
Silence httpx, httpcore, huggingface_hub, urllib3, filelock, fsspec to WARNING (otherwise HF download URLs flood the agent's context).
Tee logs to logs/{RUN_NAME}.log.
End with model.push_to_hub(...) wrapped in try/except.
Smoke-test before any long run (max_steps=1 + tiny dataset slice). The production templates show one common pattern (SMOKE_TEST env var).
[CrossEncoder] Include EarlyStoppingCallback(patience>=3) — CE rerankers often peak mid-training and regress.
[SparseEncoder] Log query_active_dims / corpus_active_dims on the verdict line; high nDCG with collapsed sparsity is not a win. The keys come back name-prefixed (e.g. ..._query_active_dims); use suffix matching to pluck them — see the SPARSE production template for the exact pattern.

5. Workflow

Identify the model type (§1). Ask if ambiguous.
Load the §2 required-reading files for that type.
Open scripts/train_<type>_example.py and copy it as your starting point.
Replace MODEL_NAME, DATASET_NAME, RUN_NAME, the loss, and the evaluator with the user's task. Cross-check loss/data-shape match against references/losses_<type>.md; cross-check the metric_for_best_model key against references/evaluators_<type>.md (named evaluators format the key as eval_{name}_{primary_metric}).
Smoke-test (max_steps=1).
Run.
After the run, append to logs/experiments.md and propose iteration if the verdict is weak/marginal.

Prerequisites

pip install "sentence-transformers[train]>=5.0"        # add [train,image] / [audio] / [video] for [SentenceTransformer] multimodal
pip install trackio                                    # optional tracker; or wandb / tensorboard / mlflow
hf auth login                                          # or set HF_TOKEN with write scope (for Hub push)

GPU strongly recommended. CPU works only for demos and [SentenceTransformer] StaticEmbedding.

Related Skills

huggingface/huggingface-spaces

development

VerifiedTrustedOfficial

Build, deploy, and maintain applications on Hugging Face Spaces — Gradio / Docker / Static SDKs, ZeroGPU and dedicated hardware, model loading, debugging, buckets, inference providers, community grants. Use whenever the user asks to create or host an app on Hugging Face, port code onto ZeroGPU, fix a Space that won't build or run, or otherwise work with `hf spaces …`, `@spaces.GPU`, Space README frontmatter, or the `spaces` Python package.

10,878SKILL.mdUpdated Jun 6, 2026

huggingface/huggingface-spaces

huggingface/huggingface-lora-space-builder

development

VerifiedTrustedOfficial

Build and publish a Gradio demo on Hugging Face Spaces for a user-provided LoRA. Use when someone asks to create, generate, ship, or publish a Space, demo, Gradio app, or playground for a LoRA — including LoRAs for Qwen-Image, Qwen-Image-Edit, LTX-Video, Wan, FLUX, SDXL, or other diffusion base models. Also triggers when someone describes a LoRA they trained or hosts on the Hub and wants to share it. Covers picking the right base pipeline and `diffusers` inference recipe, designing a UI tailored to the LoRA's task and inputs (Union/multi-task control, edit, video, image, etc.), respecting model-card recommendations (trigger words, steps, guidance, LoRA scale, example inputs), and shipping to ZeroGPU hardware as a private Space by default.

10,824SKILL.mdUpdated Jun 9, 2026

huggingface/huggingface-lora-space-builder

huggingface/hf-cli

tools

VerifiedTrustedOfficial

Hugging Face Hub CLI (`hf`) for downloading, uploading, and managing models, datasets, spaces, buckets, repos, papers, jobs, and more on the Hugging Face Hub. Use when: handling authentication; managing local cache; managing Hugging Face Buckets; running or scheduling jobs on Hugging Face infrastructure; managing Hugging Face repos; discussions and pull requests; browsing models, datasets and spaces; reading, searching, or browsing academic papers; managing collections; querying datasets; configuring spaces; setting up webhooks; or deploying and managing HF Inference Endpoints. Make sure to use this skill whenever the user mentions 'hf', 'huggingface', 'Hugging Face', 'huggingface-cli', or 'hugging face cli', or wants to do anything related to the Hugging Face ecosystem and to AI and ML in general. Also use for cloud storage needs like training checkpoints, data pipelines, or agent traces. Use even if the user doesn't explicitly ask for a CLI command. Replaces the deprecated `huggingface-cli`.

10,787SKILL.mdUpdated Mar 15, 2026

huggingface/hf-cloud-serving-image-selection

tools

VerifiedTrustedOfficial

Pick the right serving container for a SageMaker model deployment and find its current image URI. Use this skill whenever about to deploy a model to a SageMaker endpoint and an image URI needs to be chosen — including when the user says "deploy this LLM", "host this HuggingFace model", "serve this fine-tuned model", "deploy this embedding model", "host a reranker", "serve a sentence-transformers model", or when about to hardcode any container URI in deployment code. HuggingFace-curated Deep Learning Containers are ALWAYS preferred: HuggingFace vLLM (LLMs and generative rerankers), HuggingFace vLLM-Omni (multimodal), TEI (embeddings/cross-encoder rerankers), HF Inference Toolkit (other transformers). Generic images (AWS vLLM, DJL-LMI, SGLang) are used only when no HuggingFace image is compatible — never merely because they carry a newer version. Never hardcode a container URI from memory and never default to TGI. Prevents stale-image failures and wrong-region URIs.

10,782SKILL.mdUpdated Jul 8, 2026

huggingface/hf-cloud-serving-image-selection

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/huggingface/skills.git

# Copy into Claude Code skills folder (global)
cp -r skills/skills/train-sentence-transformers ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

huggingface/skills

10,411 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT