.agents/skills/fiftyone-dataset-export/SKILL.md
Exports FiftyOne datasets to standard formats (COCO, YOLO, VOC, CVAT, CSV, etc.) and Hugging Face Hub. Use when converting datasets, exporting for training, creating archives, sharing data in specific formats, or publishing datasets to Hugging Face.
npx skillsauth add datamonsterr/mycoai_projects fiftyone-dataset-exportInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
ALWAYS follow these rules:
set_context(dataset_name="my-dataset")
dataset_summary(name="my-dataset")
Before exporting, present:
Different formats support different label types:
| Format | Label Types | |--------|-------------| | COCO | detections, segmentations, keypoints | | YOLO (v4, v5) | detections | | VOC | detections | | CVAT | classifications, detections, polylines, keypoints | | CSV | all (custom fields) | | Image Classification Directory Tree | classification |
Always use absolute paths for export directories:
params={
"export_dir": {"absolute_path": "/path/to/export"}
}
Check if export directory exists before exporting. If it does, ask user whether to overwrite.
# Set context
set_context(dataset_name="my-dataset")
# Get dataset summary to see fields and label types
dataset_summary(name="my-dataset")
Identify:
# Discover export parameters dynamically
get_operator_schema(operator_uri="@voxel51/io/export_samples")
Before exporting, confirm with the user:
Dataset: my-dataset (5,000 samples)
Media type: image
Available label fields:
- ground_truth (Detections)
- predictions (Detections)
Export options:
- Format: COCO (recommended for detections)
- Export directory: /path/to/export
- Label field: ground_truth
Proceed with export?
Export media and labels:
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "COCO",
"export_dir": {"absolute_path": "/path/to/export"},
"label_field": "ground_truth"
}
)
Export labels only (no media copy):
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "LABELS_ONLY",
"dataset_type": "COCO",
"labels_path": {"absolute_path": "/path/to/labels.json"},
"label_field": "ground_truth"
}
)
Export media only (no labels):
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_ONLY",
"export_dir": {"absolute_path": "/path/to/media"}
}
)
After export, verify the output:
ls -la /path/to/export
Report exported file count and structure to user.
| Format | dataset_type Value | Label Types | Labels-Only |
|--------|----------------------|-------------|-------------|
| COCO | "COCO" | detections, segmentations, keypoints | Yes |
| YOLOv4 | "YOLOv4" | detections | Yes |
| YOLOv5 | "YOLOv5" | detections | No |
| VOC | "VOC" | detections | Yes |
| KITTI | "KITTI" | detections | Yes |
| CVAT Image | "CVAT Image" | classifications, detections, polylines, keypoints | Yes |
| CVAT Video | "CVAT Video" | frame labels | Yes |
| TF Object Detection | "TF Object Detection" | detections | No |
| Format | dataset_type Value | Media Type | Labels-Only |
|--------|----------------------|------------|-------------|
| Image Classification Directory Tree | "Image Classification Directory Tree" | image | No |
| Video Classification Directory Tree | "Video Classification Directory Tree" | video | No |
| TF Image Classification | "TF Image Classification" | image | No |
| Format | dataset_type Value | Label Types | Labels-Only |
|--------|----------------------|-------------|-------------|
| Image Segmentation | "Image Segmentation" | segmentation | Yes |
| Format | dataset_type Value | Best For | Labels-Only |
|--------|----------------------|----------|-------------|
| CSV | "CSV" | Custom fields, spreadsheet analysis | Yes |
| GeoJSON | "GeoJSON" | Geolocation data | Yes |
| FiftyOne Dataset | "FiftyOne Dataset" | Full dataset backup with all metadata | Yes |
Note: Formats with "Labels-Only: No" require export_type: "MEDIA_AND_LABELS" (cannot export labels without media).
| export_type Value | Description |
|---------------------|-------------|
| "MEDIA_AND_LABELS" | Export both media files and labels |
| "LABELS_ONLY" | Export labels only (use labels_path instead of export_dir) |
| "MEDIA_ONLY" | Export media files only (no labels) |
| "FILEPATHS_ONLY" | Export CSV with filepaths only |
Export from different sources:
| target Value | Description |
|----------------|-------------|
| "DATASET" | Export entire dataset (default) |
| "CURRENT_VIEW" | Export current filtered view |
| "SELECTED_SAMPLES" | Export selected samples only |
For training with frameworks that use COCO format:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "COCO",
"export_dir": {"absolute_path": "/path/to/coco_export"},
"label_field": "ground_truth"
}
)
Output structure:
coco_export/
├── data/
│ ├── image1.jpg
│ └── image2.jpg
└── labels.json
For training YOLOv5/v8 models:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "YOLOv5",
"export_dir": {"absolute_path": "/path/to/yolo_export"},
"label_field": "ground_truth"
}
)
Output structure:
yolo_export/
├── images/
│ └── train/
│ └── image1.jpg
├── labels/
│ └── train/
│ └── image1.txt
└── dataset.yaml
Export only a subset of samples:
# Set context
set_context(dataset_name="my-dataset")
# Filter samples in the App
set_view(tags=["validated"])
# Export the filtered view
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"target": "CURRENT_VIEW",
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "COCO",
"export_dir": {"absolute_path": "/path/to/validated_export"},
"label_field": "ground_truth"
}
)
When media should stay in place:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "LABELS_ONLY",
"dataset_type": "COCO",
"labels_path": {"absolute_path": "/path/to/annotations.json"},
"label_field": "ground_truth"
}
)
For image classification datasets:
set_context(dataset_name="my-classification-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "Image Classification Directory Tree",
"export_dir": {"absolute_path": "/path/to/classification_export"},
"label_field": "ground_truth"
}
)
Output structure:
classification_export/
├── cat/
│ ├── cat1.jpg
│ └── cat2.jpg
└── dog/
├── dog1.jpg
└── dog2.jpg
For analysis in spreadsheets:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "LABELS_ONLY",
"dataset_type": "CSV",
"labels_path": {"absolute_path": "/path/to/data.csv"},
"csv_fields": ["filepath", "ground_truth.detections.label"]
}
)
For complete dataset backup including all metadata:
set_context(dataset_name="my-dataset")
execute_operator(
operator_uri="@voxel51/io/export_samples",
params={
"export_type": "MEDIA_AND_LABELS",
"dataset_type": "FiftyOne Dataset",
"export_dir": {"absolute_path": "/path/to/backup"}
}
)
Output structure:
backup/
├── metadata.json
├── samples.json
├── data/
│ └── ...
├── annotations/
├── brain/
└── evaluations/
For more control, guide users to use the Python SDK directly:
import fiftyone as fo
import fiftyone.types as fot
# Load dataset
dataset = fo.load_dataset("my-dataset")
# Export to COCO format
dataset.export(
export_dir="/path/to/export",
dataset_type=fot.COCODetectionDataset,
label_field="ground_truth",
)
# Export labels only
dataset.export(
labels_path="/path/to/labels.json",
dataset_type=fot.COCODetectionDataset,
label_field="ground_truth",
)
# Export a filtered view
view = dataset.match_tags("validated")
view.export(
export_dir="/path/to/validated",
dataset_type=fot.YOLOv5Dataset,
label_field="ground_truth",
)
Python SDK dataset types:
fot.COCODetectionDataset - COCO formatfot.YOLOv4Dataset - YOLOv4 formatfot.YOLOv5Dataset - YOLOv5 formatfot.VOCDetectionDataset - Pascal VOC formatfot.KITTIDetectionDataset - KITTI formatfot.CVATImageDataset - CVAT image formatfot.CVATVideoDataset - CVAT video formatfot.TFObjectDetectionDataset - TensorFlow Object Detection formatfot.ImageClassificationDirectoryTree - Classification folder structurefot.VideoClassificationDirectoryTree - Video classification foldersfot.TFImageClassificationDataset - TensorFlow classification formatfot.ImageSegmentationDirectory - Segmentation masksfot.CSVDataset - CSV formatfot.GeoJSONDataset - GeoJSON formatfot.FiftyOneDataset - Native FiftyOne formatFor complete HF Hub export documentation, see HF-HUB-EXPORT.md.
Quick reference:
| Method | Use Case |
|--------|----------|
| push_to_hub() | Personal accounts, simple upload |
| Manual upload | Organizations, private org repos |
Quick start:
from fiftyone.utils.huggingface import push_to_hub
# Personal account
push_to_hub(dataset, repo_name="my-dataset", private=False)
# With options
push_to_hub(
dataset,
repo_name="my-dataset",
description="My dataset description",
license="apache-2.0",
private=True,
)
IMPORTANT: Always generate and get user approval for dataset card before uploading. See HF-HUB-EXPORT.md for complete documentation including authentication setup, dataset card workflow, parameters reference, use cases, and troubleshooting.
Error: "Export directory already exists"
"overwrite": true to paramsError: "Label field not found"
dataset_summary() to see available label fieldsError: "Unsupported label type for format"
Error: "Permission denied"
Export is slow
LABELS_ONLY export typedataset_summary() to know what fields and label types existlabels_path not export_dirdata-ai
Foundation model for image segmentation with zero-shot transfer. Use when you need to segment any object in images using points, boxes, or masks as prompts, or automatically generate all object masks in an image.
development
Implement comprehensive testing strategies with pytest, fixtures, mocking, and test-driven development. Use when writing Python tests, setting up test suites, or implementing testing best practices.
tools
Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).
development
Process images for web development — resize, crop, trim whitespace, convert formats (PNG/WebP/JPG), optimise file size, generate thumbnails, create OG card images. Uses Pillow (Python) — no ImageMagick needed. Trigger with 'resize image', 'convert to webp', 'trim logo', 'optimise images', 'make thumbnail', 'create OG image', 'crop whitespace', 'process image', or 'image too large'.