skills/adonaivera/fiftyone-find-duplicates/SKILL.md
Find duplicate or near-duplicate images in FiftyOne datasets using brain similarity computation. Use when users want to deduplicate datasets, find similar images, cluster visually similar content, or remove redundant samples. Requires FiftyOne MCP server with @voxel51/brain plugin installed.
npx skillsauth add aiskillstore/marketplace fiftyone-find-duplicatesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Find and remove duplicate or near-duplicate images using FiftyOne's brain similarity operators. Uses deep learning embeddings to identify visually similar images.
Use this skill when:
@voxel51/brain plugin installed and enabledALWAYS follow these rules:
set_context(dataset_name="my-dataset")
Brain operators are delegated and require the app:
launch_app()
Wait 5-10 seconds for initialization.
# List all brain operators
list_operators(builtin_only=False)
# Get schema for specific operator
get_operator_schema(operator_uri="@voxel51/brain/compute_similarity")
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={"brain_key": "img_sim", "model": "mobilenet-v2-imagenet-torch"}
)
close_app()
# Set context
set_context(dataset_name="my-dataset")
# Launch app (required for brain operators)
launch_app()
# Check if brain plugin is available
list_plugins(enabled=True)
# If not installed:
download_plugin(
url_or_repo="voxel51/fiftyone-plugins",
plugin_names=["@voxel51/brain"]
)
enable_plugin(plugin_name="@voxel51/brain")
# List all available operators
list_operators(builtin_only=False)
# Get schema for compute_similarity
get_operator_schema(operator_uri="@voxel51/brain/compute_similarity")
# Get schema for find_duplicates
get_operator_schema(operator_uri="@voxel51/brain/find_duplicates")
# Execute operator to compute embeddings
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={
"brain_key": "img_duplicates",
"model": "mobilenet-v2-imagenet-torch"
}
)
execute_operator(
operator_uri="@voxel51/brain/find_near_duplicates",
params={
"similarity_index": "img_duplicates",
"threshold": 0.3
}
)
Threshold guidelines (distance-based, lower = more similar):
0.1 = Very similar (near-exact duplicates)0.3 = Near duplicates (recommended default)0.5 = Similar images0.7 = Loosely similarThis operator creates two saved views automatically:
near duplicates: all samples that are near duplicatesrepresentatives of near duplicates: one representative from each groupAfter finding duplicates, use set_view to display them in the FiftyOne App:
Option A: Filter by near_dup_id field
# Show all samples that have a near_dup_id (all duplicates)
set_view(exists=["near_dup_id"])
Option B: Show specific duplicate group
# Show samples with a specific duplicate group ID
set_view(filters={"near_dup_id": 1})
Option C: Load saved view (if available)
# Load the automatically created saved view
set_view(view_name="near duplicates")
Option D: Clear filter to show all samples
clear_view()
The find_near_duplicates operator adds a near_dup_id field to samples. Samples with the same ID are duplicates of each other.
Option A: Use deduplicate operator (keeps one representative per group)
execute_operator(
operator_uri="@voxel51/brain/deduplicate_near_duplicates",
params={}
)
Option B: Manual deletion from App UI
set_view(exists=["near_dup_id"]) to show duplicatesclose_app()
| Tool | Description |
|------|-------------|
| set_view(exists=[...]) | Filter samples where field(s) have non-None values |
| set_view(filters={...}) | Filter samples by exact field values |
| set_view(tags=[...]) | Filter samples by tags |
| set_view(sample_ids=[...]) | Select specific sample IDs |
| set_view(view_name="...") | Load a saved view by name |
| clear_view() | Clear filters, show all samples |
Use list_operators() to discover and get_operator_schema() to see parameters:
| Operator | Description |
|----------|-------------|
| @voxel51/brain/compute_similarity | Compute embeddings and similarity index |
| @voxel51/brain/find_near_duplicates | Find near-duplicate samples |
| @voxel51/brain/deduplicate_near_duplicates | Delete duplicates, keep representatives |
| @voxel51/brain/find_exact_duplicates | Find exact duplicate media files |
| @voxel51/brain/deduplicate_exact_duplicates | Delete exact duplicates |
| @voxel51/brain/compute_uniqueness | Compute uniqueness scores |
For accidentally duplicated files (identical bytes):
set_context(dataset_name="my-dataset")
launch_app()
execute_operator(
operator_uri="@voxel51/brain/find_exact_duplicates",
params={}
)
execute_operator(
operator_uri="@voxel51/brain/deduplicate_exact_duplicates",
params={}
)
close_app()
For visually similar but not identical images:
set_context(dataset_name="my-dataset")
launch_app()
# Compute embeddings
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={"brain_key": "near_dups", "model": "mobilenet-v2-imagenet-torch"}
)
# Find duplicates
execute_operator(
operator_uri="@voxel51/brain/find_near_duplicates",
params={"similarity_index": "near_dups", "threshold": 0.3}
)
# View duplicates in the App
set_view(exists=["near_dup_id"])
# After review, deduplicate
execute_operator(
operator_uri="@voxel51/brain/deduplicate_near_duplicates",
params={}
)
# Clear view and close
clear_view()
close_app()
Find images similar to a specific sample:
set_context(dataset_name="my-dataset")
launch_app()
execute_operator(
operator_uri="@voxel51/brain/compute_similarity",
params={"brain_key": "search"}
)
execute_operator(
operator_uri="@voxel51/brain/sort_by_similarity",
params={
"brain_key": "search",
"query_id": "sample_id_here",
"k": 20
}
)
close_app()
Error: "No executor available"
find_near_duplicates, deduplicate_near_duplicatesError: "Brain key not found"
compute_similarity first with a brain_keyError: "Operator not found"
download_plugin() and enable_plugin()Error: "Missing dependency" (e.g., torch, tensorflow)
missing_package and install_command{
"error_type": "missing_dependency",
"missing_package": "torch",
"install_command": "pip install torch"
}
Similarity computation is slow
mobilenet-v2-imagenet-torchlist_operators() and get_operator_schema() to get current operator names and parametersbrain_keyEmbedding computation time:
Memory requirements:
Copyright 2017-2025, Voxel51, Inc. Apache 2.0 License
development
Apple Human Interface Guidelines for content display components. Use this skill when the user asks about charts component, collection view, image view, web view, color well, image well, activity view, lockup, data visualization, content display, displaying images, rendering web content, color pickers, or presenting collections of items in Apple apps. Also use when the user says how should I display charts, what's the best way to show images, should I use a web view, how do I build a grid of items, what component shows media, or how do I present a share sheet. Cross-references: hig-foundations for color/typography/accessibility, hig-patterns for data visualization patterns, hig-components-layout for structural containers, hig-platforms for platform-specific component behavior.
tools
Automate HelpDesk tasks via Rube MCP (Composio): list tickets, manage views, use canned responses, and configure custom fields. Always search tools first for current schemas.
testing
Expert Haskell engineer specializing in advanced type systems, pure functional design, and high-reliability software. Use PROACTIVELY for type-level programming, concurrency, and architecture guidance.
tools
GraphQL gives clients exactly the data they need - no more, no less. One endpoint, typed schema, introspection. But the flexibility that makes it powerful also makes it dangerous. Without proper controls, clients can craft queries that bring down your server. This skill covers schema design, resolvers, DataLoader for N+1 prevention, federation for microservices, and client integration with Apollo/urql. Key insight: GraphQL is a contract. The schema is the API documentation. Design it carefully.