skills/lance-user-guide/SKILL.md
Guide Code Agents to help Lance users write/read datasets and build/choose indices. Use when a user asks how to use Lance (Python/Rust/CLI), how to write_dataset/open/scan, how to build vector indexes (IVF_PQ, IVF_HNSW_*), how to build scalar indexes (BTREE, BITMAP, LABEL_LIST, NGRAM, INVERTED, BLOOMFILTER, RTREE, etc.), how to combine filters with vector search, or how to debug indexing and scan performance.
npx skillsauth add lance-format/lance lance-user-guideInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill to answer questions about:
Do not use this skill for:
Python:
pip install pylance
Verify:
python -c "import lance; print(lance.__version__)"
Rust:
cargo add lance
Or add it to Cargo.toml (choose an appropriate version for your project):
[dependencies]
lance = "x.y"
From source (this repository):
maturin develop -m python/Cargo.toml
Collect the minimum information required to avoid wrong guidance:
If the user does not specify a language, default to Python examples and provide a short mapping to Rust concepts.
references/index-selection.md and confirm constraints.Prefer lance.write_dataset for most user workflows.
import lance
import pyarrow as pa
vectors = pa.array(
[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
type=pa.list_(pa.float32(), 3),
)
table = pa.table({"id": [1, 2], "vector": vectors, "category": ["a", "b"]})
ds = lance.write_dataset(table, "my-data.lance", mode="create")
ds = lance.write_dataset(table, "my-data.lance", mode="append")
ds = lance.write_dataset(table, "my-data.lance", mode="overwrite")
Validation checklist:
lance.dataset(uri).count_rows()lance.dataset(uri).schemaNotes:
storage_options={...} when writing to an object store URI.commit_lock and point them to the user guide.Use lance.dataset + scanner(...) for pushdowns (projection, filter, limit, nearest).
import lance
ds = lance.dataset("my-data.lance")
tbl = ds.scanner(
columns=["id", "category"],
filter="category = 'a' and id >= 10",
limit=100,
).to_table()
Validation checklist:
scanner(...) call that reproduces it.filter string and whether prefilter is enabled (when using nearest).Run vector search with scanner(nearest=...) or to_table(nearest=...).
import lance
import numpy as np
ds = lance.dataset("my-data.lance")
q = np.array([1.0, 2.0, 3.0], dtype=np.float32)
tbl = ds.to_table(nearest={"column": "vector", "q": q, "k": 10})
If combining a filter with vector search, decide whether the filter must run before the vector query:
prefilter=True when the filter is highly selective and correctness (top-k among filtered rows) matters.prefilter=False when the filter is not very selective and speed matters, and accept that results can be fewer than k.tbl = ds.scanner(
nearest={"column": "vector", "q": q, "k": 10},
filter="category = 'a'",
prefilter=True,
).to_table()
Create a vector index with LanceDataset.create_index(...).
Start with a minimal working configuration:
ds = lance.dataset("my-data.lance")
ds = ds.create_index(
"vector",
index_type="IVF_PQ",
target_partition_size=8192,
num_sub_vectors=16,
)
Then verify:
ds.describe_indices() (preferred) or ds.list_indices() (can be expensive)nearest query that uses the indexFor parameter selection and tuning, consult references/index-selection.md.
Scalar indices speed up scans with filters. Use create_scalar_index for a stable entry point.
ds = lance.dataset("my-data.lance")
ds.create_scalar_index("category", "BTREE", replace=True)
Then verify:
ds.describe_indices()scanner(filter=...) queryTo choose a scalar index type (BTREE vs BITMAP vs LABEL_LIST vs NGRAM vs INVERTED, etc.), consult references/index-selection.md.
prefilter=True if the user expects top-k among filtered rows.num_sub_vectors.dimension / num_sub_vectors (see references/index-selection.md).use_scalar_index=False).When answering API questions, confirm the exact signature and docstrings locally:
python/python/lance/dataset.py (write_dataset, LanceDataset.scanner)python/python/lance/dataset.py (create_index)python/python/lance/dataset.py (create_scalar_index)Use targeted search:
rg -n "def write_dataset\\b|def create_index\\b|def create_scalar_index\\b|def scanner\\b" python/python/lance/dataset.py
references/index-selection.mdreferences/io-cheatsheet.mdscripts/python_end_to_end.pytools
Use when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layers like Lobster, ACPX, plugins, or plain code. Keep conditional logic in the caller; use TaskFlow for flow identity, child-task linkage, waiting state, revision-checked mutations, and user-facing emergence.
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------
tools
A CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.