semantic-grep/SKILL.md
In-process semantic search over text files or in-memory strings, using Gemini embeddings via the CF AI Gateway. Use when user wants fuzzy/conceptual search where exact-keyword grep would miss — "sessions discussing regulatory constraints", "code about retry logic", "notes mentioning burnout even if the word isn't there". Complements searching-codebases (regex/AST) and extracting-keywords (YAKE). Do NOT use when an exact string/regex match is what's wanted — grep/rg wins on speed and precision there.
npx skillsauth add oaustegard/claude-skills semantic-grepInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
jina-grep-style semantic search, done in-process via Python rather than as an external CLI. Embeds query + corpus chunks with gemini-embedding-001, ranks by cosine similarity, returns grep-format output.
The core trade-off (lifted from jina-grep-cli's own docs and validated in testing):
| Task | Tool |
|------|------|
| Known exact string, filename, or regex | grep / rg / searching-codebases |
| "What files discuss concept X" when X may not appear verbatim | semantic-grep |
| Hybrid: prefilter with grep, rerank by concept | grep → rerank_candidates() |
Regression test result (workshop session corpus, 135 docs):
Rule: when the user query reads like a named entity or keyword, try grep first. Only reach for semantic-grep when paraphrase/concept matching is actually needed.
Credentials via proxy.env (Cloudflare AI Gateway w/ BYOK — same pattern as invoking-gemini):
CF_ACCOUNT_ID=...
CF_GATEWAY_ID=...
CF_API_TOKEN=...
Direct-API fallback: GOOGLE_API_KEY or GEMINI_API_KEY env var. No dependencies beyond requests + numpy.
import sys
sys.path.insert(0, '/mnt/skills/user/semantic-grep/scripts')
from semantic_grep import semantic_grep, format_grep
# Directory of .txt files
results = semantic_grep("error handling under load", "/path/to/notes",
top_k=5, granularity="paragraph")
print(format_grep(results))
# notes/incidents.txt:42: When the queue depth exceeds... [0.71]
# notes/postmortem.txt:8: Under sustained traffic we saw... [0.68]
semantic_grep(query, corpus, *, top_k=10, threshold=None, ...)Main search function.
query (str) — the search query (embedded with RETRIEVAL_QUERY task type)corpus (str | Path | list[Chunk]) — a file, directory, or pre-chunked listtop_k (int | None) — max results; None = all above thresholdthreshold (float | None) — cosine similarity cutoff; None = no filter (top_k only)granularity ("paragraph" | "line") — how to chunk files (default paragraph)include (str) — filename-glob filter when corpus is a directory (default "*.txt"). Matches against Path.name only, not the full path — "*.md" works, "docs/*.md" does not.model (str) — default "gemini-embedding-001"dim (int) — 128 / 768 / 1536 / 3072 (default 768; MRL-truncated + renormalized)task ("text" | "code") — selects text vs code task typesReturns list[Match] where Match has path, line, text, score.
load_corpus(path, *, include="*.txt", granularity="paragraph") -> list[Chunk]Load and chunk a file or directory without embedding. Useful for inspecting what gets embedded before paying for the API call.
embed_batch(texts, task_type, *, model, dim, group_size=100) -> np.ndarrayLower-level: embed a list of strings directly via :batchEmbedContents. Returns (N, dim) float32 array, rows normalized when dim < 3072.
format_grep(matches, *, max_text_chars=200, show_score=True) -> strFormat matches as grep output: path:line: snippet [score].
The highest-leverage use isn't naive full-corpus semantic search — it's hybrid retrieval: fast coarse filter → semantic rerank.
import subprocess
from semantic_grep import Chunk, semantic_grep, format_grep
# Stage 1: fast exact/regex prefilter with rg
result = subprocess.run(
["rg", "-n", "--no-heading", "error|fail|timeout", "logs/"],
capture_output=True, text=True,
)
# Parse `path:line:text` into Chunks
chunks = []
for raw in result.stdout.splitlines():
path, line, text = raw.split(":", 2)
chunks.append(Chunk(path=path, line=int(line), text=text))
# Stage 2: semantic rerank on the prefiltered subset
ranked = semantic_grep("intermittent queue saturation during peak traffic",
chunks, top_k=10)
print(format_grep(ranked))
This is how you scale past the "embed the whole corpus every call" limit without needing a vector DB. The exact-match stage cheaply cuts millions of lines to thousands; semantic reranks those.
RETRIEVAL_QUERY, docs → RETRIEVAL_DOCUMENT. Asymmetric — documented to outperform symmetric encoding for retrieval.CODE_RETRIEVAL_QUERY, docs → RETRIEVAL_DOCUMENT. Use when searching code with natural-language queries.Use SEMANTIC_SIMILARITY (symmetric) only if you're doing pairwise sim, not retrieval. This module doesn't expose that path yet.
gemini-embedding-001 (GA since Feb 2026):
gemini-embedding-2-preview (March 2026) is multimodal and currently top of MTEB. Set model="gemini-embedding-2-preview" to opt in once the preview stabilizes.
semantic_grep pre-allocates (N, dim) float32; 1M chunks at dim=768 ≈ 3GB. Caller is responsible for sane chunk counts. load_corpus also follows symlinks via rglob — fine in a trusted single-user container, not for untrusted paths.group_size=100 per HTTP call; groups run serially. For >1K chunks, add asyncio — not needed yet.invoking-gemini. Should be factored up when invoking-gemini adds embedding support. Tracked as followup.invoking-gemini — sibling; handles Gemini text + image generation through the same CF gateway. Shares credential pattern.searching-codebases — regex/AST search. Use first when the query is a known pattern.extracting-keywords — YAKE keyword extraction; orthogonal, but pairs well for building query terms from a long prompt.exploring-codebases — for understanding repo structure. Semantic-grep doesn't replace AST-based navigation.Conceptually inspired by jina-grep-cli — we kept the retrieval shape (grep-compatible output, asymmetric query/doc embeddings, threshold + top-k) but swapped the MLX/Apple-Silicon backend for a portable Gemini API call. The original's pipe-mode rerank pattern is the most generalizable idea it contributes and is preserved here.
testing
Disciplined, validation-gated revision of an EXISTING skill so each edit is a measured improvement rather than a guess. Use when editing, revising, or tuning a skill that already exists and there is evidence it underperforms (observed failures, drift, complaints) — invoke by name, or have versioning-skills / creating-skill defer to it before applying edits. Not for authoring a brand-new skill from scratch (use creating-skill) or one-off prose.
development
Skill-aware orchestration with context routing. Decomposes complex tasks into skill-typed subtasks, extracts targeted context subsets, executes subagents in parallel, and synthesizes results. Self-answers trivial lookups inline. No SDK dependency — uses raw HTTP via httpx. Use when tasks require multiple analytical perspectives, when context is large and subtasks only need portions, or when orchestrating-agents spawns too many redundant subagents.
tools
Orchestrates parallel API instances, delegated sub-tasks, and multi-agent workflows with streaming and tool-enabled delegation patterns. Use for parallel analysis, multi-perspective reviews, or complex task decomposition.
development
Invokes Google Gemini models for structured outputs, image generation, multi-modal tasks, and Google-specific features. Use when users request Gemini, image generation, structured JSON output, Google API integration, or cost-effective parallel processing.