Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

oaustegard/semantic-grep

Name: semantic-grep
Author: oaustegard

semantic-grep/SKILL.md

npx skillsauth add oaustegard/claude-skills semantic-grep

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Semantic Grep

jina-grep-style semantic search, done in-process via Python rather than as an external CLI. Embeds query + corpus chunks with gemini-embedding-2, ranks by cosine similarity, returns grep-format output.

When Semantic Search Helps

The core trade-off (lifted from jina-grep-cli's own docs and validated in testing):

| Task | Tool | |------|------| | Known exact string, filename, or regex | grep / rg / searching-codebases | | "What files discuss concept X" when X may not appear verbatim | semantic-grep | | Hybrid: prefilter with grep, rerank by concept | grep → rerank_candidates() |

Regression test result (workshop session corpus, 135 docs):

"handling regulatory constraints" → top hit "Engineering AI Systems Under Sovereignty Constraints" (0.67). ✓
"sessions about GEPA" → top hit "Gemma, DeepMind's Family of Open Models" (0.69). ✗ — false positive on phonetic neighbor. GEPA is mentioned verbatim in one session description; grep would find it correctly.

Rule: when the user query reads like a named entity or keyword, try grep first. Only reach for semantic-grep when paraphrase/concept matching is actually needed.

Setup

Credentials via proxy.env (Cloudflare AI Gateway w/ BYOK — same pattern as invoking-gemini):

CF_ACCOUNT_ID=...
CF_GATEWAY_ID=...
CF_API_TOKEN=...

Direct-API fallback: GOOGLE_API_KEY or GEMINI_API_KEY env var. No dependencies beyond requests + numpy.

Quick Start

import sys
sys.path.insert(0, '/mnt/skills/user/semantic-grep/scripts')
from semantic_grep import semantic_grep, format_grep

# Directory of .txt files
results = semantic_grep("error handling under load", "/path/to/notes",
                        top_k=5, granularity="paragraph")
print(format_grep(results))
# notes/incidents.txt:42:  When the queue depth exceeds... [0.71]
# notes/postmortem.txt:8:  Under sustained traffic we saw... [0.68]

Core API

`semantic_grep(query, corpus, *, top_k=10, threshold=None, ...)`

Main search function.

query (str) — the search query (embedded with RETRIEVAL_QUERY task type)
corpus (str | Path | list[Chunk]) — a file, directory, or pre-chunked list
top_k (int | None) — max results; None = all above threshold
threshold (float | None) — cosine similarity cutoff; None = no filter (top_k only)
granularity ("paragraph" | "line") — how to chunk files (default paragraph)
include (str) — filename-glob filter when corpus is a directory (default "*.txt"). Matches against Path.name only, not the full path — "*.md" works, "docs/*.md" does not.
model (str) — default "gemini-embedding-2". gemini-embedding-001 is retired (text-only) and warns if passed explicitly.
dim (int) — 128 / 768 / 1536 / 3072 (default 768; MRL-truncated + renormalized)
task ("text" | "code") — selects text vs code task types

Returns list[Match] where Match has path, line, text, score.

`load_corpus(path, , include=".txt", granularity="paragraph") -> list[Chunk]`

Load and chunk a file or directory without embedding. Useful for inspecting what gets embedded before paying for the API call.

`embed_batch(texts, task_type, *, model, dim, group_size=100) -> np.ndarray`

Lower-level: embed a list of strings directly via :batchEmbedContents. Returns (N, dim) float32 array, rows normalized when dim < 3072.

`format_grep(matches, *, max_text_chars=200, show_score=True) -> str`

Format matches as grep output: path:line: snippet [score].

Pipe-mode Rerank Pattern

The highest-leverage use isn't naive full-corpus semantic search — it's hybrid retrieval: fast coarse filter → semantic rerank.

import subprocess
from semantic_grep import Chunk, semantic_grep, format_grep

# Stage 1: fast exact/regex prefilter with rg
result = subprocess.run(
    ["rg", "-n", "--no-heading", "error|fail|timeout", "logs/"],
    capture_output=True, text=True,
)

# Parse `path:line:text` into Chunks
chunks = []
for raw in result.stdout.splitlines():
    path, line, text = raw.split(":", 2)
    chunks.append(Chunk(path=path, line=int(line), text=text))

# Stage 2: semantic rerank on the prefiltered subset
ranked = semantic_grep("intermittent queue saturation during peak traffic",
                       chunks, top_k=10)
print(format_grep(ranked))

This is how you scale past the "embed the whole corpus every call" limit without needing a vector DB. The exact-match stage cheaply cuts millions of lines to thousands; semantic reranks those.

Task Types (Gemini)

text mode (default): query → RETRIEVAL_QUERY, docs → RETRIEVAL_DOCUMENT. Asymmetric — documented to outperform symmetric encoding for retrieval.
code mode: query → CODE_RETRIEVAL_QUERY, docs → RETRIEVAL_DOCUMENT. Use when searching code with natural-language queries.

Use SEMANTIC_SIMILARITY (symmetric) only if you're doing pairwise sim, not retrieval. This module doesn't expose that path yet.

Model Notes

gemini-embedding-2 (GA since 2026-04-22) — general-purpose and multimodal. Verified 2026-07-21 via the CF gateway: text, image and audio all embed to the same space at the requested dim, L2-normalized. The retired gemini-embedding-001 was text-only and rejected non-text input with HTTP 400:

2,048 input token limit per text. Longer texts are truncated at ~8K chars (approximation).
Matryoshka (MRL) — 3072 native dims, safely truncatable to 1536/768/256/128.
3072 is auto-normalized; lower dims need client-side renorm (handled here).
Pricing: $0.15 / 1M input tokens. 135 medium paragraphs ≈ 15K tokens ≈ $0.002 per query.

gemini-embedding-2-preview (March 2026) is multimodal and currently top of MTEB. Set model="gemini-embedding-2-preview" to opt in once the preview stabilizes.

Limitations (v0.1.1)

No persistent index. Every call re-embeds the corpus. Fine for <~1K chunks; prohibitive for real knowledge bases. Phase 2: cache embeddings by content hash.
Token budget is approximated by char count (×1.5). Conservative for mixed-script text; over-truncates English slightly. Real tokenizer would use the Gemini tokenizer endpoint but costs an extra call per embed.
Batch bulk-failure diagnostic. If one text in a group of 100 overflows or is rejected by safety filters, the whole batch fails and the 99 good ones are lost. No per-index fallback yet.
No memory ceiling on corpus size. semantic_grep pre-allocates (N, dim) float32; 1M chunks at dim=768 ≈ 3GB. Caller is responsible for sane chunk counts. load_corpus also follows symlinks via rglob — fine in a trusted single-user container, not for untrusted paths.
Sequential batch groups. group_size=100 per HTTP call; groups run serially. For >1K chunks, add asyncio — not needed yet.
No CLI shim. Called as a Python module, not a subprocess. Per design: "within an LLM rather than calling out to one."
Embedding function lives here, not in invoking-gemini. Should be factored up when invoking-gemini adds embedding support. Tracked as followup.

Related Skills

invoking-gemini — sibling; handles Gemini text + image generation through the same CF gateway. Shares credential pattern.
searching-codebases — regex/AST search. Use first when the query is a known pattern.
extracting-keywords — YAKE keyword extraction; orthogonal, but pairs well for building query terms from a long prompt.
exploring-codebases — for understanding repo structure. Semantic-grep doesn't replace AST-based navigation.

Attribution

Conceptually inspired by jina-grep-cli — we kept the retrieval shape (grep-compatible output, asymmetric query/doc embeddings, threshold + top-k) but swapped the MLX/Apple-Silicon backend for a portable Gemini API call. The original's pipe-mode rerank pattern is the most generalizable idea it contributes and is preserved here.

oaustegard/semantic-grep

semantic-grep/SKILL.md

In-process semantic search over text files or in-memory strings, using Gemini embeddings via the CF AI Gateway. Use when user wants fuzzy/conceptual search where exact-keyword grep would miss — "sessions discussing regulatory constraints", "code about retry logic", "notes mentioning burnout even if the word isn't there". Complements searching-codebases (regex/AST) and extracting-keywords (YAKE). Do NOT use when an exact string/regex match is what's wanted — grep/rg wins on speed and precision there.

132 stars

development

Updated Jul 23, 2026

$ install --global

skillsauth

npx skillsauth add oaustegard/claude-skills semantic-grep

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jul 23, 2026, 6:35 AM139.8s4 files scanned

SKILL.md

name:: semantic-grep
description:: In-process semantic search over text files or in-memory strings, using Gemini embeddings via the CF AI Gateway. Use when user wants fuzzy/conceptual search where exact-keyword grep would miss — "sessions discussing regulatory constraints", "code about retry logic", "notes mentioning burnout even if the word isn't there". Complements searching-codebases (regex/AST) and extracting-keywords (YAKE). Do NOT use when an exact string/regex match is what's wanted — grep/rg wins on speed and precision there.
version:: 0.2.0

Semantic Grep

When Semantic Search Helps

The core trade-off (lifted from jina-grep-cli's own docs and validated in testing):

Regression test result (workshop session corpus, 135 docs):

"handling regulatory constraints" → top hit "Engineering AI Systems Under Sovereignty Constraints" (0.67). ✓
"sessions about GEPA" → top hit "Gemma, DeepMind's Family of Open Models" (0.69). ✗ — false positive on phonetic neighbor. GEPA is mentioned verbatim in one session description; grep would find it correctly.

Rule: when the user query reads like a named entity or keyword, try grep first. Only reach for semantic-grep when paraphrase/concept matching is actually needed.

Setup

Credentials via proxy.env (Cloudflare AI Gateway w/ BYOK — same pattern as invoking-gemini):

CF_ACCOUNT_ID=...
CF_GATEWAY_ID=...
CF_API_TOKEN=...

Direct-API fallback: GOOGLE_API_KEY or GEMINI_API_KEY env var. No dependencies beyond requests + numpy.

Quick Start

import sys
sys.path.insert(0, '/mnt/skills/user/semantic-grep/scripts')
from semantic_grep import semantic_grep, format_grep

# Directory of .txt files
results = semantic_grep("error handling under load", "/path/to/notes",
                        top_k=5, granularity="paragraph")
print(format_grep(results))
# notes/incidents.txt:42:  When the queue depth exceeds... [0.71]
# notes/postmortem.txt:8:  Under sustained traffic we saw... [0.68]

Core API

`semantic_grep(query, corpus, *, top_k=10, threshold=None, ...)`

Main search function.

query (str) — the search query (embedded with RETRIEVAL_QUERY task type)
corpus (str | Path | list[Chunk]) — a file, directory, or pre-chunked list
top_k (int | None) — max results; None = all above threshold
threshold (float | None) — cosine similarity cutoff; None = no filter (top_k only)
granularity ("paragraph" | "line") — how to chunk files (default paragraph)
include (str) — filename-glob filter when corpus is a directory (default "*.txt"). Matches against Path.name only, not the full path — "*.md" works, "docs/*.md" does not.
model (str) — default "gemini-embedding-2". gemini-embedding-001 is retired (text-only) and warns if passed explicitly.
dim (int) — 128 / 768 / 1536 / 3072 (default 768; MRL-truncated + renormalized)
task ("text" | "code") — selects text vs code task types

Returns list[Match] where Match has path, line, text, score.

`load_corpus(path, , include=".txt", granularity="paragraph") -> list[Chunk]`

Load and chunk a file or directory without embedding. Useful for inspecting what gets embedded before paying for the API call.

`embed_batch(texts, task_type, *, model, dim, group_size=100) -> np.ndarray`

Lower-level: embed a list of strings directly via :batchEmbedContents. Returns (N, dim) float32 array, rows normalized when dim < 3072.

`format_grep(matches, *, max_text_chars=200, show_score=True) -> str`

Format matches as grep output: path:line: snippet [score].

Pipe-mode Rerank Pattern

The highest-leverage use isn't naive full-corpus semantic search — it's hybrid retrieval: fast coarse filter → semantic rerank.

import subprocess
from semantic_grep import Chunk, semantic_grep, format_grep

# Stage 1: fast exact/regex prefilter with rg
result = subprocess.run(
    ["rg", "-n", "--no-heading", "error|fail|timeout", "logs/"],
    capture_output=True, text=True,
)

# Parse `path:line:text` into Chunks
chunks = []
for raw in result.stdout.splitlines():
    path, line, text = raw.split(":", 2)
    chunks.append(Chunk(path=path, line=int(line), text=text))

# Stage 2: semantic rerank on the prefiltered subset
ranked = semantic_grep("intermittent queue saturation during peak traffic",
                       chunks, top_k=10)
print(format_grep(ranked))

This is how you scale past the "embed the whole corpus every call" limit without needing a vector DB. The exact-match stage cheaply cuts millions of lines to thousands; semantic reranks those.

Task Types (Gemini)

text mode (default): query → RETRIEVAL_QUERY, docs → RETRIEVAL_DOCUMENT. Asymmetric — documented to outperform symmetric encoding for retrieval.
code mode: query → CODE_RETRIEVAL_QUERY, docs → RETRIEVAL_DOCUMENT. Use when searching code with natural-language queries.

Use SEMANTIC_SIMILARITY (symmetric) only if you're doing pairwise sim, not retrieval. This module doesn't expose that path yet.

Model Notes

2,048 input token limit per text. Longer texts are truncated at ~8K chars (approximation).
Matryoshka (MRL) — 3072 native dims, safely truncatable to 1536/768/256/128.
3072 is auto-normalized; lower dims need client-side renorm (handled here).
Pricing: $0.15 / 1M input tokens. 135 medium paragraphs ≈ 15K tokens ≈ $0.002 per query.

gemini-embedding-2-preview (March 2026) is multimodal and currently top of MTEB. Set model="gemini-embedding-2-preview" to opt in once the preview stabilizes.

Limitations (v0.1.1)

No persistent index. Every call re-embeds the corpus. Fine for <~1K chunks; prohibitive for real knowledge bases. Phase 2: cache embeddings by content hash.
Token budget is approximated by char count (×1.5). Conservative for mixed-script text; over-truncates English slightly. Real tokenizer would use the Gemini tokenizer endpoint but costs an extra call per embed.
Batch bulk-failure diagnostic. If one text in a group of 100 overflows or is rejected by safety filters, the whole batch fails and the 99 good ones are lost. No per-index fallback yet.
No memory ceiling on corpus size. semantic_grep pre-allocates (N, dim) float32; 1M chunks at dim=768 ≈ 3GB. Caller is responsible for sane chunk counts. load_corpus also follows symlinks via rglob — fine in a trusted single-user container, not for untrusted paths.
Sequential batch groups. group_size=100 per HTTP call; groups run serially. For >1K chunks, add asyncio — not needed yet.
No CLI shim. Called as a Python module, not a subprocess. Per design: "within an LLM rather than calling out to one."
Embedding function lives here, not in invoking-gemini. Should be factored up when invoking-gemini adds embedding support. Tracked as followup.

Related Skills

invoking-gemini — sibling; handles Gemini text + image generation through the same CF gateway. Shares credential pattern.
searching-codebases — regex/AST search. Use first when the query is a known pattern.
extracting-keywords — YAKE keyword extraction; orthogonal, but pairs well for building query terms from a long prompt.
exploring-codebases — for understanding repo structure. Semantic-grep doesn't replace AST-based navigation.

Attribution

Related Skills

oaustegard/writing-instructions

development

VerifiedTrustedCommunity

Write effective instructions for Claude: project instructions, standalone prompts, and skill content. Use when users need help writing prompts, setting up project instructions, choosing between instruction formats, or improving how they communicate with Claude. Covers writing principles, model-aware calibration, and format selection. For building and testing complete skills, use skill-creator instead.

134SKILL.mdUpdated Jul 26, 2026

oaustegard/writing-instructions

oaustegard/finding-skills

data-ai

VerifiedTrustedCommunity

Discover and load skills on demand from /mnt/skills/user/. Use when you need a capability but don't know which skill provides it, when the boot-emitted skill list is names-only and you need a full description, or when you want to list the catalog. Verbs are list (names only), search (rank by name/description match against a query), and show (emit the full SKILL.md for a named skill).

134SKILL.mdUpdated Jul 26, 2026

oaustegard/finding-skills

oaustegard/transcribing-images

documentation

VerifiedTrustedCommunity

Reads the visual content of slides, pages, and images the way a human would, not just their embedded text. Use when a PPTX or PDF has image slides, screenshots, charts, scanned figures, or flattened-to-image layouts that the built-in pptx/pdf skills read as empty; when asked to transcribe, describe, OCR, or extract what is shown in an image, slide deck, or document page; or when embedded-text extraction returned little or nothing from a visually rich file. Triggers on 'read this deck', 'what's on these slides', 'transcribe', 'OCR', 'extract text from image', 'describe this chart/diagram', .pptx/.pdf/.png/.jpg with visual content.

134SKILL.mdUpdated Jul 26, 2026

oaustegard/transcribing-images

oaustegard/svg-portrait-mode

development

VerifiedTrustedCommunity

Portrait Mode for SVGs — foveated vectorization with 4-zone selective detail. Combines vision annotations, MediaPipe segmentation/landmarks, and optional saliency. Like phone portrait mode, but vectorized. Use when vectorizing a portrait or photo where subject detail should outrank background detail.

134SKILL.mdUpdated Jul 26, 2026

oaustegard/svg-portrait-mode

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/oaustegard/claude-skills.git

# Copy into Claude Code skills folder (global)
cp -r claude-skills/semantic-grep ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

oaustegard/claude-skills

132 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

oaustegard/semantic-grep

$ install --global

Security Scan Results

SKILL.md

Semantic Grep

When Semantic Search Helps

Setup

Quick Start

Core API

semantic_grep(query, corpus, *, top_k=10, threshold=None, ...)

load_corpus(path, *, include="*.txt", granularity="paragraph") -> list[Chunk]

embed_batch(texts, task_type, *, model, dim, group_size=100) -> np.ndarray

format_grep(matches, *, max_text_chars=200, show_score=True) -> str

Pipe-mode Rerank Pattern

Task Types (Gemini)

Model Notes

Limitations (v0.1.1)

Related Skills

Attribution

Related Skills

oaustegard/writing-instructions

oaustegard/finding-skills

oaustegard/transcribing-images

oaustegard/svg-portrait-mode

oaustegard/semantic-grep

$ install --global

Security Scan Results

SKILL.md

Semantic Grep

When Semantic Search Helps

Setup

Quick Start

Core API

semantic_grep(query, corpus, *, top_k=10, threshold=None, ...)

load_corpus(path, *, include="*.txt", granularity="paragraph") -> list[Chunk]

embed_batch(texts, task_type, *, model, dim, group_size=100) -> np.ndarray

format_grep(matches, *, max_text_chars=200, show_score=True) -> str

Pipe-mode Rerank Pattern

Task Types (Gemini)

Model Notes

Limitations (v0.1.1)

Related Skills

Attribution

Related Skills

oaustegard/writing-instructions

oaustegard/finding-skills

oaustegard/transcribing-images

oaustegard/svg-portrait-mode

`semantic_grep(query, corpus, *, top_k=10, threshold=None, ...)`

`load_corpus(path, , include=".txt", granularity="paragraph") -> list[Chunk]`

`embed_batch(texts, task_type, *, model, dim, group_size=100) -> np.ndarray`

`format_grep(matches, *, max_text_chars=200, show_score=True) -> str`

`semantic_grep(query, corpus, *, top_k=10, threshold=None, ...)`

`load_corpus(path, , include=".txt", granularity="paragraph") -> list[Chunk]`

`embed_batch(texts, task_type, *, model, dim, group_size=100) -> np.ndarray`

`format_grep(matches, *, max_text_chars=200, show_score=True) -> str`