synalinks-knowledge/SKILL.md
Use when working with Synalinks KnowledgeBase (DuckDB-backed), EmbedKnowledge, UpdateKnowledge, RetrieveKnowledge, StampKnowledge, RAG pipelines, hybrid / fulltext / similarity search, default-EmbeddingModel configuration, or document extraction-and-storage flows.
npx skillsauth add synalinks/synalinks-skills synalinks-knowledgeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build retrieval-augmented programs over a unified DuckDB knowledge base. Supports BM25 full-text, vector similarity, and hybrid (RRF) search.
import synalinks
import asyncio
class Document(synalinks.DataModel):
id: str = synalinks.Field(description="Document ID")
title: str = synalinks.Field(description="Document title")
content: str = synalinks.Field(description="Document content")
class Query(synalinks.DataModel):
query: str = synalinks.Field(description="User query")
class Answer(synalinks.DataModel):
answer: str = synalinks.Field(description="Answer based on retrieved context")
async def main():
lm = synalinks.LanguageModel(model="openai/gpt-4o-mini")
em = synalinks.EmbeddingModel(model="openai/text-embedding-3-small")
knowledge_base = synalinks.KnowledgeBase(
uri="duckdb://./docs.db",
data_models=[Document],
embedding_model=em,
metric="cosine",
)
inputs = synalinks.Input(data_model=Query)
context = await synalinks.RetrieveKnowledge(
knowledge_base=knowledge_base,
language_model=lm,
search_type="hybrid",
k=5,
return_inputs=True,
)(inputs)
answer = await synalinks.Generator(
data_model=Answer,
language_model=lm,
instructions="Answer based on the retrieved context only. If irrelevant, say you don't know.",
)(context)
rag = synalinks.Program(inputs=inputs, outputs=answer, name="rag_qa")
result = await rag(Query(query="What is Python?"))
print(result.prettify_json())
asyncio.run(main())
knowledge_base = synalinks.KnowledgeBase(
uri="duckdb://./my_database.db", # or duckdb://:memory:
data_models=[Document, Invoice], # one table per DataModel
embedding_model=em, # optional, required for similarity/hybrid
metric="cosine", # "cosine" | "l2seq" | "ip"
wipe_on_start=False, # clear DB on init
)
The first field of each DataModel is the primary key.
Generate embeddings for a DataModel so it can be searched by similarity.
embedded = await synalinks.EmbedKnowledge(
embedding_model=em,
in_mask=["content"], # keep only fields to embed
# out_mask=["id"], # OR exclude fields
)(inputs)
After masking, exactly one field should remain — the field that gets embedded.
Upsert a record into the knowledge base. Uses the DataModel's first field as the primary key.
stored = await synalinks.UpdateKnowledge(
knowledge_base=knowledge_base,
)(extracted_data)
LM-driven retrieval — the LM generates a search query, then the knowledge base is queried.
results = await synalinks.RetrieveKnowledge(
knowledge_base=knowledge_base,
language_model=lm,
search_type="hybrid", # "similarity" | "fulltext" | "hybrid"
k=10,
return_inputs=True, # forward inputs alongside retrieved context
return_query=True, # include the LM-generated query in output
)(inputs)
| Type | Backend | When to use |
|------|---------|-------------|
| fulltext | DuckDB BM25 | Exact terms, codes, IDs, named entities |
| similarity | Vector (cosine/l2seq/ip) | Semantic matches, paraphrases |
| hybrid | Reciprocal Rank Fusion of both | Default — best general-purpose |
# Full-text (BM25)
results = await knowledge_base.fulltext_search("query", k=10)
# Vector similarity
results = await knowledge_base.similarity_search("query", k=10)
# Hybrid (RRF)
results = await knowledge_base.hybrid_search("query", k=10, k_rank=60)
# Lookup by primary key
record = await knowledge_base.get("id_value")
# Paginated scan
records = await knowledge_base.getall(
Document.to_symbolic_data_model(),
limit=50,
offset=0,
)
# Raw SQL (params is a list bound to ? placeholders, in order)
results = await knowledge_base.query(
"SELECT * FROM Invoice WHERE total > ?",
params=[100.0],
)
Pipe a Generator into UpdateKnowledge to extract structured data and persist it:
class DocumentText(synalinks.DataModel):
text: str = synalinks.Field(description="Raw document text")
inputs = synalinks.Input(data_model=DocumentText)
extracted = await synalinks.Generator(
data_model=Invoice,
language_model=lm,
instructions="Extract invoice information from the document.",
)(inputs)
stored = await synalinks.UpdateKnowledge(knowledge_base=knowledge_base)(extracted)
ingest = synalinks.Program(inputs=inputs, outputs=stored, name="invoice_ingest")
When embedding_model=None is passed to EmbedKnowledge / RetrieveKnowledge / KnowledgeBase (or ops.embedding), the framework resolves the default at call time:
synalinks.set_default_embedding_model("openai/text-embedding-3-small")
# String identifiers persist into the on-disk config; instances do not.
# Later, anywhere in the program:
em = synalinks.default_embedding_model() # returns the configured instance, or None
EmbeddingModel accepts **default_kwargs forwarded to every call, and fallback= accepts a string, dict, or EmbeddingModel instance:
em = synalinks.EmbeddingModel(
model="openai/text-embedding-3-small",
dimensions=512, # forwarded to litellm.aembedding
fallback="ollama/mxbai-embed-large", # str / dict / EmbeddingModel
)
The knowledge modules (EmbedKnowledge, UpdateKnowledge, RetrieveKnowledge, StampKnowledge) all use keyword-only constructors (def __init__(self, *, ...)). Always pass arguments by name:
# Correct
synalinks.UpdateKnowledge(knowledge_base=kb)
# Wrong — TypeError
synalinks.UpdateKnowledge(kb)
program.predict(...) — see synalinks-trainingtools = [synalinks.Tool(knowledge_base.fulltext_search), ...] to a FunctionCallingAgent for tool-driven retrieval (see synalinks-agents)development
Use when training Synalinks programs — program.compile() / fit() / evaluate() / predict(), validation_split, validation_data, batch_size, epochs, callbacks (ProgramCheckpoint, custom Callback subclasses), History, in-context reinforcement learning workflow. For reward functions see synalinks-rewards; for optimizer internals see synalinks-optimizers.
development
Use when configuring or writing Synalinks reward functions and metrics — ExactMatch, CosineSimilarity, LMAsJudge, ProgramAsJudge, RewardFunctionWrapper, custom reward functions (async, register_synalinks_serializable), in_mask / out_mask filtering, F1Score / FBetaScore / BinaryF1Score / ListF1Score metrics, MeanMetricWrapper, or whenever you're shaping the signal that drives optimization.
tools
Use when integrating Synalinks with LM providers — picking the right model prefix (openai/, anthropic/, ollama/, groq/, cohere/, openrouter/, bedrock/, deepseek/, together_ai/, doubleword/, hosted_vllm/ (alias vllm/), gemini/, xai/, mistral/, azure/), env vars per provider, structured-output dispatch (constrained json_schema vs tool-call), local OpenAI-compatible servers (LMStudio, vLLM) requiring litellm.register_model and a dummy OPENAI_API_KEY, and OpenRouter embeddings (LiteLLM doesn't support them — use OpenRouterEmbeddingModel).
development
Use when building or composing a Synalinks Program — the four building APIs (Functional, Sequential, Subclassing, Mixed), Input nodes, multi-input/multi-output graphs, the call/build lifecycle, training=True/False semantics, summary, get_module, plot_program, save/load, get_state_tree/set_state_tree, get_config/from_config and custom serialization. For DataModel/Field, JSON operators (+ & | ^ ~), and LanguageModel/EmbeddingModel basics see synalinks-core. For inner modules see synalinks-modules; for compile/fit/evaluate/predict see synalinks-training.