skills/rag-pipeline-dotnet/SKILL.md
Implements RAG (Retrieval-Augmented Generation) pipelines using Microsoft Semantic Kernel for .NET applications with federal compliance and air-gapped deployment support. Use when building RAG .NET, Semantic Kernel RAG, vector search .NET, document QA .NET, knowledge base .NET, AI search .NET, embedding pipeline, or retrieval-augmented generation in C#. Do NOT use when the application stack is Python — use rag-pipeline-python instead; do NOT use outside federal or .NET-primary environments.
npx skillsauth add michaelalber/ai-toolkit rag-pipeline-dotnetInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
"RAG combines the power of large language models with your organization's specific knowledge, providing accurate, contextual responses grounded in your data."
"The quality of your RAG system is bounded by the quality of your retrieval, not the quality of your generation model." -- Jerry Liu, creator of LlamaIndex
This skill guides implementation of RAG pipelines using Microsoft Semantic Kernel for .NET applications, with federal compliance considerations and air-gapped deployment patterns.
RAG is the primary pattern for grounding LLM responses in organizational knowledge. Rather than fine-tuning models on proprietary data, RAG retrieves relevant documents at query time and injects them as context for generation. Microsoft Semantic Kernel serves as the .NET orchestration layer, providing abstractions over embedding models, vector stores, and chat completion services.
.NET is the right choice for RAG when:
Non-Negotiable Constraints:
| # | Principle | Description | Priority |
|---|-----------|-------------|----------|
| 1 | Retrieval Quality Over Generation Quality | The ceiling of RAG output is set by retrieval, not generation. Measure precision@k and relevance scores before tuning prompts or switching LLMs. Irrelevant context degrades output more than no context at all. | Critical |
| 2 | Chunk Size Optimization | Chunks must be self-contained units of meaning sized to fit the embedding model's context window. Too large and they exceed token limits (silent truncation). Too small and they lose semantic coherence. Match chunk size to corpus type and model capacity. | Critical |
| 3 | Embedding Model Selection | The embedding model determines the retrieval ceiling. Evaluate on domain-specific queries, not general benchmarks. See references/embedding-models.md for options and performance characteristics. | Critical |
| 4 | Vector Store Selection | Choose based on deployment environment, compliance requirements, and operational capacity. Azure AI Search for FedRAMP cloud; Qdrant or pgvector for air-gapped. See references/vector-store-options.md. | High |
| 5 | Semantic + Keyword Hybrid Search | Pure vector similarity misses exact-match queries (error codes, part numbers). Combine semantic search with keyword (BM25) search when the corpus contains identifiers, acronyms, or technical terms. Azure AI Search supports this natively. | High |
| 6 | Prompt Engineering for Grounded Responses | The system prompt must constrain the LLM to answer only from provided context. Include explicit instructions to cite sources and to say "I don't know" when context is insufficient. | Critical |
| 7 | Citation and Provenance | Every generated answer must trace back to specific source chunks. Include document ID, section, and relevance score in the response. This is a NIST AI RMF transparency requirement for federal systems. | High |
| 8 | Hallucination Detection | Monitor for answers that contain claims not present in retrieved context. Implement post-generation verification: compare answer claims against source chunk content. Flag and log divergences. | High |
| 9 | Federal Data Handling | Validate data classification before ingestion. CUI requires access controls and audit trails. Classified data is never eligible for RAG. See references/federal-ai-compliance.md. | Critical |
| 10 | Air-Gapped Deployment | Support disconnected environments with Ollama (local LLM + embeddings) and on-premise vector stores (Qdrant, pgvector). No external API calls. FIPS compliance depends on host OS configuration. | High |
Use search_knowledge (grounded-code-mcp) to ground decisions in authoritative references.
| Query | When to Call |
|-------|--------------|
| search_knowledge("Semantic Kernel vector store memory connector") | During CONFIGURE phase — verify Semantic Kernel API patterns for memory and connectors |
| search_knowledge("RAG retrieval augmented generation chunking embedding") | During INGEST/CHUNK phases — ground chunk size and overlap decisions |
| search_knowledge("vector similarity search embedding model selection") | During INDEX phase — verify embedding model selection criteria |
| search_knowledge("NIST AI RMF federal compliance audit logging") | During federal deployment configuration — verify NIST AI RMF transparency requirements |
| search_knowledge("FedRAMP Azure Government CUI data classification") | During federal compliance review — verify CUI handling and authorized service requirements |
| search_knowledge("retrieval precision recall evaluation RAG metrics") | During EVALUATE phase — confirm retrieval evaluation metrics and thresholds |
| search_knowledge("ASP.NET Core dependency injection CancellationToken") | During CONFIGURE phase — verify .NET DI and async patterns for the pipeline |
Protocol: Search before configuring the Semantic Kernel pipeline, before selecting vector stores or embedding models, and before implementing federal compliance features. Cite the source in the Pipeline Configuration Summary and Deployment Checklist.
+-----------------------------------------------------------------------+
| .NET RAG PIPELINE WORKFLOW |
| |
| +-----------+ +--------+ +-------+ +----------+ +----------+|
| |1.CONFIGURE|-->|2.INGEST|-->|3.INDEX|-->|4.RETRIEVE|-->|5.EVALUATE||
| +-----------+ +--------+ +-------+ +----------+ +----------+|
| | | | |
| | v | |
| | +----------+ | |
| | | GENERATE | | |
| | +----------+ | |
| | | |
| +--------------------------------------------------------+ |
| (iterate if needed) |
+-----------------------------------------------------------------------+
Set up Semantic Kernel with the chosen LLM provider, embedding model, and vector store. The flow is: User Query -> Query Embedding -> Vector Search -> Relevant Chunks -> LLM Generation -> Response.
Project Setup:
<Project Sdk="Microsoft.NET.Sdk.Web">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
</PropertyGroup>
<ItemGroup>
<!-- Semantic Kernel Core -->
<PackageReference Include="Microsoft.SemanticKernel" Version="1.25.0" />
<!-- Memory/Vector Store Connectors -->
<PackageReference Include="Microsoft.SemanticKernel.Connectors.AzureAISearch" Version="1.25.0-alpha" />
<PackageReference Include="Microsoft.SemanticKernel.Connectors.Qdrant" Version="1.25.0-alpha" />
<PackageReference Include="Microsoft.SemanticKernel.Connectors.Postgres" Version="1.25.0-alpha" />
<!-- Document Processing -->
<PackageReference Include="Microsoft.SemanticKernel.Plugins.Document" Version="1.25.0-alpha" />
<!-- AI Providers -->
<PackageReference Include="Microsoft.SemanticKernel.Connectors.AzureOpenAI" Version="1.25.0" />
<PackageReference Include="Microsoft.SemanticKernel.Connectors.OpenAI" Version="1.25.0" />
</ItemGroup>
</Project>
Program.cs Setup (Azure OpenAI):
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddSingleton(sp =>
{
var kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddAzureOpenAIChatCompletion(
deploymentName: builder.Configuration["AzureOpenAI:ChatDeployment"]!,
endpoint: builder.Configuration["AzureOpenAI:Endpoint"]!,
apiKey: builder.Configuration["AzureOpenAI:ApiKey"]!);
kernelBuilder.AddAzureOpenAITextEmbeddingGeneration(
deploymentName: builder.Configuration["AzureOpenAI:EmbeddingDeployment"]!,
endpoint: builder.Configuration["AzureOpenAI:Endpoint"]!,
apiKey: builder.Configuration["AzureOpenAI:ApiKey"]!);
return kernelBuilder.Build();
});
// Vector Store -- choose one (see references/vector-store-options.md)
builder.Services.AddSingleton<ISemanticTextMemory>(sp =>
{
var embeddingGenerator = sp.GetRequiredService<Kernel>()
.GetRequiredService<ITextEmbeddingGenerationService>();
return new MemoryBuilder()
.WithAzureAISearchMemoryStore(
builder.Configuration["AzureSearch:Endpoint"]!,
builder.Configuration["AzureSearch:ApiKey"]!)
.WithTextEmbeddingGeneration(embeddingGenerator).Build();
});
builder.Services.AddScoped<IRagService, RagService>();
Air-Gapped Configuration (Ollama):
kernelBuilder.AddOllamaChatCompletion(
modelId: "llama3", endpoint: new Uri("http://localhost:11434"));
kernelBuilder.AddOllamaTextEmbeddingGeneration(
modelId: "nomic-embed-text", endpoint: new Uri("http://localhost:11434"));
appsettings.json:
{
"AzureOpenAI": {
"Endpoint": "https://your-resource.openai.azure.com/",
"ApiKey": "your-api-key",
"ChatDeployment": "gpt-4",
"EmbeddingDeployment": "text-embedding-ada-002"
},
"AzureSearch": {
"Endpoint": "https://your-search.search.windows.net",
"ApiKey": "your-search-key",
"IndexName": "documents"
},
"Rag": { "ChunkSize": 500, "ChunkOverlap": 100, "TopK": 5, "MinRelevanceScore": 0.7 }
}
Load documents, validate content extraction, and split into semantically coherent chunks. See the full RagService implementation below for the IngestDocumentAsync, ChunkText, and SearchAsync methods.
Embeddings are generated automatically by Semantic Kernel's ISemanticTextMemory during SaveInformationAsync. The vector store connector handles index creation and upsert. See references/vector-store-options.md and references/embedding-models.md.
Process user queries through embedding, similarity search (with MinRelevanceScore threshold), and context assembly. The SearchAsync method returns ranked results by relevance.
Before deploying, test retrieval quality with representative queries. Verify that retrieved chunks are relevant and that generated answers are grounded in context. See the Output Templates section for evaluation report format.
// Services/RagService.cs
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.ChatCompletion;
namespace MyApp.Services;
public interface IRagService
{
Task<string> AskAsync(string question, string collection, CancellationToken ct = default);
Task IngestDocumentAsync(string documentPath, string collection, CancellationToken ct = default);
Task<IEnumerable<MemoryQueryResult>> SearchAsync(string query, string collection,
int limit = 5, CancellationToken ct = default);
}
public class RagService : IRagService
{
private readonly Kernel _kernel;
private readonly ISemanticTextMemory _memory;
private readonly ILogger<RagService> _logger;
private readonly RagSettings _settings;
public RagService(Kernel kernel, ISemanticTextMemory memory,
IOptions<RagSettings> settings, ILogger<RagService> logger)
{
_kernel = kernel; _memory = memory;
_settings = settings.Value; _logger = logger;
}
public async Task<string> AskAsync(string question, string collection, CancellationToken ct = default)
{
_logger.LogInformation("RAG query: {Question}", question);
var searchResults = await SearchAsync(question, collection, _settings.TopK, ct);
var relevantChunks = searchResults.ToList();
if (!relevantChunks.Any())
return "I don't have enough information to answer that question.";
var context = BuildContext(relevantChunks);
var response = await GenerateResponseAsync(question, context, ct);
_logger.LogInformation("RAG response generated with {ChunkCount} context chunks", relevantChunks.Count);
return response;
}
public async Task IngestDocumentAsync(string documentPath, string collection, CancellationToken ct = default)
{
_logger.LogInformation("Ingesting document: {Path}", documentPath);
var content = await File.ReadAllTextAsync(documentPath, ct);
var chunks = ChunkText(content, _settings.ChunkSize, _settings.ChunkOverlap);
var documentId = Path.GetFileNameWithoutExtension(documentPath);
var tasks = chunks.Select((chunk, index) =>
_memory.SaveInformationAsync(collection: collection, text: chunk,
id: $"{documentId}_{index}",
description: $"Chunk {index} from {documentPath}",
cancellationToken: ct));
await Task.WhenAll(tasks);
_logger.LogInformation("Ingested {ChunkCount} chunks from {Path}", chunks.Count, documentPath);
}
public async Task<IEnumerable<MemoryQueryResult>> SearchAsync(
string query, string collection, int limit = 5, CancellationToken ct = default)
{
var results = new List<MemoryQueryResult>();
await foreach (var result in _memory.SearchAsync(
collection: collection, query: query, limit: limit,
minRelevanceScore: _settings.MinRelevanceScore, cancellationToken: ct))
results.Add(result);
return results;
}
private static List<string> ChunkText(string text, int chunkSize, int overlap)
{
var chunks = new List<string>();
var sentences = text.Split(new[] { ". ", ".\n", ".\r\n" }, StringSplitOptions.RemoveEmptyEntries);
var currentChunk = new StringBuilder();
var overlapBuffer = new Queue<string>();
foreach (var sentence in sentences)
{
if (currentChunk.Length + sentence.Length > chunkSize && currentChunk.Length > 0)
{
chunks.Add(currentChunk.ToString().Trim());
currentChunk.Clear();
while (overlapBuffer.Count > 0 && currentChunk.Length < overlap)
currentChunk.Append(overlapBuffer.Dequeue()).Append(". ");
}
currentChunk.Append(sentence).Append(". ");
overlapBuffer.Enqueue(sentence);
while (overlapBuffer.Count > 3) overlapBuffer.Dequeue();
}
if (currentChunk.Length > 0) chunks.Add(currentChunk.ToString().Trim());
return chunks;
}
private static string BuildContext(IEnumerable<MemoryQueryResult> results)
{
var sb = new StringBuilder();
sb.AppendLine("Relevant information:\n");
foreach (var result in results)
{
sb.AppendLine($"[Source: {result.Metadata.Description}]");
sb.AppendLine(result.Metadata.Text); sb.AppendLine();
}
return sb.ToString();
}
private async Task<string> GenerateResponseAsync(string question, string context, CancellationToken ct)
{
var chatService = _kernel.GetRequiredService<IChatCompletionService>();
var chatHistory = new ChatHistory();
chatHistory.AddSystemMessage("""
You are a helpful assistant that answers questions based on the provided context.
Only use information from the context to answer. If the context doesn't contain
relevant information, say so. Cite your sources when possible.
""");
chatHistory.AddUserMessage($"""
Context:
{context}
Question: {question}
Please answer the question based on the context above.
""");
var response = await chatService.GetChatMessageContentAsync(chatHistory, cancellationToken: ct);
return response.Content ?? "Unable to generate response.";
}
}
public class RagSettings
{
public int ChunkSize { get; set; } = 500;
public int ChunkOverlap { get; set; } = 100;
public int TopK { get; set; } = 5;
public double MinRelevanceScore { get; set; } = 0.7;
}
// Features/Rag/RagEndpoints.cs
public static class RagEndpoints
{
public static IEndpointRouteBuilder MapRagEndpoints(this IEndpointRouteBuilder app)
{
var group = app.MapGroup("/api/rag")
.WithTags("RAG")
.RequireAuthorization();
group.MapPost("/ask", AskQuestion)
.WithSummary("Ask a question using RAG")
.Accepts<AskRequest>("application/json")
.Produces<AskResponse>(200);
group.MapPost("/ingest", IngestDocument)
.WithSummary("Ingest a document into the knowledge base")
.Accepts<IngestRequest>("application/json")
.Produces(204);
group.MapPost("/search", SearchDocuments)
.WithSummary("Search for relevant documents")
.Accepts<SearchRequest>("application/json")
.Produces<SearchResponse>(200);
return app;
}
private static async Task<IResult> AskQuestion(
AskRequest request, IRagService ragService, CancellationToken ct)
{
var answer = await ragService.AskAsync(request.Question, request.Collection, ct);
return Results.Ok(new AskResponse(answer));
}
private static async Task<IResult> IngestDocument(
IngestRequest request, IRagService ragService, CancellationToken ct)
{
await ragService.IngestDocumentAsync(request.DocumentPath, request.Collection, ct);
return Results.NoContent();
}
private static async Task<IResult> SearchDocuments(
SearchRequest request, IRagService ragService, CancellationToken ct)
{
var results = await ragService.SearchAsync(request.Query, request.Collection, request.Limit, ct);
return Results.Ok(new SearchResponse(results.Select(r => new SearchResult(
r.Metadata.Id, r.Metadata.Text, r.Relevance))));
}
}
public record AskRequest(string Question, string Collection);
public record AskResponse(string Answer);
public record IngestRequest(string DocumentPath, string Collection);
public record SearchRequest(string Query, string Collection, int Limit = 5);
public record SearchResponse(IEnumerable<SearchResult> Results);
public record SearchResult(string Id, string Text, double Relevance);
For federal deployments, wrap the base RagService with classification validation and audit logging. See references/federal-ai-compliance.md for full implementation details including:
.azure.us) with authorized servicespublic class FederalRagService : RagService
{
public override async Task IngestDocumentAsync(string documentPath, string collection, CancellationToken ct)
{
var classification = await ValidateClassificationAsync(documentPath);
if (classification == DataClassification.Classified)
throw new InvalidOperationException("Classified documents cannot be processed by RAG system");
if (classification == DataClassification.CUI)
{
collection = $"cui_{collection}";
_logger.LogWarning("Processing CUI document: {Path}", documentPath);
}
await base.IngestDocumentAsync(documentPath, collection, ct);
await _auditService.LogDocumentIngestionAsync(documentPath, classification);
}
}
Maintain state across conversation turns:
<rag-dotnet-state>
mode: [CONFIGURE | INGEST | INDEX | RETRIEVE | EVALUATE]
vector_store: [azure-ai-search | qdrant | chromadb | pgvector | none]
embedding_model: [text-embedding-3-small | nomic-embed-text | mxbai-embed-large | none]
generation_model: [gpt-4 | llama3 | none]
documents_ingested: [count or none]
index_built: [true | false]
retrieval_tested: [true | false]
federal_compliant: [true | false | n/a]
last_action: [what was just done]
next_action: [what should happen next]
</rag-dotnet-state>
Example:
<rag-dotnet-state>
mode: RETRIEVE
vector_store: azure-ai-search
embedding_model: text-embedding-3-small
generation_model: gpt-4
documents_ingested: 150
index_built: true
retrieval_tested: false
federal_compliant: true
last_action: Completed document ingestion for policies collection
next_action: Run retrieval evaluation with 10 representative queries
</rag-dotnet-state>
## RAG Implementation: [Project Name]
**Vector Store**: [Azure AI Search | Qdrant | pgvector] | **LLM**: [Azure OpenAI | Ollama]
**Embedding**: [text-embedding-3-small | nomic-embed-text] | **Chunk**: [500/100]
| Collection | Documents | Purpose | | Endpoint | Description |
|------------|-----------|---------|---|----------|-------------|
| policies | 50 | Policies | | POST /api/rag/ask | Question answering |
| procedures | 120 | SOPs | | POST /api/rag/ingest | Document ingestion |
| | | | | POST /api/rag/search | Similarity search |
## Ingestion Report
**Date**: [date] | **Collection**: [name] | **Embedding Model**: [model]
**Documents Processed**: [count] | **Chunks Generated**: [count] | **Avg Chunk Size**: [chars]
| Document | Chunks | Status |
|----------|--------|--------|
| [filename] | [count] | Success/Failed |
## Retrieval Evaluation Report
**Date**: [date] | **Embedding Model**: [model] | **Vector Store**: [store]
| Metric | Value | Target | Status |
|--------|-------|--------|--------|
| Precision@5 | [X.XX] | >= 0.70 | PASS/FAIL |
| Avg Relevance | [X.XX] | >= 0.75 | PASS/FAIL |
| Queries with 0 results | [count] | 0 | PASS/FAIL |
## RAG Deployment Checklist
- [ ] Vector store connection verified
- [ ] Embedding and generation models accessible and tested
- [ ] All documents ingested successfully
- [ ] Retrieval evaluation passed (precision@k >= 0.70)
- [ ] Out-of-scope queries return "I don't know" (not hallucination)
- [ ] Authentication, authorization, and audit logging configured
- [ ] Federal compliance verified (if applicable)
- [ ] Air-gapped fallback tested (if applicable)
- Security: [JWT | Azure AD] | Classification: [Handled | N/A] | Audit: [Enabled | Disabled]
Before tuning prompts or switching generation models, verify retrieval quality first. Poor retrieval means poor answers regardless of LLM quality.
STOP! Before tuning generation:
1. Run SearchAsync with 10+ representative queries
2. Inspect relevance scores -- are they above MinRelevanceScore?
3. Spot-check 3-5 retrieved chunks manually -- are they actually relevant?
4. If retrieval is poor, fix chunking or embedding model FIRST
5. Only after retrieval is solid should you tune generation prompts
One chunk size does not fit all document types. PDFs, Markdown, and code require different strategies. Always verify chunk coherence before embedding.
MANDATORY before indexing:
1. Inspect 5-10 chunks from different document types
2. Verify chunks are self-contained (not mid-sentence splits)
3. Verify chunks fit within embedding model context window
4. Adjust ChunkSize and ChunkOverlap based on inspection
The system prompt must instruct the LLM to cite source chunks. Uncited responses cannot be verified and are a compliance failure in federal contexts.
MANDATORY in system prompt:
- "Cite the source document for each claim"
- "If context is insufficient, say so explicitly"
- "Do not generate information not present in the context"
A failed connection during batch ingestion can leave the index in a partial state. Always write and remove a health-check record (_healthcheck collection) before starting batch ingestion. Abort on any connection failure.
For federal systems: use Azure Government endpoints (.azure.us), verify FedRAMP authorization of all services, enable FIPS mode on host OS for air-gapped Ollama deployments, and document FIPS compliance status in the deployment checklist.
| Anti-Pattern | Why It Fails | Correct Approach | |-------------|-------------|------------------| | Using the generation model for embedding | Chat/completion models produce different vector spaces than embedding models; retrieval quality collapses | Use a dedicated embedding model (text-embedding-3-small, nomic-embed-text) | | Single chunk size for all document types | PDFs, Markdown, code, and prose have different structural boundaries; one size splits content at wrong points | Use document-type-specific chunking strategies; inspect chunks before embedding | | No retrieval evaluation | Generation quality is bounded by retrieval; tuning prompts on bad retrieval is wasted effort | Evaluate precision@k and relevance scores with representative queries before touching generation | | Ignoring context window limits | Stuffing too many chunks into the LLM prompt dilutes relevant information and may exceed token limits | Calculate token budget: prompt + context + expected response must fit within model context window | | Storing PII/CUI without classification | Federal compliance violation; data spillage risk if unclassified and classified data share a collection | Validate data classification before ingestion; separate CUI into dedicated collections with access controls | | Treating RAG as magic search | RAG is not a search engine -- it is a knowledge-grounded generation pattern; expecting keyword-search behavior leads to frustration | Set user expectations; implement hybrid search for keyword needs; document what RAG can and cannot do | | Hardcoding embedding model without benchmarking | Different models have different strengths; what works for English prose may fail for technical documents | Benchmark 2-3 embedding models on domain-specific queries before committing to one | | No citation or source attribution | Users cannot verify answers; compliance failure in federal contexts; trust erosion | Include source document ID, chunk description, and relevance score in every response | | Batch ingestion without connection validation | Partial index state if connection fails mid-batch; silent data loss | Test vector store connection before starting ingestion; implement retry with idempotent IDs | | Using MinRelevanceScore of 0.0 | Returns every chunk regardless of relevance, flooding the LLM context with noise | Set MinRelevanceScore to 0.7+ and tune based on evaluation results |
Problem: precision@k < 0.50 -- retrieved chunks are not relevant
Actions:
1. Inspect retrieved chunks manually for 3-5 queries
2. Check if chunks are too large (exceeding embedding context) or too small (losing coherence)
3. Try a different embedding model (ada-002 -> text-embedding-3-small, or MiniLM -> nomic)
4. For exact-match queries (error codes, IDs), add hybrid search (BM25 + vector)
5. Increase TopK + add re-ranking; re-evaluate after each single change
Problem: Vector store rejects embeddings or returns garbage after model change
Actions:
1. Verify dimension match (text-embedding-3-small:1536, nomic:768, mxbai:1024)
2. If you changed models, you MUST rebuild the entire index -- drop, recreate, re-embed
3. Never mix embeddings from different models in the same collection
Problem: Vector store unreachable or returning connection errors
Actions:
1. Verify network connectivity and authentication credentials
2. Azure AI Search: verify service running and index exists
3. Qdrant: check Docker (docker ps, docker logs qdrant)
4. pgvector: verify PostgreSQL running and extension installed
5. Implement circuit breaker; return graceful "service unavailable"
Problem: Ollama models not responding or producing low-quality embeddings
Actions:
1. Verify Ollama running: curl http://localhost:11434/api/tags
2. Confirm model pulled (ollama list) and VRAM available (~1-2GB for embeddings)
3. Try larger model (nomic-embed-text -> mxbai-embed-large) for quality
4. Increase num_ctx for generation quality; verify local vector store accessible
5. Test full pipeline end-to-end before deploying to disconnected network
rag-pipeline-python)The Python counterpart to this skill, using LangChain and Ollama. Use when the team's primary language is Python or when leveraging the broader LangChain ecosystem. The core RAG principles (retrieval quality over generation quality, chunking optimization, evaluation before deployment) are identical across both skills.
ollama-model-workflow)Use to select, pull, and benchmark local models for air-gapped RAG deployments. Key integration points:
nomic-embed-text vs mxbai-embed-large on domain corpusnum_ctx to expected retrieval context size plus prompt overheaddotnet-security-review-federal)Use for security and compliance review of RAG pipeline code. Key areas:
mcp-server-scaffold)RAG pipelines are natural backends for MCP server tools. Use the mcp-server-scaffold skill to expose search and question-answering as MCP tools that other AI agents can invoke.
references/federal-ai-compliance.md - Federal compliance requirements (NIST AI RMF, FedRAMP, CUI, audit logging)references/vector-store-options.md - Vector store comparison (Azure AI Search, Qdrant, ChromaDB, pgvector)references/embedding-models.md - Embedding model options and performance characteristicsdevelopment
Federal / government security overlay applied ON TOP OF a base language security review (dotnet/python/php/rust/react). Language-agnostic: adds NIST SP 800-53 control mapping, FIPS 140-2/3 cryptographic compliance (with a per-language crypto table), CUI handling, EO 14028 supply-chain requirements, and DOE Order 205.1B, and emits POA&M-ready findings with FIPS 199 impact levels. Use for federal/DOE/DOD/national-laboratory systems. Triggers on "federal security review", "NIST compliance", "NIST 800-53", "FISMA", "CUI", "FIPS audit", "DOE security", "POA&M", "ATO review". Do NOT use alone — run the matching <lang>-security-review FIRST; this overlay maps and extends it.
tools
OWASP-based security review of React / TypeScript front-end applications. Detects the framework (Vite/CRA/Next), entry points, and data flows, scans against the OWASP Top 10 (2025) mapped to React client-side patterns (XSS via raw HTML, URL/protocol injection, secrets in the bundle, insecure token storage, dependency CVEs, missing CSP, open redirects), and produces a manager-friendly executive summary plus a graded technical findings table. Use to audit React code for vulnerabilities. Triggers on "react security review", "frontend security audit", "audit react for vulnerabilities", "owasp react", "react xss", "react security posture", "npm audit review". For federal / gov / DOE / NIST / FIPS / CUI context, run security-review-federal after this base review. Do NOT use to grade architecture/structure — use react-architecture-checklist.
tools
Analyzes legacy React codebases and produces actionable modernization plans. Primary migration paths include class components to function components + hooks, Create React App to Vite, React 16/17 to 18 to 19, JavaScript to TypeScript, Enzyme to React Testing Library, legacy Redux to Redux Toolkit / Zustand / Context, and deprecated lifecycle/API removal. Does NOT perform the migration — assesses, quantifies risk, and plans. Triggers on phrases like "modernize react", "class to hooks", "upgrade react", "migrate CRA to vite", "react legacy migration", "react 17 to 18", "react js to typescript", "react technical debt", "enzyme to RTL".
development
Scaffolds feature-based React / TypeScript architecture using feature folders, presentational + container components, custom hooks, a typed data layer, and structural CQRS (query hooks vs mutation hooks). React analog of dotnet-vertical-slice and python-feature-slice — no DI framework; uses props/context for dependency injection and a query cache for server state. Use when creating feature-based React projects, adding React features, organizing components by feature rather than by technical type, or scaffolding a feature's data layer. Triggers on phrases like "scaffold react feature", "create react slice", "react feature folder", "react vertical slice", "add react feature", "react feature architecture", "organize react by feature".