Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

giuseppe-trisciuoglio/langchain4j-vector-stores-configuration

Name: langchain4j-vector-stores-configuration
Author: giuseppe-trisciuoglio

plugins/developer-kit-java/skills/langchain4j-vector-stores-configuration/SKILL.md

npx skillsauth add giuseppe-trisciuoglio/developer-kit langchain4j-vector-stores-configuration

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

LangChain4J Vector Stores Configuration

Configure vector stores for Retrieval-Augmented Generation applications with LangChain4J.

Overview

LangChain4J provides a unified abstraction for vector stores (PostgreSQL/pgvector, Pinecone, MongoDB Atlas, Milvus, Neo4j) with builder-based configuration, metadata filtering, and hybrid search support.

When to Use

Configuring vector stores for semantic search and RAG applications
Setting up embedding storage with metadata filtering and hybrid search
Optimizing vector database performance for production AI workloads

Instructions

Set Up Basic Vector Store

Configure an embedding store for vector operations:

@Bean
public EmbeddingStore<TextSegment> embeddingStore() {
    return PgVectorEmbeddingStore.builder()
        .host("localhost")
        .port(5432)
        .database("vectordb")
        .user("username")
        .password("password")
        .table("embeddings")
        .dimension(1536) // OpenAI embedding dimension
        .createTable(true)
        .useIndex(true)
        .build();
}

Validation Workflow

Follow this workflow to ensure correct vector store setup:

Configure: Build the embedding store with required dimensions and connection parameters
Test connection: Verify store connectivity with a health check before ingesting data
Validate dimensions: Confirm embedding model dimensions match store configuration
Ingest test data: Add a small batch of test documents to verify ingestion works
Run test query: Execute a sample semantic search to confirm retrieval accuracy
Proceed to production: Only after all steps pass, proceed with full data ingestion

Configure Multiple Vector Stores

Use different stores for different use cases:

@Configuration
public class MultiVectorStoreConfiguration {

    @Bean
    @Qualifier("documentsStore")
    public EmbeddingStore<TextSegment> documentsEmbeddingStore() {
        return PgVectorEmbeddingStore.builder()
            .table("document_embeddings")
            .dimension(1536)
            .build();
    }

    @Bean
    @Qualifier("chatHistoryStore")
    public EmbeddingStore<TextSegment> chatHistoryEmbeddingStore() {
        return MongoDbEmbeddingStore.builder()
            .collectionName("chat_embeddings")
            .build();
    }
}

Implement Document Ingestion

Use EmbeddingStoreIngestor for automated document processing:

@Bean
public EmbeddingStoreIngestor embeddingStoreIngestor(
        EmbeddingStore<TextSegment> embeddingStore,
        EmbeddingModel embeddingModel) {

    return EmbeddingStoreIngestor.builder()
        .documentSplitter(DocumentSplitters.recursive(
            300,  // maxSegmentSizeInTokens
            20,   // maxOverlapSizeInTokens
            new OpenAiTokenizer(GPT_3_5_TURBO)
        ))
        .embeddingModel(embeddingModel)
        .embeddingStore(embeddingStore)
        .build();
}

Set Up Metadata Filtering

Configure metadata-based filtering capabilities:

// MongoDB with metadata field mapping
IndexMapping indexMapping = IndexMapping.builder()
    .dimension(1536)
    .metadataFieldNames(Set.of("category", "source", "created_date", "author"))
    .build();

// Search with metadata filters
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .maxResults(10)
    .filter(and(
        metadataKey("category").isEqualTo("technical_docs"),
        metadataKey("created_date").isGreaterThan(LocalDate.now().minusMonths(6))
    ))
    .build();

Configure Production Settings

Implement connection pooling and monitoring:

@Bean
public EmbeddingStore<TextSegment> optimizedPgVectorStore() {
    HikariConfig hikariConfig = new HikariConfig();
    hikariConfig.setJdbcUrl("jdbc:postgresql://localhost:5432/vectordb");
    hikariConfig.setUsername("username");
    hikariConfig.setPassword("password");
    hikariConfig.setMaximumPoolSize(20);
    hikariConfig.setMinimumIdle(5);
    hikariConfig.setConnectionTimeout(30000);

    DataSource dataSource = new HikariDataSource(hikariConfig);

    return PgVectorEmbeddingStore.builder()
        .dataSource(dataSource)
        .table("embeddings")
        .dimension(1536)
        .useIndex(true)
        .build();
}

Implement Health Checks

Monitor vector store connectivity:

@Component
public class VectorStoreHealthIndicator implements HealthIndicator {

    private final EmbeddingStore<TextSegment> embeddingStore;

    @Override
    public Health health() {
        try {
            embeddingStore.search(EmbeddingSearchRequest.builder()
                .queryEmbedding(new Embedding(Collections.nCopies(1536, 0.0f)))
                .maxResults(1)
                .build());

            return Health.up()
                .withDetail("store", embeddingStore.getClass().getSimpleName())
                .build();
        } catch (Exception e) {
            return Health.down()
                .withDetail("error", e.getMessage())
                .build();
        }
    }
}

Examples

Basic RAG Application Setup

@Configuration
public class SimpleRagConfig {

    @Bean
    public EmbeddingStore<TextSegment> embeddingStore() {
        return PgVectorEmbeddingStore.builder()
            .host("localhost")
            .database("rag_db")
            .table("documents")
            .dimension(1536)
            .build();
    }

    @Bean
    public ChatLanguageModel chatModel() {
        return OpenAiChatModel.withApiKey(System.getenv("OPENAI_API_KEY"));
    }
}

Semantic Search Service

@Service
public class SemanticSearchService {

    private final EmbeddingStore<TextSegment> store;
    private final EmbeddingModel embeddingModel;

    public List<String> search(String query, int maxResults) {
        Embedding queryEmbedding = embeddingModel.embed(query).content();

        EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
            .queryEmbedding(queryEmbedding)
            .maxResults(maxResults)
            .minScore(0.75)
            .build();

        return store.search(request).matches().stream()
            .map(match -> match.embedded().text())
            .toList();
    }
}

Production Setup with Monitoring

@Configuration
public class ProductionVectorStoreConfig {

    @Bean
    public EmbeddingStore<TextSegment> vectorStore(
            @Value("${vector.store.host}") String host,
            MeterRegistry meterRegistry) {

        EmbeddingStore<TextSegment> store = PgVectorEmbeddingStore.builder()
            .host(host)
            .database("production_vectors")
            .useIndex(true)
            .indexListSize(200)
            .build();

        return new MonitoredEmbeddingStore<>(store, meterRegistry);
    }
}

Best Practices

Choose the Right Vector Store

For Development:

Use InMemoryEmbeddingStore for local development and testing
Fast setup, no external dependencies
Data lost on application restart

For Production:

PostgreSQL + pgvector: Excellent for existing PostgreSQL environments
Pinecone: Managed service, good for rapid prototyping
MongoDB Atlas: Good integration with existing MongoDB applications
Milvus/Zilliz: High performance for large-scale deployments

Configure Appropriate Index Types

Choose index types based on performance requirements:

// For high recall requirements
.indexType(IndexType.FLAT)  // Exact search, slower but accurate

// For balanced performance
.indexType(IndexType.IVF_FLAT)  // Good balance of speed and accuracy

// For high-speed approximate search
.indexType(IndexType.HNSW)  // Fastest, slightly less accurate

Optimize Vector Dimensions

Match embedding dimensions to your model:

// OpenAI text-embedding-3-small
.dimension(1536)

// OpenAI text-embedding-3-large
.dimension(3072)

// Sentence Transformers
.dimension(384)  // all-MiniLM-L6-v2
.dimension(768)  // all-mpnet-base-v2

Implement Batch Operations

Use batch operations for better performance:

@Service
public class BatchEmbeddingService {

    private static final int BATCH_SIZE = 100;

    public void addDocumentsBatch(List<Document> documents) {
        for (List<Document> batch : Lists.partition(documents, BATCH_SIZE)) {
            List<TextSegment> segments = batch.stream()
                .map(doc -> TextSegment.from(doc.text(), doc.metadata()))
                .collect(Collectors.toList());

            List<Embedding> embeddings = embeddingModel.embedAll(segments)
                .content();

            embeddingStore.addAll(embeddings, segments);
        }
    }
}

Secure Configuration

Protect sensitive configuration:

// Use environment variables
@Value("${vector.store.api.key:#{null}}")
private String apiKey;

// Validate configuration
@PostConstruct
public void validateConfiguration() {
    if (StringUtils.isBlank(apiKey)) {
        throw new IllegalStateException("Vector store API key must be configured");
    }
}

References

For comprehensive documentation and advanced configurations, see:

API Reference - Complete API documentation
Examples - Production-ready examples

Constraints and Warnings

Vector dimensions must match the embedding model; mismatched dimensions will cause errors.
Large vector collections require proper indexing configuration for acceptable search performance.
Embedding generation can be expensive; implement batching and caching strategies.
Different vector stores have different distance metric support; verify compatibility.
Connection pooling is critical for production deployments to prevent connection exhaustion.
Metadata filtering capabilities vary between vector store implementations.
Vector stores consume significant memory; monitor resource usage in production.
Migration between vector store providers may require re-embedding all documents.
Batch operations are more efficient than single-document operations.
Always validate configuration during application startup to fail fast.

giuseppe-trisciuoglio/langchain4j-vector-stores-configuration

plugins/developer-kit-java/skills/langchain4j-vector-stores-configuration/SKILL.md

Provides configuration patterns for LangChain4J vector stores in RAG applications. Use when building semantic search, integrating vector databases (PostgreSQL/pgvector, Pinecone, MongoDB, Milvus, Neo4j), implementing embedding storage/retrieval, setting up hybrid search, or optimizing vector database performance for production AI applications.

193 stars

development

Updated Apr 5, 2026

$ install --global

skillsauth

npx skillsauth add giuseppe-trisciuoglio/developer-kit langchain4j-vector-stores-configuration

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 5, 2026, 1:23 PM5.9s3 files scanned

SKILL.md

name:: langchain4j-vector-stores-configuration
description:: Provides configuration patterns for LangChain4J vector stores in RAG applications. Use when building semantic search, integrating vector databases (PostgreSQL/pgvector, Pinecone, MongoDB, Milvus, Neo4j), implementing embedding storage/retrieval, setting up hybrid search, or optimizing vector database performance for production AI applications.
allowed-tools:: Read, Write, Edit, Bash, Glob, Grep

LangChain4J Vector Stores Configuration

Configure vector stores for Retrieval-Augmented Generation applications with LangChain4J.

Overview

When to Use

Configuring vector stores for semantic search and RAG applications
Setting up embedding storage with metadata filtering and hybrid search
Optimizing vector database performance for production AI workloads

Instructions

Set Up Basic Vector Store

Configure an embedding store for vector operations:

@Bean
public EmbeddingStore<TextSegment> embeddingStore() {
    return PgVectorEmbeddingStore.builder()
        .host("localhost")
        .port(5432)
        .database("vectordb")
        .user("username")
        .password("password")
        .table("embeddings")
        .dimension(1536) // OpenAI embedding dimension
        .createTable(true)
        .useIndex(true)
        .build();
}

Validation Workflow

Follow this workflow to ensure correct vector store setup:

Configure: Build the embedding store with required dimensions and connection parameters
Test connection: Verify store connectivity with a health check before ingesting data
Validate dimensions: Confirm embedding model dimensions match store configuration
Ingest test data: Add a small batch of test documents to verify ingestion works
Run test query: Execute a sample semantic search to confirm retrieval accuracy
Proceed to production: Only after all steps pass, proceed with full data ingestion

Configure Multiple Vector Stores

Use different stores for different use cases:

@Configuration
public class MultiVectorStoreConfiguration {

    @Bean
    @Qualifier("documentsStore")
    public EmbeddingStore<TextSegment> documentsEmbeddingStore() {
        return PgVectorEmbeddingStore.builder()
            .table("document_embeddings")
            .dimension(1536)
            .build();
    }

    @Bean
    @Qualifier("chatHistoryStore")
    public EmbeddingStore<TextSegment> chatHistoryEmbeddingStore() {
        return MongoDbEmbeddingStore.builder()
            .collectionName("chat_embeddings")
            .build();
    }
}

Implement Document Ingestion

Use EmbeddingStoreIngestor for automated document processing:

@Bean
public EmbeddingStoreIngestor embeddingStoreIngestor(
        EmbeddingStore<TextSegment> embeddingStore,
        EmbeddingModel embeddingModel) {

    return EmbeddingStoreIngestor.builder()
        .documentSplitter(DocumentSplitters.recursive(
            300,  // maxSegmentSizeInTokens
            20,   // maxOverlapSizeInTokens
            new OpenAiTokenizer(GPT_3_5_TURBO)
        ))
        .embeddingModel(embeddingModel)
        .embeddingStore(embeddingStore)
        .build();
}

Set Up Metadata Filtering

Configure metadata-based filtering capabilities:

// MongoDB with metadata field mapping
IndexMapping indexMapping = IndexMapping.builder()
    .dimension(1536)
    .metadataFieldNames(Set.of("category", "source", "created_date", "author"))
    .build();

// Search with metadata filters
EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
    .queryEmbedding(queryEmbedding)
    .maxResults(10)
    .filter(and(
        metadataKey("category").isEqualTo("technical_docs"),
        metadataKey("created_date").isGreaterThan(LocalDate.now().minusMonths(6))
    ))
    .build();

Configure Production Settings

Implement connection pooling and monitoring:

@Bean
public EmbeddingStore<TextSegment> optimizedPgVectorStore() {
    HikariConfig hikariConfig = new HikariConfig();
    hikariConfig.setJdbcUrl("jdbc:postgresql://localhost:5432/vectordb");
    hikariConfig.setUsername("username");
    hikariConfig.setPassword("password");
    hikariConfig.setMaximumPoolSize(20);
    hikariConfig.setMinimumIdle(5);
    hikariConfig.setConnectionTimeout(30000);

    DataSource dataSource = new HikariDataSource(hikariConfig);

    return PgVectorEmbeddingStore.builder()
        .dataSource(dataSource)
        .table("embeddings")
        .dimension(1536)
        .useIndex(true)
        .build();
}

Implement Health Checks

Monitor vector store connectivity:

@Component
public class VectorStoreHealthIndicator implements HealthIndicator {

    private final EmbeddingStore<TextSegment> embeddingStore;

    @Override
    public Health health() {
        try {
            embeddingStore.search(EmbeddingSearchRequest.builder()
                .queryEmbedding(new Embedding(Collections.nCopies(1536, 0.0f)))
                .maxResults(1)
                .build());

            return Health.up()
                .withDetail("store", embeddingStore.getClass().getSimpleName())
                .build();
        } catch (Exception e) {
            return Health.down()
                .withDetail("error", e.getMessage())
                .build();
        }
    }
}

Examples

Basic RAG Application Setup

@Configuration
public class SimpleRagConfig {

    @Bean
    public EmbeddingStore<TextSegment> embeddingStore() {
        return PgVectorEmbeddingStore.builder()
            .host("localhost")
            .database("rag_db")
            .table("documents")
            .dimension(1536)
            .build();
    }

    @Bean
    public ChatLanguageModel chatModel() {
        return OpenAiChatModel.withApiKey(System.getenv("OPENAI_API_KEY"));
    }
}

Semantic Search Service

@Service
public class SemanticSearchService {

    private final EmbeddingStore<TextSegment> store;
    private final EmbeddingModel embeddingModel;

    public List<String> search(String query, int maxResults) {
        Embedding queryEmbedding = embeddingModel.embed(query).content();

        EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
            .queryEmbedding(queryEmbedding)
            .maxResults(maxResults)
            .minScore(0.75)
            .build();

        return store.search(request).matches().stream()
            .map(match -> match.embedded().text())
            .toList();
    }
}

Production Setup with Monitoring

@Configuration
public class ProductionVectorStoreConfig {

    @Bean
    public EmbeddingStore<TextSegment> vectorStore(
            @Value("${vector.store.host}") String host,
            MeterRegistry meterRegistry) {

        EmbeddingStore<TextSegment> store = PgVectorEmbeddingStore.builder()
            .host(host)
            .database("production_vectors")
            .useIndex(true)
            .indexListSize(200)
            .build();

        return new MonitoredEmbeddingStore<>(store, meterRegistry);
    }
}

Best Practices

Choose the Right Vector Store

For Development:

Use InMemoryEmbeddingStore for local development and testing
Fast setup, no external dependencies
Data lost on application restart

For Production:

PostgreSQL + pgvector: Excellent for existing PostgreSQL environments
Pinecone: Managed service, good for rapid prototyping
MongoDB Atlas: Good integration with existing MongoDB applications
Milvus/Zilliz: High performance for large-scale deployments

Configure Appropriate Index Types

Choose index types based on performance requirements:

// For high recall requirements
.indexType(IndexType.FLAT)  // Exact search, slower but accurate

// For balanced performance
.indexType(IndexType.IVF_FLAT)  // Good balance of speed and accuracy

// For high-speed approximate search
.indexType(IndexType.HNSW)  // Fastest, slightly less accurate

Optimize Vector Dimensions

Match embedding dimensions to your model:

// OpenAI text-embedding-3-small
.dimension(1536)

// OpenAI text-embedding-3-large
.dimension(3072)

// Sentence Transformers
.dimension(384)  // all-MiniLM-L6-v2
.dimension(768)  // all-mpnet-base-v2

Implement Batch Operations

Use batch operations for better performance:

@Service
public class BatchEmbeddingService {

    private static final int BATCH_SIZE = 100;

    public void addDocumentsBatch(List<Document> documents) {
        for (List<Document> batch : Lists.partition(documents, BATCH_SIZE)) {
            List<TextSegment> segments = batch.stream()
                .map(doc -> TextSegment.from(doc.text(), doc.metadata()))
                .collect(Collectors.toList());

            List<Embedding> embeddings = embeddingModel.embedAll(segments)
                .content();

            embeddingStore.addAll(embeddings, segments);
        }
    }
}

Secure Configuration

Protect sensitive configuration:

// Use environment variables
@Value("${vector.store.api.key:#{null}}")
private String apiKey;

// Validate configuration
@PostConstruct
public void validateConfiguration() {
    if (StringUtils.isBlank(apiKey)) {
        throw new IllegalStateException("Vector store API key must be configured");
    }
}

References

For comprehensive documentation and advanced configurations, see:

API Reference - Complete API documentation
Examples - Production-ready examples

Constraints and Warnings

Vector dimensions must match the embedding model; mismatched dimensions will cause errors.
Large vector collections require proper indexing configuration for acceptable search performance.
Embedding generation can be expensive; implement batching and caching strategies.
Different vector stores have different distance metric support; verify compatibility.
Connection pooling is critical for production deployments to prevent connection exhaustion.
Metadata filtering capabilities vary between vector store implementations.
Vector stores consume significant memory; monitor resource usage in production.
Migration between vector store providers may require re-embedding all documents.
Batch operations are more efficient than single-document operations.
Always validate configuration during application startup to fail fast.

Related Skills

giuseppe-trisciuoglio/specs-explore

development

VerifiedTrustedCommunity

Explore codebase before committing to a change. Phase executor skill for specs.explore command.

290SKILL.mdUpdated Jun 23, 2026

giuseppe-trisciuoglio/specs-explore

giuseppe-trisciuoglio/specs-e2e-verification

development

VerifiedTrustedCommunity

Executes real end-to-end verification against a running application after specification implementation. Detects the application type, starts the local runtime (Docker, Node, Spring Boot, etc.), runs real tests (curl for REST APIs, Playwright for web SPAs, computer-use for desktop apps), verifies acceptance criteria from the functional specification, generates a markdown report, and tears down the environment. Use when: user asks to verify a completed spec with real tests, run e2e checks after implementation, validate acceptance criteria in a live environment, or test the feature for real after task completion.

290SKILL.mdUpdated Jun 23, 2026

giuseppe-trisciuoglio/specs-e2e-verification

giuseppe-trisciuoglio/sdd-init

development

VerifiedTrustedCommunity

Initialize Spec-Driven Development context — detects tech stack, conventions, architecture patterns, and bootstraps persistence backends. Triggers on 'sdd-init', 'init sdd', 'setup sdd', 'initialize sdd', 'setup project', 'initialize project context'. Creates/updates docs/specs/architecture.md & ontology.md (Constitution), and populates knowledge-graph.json.

290SKILL.mdUpdated Jun 23, 2026

giuseppe-trisciuoglio/sdd-init

giuseppe-trisciuoglio/brainstorm-prompt-optimizer

development

VerifiedTrustedCommunity

Optimizes raw idea descriptions into structured prompts ready for the brainstorming workflow. TRIGGER when: user says "optimize for brainstorm", "prepare idea for brainstorm", "enhance this idea", "make this ready for brainstorming", "imposta per brainstorm", or wants to improve a feature idea before using /specs.brainstorm. DO NOT TRIGGER for code optimization, refactoring, or general prompt engineering tasks.

290SKILL.mdUpdated Jun 23, 2026

giuseppe-trisciuoglio/brainstorm-prompt-optimizer

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/giuseppe-trisciuoglio/developer-kit.git

# Copy into Claude Code skills folder (global)
cp -r developer-kit/plugins/developer-kit-java/skills/langchain4j-vector-stores-configuration ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

giuseppe-trisciuoglio/developer-kit

193 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT