Amazon Bedrock Knowledge Bases

Amazon Bedrock Knowledge Bases is a fully managed RAG (Retrieval-Augmented Generation) solution that handles data ingestion, embedding generation, vector storage, retrieval with reranking, source attribution, and session context management.

Overview

What It Does

Amazon Bedrock Knowledge Bases provides:

Data Ingestion: Automatically process documents from S3, web, Confluence, SharePoint, Salesforce
Embedding Generation: Convert text to vectors using foundation models
Vector Storage: Store embeddings in multiple vector database options
Retrieval: Semantic and hybrid search with metadata filtering
Generation: RAG workflows with source attribution
Session Management: Multi-turn conversations with context
Chunking Strategies: Fixed, semantic, hierarchical, and custom chunking

When to Use This Skill

Use this skill when you need to:

Build RAG applications for document Q&A
Implement semantic search over enterprise knowledge
Create chatbots with knowledge bases
Integrate retrieval with Bedrock Agents
Configure optimal chunking strategies
Query documents with source attribution
Manage multi-turn conversations with context
Optimize RAG performance and cost

Key Capabilities

Multiple Vector Store Options: OpenSearch, S3 Vectors, Neptune, Pinecone, MongoDB, Redis
Flexible Data Sources: S3, web crawlers, Confluence, SharePoint, Salesforce
Advanced Chunking: Fixed-size, semantic, hierarchical, custom Lambda
Hybrid Search: Combine semantic (vector) and keyword search
Session Management: Built-in conversation context tracking
GraphRAG: Relationship-aware retrieval with Neptune Analytics
Cost Optimization: S3 Vectors for up to 90% storage savings

Quick Start

Basic RAG Workflow

import boto3
import json

# Initialize clients
bedrock_agent = boto3.client('bedrock-agent', region_name='us-east-1')
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

# 1. Create Knowledge Base
kb_response = bedrock_agent.create_knowledge_base(
    name='enterprise-docs-kb',
    description='Company documentation knowledge base',
    roleArn='arn:aws:iam::123456789012:role/BedrockKBRole',
    knowledgeBaseConfiguration={
        'type': 'VECTOR',
        'vectorKnowledgeBaseConfiguration': {
            'embeddingModelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0'
        }
    },
    storageConfiguration={
        'type': 'OPENSEARCH_SERVERLESS',
        'opensearchServerlessConfiguration': {
            'collectionArn': 'arn:aws:aoss:us-east-1:123456789012:collection/kb-collection',
            'vectorIndexName': 'bedrock-knowledge-base-index',
            'fieldMapping': {
                'vectorField': 'bedrock-knowledge-base-default-vector',
                'textField': 'AMAZON_BEDROCK_TEXT_CHUNK',
                'metadataField': 'AMAZON_BEDROCK_METADATA'
            }
        }
    }
)

knowledge_base_id = kb_response['knowledgeBase']['knowledgeBaseId']
print(f"Knowledge Base ID: {knowledge_base_id}")

# 2. Add S3 Data Source
ds_response = bedrock_agent.create_data_source(
    knowledgeBaseId=knowledge_base_id,
    name='s3-documents',
    description='Company documents from S3',
    dataSourceConfiguration={
        'type': 'S3',
        's3Configuration': {
            'bucketArn': 'arn:aws:s3:::my-docs-bucket',
            'inclusionPrefixes': ['documents/']
        }
    },
    vectorIngestionConfiguration={
        'chunkingConfiguration': {
            'chunkingStrategy': 'FIXED_SIZE',
            'fixedSizeChunkingConfiguration': {
                'maxTokens': 512,
                'overlapPercentage': 20
            }
        }
    }
)

data_source_id = ds_response['dataSource']['dataSourceId']

# 3. Start Ingestion
ingestion_response = bedrock_agent.start_ingestion_job(
    knowledgeBaseId=knowledge_base_id,
    dataSourceId=data_source_id,
    description='Initial document ingestion'
)

print(f"Ingestion Job ID: {ingestion_response['ingestionJob']['ingestionJobId']}")

# 4. Query with Retrieve and Generate
response = bedrock_agent_runtime.retrieve_and_generate(
    input={
        'text': 'What is our vacation policy?'
    },
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': knowledge_base_id,
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0',
            'retrievalConfiguration': {
                'vectorSearchConfiguration': {
                    'numberOfResults': 5,
                    'overrideSearchType': 'HYBRID'
                }
            }
        }
    }
)

print(f"Answer: {response['output']['text']}")
print(f"\nSources:")
for citation in response['citations']:
    for reference in citation['retrievedReferences']:
        print(f"  - {reference['location']['s3Location']['uri']}")

Vector Store Options

1. Amazon OpenSearch Serverless

Best for: Production RAG applications with auto-scaling requirements

Benefits:

Fully managed, serverless operation
Auto-scaling compute and storage
High availability with multi-AZ deployment
Fast query performance

Configuration:

storageConfiguration={
    'type': 'OPENSEARCH_SERVERLESS',
    'opensearchServerlessConfiguration': {
        'collectionArn': 'arn:aws:aoss:us-east-1:123456789012:collection/kb-collection',
        'vectorIndexName': 'bedrock-knowledge-base-index',
        'fieldMapping': {
            'vectorField': 'bedrock-knowledge-base-default-vector',
            'textField': 'AMAZON_BEDROCK_TEXT_CHUNK',
            'metadataField': 'AMAZON_BEDROCK_METADATA'
        }
    }
}

2. Amazon S3 Vectors (Preview)

Best for: Cost-optimized, large-scale RAG applications

Benefits:

Up to 90% cost reduction for vector storage
Built-in vector support in S3
Subsecond query performance
Massive scale and durability

Ideal Use Cases:

Large document collections (millions of chunks)
Cost-sensitive applications
Archival knowledge bases
Low-to-medium QPS workloads

Configuration:

storageConfiguration={
    'type': 'S3_VECTORS',
    's3VectorsConfiguration': {
        'bucketArn': 'arn:aws:s3:::my-vector-bucket',
        'prefix': 'vectors/'
    }
}

Limitations:

Still in preview (no CloudFormation/CDK support yet)
Not suitable for high QPS, millisecond-latency requirements
Best for cost optimization over ultra-low latency

3. Amazon Neptune Analytics (GraphRAG)

Best for: Interconnected knowledge domains requiring relationship-aware retrieval

Benefits:

Automatic graph creation linking related content
Improved retrieval accuracy through relationships
Comprehensive responses leveraging knowledge graph
Explainable results with relationship context

Use Cases:

Legal document analysis with case precedents
Scientific research with paper citations
Product catalogs with dependencies
Organizational knowledge with team relationships

Configuration:

storageConfiguration={
    'type': 'NEPTUNE_ANALYTICS',
    'neptuneAnalyticsConfiguration': {
        'graphArn': 'arn:aws:neptune-graph:us-east-1:123456789012:graph/g-12345678',
        'vectorSearchConfiguration': {
            'vectorField': 'embedding'
        }
    }
}

4. Amazon OpenSearch Service Managed Cluster

Best for: Existing OpenSearch infrastructure, advanced customization

Configuration:

storageConfiguration={
    'type': 'OPENSEARCH_SERVICE',
    'opensearchServiceConfiguration': {
        'clusterArn': 'arn:aws:es:us-east-1:123456789012:domain/my-domain',
        'vectorIndexName': 'bedrock-kb-index',
        'fieldMapping': {
            'vectorField': 'embedding',
            'textField': 'text',
            'metadataField': 'metadata'
        }
    }
}

5. Third-Party Vector Databases

Pinecone:

storageConfiguration={
    'type': 'PINECONE',
    'pineconeConfiguration': {
        'connectionString': 'https://my-index-abc123.svc.us-west1-gcp.pinecone.io',
        'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:pinecone-api-key',
        'namespace': 'bedrock-kb',
        'fieldMapping': {
            'textField': 'text',
            'metadataField': 'metadata'
        }
    }
}

MongoDB Atlas:

storageConfiguration={
    'type': 'MONGODB_ATLAS',
    'mongoDbAtlasConfiguration': {
        'endpoint': 'https://cluster0.mongodb.net',
        'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:mongodb-creds',
        'databaseName': 'bedrock_kb',
        'collectionName': 'vectors',
        'vectorIndexName': 'vector_index',
        'fieldMapping': {
            'vectorField': 'embedding',
            'textField': 'text',
            'metadataField': 'metadata'
        }
    }
}

Redis Enterprise Cloud:

storageConfiguration={
    'type': 'REDIS_ENTERPRISE_CLOUD',
    'redisEnterpriseCloudConfiguration': {
        'endpoint': 'redis-12345.c1.us-east-1-2.ec2.cloud.redislabs.com:12345',
        'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:redis-creds',
        'vectorIndexName': 'bedrock-kb-index',
        'fieldMapping': {
            'vectorField': 'embedding',
            'textField': 'text',
            'metadataField': 'metadata'
        }
    }
}

Data Source Configuration

1. Amazon S3

Supported File Types: PDF, TXT, MD, HTML, DOC, DOCX, CSV, XLS, XLSX

bedrock_agent.create_data_source(
    knowledgeBaseId=knowledge_base_id,
    name='s3-technical-docs',
    description='Technical documentation from S3',
    dataSourceConfiguration={
        'type': 'S3',
        's3Configuration': {
            'bucketArn': 'arn:aws:s3:::my-docs-bucket',
            'inclusionPrefixes': ['docs/technical/', 'docs/manuals/'],
            'exclusionPrefixes': ['docs/archive/']
        }
    }
)

2. Web Crawler

Automatic website scraping and indexing:

bedrock_agent.create_data_source(
    knowledgeBaseId=knowledge_base_id,
    name='company-website',
    description='Public company website content',
    dataSourceConfiguration={
        'type': 'WEB',
        'webConfiguration': {
            'sourceConfiguration': {
                'urlConfiguration': {
                    'seedUrls': [
                        {'url': 'https://www.example.com/docs'},
                        {'url': 'https://www.example.com/blog'}
                    ]
                }
            },
            'crawlerConfiguration': {
                'crawlerLimits': {
                    'rateLimit': 300  # Pages per minute
                }
            }
        }
    }
)

3. Confluence

bedrock_agent.create_data_source(
    knowledgeBaseId=knowledge_base_id,
    name='confluence-wiki',
    description='Company Confluence knowledge base',
    dataSourceConfiguration={
        'type': 'CONFLUENCE',
        'confluenceConfiguration': {
            'sourceConfiguration': {
                'hostUrl': 'https://company.atlassian.net/wiki',
                'hostType': 'SAAS',
                'authType': 'BASIC',
                'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:confluence-creds'
            },
            'crawlerConfiguration': {
                'filterConfiguration': {
                    'type': 'PATTERN',
                    'patternObjectFilter': {
                        'filters': [
                            {
                                'objectType': 'Space',
                                'inclusionFilters': ['Engineering', 'Product'],
                                'exclusionFilters': ['Archive']
                            }
                        ]
                    }
                }
            }
        }
    }
)

4. SharePoint

bedrock_agent.create_data_source(
    knowledgeBaseId=knowledge_base_id,
    name='sharepoint-docs',
    description='SharePoint document library',
    dataSourceConfiguration={
        'type': 'SHAREPOINT',
        'sharePointConfiguration': {
            'sourceConfiguration': {
                'siteUrls': [
                    'https://company.sharepoint.com/sites/Engineering',
                    'https://company.sharepoint.com/sites/Product'
                ],
                'tenantId': 'tenant-id',
                'domain': 'company',
                'authType': 'OAUTH2_CLIENT_CREDENTIALS',
                'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:sharepoint-creds'
            }
        }
    }
)

5. Salesforce

bedrock_agent.create_data_source(
    knowledgeBaseId=knowledge_base_id,
    name='salesforce-knowledge',
    description='Salesforce knowledge articles',
    dataSourceConfiguration={
        'type': 'SALESFORCE',
        'salesforceConfiguration': {
            'sourceConfiguration': {
                'hostUrl': 'https://company.my.salesforce.com',
                'authType': 'OAUTH2_CLIENT_CREDENTIALS',
                'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:salesforce-creds'
            },
            'crawlerConfiguration': {
                'filterConfiguration': {
                    'type': 'PATTERN',
                    'patternObjectFilter': {
                        'filters': [
                            {
                                'objectType': 'Knowledge',
                                'inclusionFilters': ['Product_Documentation', 'Support_Articles']
                            }
                        ]
                    }
                }
            }
        }
    }
)

Chunking Strategies

1. Fixed-Size Chunking

Best for: Simple documents with uniform structure

How it works: Splits text into chunks of fixed token size with overlap

Parameters:

maxTokens: 200-8192 tokens (typically 512-1024)
overlapPercentage: 10-50% (typically 20%)

Configuration:

vectorIngestionConfiguration={
    'chunkingConfiguration': {
        'chunkingStrategy': 'FIXED_SIZE',
        'fixedSizeChunkingConfiguration': {
            'maxTokens': 512,
            'overlapPercentage': 20
        }
    }
}

Use Cases:

Blog posts and articles
Technical documentation with consistent formatting
FAQs and Q&A content
Simple text files

Pros:

Fast and predictable
No additional costs
Easy to tune

Cons:

May split semantic units awkwardly
Doesn't respect document structure
Can break context mid-sentence

2. Semantic Chunking

Best for: Documents without clear boundaries (legal, technical, academic)

How it works: Uses sentence similarity to group related content

Parameters:

maxTokens: 20-8192 tokens (typically 300-500)
bufferSize: Number of neighboring sentences (default: 1)
breakpointPercentileThreshold: Similarity threshold (recommended: 95%)

Configuration:

vectorIngestionConfiguration={
    'chunkingConfiguration': {
        'chunkingStrategy': 'SEMANTIC',
        'semanticChunkingConfiguration': {
            'maxTokens': 300,
            'bufferSize': 1,
            'breakpointPercentileThreshold': 95
        }
    }
}

Use Cases:

Legal documents and contracts
Academic papers
Technical specifications
Medical records
Research reports

Pros:

Preserves semantic meaning
Better context preservation
Improved retrieval accuracy

Cons:

Additional cost (foundation model usage)
Slower ingestion
Less predictable chunk sizes

Cost Consideration: Semantic chunking uses foundation models for similarity analysis, incurring additional costs beyond storage and retrieval.

3. Hierarchical Chunking

Best for: Complex documents with nested structure

How it works: Creates parent and child chunks; retrieves child, returns parent for context

Parameters:

levelConfigurations: Array of chunk sizes (parent → child)
overlapTokens: Overlap between chunks

Configuration:

vectorIngestionConfiguration={
    'chunkingConfiguration': {
        'chunkingStrategy': 'HIERARCHICAL',
        'hierarchicalChunkingConfiguration': {
            'levelConfigurations': [
                {
                    'maxTokens': 1500  # Parent chunk (comprehensive context)
                },
                {
                    'maxTokens': 300   # Child chunk (focused retrieval)
                }
            ],
            'overlapTokens': 60
        }
    }
}

Use Cases:

Technical manuals with sections and subsections
Academic papers with abstract, sections, and subsections
Legal documents with articles and clauses
Product documentation with categories and details

How Retrieval Works:

Query matches against child chunks (fast, focused)
Returns parent chunks (comprehensive context)
Best of both: precision retrieval + complete context

Pros:

Optimal balance of precision and context
Excellent for nested documents
Better accuracy for complex queries

Cons:

More complex configuration
Larger storage footprint
Requires understanding of document structure

4. Custom Chunking (Lambda)

Best for: Specialized domain logic, custom parsing requirements

How it works: Invoke Lambda function for custom chunking logic

Configuration:

vectorIngestionConfiguration={
    'chunkingConfiguration': {
        'chunkingStrategy': 'NONE'  # Custom via Lambda
    },
    'customTransformationConfiguration': {
        'intermediateStorage': {
            's3Location': {
                'uri': 's3://my-kb-bucket/intermediate/'
            }
        },
        'transformations': [
            {
                'stepToApply': 'POST_CHUNKING',
                'transformationFunction': {
                    'transformationLambdaConfiguration': {
                        'lambdaArn': 'arn:aws:lambda:us-east-1:123456789012:function:custom-chunker'
                    }
                }
            }
        ]
    }
}

Example Lambda Handler:

# Lambda function for custom chunking
import json

def lambda_handler(event, context):
    """
    Custom chunking logic for specialized documents

    Input: event contains document content and metadata
    Output: array of chunks with text and metadata
    """

    # Extract document content
    document = event['document']
    content = document['content']
    metadata = document.get('metadata', {})

    # Custom chunking logic (example: split by custom delimiter)
    chunks = []
    sections = content.split('---SECTION---')

    for idx, section in enumerate(sections):
        if section.strip():
            chunks.append({
                'text': section.strip(),
                'metadata': {
                    **metadata,
                    'chunk_id': f'section_{idx}',
                    'chunk_type': 'custom_section'
                }
            })

    return {
        'chunks': chunks
    }

Use Cases:

Medical records with structured sections (SOAP notes)
Financial documents with tables and calculations
Code documentation with code blocks and explanations
Domain-specific formats (HL7, FHIR, etc.)

Pros:

Complete control over chunking logic
Can handle any document format
Integrate domain expertise

Cons:

Requires Lambda development and maintenance
Additional operational complexity
Harder to debug and iterate

Chunking Strategy Selection Guide

| Document Type | Recommended Strategy | Rationale | |--------------|---------------------|-----------| | Blog posts, articles | Fixed-size | Simple, uniform structure | | Legal documents | Semantic | Preserve legal reasoning flow | | Technical manuals | Hierarchical | Nested sections and subsections | | Academic papers | Hierarchical | Abstract, sections, subsections | | FAQs | Fixed-size | Independent Q&A pairs | | Medical records | Custom Lambda | Structured sections (SOAP, HL7) | | Code documentation | Custom Lambda | Code blocks + explanations | | Product catalogs | Fixed-size | Uniform product descriptions | | Research reports | Semantic | Preserve research narrative |

Retrieval Operations

1. Retrieve API (Retrieval Only)

Returns raw retrieved chunks without generation.

Use Cases:

Custom generation logic
Debugging retrieval quality
Building custom RAG pipelines
Integrating with non-Bedrock models

bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

response = bedrock_agent_runtime.retrieve(
    knowledgeBaseId='KB123456',
    retrievalQuery={
        'text': 'What are the benefits of hierarchical chunking?'
    },
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 5,
            'overrideSearchType': 'HYBRID',  # SEMANTIC, HYBRID
            'filter': {
                'andAll': [
                    {
                        'equals': {
                            'key': 'document_type',
                            'value': 'technical_guide'
                        }
                    },
                    {
                        'greaterThan': {
                            'key': 'publish_year',
                            'value': 2024
                        }
                    }
                ]
            }
        }
    }
)

# Process retrieved chunks
for result in response['retrievalResults']:
    print(f"Score: {result['score']}")
    print(f"Content: {result['content']['text']}")
    print(f"Location: {result['location']}")
    print(f"Metadata: {result.get('metadata', {})}")
    print("---")

2. Retrieve and Generate API (RAG)

Returns generated response with source attribution.

Use Cases:

Complete RAG workflows
Question answering
Document summarization
Chatbots with knowledge bases

response = bedrock_agent_runtime.retrieve_and_generate(
    input={
        'text': 'Explain semantic chunking benefits and when to use it'
    },
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': 'KB123456',
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0',
            'retrievalConfiguration': {
                'vectorSearchConfiguration': {
                    'numberOfResults': 5,
                    'overrideSearchType': 'HYBRID'
                }
            },
            'generationConfiguration': {
                'inferenceConfig': {
                    'textInferenceConfig': {
                        'temperature': 0.7,
                        'maxTokens': 2048,
                        'topP': 0.9
                    }
                },
                'promptTemplate': {
                    'textPromptTemplate': '''You are a helpful assistant. Answer the user's question based on the provided context.

Context: $search_results$

Question: $query$

Answer:'''
                }
            }
        }
    }
)

print(f"Generated Response: {response['output']['text']}")
print(f"\nSources:")
for citation in response['citations']:
    for reference in citation['retrievedReferences']:
        print(f"  - {reference['location']}")
        print(f"    Relevance Score: {reference.get('score', 'N/A')}")

3. Multi-Turn Conversations with Session Management

Bedrock automatically manages conversation context across turns.

# First turn - creates session automatically
response1 = bedrock_agent_runtime.retrieve_and_generate(
    input={
        'text': 'What is Amazon Bedrock Knowledge Bases?'
    },
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': 'KB123456',
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'
        }
    }
)

session_id = response1['sessionId']
print(f"Session ID: {session_id}")
print(f"Response: {response1['output']['text']}\n")

# Follow-up turn - reuse session for context
response2 = bedrock_agent_runtime.retrieve_and_generate(
    input={
        'text': 'What chunking strategies does it support?'
    },
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': 'KB123456',
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'
        }
    },
    sessionId=session_id  # Continue conversation with context
)

print(f"Follow-up Response: {response2['output']['text']}")

# Third turn
response3 = bedrock_agent_runtime.retrieve_and_generate(
    input={
        'text': 'Which strategy would you recommend for legal documents?'
    },
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': 'KB123456',
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'
        }
    },
    sessionId=session_id
)

print(f"Third Response: {response3['output']['text']}")

4. Advanced Metadata Filtering

Filter retrieval by metadata attributes for precision.

response = bedrock_agent_runtime.retrieve(
    knowledgeBaseId='KB123456',
    retrievalQuery={
        'text': 'Security best practices for production deployments'
    },
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 10,
            'overrideSearchType': 'HYBRID',
            'filter': {
                'andAll': [
                    {
                        'equals': {
                            'key': 'document_type',
                            'value': 'security_guide'
                        }
                    },
                    {
                        'greaterThanOrEquals': {
                            'key': 'publish_year',
                            'value': 2024
                        }
                    },
                    {
                        'in': {
                            'key': 'category',
                            'value': ['production', 'security', 'compliance']
                        }
                    }
                ]
            }
        }
    }
)

Supported Filter Operators:

equals: Exact match
notEquals: Not equal
greaterThan, greaterThanOrEquals: Numeric comparison
lessThan, lessThanOrEquals: Numeric comparison
in: Match any value in array
notIn: Not match any value in array
startsWith: String prefix match
andAll: Combine filters with AND
orAll: Combine filters with OR

Ingestion Management

1. Start Ingestion Job

ingestion_response = bedrock_agent.start_ingestion_job(
    knowledgeBaseId=knowledge_base_id,
    dataSourceId=data_source_id,
    description='Monthly document sync',
    clientToken='unique-idempotency-token-123'
)

job_id = ingestion_response['ingestionJob']['ingestionJobId']
print(f"Ingestion Job ID: {job_id}")

2. Monitor Ingestion Job

# Get job status
job_status = bedrock_agent.get_ingestion_job(
    knowledgeBaseId=knowledge_base_id,
    dataSourceId=data_source_id,
    ingestionJobId=job_id
)

print(f"Status: {job_status['ingestionJob']['status']}")
print(f"Started: {job_status['ingestionJob']['startedAt']}")
print(f"Updated: {job_status['ingestionJob']['updatedAt']}")

if 'statistics' in job_status['ingestionJob']:
    stats = job_status['ingestionJob']['statistics']
    print(f"Documents Scanned: {stats['numberOfDocumentsScanned']}")
    print(f"Documents Indexed: {stats['numberOfDocumentsIndexed']}")
    print(f"Documents Failed: {stats['numberOfDocumentsFailed']}")

# Wait for completion
import time

while True:
    status = bedrock_agent.get_ingestion_job(
        knowledgeBaseId=knowledge_base_id,
        dataSourceId=data_source_id,
        ingestionJobId=job_id
    )

    current_status = status['ingestionJob']['status']

    if current_status in ['COMPLETE', 'FAILED']:
        print(f"Ingestion job {current_status}")
        break

    print(f"Status: {current_status}, waiting...")
    time.sleep(30)

3. List Ingestion Jobs

list_response = bedrock_agent.list_ingestion_jobs(
    knowledgeBaseId=knowledge_base_id,
    dataSourceId=data_source_id,
    maxResults=50
)

for job in list_response['ingestionJobSummaries']:
    print(f"Job ID: {job['ingestionJobId']}")
    print(f"Status: {job['status']}")
    print(f"Started: {job['startedAt']}")
    print(f"Updated: {job['updatedAt']}")
    print("---")

Integration with Bedrock Agents

1. Agent with Knowledge Base Action

bedrock_agent = boto3.client('bedrock-agent', region_name='us-east-1')

# Create agent with knowledge base
agent_response = bedrock_agent.create_agent(
    agentName='customer-support-agent',
    description='Customer support agent with knowledge base access',
    instruction='''You are a customer support agent. When answering questions:
1. Search the knowledge base for relevant information
2. Provide accurate answers based on retrieved context
3. Cite your sources
4. Admit when you don't know something''',
    foundationModel='anthropic.claude-3-sonnet-20240229-v1:0',
    agentResourceRoleArn='arn:aws:iam::123456789012:role/BedrockAgentRole'
)

agent_id = agent_response['agent']['agentId']

# Associate knowledge base with agent
kb_association = bedrock_agent.associate_agent_knowledge_base(
    agentId=agent_id,
    agentVersion='DRAFT',
    knowledgeBaseId='KB123456',
    description='Company documentation knowledge base',
    knowledgeBaseState='ENABLED'
)

# Prepare and create alias
bedrock_agent.prepare_agent(agentId=agent_id)

alias_response = bedrock_agent.create_agent_alias(
    agentId=agent_id,
    agentAliasName='production',
    description='Production alias'
)

agent_alias_id = alias_response['agentAlias']['agentAliasId']

# Invoke agent (automatically queries knowledge base)
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

response = bedrock_agent_runtime.invoke_agent(
    agentId=agent_id,
    agentAliasId=agent_alias_id,
    sessionId='session-123',
    inputText='What is our return policy for defective products?'
)

for event in response['completion']:
    if 'chunk' in event:
        chunk = event['chunk']
        print(chunk['bytes'].decode())

2. Agent with Multiple Knowledge Bases

# Associate multiple knowledge bases
bedrock_agent.associate_agent_knowledge_base(
    agentId=agent_id,
    agentVersion='DRAFT',
    knowledgeBaseId='KB-PRODUCT-DOCS',
    description='Product documentation'
)

bedrock_agent.associate_agent_knowledge_base(
    agentId=agent_id,
    agentVersion='DRAFT',
    knowledgeBaseId='KB-SUPPORT-ARTICLES',
    description='Support knowledge articles'
)

bedrock_agent.associate_agent_knowledge_base(
    agentId=agent_id,
    agentVersion='DRAFT',
    knowledgeBaseId='KB-COMPANY-POLICIES',
    description='Company policies and procedures'
)

# Agent automatically searches all knowledge bases and combines results

Best Practices

1. Chunking Strategy Selection

Decision Framework:

Simple, uniform documents → Fixed-size chunking
- Blog posts, articles, simple FAQs
- Fast, predictable, cost-effective
Documents without clear boundaries → Semantic chunking
- Legal documents, contracts, academic papers
- Preserves semantic meaning, better accuracy
- Consider additional cost
Nested, hierarchical documents → Hierarchical chunking
- Technical manuals, product docs, research papers
- Best balance of precision and context
- Optimal for complex structures
Specialized formats → Custom Lambda chunking
- Medical records (HL7, FHIR), code docs, custom formats
- Complete control, domain expertise
- Higher operational complexity

Tuning Guidelines:

Fixed-size: Start with 512 tokens, 20% overlap
Semantic: Start with 300 tokens, bufferSize=1, threshold=95%
Hierarchical: Parent 1500 tokens, child 300 tokens, overlap 60 tokens
Custom: Test extensively with domain experts

2. Retrieval Optimization

Number of Results:

Start with 5-10 results
Increase if answers lack detail
Decrease if too much noise

Search Type:

SEMANTIC: Pure vector similarity (faster, good for conceptual queries)
HYBRID: Vector + keyword (better recall, recommended for production)

Use Hybrid Search when:

Queries contain specific terms or names
Need to match exact keywords
Domain has specialized vocabulary

Use Semantic Search when:

Purely conceptual queries
Prioritizing speed over perfect recall
Well-embedded domain knowledge

Metadata Filters:

Always use when applicable
Dramatically improves precision
Reduces retrieval latency
Examples: document_type, publish_date, category, author

3. Cost Optimization

S3 Vectors:

Use for large-scale knowledge bases (millions of chunks)
Up to 90% cost savings vs. OpenSearch
Ideal for cost-sensitive applications
Trade-off: Slightly higher latency

Semantic Chunking:

Incurs foundation model costs during ingestion
Consider cost vs. accuracy benefit
May not be worth it for simple documents
Best for complex, high-value content

Ingestion Frequency:

Schedule ingestion during off-peak hours
Use incremental updates when possible
Don't re-ingest unchanged documents

Model Selection:

Use smaller embedding models when accuracy permits
Titan Embed Text v2 is cost-effective
Consider Cohere Embed for multilingual

Token Usage:

Monitor generation token usage
Set appropriate maxTokens limits
Use prompt templates to control verbosity

4. Session Management

Always Reuse Sessions:

Pass sessionId for follow-up turns
Bedrock handles context automatically
No manual conversation history needed

Session Lifecycle:

Sessions expire after inactivity (default: 60 minutes)
Create new session for unrelated conversations
Use unique sessionId per user/conversation

Context Limits:

Monitor conversation length
Long sessions may hit context limits
Consider summarization for very long conversations

5. GraphRAG with Neptune

When to Use:

Interconnected knowledge domains
Relationship-aware queries
Need for explainability
Complex knowledge graphs

Benefits:

Automatic graph creation
Improved accuracy through relationships
Comprehensive answers
Explainable results

Considerations:

Higher setup complexity
Neptune Analytics costs
Best for domains with rich relationships

6. Data Source Management

S3 Best Practices:

Organize with clear prefixes
Use inclusion/exclusion filters
Maintain consistent metadata
Version documents when updating

Web Crawler:

Set appropriate rate limits
Use robots.txt for guidance
Monitor for broken links
Schedule regular re-crawls

Confluence/SharePoint:

Filter by spaces/sites
Exclude archived content
Use fine-grained permissions
Schedule incremental syncs

Metadata Enrichment:

Add custom metadata to documents
Include: document_type, publish_date, category, author, version
Enables powerful filtering
Improves retrieval precision

7. Monitoring and Debugging

Enable CloudWatch Logs:

# Monitor retrieval quality
# Track: query latency, retrieval scores, generation quality
# Set alarms for: high latency, low scores, high error rates

Test Retrieval Quality:

# Use retrieve API to debug
response = bedrock_agent_runtime.retrieve(
    knowledgeBaseId='KB123456',
    retrievalQuery={'text': 'test query'}
)

# Analyze retrieval scores
for result in response['retrievalResults']:
    print(f"Score: {result['score']}")
    print(f"Content preview: {result['content']['text'][:200]}")

Common Issues:

Low Retrieval Scores:
- Check chunking strategy
- Verify embedding model
- Ensure documents are properly ingested
- Consider semantic or hierarchical chunking
Irrelevant Results:
- Add metadata filters
- Use hybrid search
- Refine chunking strategy
- Increase numberOfResults
Missing Information:
- Verify data source configuration
- Check ingestion job status
- Ensure documents are not excluded by filters
- Increase numberOfResults
Slow Retrieval:
- Use metadata filters to narrow scope
- Optimize vector database configuration
- Consider S3 Vectors for cost over latency
- Reduce numberOfResults

8. Security Best Practices

IAM Permissions:

Use least privilege for Knowledge Base role
Separate roles for data sources, ingestion, retrieval
Enable VPC endpoints for private connectivity

Data Encryption:

All data encrypted at rest (AWS KMS)
Data encrypted in transit (TLS)
Use customer-managed KMS keys for compliance

Access Control:

Use IAM policies to control who can query
Implement fine-grained access control
Monitor access with CloudTrail

PII Handling:

Use Bedrock Guardrails for PII redaction
Implement data masking for sensitive fields
Consider custom Lambda for advanced PII handling

Complete Production Example

End-to-End RAG Application

import boto3
import json
from typing import List, Dict, Optional

class BedrockKnowledgeBaseRAG:
    """Production RAG application with Amazon Bedrock Knowledge Bases"""

    def __init__(self, region_name: str = 'us-east-1'):
        self.bedrock_agent = boto3.client('bedrock-agent', region_name=region_name)
        self.bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name=region_name)

    def create_knowledge_base(
        self,
        name: str,
        description: str,
        role_arn: str,
        vector_store_config: Dict,
        embedding_model: str = 'amazon.titan-embed-text-v2:0'
    ) -> str:
        """Create knowledge base with vector store"""

        response = self.bedrock_agent.create_knowledge_base(
            name=name,
            description=description,
            roleArn=role_arn,
            knowledgeBaseConfiguration={
                'type': 'VECTOR',
                'vectorKnowledgeBaseConfiguration': {
                    'embeddingModelArn': f'arn:aws:bedrock:us-east-1::foundation-model/{embedding_model}'
                }
            },
            storageConfiguration=vector_store_config
        )

        return response['knowledgeBase']['knowledgeBaseId']

    def add_s3_data_source(
        self,
        knowledge_base_id: str,
        name: str,
        bucket_arn: str,
        inclusion_prefixes: List[str],
        chunking_strategy: str = 'FIXED_SIZE',
        chunking_config: Optional[Dict] = None
    ) -> str:
        """Add S3 data source with chunking configuration"""

        if chunking_config is None:
            chunking_config = {
                'maxTokens': 512,
                'overlapPercentage': 20
            }

        vector_ingestion_config = {
            'chunkingConfiguration': {
                'chunkingStrategy': chunking_strategy
            }
        }

        if chunking_strategy == 'FIXED_SIZE':
            vector_ingestion_config['chunkingConfiguration']['fixedSizeChunkingConfiguration'] = chunking_config
        elif chunking_strategy == 'SEMANTIC':
            vector_ingestion_config['chunkingConfiguration']['semanticChunkingConfiguration'] = chunking_config
        elif chunking_strategy == 'HIERARCHICAL':
            vector_ingestion_config['chunkingConfiguration']['hierarchicalChunkingConfiguration'] = chunking_config

        response = self.bedrock_agent.create_data_source(
            knowledgeBaseId=knowledge_base_id,
            name=name,
            description=f'S3 data source: {name}',
            dataSourceConfiguration={
                'type': 'S3',
                's3Configuration': {
                    'bucketArn': bucket_arn,
                    'inclusionPrefixes': inclusion_prefixes
                }
            },
            vectorIngestionConfiguration=vector_ingestion_config
        )

        return response['dataSource']['dataSourceId']

    def ingest_data(self, knowledge_base_id: str, data_source_id: str) -> str:
        """Start ingestion job and wait for completion"""

        import time

        # Start ingestion
        response = self.bedrock_agent.start_ingestion_job(
            knowledgeBaseId=knowledge_base_id,
            dataSourceId=data_source_id,
            description='Automated ingestion'
        )

        job_id = response['ingestionJob']['ingestionJobId']

        # Wait for completion
        while True:
            status_response = self.bedrock_agent.get_ingestion_job(
                knowledgeBaseId=knowledge_base_id,
                dataSourceId=data_source_id,
                ingestionJobId=job_id
            )

            status = status_response['ingestionJob']['status']

            if status == 'COMPLETE':
                print(f"Ingestion completed successfully")
                if 'statistics' in status_response['ingestionJob']:
                    stats = status_response['ingestionJob']['statistics']
                    print(f"Documents indexed: {stats.get('numberOfDocumentsIndexed', 0)}")
                break
            elif status == 'FAILED':
                print(f"Ingestion failed")
                break

            print(f"Ingestion status: {status}")
            time.sleep(30)

        return job_id

    def query(
        self,
        knowledge_base_id: str,
        query: str,
        model_arn: str = 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0',
        num_results: int = 5,
        search_type: str = 'HYBRID',
        metadata_filter: Optional[Dict] = None,
        session_id: Optional[str] = None
    ) -> Dict:
        """Query knowledge base with retrieve and generate"""

        retrieval_config = {
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': knowledge_base_id,
                'modelArn': model_arn,
                'retrievalConfiguration': {
                    'vectorSearchConfiguration': {
                        'numberOfResults': num_results,
                        'overrideSearchType': search_type
                    }
                },
                'generationConfiguration': {
                    'inferenceConfig': {
                        'textInferenceConfig': {
                            'temperature': 0.7,
                            'maxTokens': 2048
                        }
                    }
                }
            }
        }

        # Add metadata filter if provided
        if metadata_filter:
            retrieval_config['knowledgeBaseConfiguration']['retrievalConfiguration']['vectorSearchConfiguration']['filter'] = metadata_filter

        # Build request
        request = {
            'input': {'text': query},
            'retrieveAndGenerateConfiguration': retrieval_config
        }

        # Add session if provided
        if session_id:
            request['sessionId'] = session_id

        response = self.bedrock_agent_runtime.retrieve_and_generate(**request)

        return {
            'answer': response['output']['text'],
            'citations': response.get('citations', []),
            'session_id': response['sessionId']
        }

    def multi_turn_conversation(
        self,
        knowledge_base_id: str,
        queries: List[str],
        model_arn: str = 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'
    ) -> List[Dict]:
        """Execute multi-turn conversation with context"""

        session_id = None
        conversation = []

        for query in queries:
            result = self.query(
                knowledge_base_id=knowledge_base_id,
                query=query,
                model_arn=model_arn,
                session_id=session_id
            )

            session_id = result['session_id']

            conversation.append({
                'query': query,
                'answer': result['answer'],
                'citations': result['citations']
            })

        return conversation


# Example Usage
if __name__ == '__main__':
    rag = BedrockKnowledgeBaseRAG(region_name='us-east-1')

    # Create knowledge base
    kb_id = rag.create_knowledge_base(
        name='production-docs-kb',
        description='Production documentation knowledge base',
        role_arn='arn:aws:iam::123456789012:role/BedrockKBRole',
        vector_store_config={
            'type': 'OPENSEARCH_SERVERLESS',
            'opensearchServerlessConfiguration': {
                'collectionArn': 'arn:aws:aoss:us-east-1:123456789012:collection/kb-collection',
                'vectorIndexName': 'bedrock-kb-index',
                'fieldMapping': {
                    'vectorField': 'bedrock-knowledge-base-default-vector',
                    'textField': 'AMAZON_BEDROCK_TEXT_CHUNK',
                    'metadataField': 'AMAZON_BEDROCK_METADATA'
                }
            }
        }
    )

    # Add data source
    ds_id = rag.add_s3_data_source(
        knowledge_base_id=kb_id,
        name='technical-docs',
        bucket_arn='arn:aws:s3:::my-docs-bucket',
        inclusion_prefixes=['docs/'],
        chunking_strategy='HIERARCHICAL',
        chunking_config={
            'levelConfigurations': [
                {'maxTokens': 1500},
                {'maxTokens': 300}
            ],
            'overlapTokens': 60
        }
    )

    # Ingest data
    rag.ingest_data(kb_id, ds_id)

    # Single query
    result = rag.query(
        knowledge_base_id=kb_id,
        query='What are the best practices for RAG applications?',
        metadata_filter={
            'equals': {
                'key': 'document_type',
                'value': 'best_practices'
            }
        }
    )

    print(f"Answer: {result['answer']}")
    print(f"\nSources:")
    for citation in result['citations']:
        for ref in citation['retrievedReferences']:
            print(f"  - {ref['location']}")

    # Multi-turn conversation
    conversation = rag.multi_turn_conversation(
        knowledge_base_id=kb_id,
        queries=[
            'What is hierarchical chunking?',
            'When should I use it?',
            'What are the configuration parameters?'
        ]
    )

    for turn in conversation:
        print(f"\nQ: {turn['query']}")
        print(f"A: {turn['answer']}")

Related Skills

Amazon Bedrock Core Skills

bedrock-guardrails: Content safety, PII redaction, hallucination detection
bedrock-agents: Agentic workflows with tool use and knowledge bases
bedrock-flows: Visual workflow builder for generative AI
bedrock-model-customization: Fine-tuning, reinforcement fine-tuning, distillation
bedrock-prompt-management: Prompt versioning and deployment

AWS Infrastructure Skills

opensearch-serverless: Vector database configuration and management
neptune-analytics: GraphRAG configuration and queries
s3-management: S3 bucket configuration for data sources and vectors
iam-bedrock: IAM roles and policies for Knowledge Bases

Observability Skills

cloudwatch-bedrock-monitoring: Monitor Knowledge Bases metrics and logs
bedrock-cost-optimization: Track and optimize Knowledge Bases costs

Additional Resources

Official Documentation

Amazon Bedrock Knowledge Bases
Knowledge Bases User Guide
Chunking Strategies
Boto3 Knowledge Bases API

Best Practices

Building Cost-Effective RAG with S3 Vectors
Advanced Parsing and Chunking

Research Document

/mnt/c/data/github/skrillz/AMAZON-BEDROCK-COMPREHENSIVE-RESEARCH-2025.md - Section 2 (Complete Knowledge Bases research)

Amazon Bedrock Knowledge Bases

Overview

What It Does

Amazon Bedrock Knowledge Bases provides:

Data Ingestion: Automatically process documents from S3, web, Confluence, SharePoint, Salesforce
Embedding Generation: Convert text to vectors using foundation models
Vector Storage: Store embeddings in multiple vector database options
Retrieval: Semantic and hybrid search with metadata filtering
Generation: RAG workflows with source attribution
Session Management: Multi-turn conversations with context
Chunking Strategies: Fixed, semantic, hierarchical, and custom chunking

When to Use This Skill

Use this skill when you need to:

Build RAG applications for document Q&A
Implement semantic search over enterprise knowledge
Create chatbots with knowledge bases
Integrate retrieval with Bedrock Agents
Configure optimal chunking strategies
Query documents with source attribution
Manage multi-turn conversations with context
Optimize RAG performance and cost

Key Capabilities

Multiple Vector Store Options: OpenSearch, S3 Vectors, Neptune, Pinecone, MongoDB, Redis
Flexible Data Sources: S3, web crawlers, Confluence, SharePoint, Salesforce
Advanced Chunking: Fixed-size, semantic, hierarchical, custom Lambda
Hybrid Search: Combine semantic (vector) and keyword search
Session Management: Built-in conversation context tracking
GraphRAG: Relationship-aware retrieval with Neptune Analytics
Cost Optimization: S3 Vectors for up to 90% storage savings

Quick Start

Basic RAG Workflow

import boto3
import json

# Initialize clients
bedrock_agent = boto3.client('bedrock-agent', region_name='us-east-1')
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

# 1. Create Knowledge Base
kb_response = bedrock_agent.create_knowledge_base(
    name='enterprise-docs-kb',
    description='Company documentation knowledge base',
    roleArn='arn:aws:iam::123456789012:role/BedrockKBRole',
    knowledgeBaseConfiguration={
        'type': 'VECTOR',
        'vectorKnowledgeBaseConfiguration': {
            'embeddingModelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0'
        }
    },
    storageConfiguration={
        'type': 'OPENSEARCH_SERVERLESS',
        'opensearchServerlessConfiguration': {
            'collectionArn': 'arn:aws:aoss:us-east-1:123456789012:collection/kb-collection',
            'vectorIndexName': 'bedrock-knowledge-base-index',
            'fieldMapping': {
                'vectorField': 'bedrock-knowledge-base-default-vector',
                'textField': 'AMAZON_BEDROCK_TEXT_CHUNK',
                'metadataField': 'AMAZON_BEDROCK_METADATA'
            }
        }
    }
)

knowledge_base_id = kb_response['knowledgeBase']['knowledgeBaseId']
print(f"Knowledge Base ID: {knowledge_base_id}")

# 2. Add S3 Data Source
ds_response = bedrock_agent.create_data_source(
    knowledgeBaseId=knowledge_base_id,
    name='s3-documents',
    description='Company documents from S3',
    dataSourceConfiguration={
        'type': 'S3',
        's3Configuration': {
            'bucketArn': 'arn:aws:s3:::my-docs-bucket',
            'inclusionPrefixes': ['documents/']
        }
    },
    vectorIngestionConfiguration={
        'chunkingConfiguration': {
            'chunkingStrategy': 'FIXED_SIZE',
            'fixedSizeChunkingConfiguration': {
                'maxTokens': 512,
                'overlapPercentage': 20
            }
        }
    }
)

data_source_id = ds_response['dataSource']['dataSourceId']

# 3. Start Ingestion
ingestion_response = bedrock_agent.start_ingestion_job(
    knowledgeBaseId=knowledge_base_id,
    dataSourceId=data_source_id,
    description='Initial document ingestion'
)

print(f"Ingestion Job ID: {ingestion_response['ingestionJob']['ingestionJobId']}")

# 4. Query with Retrieve and Generate
response = bedrock_agent_runtime.retrieve_and_generate(
    input={
        'text': 'What is our vacation policy?'
    },
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': knowledge_base_id,
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0',
            'retrievalConfiguration': {
                'vectorSearchConfiguration': {
                    'numberOfResults': 5,
                    'overrideSearchType': 'HYBRID'
                }
            }
        }
    }
)

print(f"Answer: {response['output']['text']}")
print(f"\nSources:")
for citation in response['citations']:
    for reference in citation['retrievedReferences']:
        print(f"  - {reference['location']['s3Location']['uri']}")

Vector Store Options

1. Amazon OpenSearch Serverless

Best for: Production RAG applications with auto-scaling requirements

Benefits:

Fully managed, serverless operation
Auto-scaling compute and storage
High availability with multi-AZ deployment
Fast query performance

Configuration:

storageConfiguration={
    'type': 'OPENSEARCH_SERVERLESS',
    'opensearchServerlessConfiguration': {
        'collectionArn': 'arn:aws:aoss:us-east-1:123456789012:collection/kb-collection',
        'vectorIndexName': 'bedrock-knowledge-base-index',
        'fieldMapping': {
            'vectorField': 'bedrock-knowledge-base-default-vector',
            'textField': 'AMAZON_BEDROCK_TEXT_CHUNK',
            'metadataField': 'AMAZON_BEDROCK_METADATA'
        }
    }
}

2. Amazon S3 Vectors (Preview)

Best for: Cost-optimized, large-scale RAG applications

Benefits:

Up to 90% cost reduction for vector storage
Built-in vector support in S3
Subsecond query performance
Massive scale and durability

Ideal Use Cases:

Large document collections (millions of chunks)
Cost-sensitive applications
Archival knowledge bases
Low-to-medium QPS workloads

Configuration:

storageConfiguration={
    'type': 'S3_VECTORS',
    's3VectorsConfiguration': {
        'bucketArn': 'arn:aws:s3:::my-vector-bucket',
        'prefix': 'vectors/'
    }
}

Limitations:

Still in preview (no CloudFormation/CDK support yet)
Not suitable for high QPS, millisecond-latency requirements
Best for cost optimization over ultra-low latency

3. Amazon Neptune Analytics (GraphRAG)

Best for: Interconnected knowledge domains requiring relationship-aware retrieval

Benefits:

Automatic graph creation linking related content
Improved retrieval accuracy through relationships
Comprehensive responses leveraging knowledge graph
Explainable results with relationship context

Use Cases:

Legal document analysis with case precedents
Scientific research with paper citations
Product catalogs with dependencies
Organizational knowledge with team relationships

Configuration:

storageConfiguration={
    'type': 'NEPTUNE_ANALYTICS',
    'neptuneAnalyticsConfiguration': {
        'graphArn': 'arn:aws:neptune-graph:us-east-1:123456789012:graph/g-12345678',
        'vectorSearchConfiguration': {
            'vectorField': 'embedding'
        }
    }
}

4. Amazon OpenSearch Service Managed Cluster

Best for: Existing OpenSearch infrastructure, advanced customization

Configuration:

storageConfiguration={
    'type': 'OPENSEARCH_SERVICE',
    'opensearchServiceConfiguration': {
        'clusterArn': 'arn:aws:es:us-east-1:123456789012:domain/my-domain',
        'vectorIndexName': 'bedrock-kb-index',
        'fieldMapping': {
            'vectorField': 'embedding',
            'textField': 'text',
            'metadataField': 'metadata'
        }
    }
}

5. Third-Party Vector Databases

Pinecone:

storageConfiguration={
    'type': 'PINECONE',
    'pineconeConfiguration': {
        'connectionString': 'https://my-index-abc123.svc.us-west1-gcp.pinecone.io',
        'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:pinecone-api-key',
        'namespace': 'bedrock-kb',
        'fieldMapping': {
            'textField': 'text',
            'metadataField': 'metadata'
        }
    }
}

MongoDB Atlas:

storageConfiguration={
    'type': 'MONGODB_ATLAS',
    'mongoDbAtlasConfiguration': {
        'endpoint': 'https://cluster0.mongodb.net',
        'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:mongodb-creds',
        'databaseName': 'bedrock_kb',
        'collectionName': 'vectors',
        'vectorIndexName': 'vector_index',
        'fieldMapping': {
            'vectorField': 'embedding',
            'textField': 'text',
            'metadataField': 'metadata'
        }
    }
}

Redis Enterprise Cloud:

storageConfiguration={
    'type': 'REDIS_ENTERPRISE_CLOUD',
    'redisEnterpriseCloudConfiguration': {
        'endpoint': 'redis-12345.c1.us-east-1-2.ec2.cloud.redislabs.com:12345',
        'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:redis-creds',
        'vectorIndexName': 'bedrock-kb-index',
        'fieldMapping': {
            'vectorField': 'embedding',
            'textField': 'text',
            'metadataField': 'metadata'
        }
    }
}

Data Source Configuration

1. Amazon S3

Supported File Types: PDF, TXT, MD, HTML, DOC, DOCX, CSV, XLS, XLSX

bedrock_agent.create_data_source(
    knowledgeBaseId=knowledge_base_id,
    name='s3-technical-docs',
    description='Technical documentation from S3',
    dataSourceConfiguration={
        'type': 'S3',
        's3Configuration': {
            'bucketArn': 'arn:aws:s3:::my-docs-bucket',
            'inclusionPrefixes': ['docs/technical/', 'docs/manuals/'],
            'exclusionPrefixes': ['docs/archive/']
        }
    }
)

2. Web Crawler

Automatic website scraping and indexing:

bedrock_agent.create_data_source(
    knowledgeBaseId=knowledge_base_id,
    name='company-website',
    description='Public company website content',
    dataSourceConfiguration={
        'type': 'WEB',
        'webConfiguration': {
            'sourceConfiguration': {
                'urlConfiguration': {
                    'seedUrls': [
                        {'url': 'https://www.example.com/docs'},
                        {'url': 'https://www.example.com/blog'}
                    ]
                }
            },
            'crawlerConfiguration': {
                'crawlerLimits': {
                    'rateLimit': 300  # Pages per minute
                }
            }
        }
    }
)

3. Confluence

bedrock_agent.create_data_source(
    knowledgeBaseId=knowledge_base_id,
    name='confluence-wiki',
    description='Company Confluence knowledge base',
    dataSourceConfiguration={
        'type': 'CONFLUENCE',
        'confluenceConfiguration': {
            'sourceConfiguration': {
                'hostUrl': 'https://company.atlassian.net/wiki',
                'hostType': 'SAAS',
                'authType': 'BASIC',
                'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:confluence-creds'
            },
            'crawlerConfiguration': {
                'filterConfiguration': {
                    'type': 'PATTERN',
                    'patternObjectFilter': {
                        'filters': [
                            {
                                'objectType': 'Space',
                                'inclusionFilters': ['Engineering', 'Product'],
                                'exclusionFilters': ['Archive']
                            }
                        ]
                    }
                }
            }
        }
    }
)

4. SharePoint

bedrock_agent.create_data_source(
    knowledgeBaseId=knowledge_base_id,
    name='sharepoint-docs',
    description='SharePoint document library',
    dataSourceConfiguration={
        'type': 'SHAREPOINT',
        'sharePointConfiguration': {
            'sourceConfiguration': {
                'siteUrls': [
                    'https://company.sharepoint.com/sites/Engineering',
                    'https://company.sharepoint.com/sites/Product'
                ],
                'tenantId': 'tenant-id',
                'domain': 'company',
                'authType': 'OAUTH2_CLIENT_CREDENTIALS',
                'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:sharepoint-creds'
            }
        }
    }
)

5. Salesforce

bedrock_agent.create_data_source(
    knowledgeBaseId=knowledge_base_id,
    name='salesforce-knowledge',
    description='Salesforce knowledge articles',
    dataSourceConfiguration={
        'type': 'SALESFORCE',
        'salesforceConfiguration': {
            'sourceConfiguration': {
                'hostUrl': 'https://company.my.salesforce.com',
                'authType': 'OAUTH2_CLIENT_CREDENTIALS',
                'credentialsSecretArn': 'arn:aws:secretsmanager:us-east-1:123456789012:secret:salesforce-creds'
            },
            'crawlerConfiguration': {
                'filterConfiguration': {
                    'type': 'PATTERN',
                    'patternObjectFilter': {
                        'filters': [
                            {
                                'objectType': 'Knowledge',
                                'inclusionFilters': ['Product_Documentation', 'Support_Articles']
                            }
                        ]
                    }
                }
            }
        }
    }
)

Chunking Strategies

1. Fixed-Size Chunking

Best for: Simple documents with uniform structure

How it works: Splits text into chunks of fixed token size with overlap

Parameters:

maxTokens: 200-8192 tokens (typically 512-1024)
overlapPercentage: 10-50% (typically 20%)

Configuration:

vectorIngestionConfiguration={
    'chunkingConfiguration': {
        'chunkingStrategy': 'FIXED_SIZE',
        'fixedSizeChunkingConfiguration': {
            'maxTokens': 512,
            'overlapPercentage': 20
        }
    }
}

Use Cases:

Blog posts and articles
Technical documentation with consistent formatting
FAQs and Q&A content
Simple text files

Pros:

Fast and predictable
No additional costs
Easy to tune

Cons:

May split semantic units awkwardly
Doesn't respect document structure
Can break context mid-sentence

2. Semantic Chunking

Best for: Documents without clear boundaries (legal, technical, academic)

How it works: Uses sentence similarity to group related content

Parameters:

maxTokens: 20-8192 tokens (typically 300-500)
bufferSize: Number of neighboring sentences (default: 1)
breakpointPercentileThreshold: Similarity threshold (recommended: 95%)

Configuration:

vectorIngestionConfiguration={
    'chunkingConfiguration': {
        'chunkingStrategy': 'SEMANTIC',
        'semanticChunkingConfiguration': {
            'maxTokens': 300,
            'bufferSize': 1,
            'breakpointPercentileThreshold': 95
        }
    }
}

Use Cases:

Legal documents and contracts
Academic papers
Technical specifications
Medical records
Research reports

Pros:

Preserves semantic meaning
Better context preservation
Improved retrieval accuracy

Cons:

Additional cost (foundation model usage)
Slower ingestion
Less predictable chunk sizes

Cost Consideration: Semantic chunking uses foundation models for similarity analysis, incurring additional costs beyond storage and retrieval.

3. Hierarchical Chunking

Best for: Complex documents with nested structure

How it works: Creates parent and child chunks; retrieves child, returns parent for context

Parameters:

levelConfigurations: Array of chunk sizes (parent → child)
overlapTokens: Overlap between chunks

Configuration:

vectorIngestionConfiguration={
    'chunkingConfiguration': {
        'chunkingStrategy': 'HIERARCHICAL',
        'hierarchicalChunkingConfiguration': {
            'levelConfigurations': [
                {
                    'maxTokens': 1500  # Parent chunk (comprehensive context)
                },
                {
                    'maxTokens': 300   # Child chunk (focused retrieval)
                }
            ],
            'overlapTokens': 60
        }
    }
}

Use Cases:

Technical manuals with sections and subsections
Academic papers with abstract, sections, and subsections
Legal documents with articles and clauses
Product documentation with categories and details

How Retrieval Works:

Query matches against child chunks (fast, focused)
Returns parent chunks (comprehensive context)
Best of both: precision retrieval + complete context

Pros:

Optimal balance of precision and context
Excellent for nested documents
Better accuracy for complex queries

Cons:

More complex configuration
Larger storage footprint
Requires understanding of document structure

4. Custom Chunking (Lambda)

Best for: Specialized domain logic, custom parsing requirements

How it works: Invoke Lambda function for custom chunking logic

Configuration:

vectorIngestionConfiguration={
    'chunkingConfiguration': {
        'chunkingStrategy': 'NONE'  # Custom via Lambda
    },
    'customTransformationConfiguration': {
        'intermediateStorage': {
            's3Location': {
                'uri': 's3://my-kb-bucket/intermediate/'
            }
        },
        'transformations': [
            {
                'stepToApply': 'POST_CHUNKING',
                'transformationFunction': {
                    'transformationLambdaConfiguration': {
                        'lambdaArn': 'arn:aws:lambda:us-east-1:123456789012:function:custom-chunker'
                    }
                }
            }
        ]
    }
}

Example Lambda Handler:

# Lambda function for custom chunking
import json

def lambda_handler(event, context):
    """
    Custom chunking logic for specialized documents

    Input: event contains document content and metadata
    Output: array of chunks with text and metadata
    """

    # Extract document content
    document = event['document']
    content = document['content']
    metadata = document.get('metadata', {})

    # Custom chunking logic (example: split by custom delimiter)
    chunks = []
    sections = content.split('---SECTION---')

    for idx, section in enumerate(sections):
        if section.strip():
            chunks.append({
                'text': section.strip(),
                'metadata': {
                    **metadata,
                    'chunk_id': f'section_{idx}',
                    'chunk_type': 'custom_section'
                }
            })

    return {
        'chunks': chunks
    }

Use Cases:

Medical records with structured sections (SOAP notes)
Financial documents with tables and calculations
Code documentation with code blocks and explanations
Domain-specific formats (HL7, FHIR, etc.)

Pros:

Complete control over chunking logic
Can handle any document format
Integrate domain expertise

Cons:

Requires Lambda development and maintenance
Additional operational complexity
Harder to debug and iterate

Chunking Strategy Selection Guide

Retrieval Operations

1. Retrieve API (Retrieval Only)

Returns raw retrieved chunks without generation.

Use Cases:

Custom generation logic
Debugging retrieval quality
Building custom RAG pipelines
Integrating with non-Bedrock models

bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

response = bedrock_agent_runtime.retrieve(
    knowledgeBaseId='KB123456',
    retrievalQuery={
        'text': 'What are the benefits of hierarchical chunking?'
    },
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 5,
            'overrideSearchType': 'HYBRID',  # SEMANTIC, HYBRID
            'filter': {
                'andAll': [
                    {
                        'equals': {
                            'key': 'document_type',
                            'value': 'technical_guide'
                        }
                    },
                    {
                        'greaterThan': {
                            'key': 'publish_year',
                            'value': 2024
                        }
                    }
                ]
            }
        }
    }
)

# Process retrieved chunks
for result in response['retrievalResults']:
    print(f"Score: {result['score']}")
    print(f"Content: {result['content']['text']}")
    print(f"Location: {result['location']}")
    print(f"Metadata: {result.get('metadata', {})}")
    print("---")

2. Retrieve and Generate API (RAG)

Returns generated response with source attribution.

Use Cases:

Complete RAG workflows
Question answering
Document summarization
Chatbots with knowledge bases

response = bedrock_agent_runtime.retrieve_and_generate(
    input={
        'text': 'Explain semantic chunking benefits and when to use it'
    },
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': 'KB123456',
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0',
            'retrievalConfiguration': {
                'vectorSearchConfiguration': {
                    'numberOfResults': 5,
                    'overrideSearchType': 'HYBRID'
                }
            },
            'generationConfiguration': {
                'inferenceConfig': {
                    'textInferenceConfig': {
                        'temperature': 0.7,
                        'maxTokens': 2048,
                        'topP': 0.9
                    }
                },
                'promptTemplate': {
                    'textPromptTemplate': '''You are a helpful assistant. Answer the user's question based on the provided context.

Context: $search_results$

Question: $query$

Answer:'''
                }
            }
        }
    }
)

print(f"Generated Response: {response['output']['text']}")
print(f"\nSources:")
for citation in response['citations']:
    for reference in citation['retrievedReferences']:
        print(f"  - {reference['location']}")
        print(f"    Relevance Score: {reference.get('score', 'N/A')}")

3. Multi-Turn Conversations with Session Management

Bedrock automatically manages conversation context across turns.

# First turn - creates session automatically
response1 = bedrock_agent_runtime.retrieve_and_generate(
    input={
        'text': 'What is Amazon Bedrock Knowledge Bases?'
    },
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': 'KB123456',
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'
        }
    }
)

session_id = response1['sessionId']
print(f"Session ID: {session_id}")
print(f"Response: {response1['output']['text']}\n")

# Follow-up turn - reuse session for context
response2 = bedrock_agent_runtime.retrieve_and_generate(
    input={
        'text': 'What chunking strategies does it support?'
    },
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': 'KB123456',
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'
        }
    },
    sessionId=session_id  # Continue conversation with context
)

print(f"Follow-up Response: {response2['output']['text']}")

# Third turn
response3 = bedrock_agent_runtime.retrieve_and_generate(
    input={
        'text': 'Which strategy would you recommend for legal documents?'
    },
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': 'KB123456',
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'
        }
    },
    sessionId=session_id
)

print(f"Third Response: {response3['output']['text']}")

4. Advanced Metadata Filtering

Filter retrieval by metadata attributes for precision.

response = bedrock_agent_runtime.retrieve(
    knowledgeBaseId='KB123456',
    retrievalQuery={
        'text': 'Security best practices for production deployments'
    },
    retrievalConfiguration={
        'vectorSearchConfiguration': {
            'numberOfResults': 10,
            'overrideSearchType': 'HYBRID',
            'filter': {
                'andAll': [
                    {
                        'equals': {
                            'key': 'document_type',
                            'value': 'security_guide'
                        }
                    },
                    {
                        'greaterThanOrEquals': {
                            'key': 'publish_year',
                            'value': 2024
                        }
                    },
                    {
                        'in': {
                            'key': 'category',
                            'value': ['production', 'security', 'compliance']
                        }
                    }
                ]
            }
        }
    }
)

Supported Filter Operators:

equals: Exact match
notEquals: Not equal
greaterThan, greaterThanOrEquals: Numeric comparison
lessThan, lessThanOrEquals: Numeric comparison
in: Match any value in array
notIn: Not match any value in array
startsWith: String prefix match
andAll: Combine filters with AND
orAll: Combine filters with OR

Ingestion Management

1. Start Ingestion Job

ingestion_response = bedrock_agent.start_ingestion_job(
    knowledgeBaseId=knowledge_base_id,
    dataSourceId=data_source_id,
    description='Monthly document sync',
    clientToken='unique-idempotency-token-123'
)

job_id = ingestion_response['ingestionJob']['ingestionJobId']
print(f"Ingestion Job ID: {job_id}")

2. Monitor Ingestion Job

# Get job status
job_status = bedrock_agent.get_ingestion_job(
    knowledgeBaseId=knowledge_base_id,
    dataSourceId=data_source_id,
    ingestionJobId=job_id
)

print(f"Status: {job_status['ingestionJob']['status']}")
print(f"Started: {job_status['ingestionJob']['startedAt']}")
print(f"Updated: {job_status['ingestionJob']['updatedAt']}")

if 'statistics' in job_status['ingestionJob']:
    stats = job_status['ingestionJob']['statistics']
    print(f"Documents Scanned: {stats['numberOfDocumentsScanned']}")
    print(f"Documents Indexed: {stats['numberOfDocumentsIndexed']}")
    print(f"Documents Failed: {stats['numberOfDocumentsFailed']}")

# Wait for completion
import time

while True:
    status = bedrock_agent.get_ingestion_job(
        knowledgeBaseId=knowledge_base_id,
        dataSourceId=data_source_id,
        ingestionJobId=job_id
    )

    current_status = status['ingestionJob']['status']

    if current_status in ['COMPLETE', 'FAILED']:
        print(f"Ingestion job {current_status}")
        break

    print(f"Status: {current_status}, waiting...")
    time.sleep(30)

3. List Ingestion Jobs

list_response = bedrock_agent.list_ingestion_jobs(
    knowledgeBaseId=knowledge_base_id,
    dataSourceId=data_source_id,
    maxResults=50
)

for job in list_response['ingestionJobSummaries']:
    print(f"Job ID: {job['ingestionJobId']}")
    print(f"Status: {job['status']}")
    print(f"Started: {job['startedAt']}")
    print(f"Updated: {job['updatedAt']}")
    print("---")

Integration with Bedrock Agents

1. Agent with Knowledge Base Action

bedrock_agent = boto3.client('bedrock-agent', region_name='us-east-1')

# Create agent with knowledge base
agent_response = bedrock_agent.create_agent(
    agentName='customer-support-agent',
    description='Customer support agent with knowledge base access',
    instruction='''You are a customer support agent. When answering questions:
1. Search the knowledge base for relevant information
2. Provide accurate answers based on retrieved context
3. Cite your sources
4. Admit when you don't know something''',
    foundationModel='anthropic.claude-3-sonnet-20240229-v1:0',
    agentResourceRoleArn='arn:aws:iam::123456789012:role/BedrockAgentRole'
)

agent_id = agent_response['agent']['agentId']

# Associate knowledge base with agent
kb_association = bedrock_agent.associate_agent_knowledge_base(
    agentId=agent_id,
    agentVersion='DRAFT',
    knowledgeBaseId='KB123456',
    description='Company documentation knowledge base',
    knowledgeBaseState='ENABLED'
)

# Prepare and create alias
bedrock_agent.prepare_agent(agentId=agent_id)

alias_response = bedrock_agent.create_agent_alias(
    agentId=agent_id,
    agentAliasName='production',
    description='Production alias'
)

agent_alias_id = alias_response['agentAlias']['agentAliasId']

# Invoke agent (automatically queries knowledge base)
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

response = bedrock_agent_runtime.invoke_agent(
    agentId=agent_id,
    agentAliasId=agent_alias_id,
    sessionId='session-123',
    inputText='What is our return policy for defective products?'
)

for event in response['completion']:
    if 'chunk' in event:
        chunk = event['chunk']
        print(chunk['bytes'].decode())

2. Agent with Multiple Knowledge Bases

# Associate multiple knowledge bases
bedrock_agent.associate_agent_knowledge_base(
    agentId=agent_id,
    agentVersion='DRAFT',
    knowledgeBaseId='KB-PRODUCT-DOCS',
    description='Product documentation'
)

bedrock_agent.associate_agent_knowledge_base(
    agentId=agent_id,
    agentVersion='DRAFT',
    knowledgeBaseId='KB-SUPPORT-ARTICLES',
    description='Support knowledge articles'
)

bedrock_agent.associate_agent_knowledge_base(
    agentId=agent_id,
    agentVersion='DRAFT',
    knowledgeBaseId='KB-COMPANY-POLICIES',
    description='Company policies and procedures'
)

# Agent automatically searches all knowledge bases and combines results

Best Practices

1. Chunking Strategy Selection

Decision Framework:

Simple, uniform documents → Fixed-size chunking
- Blog posts, articles, simple FAQs
- Fast, predictable, cost-effective
Documents without clear boundaries → Semantic chunking
- Legal documents, contracts, academic papers
- Preserves semantic meaning, better accuracy
- Consider additional cost
Nested, hierarchical documents → Hierarchical chunking
- Technical manuals, product docs, research papers
- Best balance of precision and context
- Optimal for complex structures
Specialized formats → Custom Lambda chunking
- Medical records (HL7, FHIR), code docs, custom formats
- Complete control, domain expertise
- Higher operational complexity

Tuning Guidelines:

Fixed-size: Start with 512 tokens, 20% overlap
Semantic: Start with 300 tokens, bufferSize=1, threshold=95%
Hierarchical: Parent 1500 tokens, child 300 tokens, overlap 60 tokens
Custom: Test extensively with domain experts

2. Retrieval Optimization

Number of Results:

Start with 5-10 results
Increase if answers lack detail
Decrease if too much noise

Search Type:

SEMANTIC: Pure vector similarity (faster, good for conceptual queries)
HYBRID: Vector + keyword (better recall, recommended for production)

Use Hybrid Search when:

Queries contain specific terms or names
Need to match exact keywords
Domain has specialized vocabulary

Use Semantic Search when:

Purely conceptual queries
Prioritizing speed over perfect recall
Well-embedded domain knowledge

Metadata Filters:

Always use when applicable
Dramatically improves precision
Reduces retrieval latency
Examples: document_type, publish_date, category, author

3. Cost Optimization

S3 Vectors:

Use for large-scale knowledge bases (millions of chunks)
Up to 90% cost savings vs. OpenSearch
Ideal for cost-sensitive applications
Trade-off: Slightly higher latency

Semantic Chunking:

Incurs foundation model costs during ingestion
Consider cost vs. accuracy benefit
May not be worth it for simple documents
Best for complex, high-value content

Ingestion Frequency:

Schedule ingestion during off-peak hours
Use incremental updates when possible
Don't re-ingest unchanged documents

Model Selection:

Use smaller embedding models when accuracy permits
Titan Embed Text v2 is cost-effective
Consider Cohere Embed for multilingual

Token Usage:

Monitor generation token usage
Set appropriate maxTokens limits
Use prompt templates to control verbosity

4. Session Management

Always Reuse Sessions:

Pass sessionId for follow-up turns
Bedrock handles context automatically
No manual conversation history needed

Session Lifecycle:

Sessions expire after inactivity (default: 60 minutes)
Create new session for unrelated conversations
Use unique sessionId per user/conversation

Context Limits:

Monitor conversation length
Long sessions may hit context limits
Consider summarization for very long conversations

5. GraphRAG with Neptune

When to Use:

Interconnected knowledge domains
Relationship-aware queries
Need for explainability
Complex knowledge graphs

Benefits:

Automatic graph creation
Improved accuracy through relationships
Comprehensive answers
Explainable results

Considerations:

Higher setup complexity
Neptune Analytics costs
Best for domains with rich relationships

6. Data Source Management

S3 Best Practices:

Organize with clear prefixes
Use inclusion/exclusion filters
Maintain consistent metadata
Version documents when updating

Web Crawler:

Set appropriate rate limits
Use robots.txt for guidance
Monitor for broken links
Schedule regular re-crawls

Confluence/SharePoint:

Filter by spaces/sites
Exclude archived content
Use fine-grained permissions
Schedule incremental syncs

Metadata Enrichment:

Add custom metadata to documents
Include: document_type, publish_date, category, author, version
Enables powerful filtering
Improves retrieval precision

7. Monitoring and Debugging

Enable CloudWatch Logs:

# Monitor retrieval quality
# Track: query latency, retrieval scores, generation quality
# Set alarms for: high latency, low scores, high error rates

Test Retrieval Quality:

# Use retrieve API to debug
response = bedrock_agent_runtime.retrieve(
    knowledgeBaseId='KB123456',
    retrievalQuery={'text': 'test query'}
)

# Analyze retrieval scores
for result in response['retrievalResults']:
    print(f"Score: {result['score']}")
    print(f"Content preview: {result['content']['text'][:200]}")

Common Issues:

Low Retrieval Scores:
- Check chunking strategy
- Verify embedding model
- Ensure documents are properly ingested
- Consider semantic or hierarchical chunking
Irrelevant Results:
- Add metadata filters
- Use hybrid search
- Refine chunking strategy
- Increase numberOfResults
Missing Information:
- Verify data source configuration
- Check ingestion job status
- Ensure documents are not excluded by filters
- Increase numberOfResults
Slow Retrieval:
- Use metadata filters to narrow scope
- Optimize vector database configuration
- Consider S3 Vectors for cost over latency
- Reduce numberOfResults

8. Security Best Practices

IAM Permissions:

Use least privilege for Knowledge Base role
Separate roles for data sources, ingestion, retrieval
Enable VPC endpoints for private connectivity

Data Encryption:

All data encrypted at rest (AWS KMS)
Data encrypted in transit (TLS)
Use customer-managed KMS keys for compliance

Access Control:

Use IAM policies to control who can query
Implement fine-grained access control
Monitor access with CloudTrail

PII Handling:

Use Bedrock Guardrails for PII redaction
Implement data masking for sensitive fields
Consider custom Lambda for advanced PII handling

Complete Production Example

End-to-End RAG Application

import boto3
import json
from typing import List, Dict, Optional

class BedrockKnowledgeBaseRAG:
    """Production RAG application with Amazon Bedrock Knowledge Bases"""

    def __init__(self, region_name: str = 'us-east-1'):
        self.bedrock_agent = boto3.client('bedrock-agent', region_name=region_name)
        self.bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name=region_name)

    def create_knowledge_base(
        self,
        name: str,
        description: str,
        role_arn: str,
        vector_store_config: Dict,
        embedding_model: str = 'amazon.titan-embed-text-v2:0'
    ) -> str:
        """Create knowledge base with vector store"""

        response = self.bedrock_agent.create_knowledge_base(
            name=name,
            description=description,
            roleArn=role_arn,
            knowledgeBaseConfiguration={
                'type': 'VECTOR',
                'vectorKnowledgeBaseConfiguration': {
                    'embeddingModelArn': f'arn:aws:bedrock:us-east-1::foundation-model/{embedding_model}'
                }
            },
            storageConfiguration=vector_store_config
        )

        return response['knowledgeBase']['knowledgeBaseId']

    def add_s3_data_source(
        self,
        knowledge_base_id: str,
        name: str,
        bucket_arn: str,
        inclusion_prefixes: List[str],
        chunking_strategy: str = 'FIXED_SIZE',
        chunking_config: Optional[Dict] = None
    ) -> str:
        """Add S3 data source with chunking configuration"""

        if chunking_config is None:
            chunking_config = {
                'maxTokens': 512,
                'overlapPercentage': 20
            }

        vector_ingestion_config = {
            'chunkingConfiguration': {
                'chunkingStrategy': chunking_strategy
            }
        }

        if chunking_strategy == 'FIXED_SIZE':
            vector_ingestion_config['chunkingConfiguration']['fixedSizeChunkingConfiguration'] = chunking_config
        elif chunking_strategy == 'SEMANTIC':
            vector_ingestion_config['chunkingConfiguration']['semanticChunkingConfiguration'] = chunking_config
        elif chunking_strategy == 'HIERARCHICAL':
            vector_ingestion_config['chunkingConfiguration']['hierarchicalChunkingConfiguration'] = chunking_config

        response = self.bedrock_agent.create_data_source(
            knowledgeBaseId=knowledge_base_id,
            name=name,
            description=f'S3 data source: {name}',
            dataSourceConfiguration={
                'type': 'S3',
                's3Configuration': {
                    'bucketArn': bucket_arn,
                    'inclusionPrefixes': inclusion_prefixes
                }
            },
            vectorIngestionConfiguration=vector_ingestion_config
        )

        return response['dataSource']['dataSourceId']

    def ingest_data(self, knowledge_base_id: str, data_source_id: str) -> str:
        """Start ingestion job and wait for completion"""

        import time

        # Start ingestion
        response = self.bedrock_agent.start_ingestion_job(
            knowledgeBaseId=knowledge_base_id,
            dataSourceId=data_source_id,
            description='Automated ingestion'
        )

        job_id = response['ingestionJob']['ingestionJobId']

        # Wait for completion
        while True:
            status_response = self.bedrock_agent.get_ingestion_job(
                knowledgeBaseId=knowledge_base_id,
                dataSourceId=data_source_id,
                ingestionJobId=job_id
            )

            status = status_response['ingestionJob']['status']

            if status == 'COMPLETE':
                print(f"Ingestion completed successfully")
                if 'statistics' in status_response['ingestionJob']:
                    stats = status_response['ingestionJob']['statistics']
                    print(f"Documents indexed: {stats.get('numberOfDocumentsIndexed', 0)}")
                break
            elif status == 'FAILED':
                print(f"Ingestion failed")
                break

            print(f"Ingestion status: {status}")
            time.sleep(30)

        return job_id

    def query(
        self,
        knowledge_base_id: str,
        query: str,
        model_arn: str = 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0',
        num_results: int = 5,
        search_type: str = 'HYBRID',
        metadata_filter: Optional[Dict] = None,
        session_id: Optional[str] = None
    ) -> Dict:
        """Query knowledge base with retrieve and generate"""

        retrieval_config = {
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': knowledge_base_id,
                'modelArn': model_arn,
                'retrievalConfiguration': {
                    'vectorSearchConfiguration': {
                        'numberOfResults': num_results,
                        'overrideSearchType': search_type
                    }
                },
                'generationConfiguration': {
                    'inferenceConfig': {
                        'textInferenceConfig': {
                            'temperature': 0.7,
                            'maxTokens': 2048
                        }
                    }
                }
            }
        }

        # Add metadata filter if provided
        if metadata_filter:
            retrieval_config['knowledgeBaseConfiguration']['retrievalConfiguration']['vectorSearchConfiguration']['filter'] = metadata_filter

        # Build request
        request = {
            'input': {'text': query},
            'retrieveAndGenerateConfiguration': retrieval_config
        }

        # Add session if provided
        if session_id:
            request['sessionId'] = session_id

        response = self.bedrock_agent_runtime.retrieve_and_generate(**request)

        return {
            'answer': response['output']['text'],
            'citations': response.get('citations', []),
            'session_id': response['sessionId']
        }

    def multi_turn_conversation(
        self,
        knowledge_base_id: str,
        queries: List[str],
        model_arn: str = 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'
    ) -> List[Dict]:
        """Execute multi-turn conversation with context"""

        session_id = None
        conversation = []

        for query in queries:
            result = self.query(
                knowledge_base_id=knowledge_base_id,
                query=query,
                model_arn=model_arn,
                session_id=session_id
            )

            session_id = result['session_id']

            conversation.append({
                'query': query,
                'answer': result['answer'],
                'citations': result['citations']
            })

        return conversation


# Example Usage
if __name__ == '__main__':
    rag = BedrockKnowledgeBaseRAG(region_name='us-east-1')

    # Create knowledge base
    kb_id = rag.create_knowledge_base(
        name='production-docs-kb',
        description='Production documentation knowledge base',
        role_arn='arn:aws:iam::123456789012:role/BedrockKBRole',
        vector_store_config={
            'type': 'OPENSEARCH_SERVERLESS',
            'opensearchServerlessConfiguration': {
                'collectionArn': 'arn:aws:aoss:us-east-1:123456789012:collection/kb-collection',
                'vectorIndexName': 'bedrock-kb-index',
                'fieldMapping': {
                    'vectorField': 'bedrock-knowledge-base-default-vector',
                    'textField': 'AMAZON_BEDROCK_TEXT_CHUNK',
                    'metadataField': 'AMAZON_BEDROCK_METADATA'
                }
            }
        }
    )

    # Add data source
    ds_id = rag.add_s3_data_source(
        knowledge_base_id=kb_id,
        name='technical-docs',
        bucket_arn='arn:aws:s3:::my-docs-bucket',
        inclusion_prefixes=['docs/'],
        chunking_strategy='HIERARCHICAL',
        chunking_config={
            'levelConfigurations': [
                {'maxTokens': 1500},
                {'maxTokens': 300}
            ],
            'overlapTokens': 60
        }
    )

    # Ingest data
    rag.ingest_data(kb_id, ds_id)

    # Single query
    result = rag.query(
        knowledge_base_id=kb_id,
        query='What are the best practices for RAG applications?',
        metadata_filter={
            'equals': {
                'key': 'document_type',
                'value': 'best_practices'
            }
        }
    )

    print(f"Answer: {result['answer']}")
    print(f"\nSources:")
    for citation in result['citations']:
        for ref in citation['retrievedReferences']:
            print(f"  - {ref['location']}")

    # Multi-turn conversation
    conversation = rag.multi_turn_conversation(
        knowledge_base_id=kb_id,
        queries=[
            'What is hierarchical chunking?',
            'When should I use it?',
            'What are the configuration parameters?'
        ]
    )

    for turn in conversation:
        print(f"\nQ: {turn['query']}")
        print(f"A: {turn['answer']}")

Related Skills

Amazon Bedrock Core Skills

bedrock-guardrails: Content safety, PII redaction, hallucination detection
bedrock-agents: Agentic workflows with tool use and knowledge bases
bedrock-flows: Visual workflow builder for generative AI
bedrock-model-customization: Fine-tuning, reinforcement fine-tuning, distillation
bedrock-prompt-management: Prompt versioning and deployment

AWS Infrastructure Skills

opensearch-serverless: Vector database configuration and management
neptune-analytics: GraphRAG configuration and queries
s3-management: S3 bucket configuration for data sources and vectors
iam-bedrock: IAM roles and policies for Knowledge Bases

Observability Skills

cloudwatch-bedrock-monitoring: Monitor Knowledge Bases metrics and logs
bedrock-cost-optimization: Track and optimize Knowledge Bases costs

Additional Resources

Official Documentation

Amazon Bedrock Knowledge Bases
Knowledge Bases User Guide
Chunking Strategies
Boto3 Knowledge Bases API

Best Practices

Building Cost-Effective RAG with S3 Vectors
Advanced Parsing and Chunking

Research Document

/mnt/c/data/github/skrillz/AMAZON-BEDROCK-COMPREHENSIVE-RESEARCH-2025.md - Section 2 (Complete Knowledge Bases research)

Adoption

adaptationio/bedrock-knowledge-bases

$ install --global

Security Scan Results

SKILL.md

Amazon Bedrock Knowledge Bases

Overview

What It Does

When to Use This Skill

Key Capabilities

Quick Start

Basic RAG Workflow

Vector Store Options

1. Amazon OpenSearch Serverless

2. Amazon S3 Vectors (Preview)

3. Amazon Neptune Analytics (GraphRAG)

4. Amazon OpenSearch Service Managed Cluster

5. Third-Party Vector Databases

Data Source Configuration

1. Amazon S3

2. Web Crawler

3. Confluence

4. SharePoint

5. Salesforce

Chunking Strategies

1. Fixed-Size Chunking

2. Semantic Chunking

3. Hierarchical Chunking

4. Custom Chunking (Lambda)

Chunking Strategy Selection Guide

Retrieval Operations

1. Retrieve API (Retrieval Only)

2. Retrieve and Generate API (RAG)

3. Multi-Turn Conversations with Session Management

4. Advanced Metadata Filtering

Ingestion Management

1. Start Ingestion Job

2. Monitor Ingestion Job

3. List Ingestion Jobs

Integration with Bedrock Agents

1. Agent with Knowledge Base Action

2. Agent with Multiple Knowledge Bases

Best Practices

1. Chunking Strategy Selection

2. Retrieval Optimization

3. Cost Optimization

4. Session Management

5. GraphRAG with Neptune

6. Data Source Management

7. Monitoring and Debugging

8. Security Best Practices

Complete Production Example

End-to-End RAG Application

Related Skills

Amazon Bedrock Core Skills

AWS Infrastructure Skills

Observability Skills

Additional Resources

Official Documentation

Best Practices

Research Document

Related Skills

adaptationio/ttyd-remote-terminal-wsl2

adaptationio/tri-ai-collaboration

adaptationio/todo-management

adaptationio/testing-workflow

adaptationio/bedrock-knowledge-bases

$ install --global

Security Scan Results

SKILL.md

Amazon Bedrock Knowledge Bases

Overview

What It Does

When to Use This Skill

Key Capabilities

Quick Start

Basic RAG Workflow

Vector Store Options

1. Amazon OpenSearch Serverless

2. Amazon S3 Vectors (Preview)