name: enterprise-search description: Design cross-tool knowledge retrieval strategies, architect enterprise search systems, and tune relevance models. Use when building internal search experiences, consolidating knowledge across tools, or improving search result quality. tags: [search, knowledge-management, information-retrieval]

Enterprise Search

Business-oriented framework for designing cross-tool knowledge retrieval, architecting enterprise search systems, and tuning relevance models. Focused on strategy and requirements — for technical implementation, see hybrid-search-implementation and similarity-search-patterns in the ai-ml domain.

Use this skill when

Designing an enterprise search strategy across multiple internal tools (Confluence, Slack, Drive, SharePoint, GitHub)
Choosing between federated, centralized, or hybrid search architectures
Defining relevance tuning requirements and quality metrics
Building a knowledge taxonomy or metadata schema for searchable content
Creating search UX requirements for internal portals
Evaluating search quality and measuring improvement

Do not use this skill when

Implementing vector search or embeddings at code level (use hybrid-search-implementation)
Building similarity search with specific vector databases (use similarity-search-patterns)
Optimizing web SEO for external search engines (use seo-audit)
Building RAG pipelines for LLM applications (use RAG skills in ai-ml domain)

Instructions

Audit current state — inventory all content sources, volumes, and access patterns.
Choose architecture — federated, centralized, or hybrid based on your constraints.
Design taxonomy — define metadata schema, facets, and tagging standards.
Define relevance model — scoring factors, boosting rules, and personalization signals.
Set quality metrics — establish baselines and targets for search quality.
Design search UX — autocomplete, facets, snippets, and result presentation.

Search Architecture Patterns

Architecture Comparison

| Pattern | How It Works | Pros | Cons | Best For | |---------|-------------|------|------|----------| | Federated | Query multiple sources in real-time, merge results | No data duplication, real-time freshness | Slower, limited cross-source ranking | Small orgs (<500 people), few sources | | Centralized | Ingest all content into single search index | Best relevance, fastest queries, unified ranking | Data duplication, sync complexity, stale content risk | Large orgs, search-critical workflows | | Hybrid | Centralized index for primary sources + federated for long-tail | Balanced cost vs. quality | Most complex to maintain | Mid-to-large orgs with diverse source landscape |

Connector Architecture

| Source | Connector Type | Sync Method | Typical Latency | |--------|---------------|-------------|-----------------| | Confluence / Wiki | REST API | Incremental (webhook + poll) | Near real-time | | Slack / Teams | Events API | Streaming | Real-time | | Google Drive | Drive API + Changes API | Incremental | 5-15 min | | SharePoint | Graph API | Delta query | 5-15 min | | GitHub | Webhooks + REST API | Event-driven | Near real-time | | Jira / Linear | REST API + Webhooks | Incremental | Near real-time | | Email | Graph API / Gmail API | Incremental | 15-30 min | | Database | CDC (Change Data Capture) | Streaming | Near real-time |

Access Control

Critical requirement: Search results must respect source-level permissions.

| Approach | How It Works | Trade-off | |----------|-------------|-----------| | Early binding | Filter at index time (only index what user can access) | Secure but requires per-user indices or ACL tagging | | Late binding | Filter at query time (check permissions on each result) | Simpler indexing but slower queries at scale | | Hybrid | Group-based ACL at index + user-level check at query | Best balance for most orgs |

Knowledge Taxonomy Design

Metadata Schema

Every indexed document should carry these metadata fields:

| Field | Type | Purpose | Example | |-------|------|---------|---------| | title | string | Primary display and search field | "Q4 Revenue Report" | | source | enum | Origin system | confluence, slack, drive, github | | content_type | enum | Document classification | document, conversation, code, ticket | | team | string | Owning team or department | "Engineering", "Sales" | | created_at | datetime | For freshness scoring | 2026-01-15T10:30:00Z | | updated_at | datetime | For freshness and deduplication | 2026-02-28T14:00:00Z | | author | string | For personalization and credibility | "[email protected]" | | access_groups | list[string] | For permission filtering | ["engineering", "all-staff"] | | tags | list[string] | For faceted navigation | ["architecture", "adr", "database"] | | status | enum | Content lifecycle | draft, published, archived |

Tagging Standards

| Rule | Rationale | |------|-----------| | Use controlled vocabulary (not free-text tags) | Prevents tag proliferation and inconsistency | | Max 5 tags per document | Forces specificity over over-tagging | | Tags use kebab-case | Consistency with URLs and search queries | | Review tag taxonomy quarterly | Remove unused tags, merge synonyms | | Auto-tag where possible | Use classification models to suggest tags on creation |

Content Freshness Policies

| Content Type | Freshness Target | Stale Threshold | Action When Stale | |-------------|-----------------|-----------------|-------------------| | Documentation | Updated quarterly | >6 months | Flag for review | | Meeting notes | Permanent | N/A | Reduce ranking weight over time | | Code / PRs | Always current (live sync) | N/A | N/A | | Tickets / Issues | Live sync | N/A | Archive closed items after 12 months | | Policies / Runbooks | Updated semi-annually | >12 months | Alert content owner |

Relevance Tuning Framework

Scoring Factors

| Factor | Weight | Description | |--------|--------|-------------| | Text relevance (BM25) | 40% | Keyword match quality — title, body, tags | | Freshness | 20% | More recent content ranked higher (decay function) | | Popularity | 15% | View count, link count, citation count | | Personalization | 15% | User's team, recent searches, frequently accessed sources | | Source authority | 10% | Official docs > Slack messages > personal notes |

Field Boosting

| Field | Boost Factor | Rationale | |-------|-------------|-----------| | Title | 3.0x | Titles are the strongest relevance signal | | Tags | 2.0x | Curated metadata is high-signal | | Headings (H1-H3) | 1.5x | Section headers indicate topic boundaries | | Body text | 1.0x | Baseline — full content match | | Comments | 0.5x | Noisy, often tangential |

Query Understanding

| Technique | Purpose | Example | |-----------|---------|---------| | Synonym expansion | Match equivalent terms | "deploy" → "deploy, release, ship" | | Spell correction | Handle typos | "kuberntes" → "kubernetes" | | Intent classification | Route to specialized search | "how do I deploy" → tutorial filter | | Entity recognition | Boost specific entities | "John's PR for auth" → person + code filter |

Search Quality Metrics

Core Metrics

| Metric | Formula | Target | How to Measure | |--------|---------|--------|---------------| | MRR (Mean Reciprocal Rank) | Average of 1/rank of first relevant result | >0.6 | Relevance judgments on sample queries | | NDCG@10 | Normalized discounted cumulative gain at position 10 | >0.7 | Graded relevance judgments | | Precision@5 | % of top 5 results that are relevant | >60% | Binary relevance judgments | | Zero-Result Rate | % of queries returning no results | <5% | Log analysis | | Click-Through Rate | % of searches that result in a click | >40% | Click tracking | | Query Reformulation Rate | % of searches followed by a refined query | <20% | Session analysis | | Time to Result | p50 and p95 query latency | p50 <200ms, p95 <1s | Infrastructure monitoring |

Quality Improvement Loop

1. Sample 100 queries weekly from search logs
2. Have 2+ raters judge relevance of top 10 results (0-3 scale)
3. Calculate MRR, NDCG@10, Precision@5
4. Identify failure patterns (categories of bad results)
5. Adjust relevance model (boosting, synonyms, freshness weights)
6. A/B test changes against baseline
7. Repeat monthly

Search UX Patterns

| Pattern | Purpose | Implementation Notes | |---------|---------|---------------------| | Autocomplete | Reduce typing, guide to known content | Suggest from titles, tags, and popular queries | | Faceted navigation | Filter by source, type, team, date | Show counts per facet; update dynamically | | Snippets / Highlights | Show matching content in context | Highlight query terms in 2-3 sentence excerpts | | Related queries | Help users refine or explore | "People also searched for..." based on co-occurrence | | Source badges | Indicate content origin | Confluence icon, Slack icon, etc. | | Freshness indicator | Show content age | "Updated 2 days ago" vs. "Updated 2 years ago" | | "Did you mean?" | Handle typos gracefully | Only suggest when confidence >80% |

Output Template: Enterprise Search Requirements Document

# Enterprise Search Requirements — [Project Name]

## Current State
- **Content sources:** [list with estimated volumes]
- **Current search tools:** [what people use today]
- **Top pain points:** [from user interviews]

## Architecture Decision
- **Pattern:** [Federated / Centralized / Hybrid]
- **Rationale:** [why this pattern]
- **Search platform:** [Elasticsearch, Typesense, Algolia, Vespa, etc.]

## Scope (Phase 1)
- **Sources to index:** [list with priority]
- **Content types:** [documents, conversations, code, tickets]
- **Users:** [target audience and access model]

## Relevance Model
- **Scoring factors:** [weights per factor]
- **Field boosting:** [title, tags, headings, body]
- **Freshness decay:** [function and parameters]

## Quality Targets
| Metric | Baseline | Target |
|--------|----------|--------|
| MRR | [current] | [goal] |
| Zero-result rate | [current] | <5% |
| p95 latency | [current] | <1s |

## Roadmap
- Phase 1: [Core sources, basic search] — [timeline]
- Phase 2: [Additional sources, relevance tuning] — [timeline]
- Phase 3: [Personalization, AI-powered features] — [timeline]

Common Mistakes

Indexing everything without curation — more content does not mean better search; noisy sources dilute quality
Ignoring access control — leaking confidential documents through search is a security incident
No freshness weighting — returning 3-year-old docs before this week's update frustrates users
Not measuring search quality — if you don't measure MRR/NDCG, you can't improve
Building search without user research — understand what people actually search for before designing the system
Treating search as a one-time project — relevance tuning is ongoing; plan for continuous improvement

Additional Resources

Related skills: hybrid-search-implementation (ai-ml — technical implementation), similarity-search-patterns (ai-ml — vector search)
Elasticsearch / OpenSearch — open-source search engines
Algolia — managed search platform
Vespa — open-source search and recommendation engine

name: enterprise-search description: Design cross-tool knowledge retrieval strategies, architect enterprise search systems, and tune relevance models. Use when building internal search experiences, consolidating knowledge across tools, or improving search result quality. tags: [search, knowledge-management, information-retrieval]

Enterprise Search

Use this skill when

Designing an enterprise search strategy across multiple internal tools (Confluence, Slack, Drive, SharePoint, GitHub)
Choosing between federated, centralized, or hybrid search architectures
Defining relevance tuning requirements and quality metrics
Building a knowledge taxonomy or metadata schema for searchable content
Creating search UX requirements for internal portals
Evaluating search quality and measuring improvement

Do not use this skill when

Implementing vector search or embeddings at code level (use hybrid-search-implementation)
Building similarity search with specific vector databases (use similarity-search-patterns)
Optimizing web SEO for external search engines (use seo-audit)
Building RAG pipelines for LLM applications (use RAG skills in ai-ml domain)

Instructions

Audit current state — inventory all content sources, volumes, and access patterns.
Choose architecture — federated, centralized, or hybrid based on your constraints.
Design taxonomy — define metadata schema, facets, and tagging standards.
Define relevance model — scoring factors, boosting rules, and personalization signals.
Set quality metrics — establish baselines and targets for search quality.
Design search UX — autocomplete, facets, snippets, and result presentation.

Search Architecture Patterns

Architecture Comparison

Connector Architecture

Access Control

Critical requirement: Search results must respect source-level permissions.

Knowledge Taxonomy Design

Metadata Schema

Every indexed document should carry these metadata fields:

Tagging Standards

Content Freshness Policies

Relevance Tuning Framework

Scoring Factors

Field Boosting

Query Understanding

Search Quality Metrics

Core Metrics

Quality Improvement Loop

1. Sample 100 queries weekly from search logs
2. Have 2+ raters judge relevance of top 10 results (0-3 scale)
3. Calculate MRR, NDCG@10, Precision@5
4. Identify failure patterns (categories of bad results)
5. Adjust relevance model (boosting, synonyms, freshness weights)
6. A/B test changes against baseline
7. Repeat monthly

Search UX Patterns

Output Template: Enterprise Search Requirements Document

# Enterprise Search Requirements — [Project Name]

## Current State
- **Content sources:** [list with estimated volumes]
- **Current search tools:** [what people use today]
- **Top pain points:** [from user interviews]

## Architecture Decision
- **Pattern:** [Federated / Centralized / Hybrid]
- **Rationale:** [why this pattern]
- **Search platform:** [Elasticsearch, Typesense, Algolia, Vespa, etc.]

## Scope (Phase 1)
- **Sources to index:** [list with priority]
- **Content types:** [documents, conversations, code, tickets]
- **Users:** [target audience and access model]

## Relevance Model
- **Scoring factors:** [weights per factor]
- **Field boosting:** [title, tags, headings, body]
- **Freshness decay:** [function and parameters]

## Quality Targets
| Metric | Baseline | Target |
|--------|----------|--------|
| MRR | [current] | [goal] |
| Zero-result rate | [current] | <5% |
| p95 latency | [current] | <1s |

## Roadmap
- Phase 1: [Core sources, basic search] — [timeline]
- Phase 2: [Additional sources, relevance tuning] — [timeline]
- Phase 3: [Personalization, AI-powered features] — [timeline]

Common Mistakes

Indexing everything without curation — more content does not mean better search; noisy sources dilute quality
Ignoring access control — leaking confidential documents through search is a security incident
No freshness weighting — returning 3-year-old docs before this week's update frustrates users
Not measuring search quality — if you don't measure MRR/NDCG, you can't improve
Building search without user research — understand what people actually search for before designing the system
Treating search as a one-time project — relevance tuning is ongoing; plan for continuous improvement

Additional Resources

Related skills: hybrid-search-implementation (ai-ml — technical implementation), similarity-search-patterns (ai-ml — vector search)
Elasticsearch / OpenSearch — open-source search engines
Algolia — managed search platform
Vespa — open-source search and recommendation engine

Adoption

frank-luongt/plugins/faos-coo/skills/enterprise-search

$ install --global

Security Scan Results

SKILL.md

Enterprise Search

Use this skill when

Do not use this skill when

Instructions

Search Architecture Patterns

Architecture Comparison

Connector Architecture

Access Control

Knowledge Taxonomy Design

Metadata Schema

Tagging Standards

Content Freshness Policies

Relevance Tuning Framework

Scoring Factors

Field Boosting

Query Understanding

Search Quality Metrics

Core Metrics

Quality Improvement Loop

Search UX Patterns

Output Template: Enterprise Search Requirements Document

Common Mistakes

Additional Resources

Related Skills

frank-luongt/skills/codex/grpo-rl-training

frank-luongt/skills/codex/graphql-architect

frank-luongt/skills/codex/grafana-dashboards

frank-luongt/skills/codex/gptq

frank-luongt/plugins/faos-coo/skills/enterprise-search

$ install --global

Security Scan Results

SKILL.md

Enterprise Search

Use this skill when

Do not use this skill when

Instructions

Search Architecture Patterns

Architecture Comparison

Connector Architecture

Access Control

Knowledge Taxonomy Design

Metadata Schema

Tagging Standards

Content Freshness Policies

Relevance Tuning Framework

Scoring Factors

Field Boosting

Query Understanding

Search Quality Metrics

Core Metrics

Quality Improvement Loop

Search UX Patterns

Output Template: Enterprise Search Requirements Document

Common Mistakes

Additional Resources

Related Skills

frank-luongt/skills/codex/grpo-rl-training

frank-luongt/skills/codex/graphql-architect

frank-luongt/skills/codex/grafana-dashboards

frank-luongt/skills/codex/gptq