plugins/faos-coo/skills/enterprise-search/SKILL.md
<!-- AUTO-GENERATED by export-plugins.py — DO NOT EDIT --> --- name: enterprise-search description: Design cross-tool knowledge retrieval strategies, architect enterprise search systems, and tune relevance models. Use when building internal search experiences, consolidating knowledge across tools, or improving search result quality. tags: [search, knowledge-management, information-retrieval] --- # Enterprise Search Business-oriented framework for designing cross-tool knowledge retrieval, archi
npx skillsauth add frank-luongt/faos-skills-marketplace plugins/faos-coo/skills/enterprise-searchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Business-oriented framework for designing cross-tool knowledge retrieval, architecting enterprise search systems, and tuning relevance models. Focused on strategy and requirements — for technical implementation, see hybrid-search-implementation and similarity-search-patterns in the ai-ml domain.
hybrid-search-implementation)similarity-search-patterns)seo-audit)| Pattern | How It Works | Pros | Cons | Best For | |---------|-------------|------|------|----------| | Federated | Query multiple sources in real-time, merge results | No data duplication, real-time freshness | Slower, limited cross-source ranking | Small orgs (<500 people), few sources | | Centralized | Ingest all content into single search index | Best relevance, fastest queries, unified ranking | Data duplication, sync complexity, stale content risk | Large orgs, search-critical workflows | | Hybrid | Centralized index for primary sources + federated for long-tail | Balanced cost vs. quality | Most complex to maintain | Mid-to-large orgs with diverse source landscape |
| Source | Connector Type | Sync Method | Typical Latency | |--------|---------------|-------------|-----------------| | Confluence / Wiki | REST API | Incremental (webhook + poll) | Near real-time | | Slack / Teams | Events API | Streaming | Real-time | | Google Drive | Drive API + Changes API | Incremental | 5-15 min | | SharePoint | Graph API | Delta query | 5-15 min | | GitHub | Webhooks + REST API | Event-driven | Near real-time | | Jira / Linear | REST API + Webhooks | Incremental | Near real-time | | Email | Graph API / Gmail API | Incremental | 15-30 min | | Database | CDC (Change Data Capture) | Streaming | Near real-time |
Critical requirement: Search results must respect source-level permissions.
| Approach | How It Works | Trade-off | |----------|-------------|-----------| | Early binding | Filter at index time (only index what user can access) | Secure but requires per-user indices or ACL tagging | | Late binding | Filter at query time (check permissions on each result) | Simpler indexing but slower queries at scale | | Hybrid | Group-based ACL at index + user-level check at query | Best balance for most orgs |
Every indexed document should carry these metadata fields:
| Field | Type | Purpose | Example |
|-------|------|---------|---------|
| title | string | Primary display and search field | "Q4 Revenue Report" |
| source | enum | Origin system | confluence, slack, drive, github |
| content_type | enum | Document classification | document, conversation, code, ticket |
| team | string | Owning team or department | "Engineering", "Sales" |
| created_at | datetime | For freshness scoring | 2026-01-15T10:30:00Z |
| updated_at | datetime | For freshness and deduplication | 2026-02-28T14:00:00Z |
| author | string | For personalization and credibility | "[email protected]" |
| access_groups | list[string] | For permission filtering | ["engineering", "all-staff"] |
| tags | list[string] | For faceted navigation | ["architecture", "adr", "database"] |
| status | enum | Content lifecycle | draft, published, archived |
| Rule | Rationale | |------|-----------| | Use controlled vocabulary (not free-text tags) | Prevents tag proliferation and inconsistency | | Max 5 tags per document | Forces specificity over over-tagging | | Tags use kebab-case | Consistency with URLs and search queries | | Review tag taxonomy quarterly | Remove unused tags, merge synonyms | | Auto-tag where possible | Use classification models to suggest tags on creation |
| Content Type | Freshness Target | Stale Threshold | Action When Stale | |-------------|-----------------|-----------------|-------------------| | Documentation | Updated quarterly | >6 months | Flag for review | | Meeting notes | Permanent | N/A | Reduce ranking weight over time | | Code / PRs | Always current (live sync) | N/A | N/A | | Tickets / Issues | Live sync | N/A | Archive closed items after 12 months | | Policies / Runbooks | Updated semi-annually | >12 months | Alert content owner |
| Factor | Weight | Description | |--------|--------|-------------| | Text relevance (BM25) | 40% | Keyword match quality — title, body, tags | | Freshness | 20% | More recent content ranked higher (decay function) | | Popularity | 15% | View count, link count, citation count | | Personalization | 15% | User's team, recent searches, frequently accessed sources | | Source authority | 10% | Official docs > Slack messages > personal notes |
| Field | Boost Factor | Rationale | |-------|-------------|-----------| | Title | 3.0x | Titles are the strongest relevance signal | | Tags | 2.0x | Curated metadata is high-signal | | Headings (H1-H3) | 1.5x | Section headers indicate topic boundaries | | Body text | 1.0x | Baseline — full content match | | Comments | 0.5x | Noisy, often tangential |
| Technique | Purpose | Example | |-----------|---------|---------| | Synonym expansion | Match equivalent terms | "deploy" → "deploy, release, ship" | | Spell correction | Handle typos | "kuberntes" → "kubernetes" | | Intent classification | Route to specialized search | "how do I deploy" → tutorial filter | | Entity recognition | Boost specific entities | "John's PR for auth" → person + code filter |
| Metric | Formula | Target | How to Measure | |--------|---------|--------|---------------| | MRR (Mean Reciprocal Rank) | Average of 1/rank of first relevant result | >0.6 | Relevance judgments on sample queries | | NDCG@10 | Normalized discounted cumulative gain at position 10 | >0.7 | Graded relevance judgments | | Precision@5 | % of top 5 results that are relevant | >60% | Binary relevance judgments | | Zero-Result Rate | % of queries returning no results | <5% | Log analysis | | Click-Through Rate | % of searches that result in a click | >40% | Click tracking | | Query Reformulation Rate | % of searches followed by a refined query | <20% | Session analysis | | Time to Result | p50 and p95 query latency | p50 <200ms, p95 <1s | Infrastructure monitoring |
1. Sample 100 queries weekly from search logs
2. Have 2+ raters judge relevance of top 10 results (0-3 scale)
3. Calculate MRR, NDCG@10, Precision@5
4. Identify failure patterns (categories of bad results)
5. Adjust relevance model (boosting, synonyms, freshness weights)
6. A/B test changes against baseline
7. Repeat monthly
| Pattern | Purpose | Implementation Notes | |---------|---------|---------------------| | Autocomplete | Reduce typing, guide to known content | Suggest from titles, tags, and popular queries | | Faceted navigation | Filter by source, type, team, date | Show counts per facet; update dynamically | | Snippets / Highlights | Show matching content in context | Highlight query terms in 2-3 sentence excerpts | | Related queries | Help users refine or explore | "People also searched for..." based on co-occurrence | | Source badges | Indicate content origin | Confluence icon, Slack icon, etc. | | Freshness indicator | Show content age | "Updated 2 days ago" vs. "Updated 2 years ago" | | "Did you mean?" | Handle typos gracefully | Only suggest when confidence >80% |
# Enterprise Search Requirements — [Project Name]
## Current State
- **Content sources:** [list with estimated volumes]
- **Current search tools:** [what people use today]
- **Top pain points:** [from user interviews]
## Architecture Decision
- **Pattern:** [Federated / Centralized / Hybrid]
- **Rationale:** [why this pattern]
- **Search platform:** [Elasticsearch, Typesense, Algolia, Vespa, etc.]
## Scope (Phase 1)
- **Sources to index:** [list with priority]
- **Content types:** [documents, conversations, code, tickets]
- **Users:** [target audience and access model]
## Relevance Model
- **Scoring factors:** [weights per factor]
- **Field boosting:** [title, tags, headings, body]
- **Freshness decay:** [function and parameters]
## Quality Targets
| Metric | Baseline | Target |
|--------|----------|--------|
| MRR | [current] | [goal] |
| Zero-result rate | [current] | <5% |
| p95 latency | [current] | <1s |
## Roadmap
- Phase 1: [Core sources, basic search] — [timeline]
- Phase 2: [Additional sources, relevance tuning] — [timeline]
- Phase 3: [Personalization, AI-powered features] — [timeline]
hybrid-search-implementation (ai-ml — technical implementation), similarity-search-patterns (ai-ml — vector search)development
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-mlflow-evaluation --- # MLflow 3 GenAI Evaluation ## Before Writing Any Code 1. **Read GOTCHAS.md** - 15+ common mistakes that cause failures 2. **Read CRITICAL-interfaces.md** - Exact API signatures and data schemas ## End-to-End Workflows Follow these workflows based on your goal. Each step indicates which reference files to read. ### Workflow 1: First-Time Evaluation Setup For users new to MLflow GenAI evalu
development
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-lakebase-provisioned --- # Lakebase Provisioned Patterns and best practices for using Lakebase Provisioned (Databricks managed PostgreSQL) for OLTP workloads. ## When to Use Use this skill when: - Building applications that need a PostgreSQL database for transactional workloads - Adding persistent state to Databricks Apps - Implementing reverse ETL from Delta Lake to an operational database - Storing chat/agent m
tools
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-jobs --- # Databricks Lakeflow Jobs ## Overview Databricks Jobs orchestrate data workflows with multi-task DAGs, flexible triggers, and comprehensive monitoring. Jobs support diverse task types and can be managed via Python SDK, CLI, or Asset Bundles. ## Reference Files | Use Case | Reference File | | ----------------------
development
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-genie --- # Databricks Genie Create and query Databricks Genie Spaces - natural language interfaces for SQL-based data exploration. ## Overview Genie Spaces allow users to ask natural language questions about structured data in Unity Catalog. The system translates questions into SQL queries, executes them on a SQL warehouse, and presents results conversationally. ## When to Use This Skill Use this skill when: -