agent-skills/skills/hologres-uv-compute/SKILL.md
Hologres UV/PV computation using Dynamic Tables and RoaringBitmap for real-time deduplication at scale. Use for building incremental UV/PV pipelines, RoaringBitmap-based user deduplication, flexible time-range UV aggregation, and text-to-int UID encoding for bitmap compatibility. Triggers: "hologres uv", "hologres pv", "roaringbitmap", "rb_build_agg", "rb_or_agg", "去重", "UV计算", "用户去重", "bitmap去重", "实时UV", "hg_id_encoding"
npx skillsauth add aliyun/hologres-ai-plugins hologres-uv-computeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill requires hologres-cli to be installed first:
pip install hologres-cli
export HOLOGRES_SKILL=hologres-uv-compute
All SQL execution and Dynamic Table operations depend on hologres-cli commands (hologres sql run --write, hologres dt create).
Build real-time, incremental UV/PV computation pipelines using Dynamic Tables and RoaringBitmap in Hologres. This approach supports flexible time-range aggregation over billions of records with low latency.
| Traditional COUNT DISTINCT | RoaringBitmap + Dynamic Table |
|---------------------------|-------------------------------|
| Full scan on every query | Pre-aggregated bitmaps, incremental refresh |
| Slow with high-cardinality UIDs | Compressed bitmap, sub-second UV queries |
| Cannot merge across time ranges | RB_OR_AGG merges bitmaps for any date range |
| Heavy resource usage | Incremental computation, minimal resources |
-- 1. Enable RoaringBitmap extension
CREATE EXTENSION IF NOT EXISTS roaringbitmap;
-- 2. Create ODS detail table (source data)
BEGIN;
CREATE TABLE ods_app_detail (
uid int,
country text,
prov text,
city text,
ymd text NOT NULL
) LOGICAL PARTITION BY LIST (ymd);
CALL set_table_property('ods_app_detail', 'orientation', 'column');
CALL set_table_property('ods_app_detail', 'distribution_key', 'uid');
CALL set_table_property('ods_app_detail', 'clustering_key', 'ymd');
CALL set_table_property('ods_app_detail', 'event_time_column', 'ymd');
CALL set_table_property('ods_app_detail', 'bitmap_columns', 'country,prov,city,ymd');
COMMIT;
-- 3. Create DWS Dynamic Table (bitmap aggregation layer)
CREATE DYNAMIC TABLE dt_dws_app_rb (
country, prov, city, rb_uid, pv, ymd
)
LOGICAL PARTITION BY LIST (ymd)
WITH (
freshness = '5 minutes',
auto_refresh_mode = 'incremental',
auto_refresh_partition_active_time = '2 days',
partition_key_time_format = 'YYYYMMDD'
)
AS
SELECT country, prov, city,
RB_BUILD_AGG(uid) AS rb_uid,
COUNT(1) AS pv,
ymd
FROM ods_app_detail
GROUP BY country, prov, city, ymd;
-- 4. Query UV/PV for a single day
SELECT country, prov, city,
RB_CARDINALITY(RB_OR_AGG(rb_uid)) AS uv,
SUM(pv) AS pv
FROM dt_dws_app_rb
WHERE ymd = '20251223'
GROUP BY country, prov, city;
ODS (Detail) DWS (Bitmap Aggregation) Query
┌─────────────┐ Dynamic ┌──────────────────────┐ ┌─────────────┐
│ods_app_detail│──Table────>│ dt_dws_app_rb │───>│ RB_OR_AGG │
│ uid, dims, │ incremental│ rb_uid (bitmap), │ │ + CARDINALITY│
│ ymd │ refresh │ pv, dims, ymd │ │ = UV for any │
└─────────────┘ └──────────────────────┘ │ time range │
└─────────────┘
Data flow:
ods_app_detail (partitioned by day)dt_dws_app_rb incrementally aggregates UIDs into bitmaps per dimension per dayRB_OR_AGG for exact UVThe source table stores raw event data, partitioned by date.
BEGIN;
CREATE TABLE ods_app_detail (
uid int,
country text,
prov text,
city text,
ymd text NOT NULL
) LOGICAL PARTITION BY LIST (ymd);
CALL set_table_property('ods_app_detail', 'orientation', 'column');
CALL set_table_property('ods_app_detail', 'distribution_key', 'uid');
CALL set_table_property('ods_app_detail', 'clustering_key', 'ymd');
CALL set_table_property('ods_app_detail', 'event_time_column', 'ymd');
CALL set_table_property('ods_app_detail', 'bitmap_columns', 'country,prov,city,ymd');
COMMIT;
Key design choices:
| Property | Value | Reason |
|----------|-------|--------|
| orientation | column | Columnar storage for analytical queries |
| distribution_key | uid | Distribute by user for aggregation locality |
| clustering_key | ymd | Optimize time-range scans |
| event_time_column | ymd | Segment key for partition pruning |
| bitmap_columns | dimension columns | Accelerate dimension filtering |
The Dynamic Table pre-aggregates UIDs into RoaringBitmaps per dimension per day using incremental refresh.
CREATE DYNAMIC TABLE dt_dws_app_rb (
country, prov, city, rb_uid, pv, ymd
)
LOGICAL PARTITION BY LIST (ymd)
WITH (
freshness = '5 minutes',
auto_refresh_mode = 'incremental',
auto_refresh_partition_active_time = '2 days',
partition_key_time_format = 'YYYYMMDD'
)
AS
SELECT country, prov, city,
RB_BUILD_AGG(uid) AS rb_uid,
COUNT(1) AS pv,
ymd
FROM ods_app_detail
GROUP BY country, prov, city, ymd;
Key Dynamic Table parameters:
| Parameter | Value | Description |
|-----------|-------|-------------|
| freshness | 5 minutes | Target data freshness |
| auto_refresh_mode | incremental | Only compute new/changed data |
| auto_refresh_partition_active_time | 2 days | Only auto-refresh recent 2 days |
| partition_key_time_format | YYYYMMDD | Parse partition key as date |
Auto-refresh only covers active partitions. For historical data, trigger manually:
-- Full refresh a specific partition
REFRESH DYNAMIC TABLE dt_dws_app_rb
PARTITION(20251201)
WITH (refresh_mode = 'full');
-- Full refresh a date range (one by one)
REFRESH DYNAMIC TABLE dt_dws_app_rb
PARTITION(20251201)
WITH (refresh_mode = 'full');
SELECT country, prov, city,
RB_CARDINALITY(RB_OR_AGG(rb_uid)) AS uv,
SUM(pv) AS pv
FROM dt_dws_app_rb
WHERE ymd = '20251223'
GROUP BY country, prov, city;
SELECT country, prov, city,
RB_CARDINALITY(RB_OR_AGG(rb_uid)) AS uv,
SUM(pv) AS pv
FROM dt_dws_app_rb
WHERE ymd >= '20251201' AND ymd <= '20251231'
GROUP BY country, prov, city;
SELECT RB_CARDINALITY(RB_OR_AGG(rb_uid)) AS total_uv,
SUM(pv) AS total_pv
FROM dt_dws_app_rb
WHERE ymd >= '20251201' AND ymd <= '20251231';
How it works: RB_OR_AGG merges daily bitmaps with bitwise OR — a user appearing on multiple days is counted only once. RB_CARDINALITY returns the count of distinct bits (= distinct users).
When UIDs are text type (e.g., device IDs, UUIDs), they must be mapped to integers for RoaringBitmap. Two approaches:
-- Create mapping table
BEGIN;
CREATE TABLE uid_mapping (
uid text NOT NULL,
uid_int32 serial,
PRIMARY KEY (uid)
);
CALL set_table_property('uid_mapping', 'orientation', 'row');
CALL set_table_property('uid_mapping', 'distribution_key', 'uid');
CALL set_table_property('uid_mapping', 'clustering_key', 'uid');
COMMIT;
-- Use in Dynamic Table
CREATE DYNAMIC TABLE dt_dws_app_rb (...)
AS
SELECT country, prov, city,
RB_BUILD_AGG(m.uid_int32) AS uid_rb,
COUNT(1) AS pv, ymd
FROM ods_app_detail o
JOIN uid_mapping m ON o.uid_text = m.uid
GROUP BY country, prov, city, ymd;
Built-in function that auto-manages the mapping table.
-- Create mapping table (single text PK + one serial column)
BEGIN;
CREATE TABLE uid_mapping (
uid text NOT NULL,
uid_int32 serial,
PRIMARY KEY (uid)
);
CALL set_table_property('uid_mapping', 'orientation', 'row');
CALL set_table_property('uid_mapping', 'distribution_key', 'uid');
COMMIT;
-- Use hg_id_encoding_int4 in Dynamic Table query
CREATE DYNAMIC TABLE dt_dws_app_rb (...)
AS
SELECT country, prov, city,
RB_BUILD_AGG(hg_id_encoding_int4(uid_text, 'uid_mapping')) AS uid_rb,
COUNT(1) AS pv, ymd
FROM ods_app_detail
GROUP BY country, prov, city, ymd;
Constraints for hg_id_encoding_int4:
text PK column + one serial column| Function | Description | Example |
|----------|-------------|---------|
| RB_BUILD_AGG(int) | Aggregate integers into a bitmap | RB_BUILD_AGG(uid) |
| RB_OR_AGG(roaringbitmap) | Merge bitmaps (union / deduplicate) | RB_OR_AGG(rb_uid) |
| RB_AND_AGG(roaringbitmap) | Intersect bitmaps (common users) | RB_AND_AGG(rb_uid) |
| RB_CARDINALITY(roaringbitmap) | Count distinct elements in bitmap | RB_CARDINALITY(rb) |
| RB_OR(rb, rb) | Union two bitmaps | RB_OR(a, b) |
| RB_AND(rb, rb) | Intersect two bitmaps | RB_AND(a, b) |
| RB_ANDNOT(rb, rb) | Difference (in A but not in B) | RB_ANDNOT(a, b) |
| RB_TO_ARRAY(roaringbitmap) | Convert bitmap to integer array | RB_TO_ARRAY(rb) |
| RB_BUILD(int[]) | Build bitmap from integer array | RB_BUILD(ARRAY[1,2,3]) |
| RB_CONTAINS(rb, int) | Check if bitmap contains a value | RB_CONTAINS(rb, 42) |
| RB_IS_EMPTY(roaringbitmap) | Check if bitmap is empty | RB_IS_EMPTY(rb) |
For complete function reference, see references/roaringbitmap-functions.md.
| Document | Content | |----------|---------| | roaringbitmap-functions.md | Complete RoaringBitmap function reference | | dynamic-table-patterns.md | Dynamic Table configuration patterns for UV | | advanced-scenarios.md | Advanced UV scenarios: retention, funnel, cross-platform |
LOGICAL PARTITION BY LIST (ymd) on both ODS and DWS tables for time-range partition pruningauto_refresh_mode = 'incremental' to avoid recomputing all data on each refreshauto_refresh_partition_active_time to limit auto-refresh to recent partitions onlyREFRESH DYNAMIC TABLE ... PARTITION(...) WITH (refresh_mode = 'full')RB_OR_AGG for cross-day UV — it merges daily bitmaps for exact deduplication over any date rangehg_id_encoding_int4 (V4.1+) instead of manual mapping tablesdistribution_key = 'uid' on ODS table for aggregation localityCREATE EXTENSION IF NOT EXISTS roaringbitmap before any bitmap operationsdevelopment
Hologres Knowledge Base Management: Build search & RAG knowledge bases on Hologres using full-text inverted index (Tantivy + BM25), HGraph vector index, and holo-search-sdk. Covers create knowledge base → ingest documents (with embeddings) → vector / full-text / hybrid search → Q&A with LLM. Triggers: "知识库", "knowledge base", "RAG", "向量检索", "vector search", "全文检索", "fulltext search", "倒排索引", "BM25", "HGraph", "holo-search-sdk", "embedding", "混合检索", "hybrid search", "Hologres 搜索", "Hologres search", "tantivy", "jieba", "ik 分词", "向量索引"
tools
Hologres 实例内存使用率异常诊断技能。当用户提到内存打满、OOM、内存持续高位、Worker 内存不均、内存泄漏、内存倾斜、内存归因分析等场景时使用。 以 instance_id + 时间窗口为输入,自动完成内存水位形态判定(全局高 / 局部倾斜 / 持续不回落)、业务指标对齐、内存分类初筛(Query vs System/Cache)、 并沿 Query 主线、倾斜主线、Write/后台主线、System/元数据主线四大归因维度自动下钻,输出结构化的 Markdown 诊断报告与治理行动清单。 云监控数据通过 `hologres metric query` / `hologres metric latest` 获取;元仓与 PG 系统表数据通过 `hologres sql run` 获取; 内部工具数据(OOM/Jeprof/Coredump)通过 `holo oncall common` 获取,全程享有 hologres-cli 的安全护栏、JSON 结构化输出与自动错误重试能力。
tools
Hologres 实例 CPU 使用率异常诊断技能。当用户提到 CPU 打满、CPU 持续高位、Worker CPU 不均、负载诊断、CPU 归因分析、后台 Compaction 干扰等场景时使用。 以 instance_id + 时间窗口为输入,自动完成 CPU 状态分级(持续打满 / 持续高位 / 安全平稳)、四象限归因诊断(宏观定性 / 分布定位 / 查询归因 / 后台任务干扰),并输出结构化的 Markdown 诊断报告与治理行动清单。 云监控数据通过 `hologres metric query` / `hologres metric latest` 获取;元仓与 PG 系统表数据通过 `hologres sql run` 获取,全程享有 hologres-cli 的安全护栏、JSON 结构化输出与自动错误重试能力。
data-ai
Hologres 运维诊断日报生成技能。生成一份包含"诊断结论 + 根因解释 + 行动建议"的每日巡检报告, 覆盖实例健康、可用性、计算资源、SQL性能、成本治理、容量预测六大维度。 触发词:日报、每日巡检、daily report、运维日报、诊断日报、实例巡检报告、每日健康报告。