[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-772d582d-a6fd-4cbb-bdc4-0ad4eccd579e":3,"$fnJa3dE8iGQU4Zfs6I22GkXkPFW7Zvfz6PyN3huTubOQ":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"772d582d-a6fd-4cbb-bdc4-0ad4eccd579e","cc-skill-clickhouse-io","ClickHouse数据库模式、查询优化、分析和数据工程最佳实践，适用于高性能分析工作负载。","cat_life_career","mod_other","sickn33,other","---\nname: cc-skill-clickhouse-io\ndescription: \"ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.\"\nrisk: unknown\nsource: community\ndate_added: \"2026-02-27\"\n---\n\n# ClickHouse Analytics Patterns\n\nClickHouse-specific patterns for high-performance analytics and data engineering.\n\n## Overview\n\nClickHouse is a column-oriented database management system (DBMS) for online analytical processing (OLAP). It's optimized for fast analytical queries on large datasets.\n\n**Key Features:**\n- Column-oriented storage\n- Data compression\n- Parallel query execution\n- Distributed queries\n- Real-time analytics\n\n## Table Design Patterns\n\n### MergeTree Engine (Most Common)\n\n```sql\nCREATE TABLE markets_analytics (\n    date Date,\n    market_id String,\n    market_name String,\n    volume UInt64,\n    trades UInt32,\n    unique_traders UInt32,\n    avg_trade_size Float64,\n    created_at DateTime\n) ENGINE = MergeTree()\nPARTITION BY toYYYYMM(date)\nORDER BY (date, market_id)\nSETTINGS index_granularity = 8192;\n```\n\n### ReplacingMergeTree (Deduplication)\n\n```sql\n-- For data that may have duplicates (e.g., from multiple sources)\nCREATE TABLE user_events (\n    event_id String,\n    user_id String,\n    event_type String,\n    timestamp DateTime,\n    properties String\n) ENGINE = ReplacingMergeTree()\nPARTITION BY toYYYYMM(timestamp)\nORDER BY (user_id, event_id, timestamp)\nPRIMARY KEY (user_id, event_id);\n```\n\n### AggregatingMergeTree (Pre-aggregation)\n\n```sql\n-- For maintaining aggregated metrics\nCREATE TABLE market_stats_hourly (\n    hour DateTime,\n    market_id String,\n    total_volume AggregateFunction(sum, UInt64),\n    total_trades AggregateFunction(count, UInt32),\n    unique_users AggregateFunction(uniq, String)\n) ENGINE = AggregatingMergeTree()\nPARTITION BY toYYYYMM(hour)\nORDER BY (hour, market_id);\n\n-- Query aggregated data\nSELECT\n    hour,\n    market_id,\n    sumMerge(total_volume) AS volume,\n    countMerge(total_trades) AS trades,\n    uniqMerge(unique_users) AS users\nFROM market_stats_hourly\nWHERE hour >= toStartOfHour(now() - INTERVAL 24 HOUR)\nGROUP BY hour, market_id\nORDER BY hour DESC;\n```\n\n## Query Optimization Patterns\n\n### Efficient Filtering\n\n```sql\n-- ✅ GOOD: Use indexed columns first\nSELECT *\nFROM markets_analytics\nWHERE date >= '2025-01-01'\n  AND market_id = 'market-123'\n  AND volume > 1000\nORDER BY date DESC\nLIMIT 100;\n\n-- ❌ BAD: Filter on non-indexed columns first\nSELECT *\nFROM markets_analytics\nWHERE volume > 1000\n  AND market_name LIKE '%election%'\n  AND date >= '2025-01-01';\n```\n\n### Aggregations\n\n```sql\n-- ✅ GOOD: Use ClickHouse-specific aggregation functions\nSELECT\n    toStartOfDay(created_at) AS day,\n    market_id,\n    sum(volume) AS total_volume,\n    count() AS total_trades,\n    uniq(trader_id) AS unique_traders,\n    avg(trade_size) AS avg_size\nFROM trades\nWHERE created_at >= today() - INTERVAL 7 DAY\nGROUP BY day, market_id\nORDER BY day DESC, total_volume DESC;\n\n-- ✅ Use quantile for percentiles (more efficient than percentile)\nSELECT\n    quantile(0.50)(trade_size) AS median,\n    quantile(0.95)(trade_size) AS p95,\n    quantile(0.99)(trade_size) AS p99\nFROM trades\nWHERE created_at >= now() - INTERVAL 1 HOUR;\n```\n\n### Window Functions\n\n```sql\n-- Calculate running totals\nSELECT\n    date,\n    market_id,\n    volume,\n    sum(volume) OVER (\n        PARTITION BY market_id\n        ORDER BY date\n        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW\n    ) AS cumulative_volume\nFROM markets_analytics\nWHERE date >= today() - INTERVAL 30 DAY\nORDER BY market_id, date;\n```\n\n## Data Insertion Patterns\n\n### Bulk Insert (Recommended)\n\n```typescript\nimport { ClickHouse } from 'clickhouse'\n\nconst clickhouse = new ClickHouse({\n  url: process.env.CLICKHOUSE_URL,\n  port: 8123,\n  basicAuth: {\n    username: process.env.CLICKHOUSE_USER,\n    password: process.env.CLICKHOUSE_PASSWORD\n  }\n})\n\n\u002F\u002F ✅ Batch insert (efficient)\nasync function bulkInsertTrades(trades: Trade[]) {\n  const values = trades.map(trade => `(\n    '${trade.id}',\n    '${trade.market_id}',\n    '${trade.user_id}',\n    ${trade.amount},\n    '${trade.timestamp.toISOString()}'\n  )`).join(',')\n\n  await clickhouse.query(`\n    INSERT INTO trades (id, market_id, user_id, amount, timestamp)\n    VALUES ${values}\n  `).toPromise()\n}\n\n\u002F\u002F ❌ Individual inserts (slow)\nasync function insertTrade(trade: Trade) {\n  \u002F\u002F Don't do this in a loop!\n  await clickhouse.query(`\n    INSERT INTO trades VALUES ('${trade.id}', ...)\n  `).toPromise()\n}\n```\n\n### Streaming Insert\n\n```typescript\n\u002F\u002F For continuous data ingestion\nimport { createWriteStream } from 'fs'\nimport { pipeline } from 'stream\u002Fpromises'\n\nasync function streamInserts() {\n  const stream = clickhouse.insert('trades').stream()\n\n  for await (const batch of dataSource) {\n    stream.write(batch)\n  }\n\n  await stream.end()\n}\n```\n\n## Materialized Views\n\n### Real-time Aggregations\n\n```sql\n-- Create materialized view for hourly stats\nCREATE MATERIALIZED VIEW market_stats_hourly_mv\nTO market_stats_hourly\nAS SELECT\n    toStartOfHour(timestamp) AS hour,\n    market_id,\n    sumState(amount) AS total_volume,\n    countState() AS total_trades,\n    uniqState(user_id) AS unique_users\nFROM trades\nGROUP BY hour, market_id;\n\n-- Query the materialized view\nSELECT\n    hour,\n    market_id,\n    sumMerge(total_volume) AS volume,\n    countMerge(total_trades) AS trades,\n    uniqMerge(unique_users) AS users\nFROM market_stats_hourly\nWHERE hour >= now() - INTERVAL 24 HOUR\nGROUP BY hour, market_id;\n```\n\n## Performance Monitoring\n\n### Query Performance\n\n```sql\n-- Check slow queries\nSELECT\n    query_id,\n    user,\n    query,\n    query_duration_ms,\n    read_rows,\n    read_bytes,\n    memory_usage\nFROM system.query_log\nWHERE type = 'QueryFinish'\n  AND query_duration_ms > 1000\n  AND event_time >= now() - INTERVAL 1 HOUR\nORDER BY query_duration_ms DESC\nLIMIT 10;\n```\n\n### Table Statistics\n\n```sql\n-- Check table sizes\nSELECT\n    database,\n    table,\n    formatReadableSize(sum(bytes)) AS size,\n    sum(rows) AS rows,\n    max(modification_time) AS latest_modification\nFROM system.parts\nWHERE active\nGROUP BY database, table\nORDER BY sum(bytes) DESC;\n```\n\n## Common Analytics Queries\n\n### Time Series Analysis\n\n```sql\n-- Daily active users\nSELECT\n    toDate(timestamp) AS date,\n    uniq(user_id) AS daily_active_users\nFROM events\nWHERE timestamp >= today() - INTERVAL 30 DAY\nGROUP BY date\nORDER BY date;\n\n-- Retention analysis\nSELECT\n    signup_date,\n    countIf(days_since_signup = 0) AS day_0,\n    countIf(days_since_signup = 1) AS day_1,\n    countIf(days_since_signup = 7) AS day_7,\n    countIf(days_since_signup = 30) AS day_30\nFROM (\n    SELECT\n        user_id,\n        min(toDate(timestamp)) AS signup_date,\n        toDate(timestamp) AS activity_date,\n        dateDiff('day', signup_date, activity_date) AS days_since_signup\n    FROM events\n    GROUP BY user_id, activity_date\n)\nGROUP BY signup_date\nORDER BY signup_date DESC;\n```\n\n### Funnel Analysis\n\n```sql\n-- Conversion funnel\nSELECT\n    countIf(step = 'viewed_market') AS viewed,\n    countIf(step = 'clicked_trade') AS clicked,\n    countIf(step = 'completed_trade') AS completed,\n    round(clicked \u002F viewed * 100, 2) AS view_to_click_rate,\n    round(completed \u002F clicked * 100, 2) AS click_to_completion_rate\nFROM (\n    SELECT\n        user_id,\n        session_id,\n        event_type AS step\n    FROM events\n    WHERE event_date = today()\n)\nGROUP BY session_id;\n```\n\n### Cohort Analysis\n\n```sql\n-- User cohorts by signup month\nSELECT\n    toStartOfMonth(signup_date) AS cohort,\n    toStartOfMonth(activity_date) AS month,\n    dateDiff('month', cohort, month) AS months_since_signup,\n    count(DISTINCT user_id) AS active_users\nFROM (\n    SELECT\n        user_id,\n        min(toDate(timestamp)) OVER (PARTITION BY user_id) AS signup_date,\n        toDate(timestamp) AS activity_date\n    FROM events\n)\nGROUP BY cohort, month, months_since_signup\nORDER BY cohort, months_since_signup;\n```\n\n## Data Pipeline Patterns\n\n### ETL Pattern\n\n```typescript\n\u002F\u002F Extract, Transform, Load\nasync function etlPipeline() {\n  \u002F\u002F 1. Extract from source\n  const rawData = await extractFromPostgres()\n\n  \u002F\u002F 2. Transform\n  const transformed = rawData.map(row => ({\n    date: new Date(row.created_at).toISOString().split('T')[0],\n    market_id: row.market_slug,\n    volume: parseFloat(row.total_volume),\n    trades: parseInt(row.trade_count)\n  }))\n\n  \u002F\u002F 3. Load to ClickHouse\n  await bulkInsertToClickHouse(transformed)\n}\n\n\u002F\u002F Run periodically\nsetInterval(etlPipeline, 60 * 60 * 1000)  \u002F\u002F Every hour\n```\n\n### Change Data Capture (CDC)\n\n```typescript\n\u002F\u002F Listen to PostgreSQL changes and sync to ClickHouse\nimport { Client } from 'pg'\n\nconst pgClient = new Client({ connectionString: process.env.DATABASE_URL })\n\npgClient.query('LISTEN market_updates')\n\npgClient.on('notification', async (msg) => {\n  const update = JSON.parse(msg.payload)\n\n  await clickhouse.insert('market_updates', [\n    {\n      market_id: update.id,\n      event_type: update.operation,  \u002F\u002F INSERT, UPDATE, DELETE\n      timestamp: new Date(),\n      data: JSON.stringify(update.new_data)\n    }\n  ])\n})\n```\n\n## Best Practices\n\n### 1. Partitioning Strategy\n- Partition by time (usually month or day)\n- Avoid too many partitions (performance impact)\n- Use DATE type for partition key\n\n### 2. Ordering Key\n- Put most frequently filtered columns first\n- Consider cardinality (high cardinality first)\n- Order impacts compression\n\n### 3. Data Types\n- Use smallest appropriate type (UInt32 vs UInt64)\n- Use LowCardinality for repeated strings\n- Use Enum for categorical data\n\n### 4. Avoid\n- SELECT * (specify columns)\n- FINAL (merge data before query instead)\n- Too many JOINs (denormalize for analytics)\n- Small frequent inserts (batch instead)\n\n### 5. Monitoring\n- Track query performance\n- Monitor disk usage\n- Check merge operations\n- Review slow query log\n\n**Remember**: ClickHouse excels at analytical workloads. Design tables for your query patterns, batch inserts, and leverage materialized views for real-time aggregations.\n\n## When to Use\nThis skill is applicable to execute the workflow or actions described in the overview.\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,74,1155,"2026-05-16 13:10:10",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"其他","other","mdi-page-next-outline","其他类型Skill",5,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"职场发展","career","mdi-briefcase-outline","面试准备、简历优化、职业规划",4,575,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"b131fc6a-bf15-4f02-b95f-a1b8609c2163","1.0.0","cc-skill-clickhouse-io.zip",3972,"uploads\u002Fskills\u002F772d582d-a6fd-4cbb-bdc4-0ad4eccd579e\u002Fcc-skill-clickhouse-io.zip","643ef943a956f46071bfedaf31b6381c18c321c1c256ef598a0dc1edb6ccea01","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":10462}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]