应用简介
分布式NoSQL数据库(Cassandra、DynamoDB)的专业指导。重点关注思维模型、以查询为先的建模、单表设计和避免大规模系统中的热点分区。
---
name: nosql-expert
description: "Expert guidance for distributed NoSQL databases (Cassandra, DynamoDB). Focuses on mental models, query-first modeling, single-table design, and avoiding hot partitions in high-scale systems."
risk: unknown
source: community
date_added: "2026-02-27"
---
# NoSQL Expert Patterns (Cassandra & DynamoDB)
## Overview
This skill provides professional mental models and design patterns for **distributed wide-column and key-value stores** (specifically Apache Cassandra and Amazon DynamoDB).
Unlike SQL (where you model data entities), or document stores (like MongoDB), these distributed systems require you to **model your queries first**.
## When to Use
- **Designing for Scale**: Moving beyond simple single-node databases to distributed clusters.
- **Technology Selection**: Evaluating or using **Cassandra**, **ScyllaDB**, or **DynamoDB**.
- **Performance Tuning**: Troubleshooting "hot partitions" or high latency in existing NoSQL systems.
- **Microservices**: Implementing "database-per-service" patterns where highly optimized reads are required.
## The Mental Shift: SQL vs. Distributed NoSQL
| Feature | SQL (Relational) | Distributed NoSQL (Cassandra/DynamoDB) |
| :--- | :--- | :--- |
| **Data modeling** | Model Entities + Relationships | Model **Queries** (Access Patterns) |
| **Joins** | CPU-intensive, at read time | **Pre-computed** (Denormalized) at write time |
| **Storage cost** | Expensive (minimize duplication) | Cheap (duplicate data for read speed) |
| **Consistency** | ACID (Strong) | **BASE (Eventual)** / Tunable |
| **Scalability** | Vertical (Bigger machine) | **Horizontal** (More nodes/shards) |
> **The Golden Rule:** In SQL, you design the data model to answer *any* query. In NoSQL, you design the data model to answer *specific* queries efficiently.
## Core Design Patterns
### 1. Query-First Modeling (Access Patterns)
You typically cannot "add a query later" without migration or creating a new table/index.
**Process:**
1. **List all Entities** (User, Order, Product).
2. **List all Access Patterns** ("Get User by Email", "Get Orders by User sorted by Date").
3. **Design Table(s)** specifically to serve those patterns with a single lookup.
### 2. The Partition Key is King
Data is distributed across physical nodes based on the **Partition Key (PK)**.
- **Goal:** Even distribution of data and traffic.
- **Anti-Pattern:** Using a low-cardinality PK (e.g., `status="active"` or `gender="m"`) creates **Hot Partitions**, limiting throughput to a single node's capacity.
- **Best Practice:** Use high-cardinality keys (User IDs, Device IDs, Composite Keys).
### 3. Clustering / Sort Keys
Within a partition, data is sorted on disk by the **Clustering Key (Cassandra)** or **Sort Key (DynamoDB)**.
- This allows for efficient **Range Queries** (e.g., `WHERE user_id=X AND date > Y`).
- It effectively pre-sorts your data for specific retrieval requirements.
### 4. Single-Table Design (Adjacency Lists)
*Primary use: DynamoDB (but concepts apply elsewhere)*
Storing multiple entity types in one table to enable pre-joined reads.
| PK (Partition) | SK (Sort) | Data Fields... |
| :--- | :--- | :--- |
| `USER#123` | `PROFILE` | `{ name: "Ian", email: "..." }` |
| `USER#123` | `ORDER#998` | `{ total: 50.00, status: "shipped" }` |
| `USER#123` | `ORDER#999` | `{ total: 12.00, status: "pending" }` |
- **Query:** `PK="USER#123"`
- **Result:** Fetches User Profile AND all Orders in **one network request**.
### 5. Denormalization & Duplication
Don't be afraid to store the same data in multiple tables to serve different query patterns.
- **Table A:** `users_by_id` (PK: uuid)
- **Table B:** `users_by_email` (PK: email)
*Trade-off: You must manage data consistency across tables (often using eventual consistency or batch writes).*
## Specific Guidance
### Apache Cassandra / ScyllaDB
- **Primary Key Structure:** `((Partition Key), Clustering Columns)`
- **No Joins, No Aggregates:** Do not try to `JOIN` or `GROUP BY`. Pre-calculate aggregates in a separate counter table.
- **Avoid `ALLOW FILTERING`:** If you see this in production, your data model is wrong. It implies a full cluster scan.
- **Writes are Cheap:** Inserts and Updates are just appends to the LSM tree. Don't worry about write volume as much as read efficiency.
- **Tombstones:** Deletes are expensive markers. Avoid high-velocity delete patterns (like queues) in standard tables.
### AWS DynamoDB
- **GSI (Global Secondary Index):** Use GSIs to create alternative views of your data (e.g., "Search Orders by Date" instead of by User).
- *Note:* GSIs are eventually consistent.
- **LSI (Local Secondary Index):** Sorts data differently *within* the same partition. Must be created at table creation time.
- **WCU / RCU:** Understand capacity modes. Single-table design helps optimize consumed capacity units.
- **TTL:** Use Time-To-Live attributes to automatically expire old data (free delete) without creating tombstones.
## Expert Checklist
Before finalizing your NoSQL schema:
- [ ] **Access Pattern Coverage:** Does every query pattern map to a specific table or index?
- [ ] **Cardinality Check:** Does the Partition Key have enough unique values to spread traffic evenly?
- [ ] **Split Partition Risk:** For any single partition (e.g., a single user's orders), will it grow indefinitely? (If > 10GB, you need to "shard" the partition, e.g., `USER#123#2024-01`).
- [ ] **Consistency Requirement:** Can the application tolerate eventual consistency for this read pattern?
## Common Anti-Patterns
❌ **Scatter-Gather:** Querying *all* partitions to find one item (Scan).
❌ **Hot Keys:** Putting all "Monday" data into one partition.
❌ **Relational Modeling:** Creating `Author` and `Book` tables and trying to join them in code. (Instead, embed Book summaries in Author, or duplicate Author info in Books).
## Limitations
- Use this skill only when the task clearly matches the scope described above.
- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.
发布日期
5/16/2026
提供方
SkillOPIC
来源类型
导入
sickn33
coding
数据安全
使用 Skill 时,您的对话内容将被发送至 AI 模型进行处理。我们会严格保护您的隐私数据,不会将您的对话内容用于模型训练或分享给第三方。 以下为此 Skill 的数据处理说明。
此 Skill 将处理您的对话输入
您的消息将作为 Prompt 上下文发送至 AI 模型
所有通信均通过加密通道传输
对话记录仅保存在本地
您可以随时清除本地对话历史,清除后数据不可恢复
评分和评价
已验证评分
Skill 信息
了解此 Skill 的详细信息和功能特性
编程开发
后端开发
文件结构
SKILL.md6.1 KB
版本历史
- 公开
- 来源于用户导入
如需详细了解相关要求,请访问帮助中心,或给我们提交反馈信息