应用简介
管理LLM上下文窗口的策略包括
---
name: context-window-management
description: Strategies for managing LLM context windows including
summarization, trimming, routing, and avoiding context rot
risk: unknown
source: vibeship-spawner-skills (Apache 2.0)
date_added: 2026-02-27
---
# Context Window Management
Strategies for managing LLM context windows including summarization, trimming, routing, and avoiding context rot
## Capabilities
- context-engineering
- context-summarization
- context-trimming
- context-routing
- token-counting
- context-prioritization
## Prerequisites
- Knowledge: LLM fundamentals, Tokenization basics, Prompt engineering
- Skills_recommended: prompt-engineering
## Scope
- Does_not_cover: RAG implementation details, Model fine-tuning, Embedding models
- Boundaries: Focus is context optimization, Covers strategies not specific implementations
## Ecosystem
### Primary_tools
- tiktoken - OpenAI's tokenizer for counting tokens
- LangChain - Framework with context management utilities
- Claude API - 200K+ context with caching support
## Patterns
### Tiered Context Strategy
Different strategies based on context size
**When to use**: Building any multi-turn conversation system
interface ContextTier {
maxTokens: number;
strategy: 'full' | 'summarize' | 'rag';
model: string;
}
const TIERS: ContextTier[] = [
{ maxTokens: 8000, strategy: 'full', model: 'claude-3-haiku' },
{ maxTokens: 32000, strategy: 'full', model: 'claude-3-5-sonnet' },
{ maxTokens: 100000, strategy: 'summarize', model: 'claude-3-5-sonnet' },
{ maxTokens: Infinity, strategy: 'rag', model: 'claude-3-5-sonnet' }
];
async function selectStrategy(messages: Message[]): ContextTier {
const tokens = await countTokens(messages);
for (const tier of TIERS) {
if (tokens <= tier.maxTokens) {
return tier;
}
}
return TIERS[TIERS.length - 1];
}
async function prepareContext(messages: Message[]): PreparedContext {
const tier = await selectStrategy(messages);
switch (tier.strategy) {
case 'full':
return { messages, model: tier.model };
case 'summarize':
const summary = await summarizeOldMessages(messages);
return { messages: [summary, ...recentMessages(messages)], model: tier.model };
case 'rag':
const relevant = await retrieveRelevant(messages);
return { messages: [...relevant, ...recentMessages(messages)], model: tier.model };
}
}
### Serial Position Optimization
Place important content at start and end
**When to use**: Constructing prompts with significant context
// LLMs weight beginning and end more heavily
// Structure prompts to leverage this
function buildOptimalPrompt(components: {
systemPrompt: string;
criticalContext: string;
conversationHistory: Message[];
currentQuery: string;
}): string {
// START: System instructions (always first)
const parts = [components.systemPrompt];
// CRITICAL CONTEXT: Right after system (high primacy)
if (components.criticalContext) {
parts.push(`## Key Context\n${components.criticalContext}`);
}
// MIDDLE: Conversation history (lower weight)
// Summarize if long, keep recent messages full
const history = components.conversationHistory;
if (history.length > 10) {
const oldSummary = summarize(history.slice(0, -5));
const recent = history.slice(-5);
parts.push(`## Earlier Conversation (Summary)\n${oldSummary}`);
parts.push(`## Recent Messages\n${formatMessages(recent)}`);
} else {
parts.push(`## Conversation\n${formatMessages(history)}`);
}
// END: Current query (high recency)
// Restate critical requirements here
parts.push(`## Current Request\n${components.currentQuery}`);
// FINAL: Reminder of key constraints
parts.push(`Remember: ${extractKeyConstraints(components.systemPrompt)}`);
return parts.join('\n\n');
}
### Intelligent Summarization
Summarize by importance, not just recency
**When to use**: Context exceeds optimal size
interface MessageWithMetadata extends Message {
importance: number; // 0-1 score
hasCriticalInfo: boolean; // User preferences, decisions
referenced: boolean; // Was this referenced later?
}
async function smartSummarize(
messages: MessageWithMetadata[],
targetTokens: number
): Message[] {
// Sort by importance, preserve order for tied scores
const sorted = [...messages].sort((a, b) =>
(b.importance + (b.hasCriticalInfo ? 0.5 : 0) + (b.referenced ? 0.3 : 0)) -
(a.importance + (a.hasCriticalInfo ? 0.5 : 0) + (a.referenced ? 0.3 : 0))
);
const keep: Message[] = [];
const summarizePool: Message[] = [];
let currentTokens = 0;
for (const msg of sorted) {
const msgTokens = await countTokens([msg]);
if (currentTokens + msgTokens < targetTokens * 0.7) {
keep.push(msg);
currentTokens += msgTokens;
} else {
summarizePool.push(msg);
}
}
// Summarize the low-importance messages
if (summarizePool.length > 0) {
const summary = await llm.complete(`
Summarize these messages, preserving:
- Any user preferences or decisions
- Key facts that might be referenced later
- The overall flow of conversation
Messages:
${formatMessages(summarizePool)}
`);
keep.unshift({ role: 'system', content: `[Earlier context: ${summary}]` });
}
// Restore original order
return keep.sort((a, b) => a.timestamp - b.timestamp);
}
### Token Budget Allocation
Allocate token budget across context components
**When to use**: Need predictable context management
interface TokenBudget {
system: number; // System prompt
criticalContext: number; // User prefs, key info
history: number; // Conversation history
query: number; // Current query
response: number; // Reserved for response
}
function allocateBudget(totalTokens: number): TokenBudget {
return {
system: Math.floor(totalTokens * 0.10), // 10%
criticalContext: Math.floor(totalTokens * 0.15), // 15%
history: Math.floor(totalTokens * 0.40), // 40%
query: Math.floor(totalTokens * 0.10), // 10%
response: Math.floor(totalTokens * 0.25), // 25%
};
}
async function buildWithBudget(
components: ContextComponents,
modelMaxTokens: number
): PreparedContext {
const budget = allocateBudget(modelMaxTokens);
// Truncate/summarize each component to fit budget
const prepared = {
system: truncateToTokens(components.system, budget.system),
criticalContext: truncateToTokens(
components.criticalContext, budget.criticalContext
),
history: await summarizeToTokens(components.history, budget.history),
query: truncateToTokens(components.query, budget.query),
};
// Reallocate unused budget
const used = await countTokens(Object.values(prepared).join('\n'));
const remaining = modelMaxTokens - used - budget.response;
if (remaining > 0) {
// Give extra to history (most valuable for conversation)
prepared.history = await summarizeToTokens(
components.history,
budget.history + remaining
);
}
return prepared;
}
## Validation Checks
### No Token Counting
Severity: WARNING
Message: Building context without token counting. May exceed model limits.
Fix action: Count tokens before sending, implement budget allocation
### Naive Message Truncation
Severity: WARNING
Message: Truncating messages without summarization. Critical context may be lost.
Fix action: Summarize old messages instead of simply removing them
### Hardcoded Token Limit
Severity: INFO
Message: Hardcoded token limit. Consider making configurable per model.
Fix action: Use model-specific limits from configuration
### No Context Management Strategy
Severity: WARNING
Message: LLM calls without context management strategy.
Fix action: Implement context management: budgets, summarization, or RAG
## Collaboration
### Delegation Triggers
- retrieval|rag|search -> rag-implementation (Need retrieval system)
- memory|persistence|remember -> conversation-memory (Need memory storage)
- cache|caching -> prompt-caching (Need caching optimization)
### Complete Context System
Skills: context-window-management, rag-implementation, conversation-memory, prompt-caching
Workflow:
```
1. Design context strategy
2. Implement RAG for large corpuses
3. Set up memory persistence
4. Add caching for performance
```
## Related Skills
Works well with: `rag-implementation`, `conversation-memory`, `prompt-caching`, `llm-npc-dialogue`
## When to Use
- User mentions or implies: context window
- User mentions or implies: token limit
- User mentions or implies: context management
- User mentions or implies: context engineering
- User mentions or implies: long context
- User mentions or implies: context overflow
## Limitations
- Use this skill only when the task clearly matches the scope described above.
- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.
发布日期
5/16/2026
提供方
SkillOPIC
来源类型
导入
sickn33
coding
数据安全
使用 Skill 时,您的对话内容将被发送至 AI 模型进行处理。我们会严格保护您的隐私数据,不会将您的对话内容用于模型训练或分享给第三方。 以下为此 Skill 的数据处理说明。
此 Skill 将处理您的对话输入
您的消息将作为 Prompt 上下文发送至 AI 模型
所有通信均通过加密通道传输
对话记录仅保存在本地
您可以随时清除本地对话历史,清除后数据不可恢复
评分和评价
已验证评分
Skill 信息
了解此 Skill 的详细信息和功能特性
编程开发
后端开发
文件结构
SKILL.md9.2 KB
版本历史
- 公开
- 来源于用户导入
如需详细了解相关要求,请访问帮助中心,或给我们提交反馈信息