[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-194ce431-7204-4e52-8373-07dca5570420":3,"$fa9GIzuLqYUxQwImyCjfvKADxUJouzxUagNwt4lCh_eY":42},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":33},"194ce431-7204-4e52-8373-07dca5570420","prompt-caching","LLM提示的缓存策略，包括Anthropic提示","cat_coding_backend","mod_coding","sickn33,coding","---\nname: prompt-caching\ndescription: Caching strategies for LLM prompts including Anthropic prompt\n  caching, response caching, and CAG (Cache Augmented Generation)\nrisk: none\nsource: vibeship-spawner-skills (Apache 2.0)\ndate_added: 2026-02-27\n---\n\n# Prompt Caching\n\nCaching strategies for LLM prompts including Anthropic prompt caching, response caching, and CAG (Cache Augmented Generation)\n\n## Capabilities\n\n- prompt-cache\n- response-cache\n- kv-cache\n- cag-patterns\n- cache-invalidation\n\n## Prerequisites\n\n- Knowledge: Caching fundamentals, LLM API usage, Hash functions\n- Skills_recommended: context-window-management\n\n## Scope\n\n- Does_not_cover: CDN caching, Database query caching, Static asset caching\n- Boundaries: Focus is LLM-specific caching, Covers prompt and response caching\n\n## Ecosystem\n\n### Primary_tools\n\n- Anthropic Prompt Caching - Native prompt caching in Claude API\n- Redis - In-memory cache for responses\n- OpenAI Caching - Automatic caching in OpenAI API\n\n## Patterns\n\n### Anthropic Prompt Caching\n\nUse Claude's native prompt caching for repeated prefixes\n\n**When to use**: Using Claude API with stable system prompts or context\n\nimport Anthropic from '@anthropic-ai\u002Fsdk';\n\nconst client = new Anthropic();\n\n\u002F\u002F Cache the stable parts of your prompt\nasync function queryWithCaching(userQuery: string) {\n    const response = await client.messages.create({\n        model: \"claude-sonnet-4-20250514\",\n        max_tokens: 1024,\n        system: [\n            {\n                type: \"text\",\n                text: LONG_SYSTEM_PROMPT,  \u002F\u002F Your detailed instructions\n                cache_control: { type: \"ephemeral\" }  \u002F\u002F Cache this!\n            },\n            {\n                type: \"text\",\n                text: KNOWLEDGE_BASE,  \u002F\u002F Large static context\n                cache_control: { type: \"ephemeral\" }\n            }\n        ],\n        messages: [\n            { role: \"user\", content: userQuery }  \u002F\u002F Dynamic part\n        ]\n    });\n\n    \u002F\u002F Check cache usage\n    console.log(`Cache read: ${response.usage.cache_read_input_tokens}`);\n    console.log(`Cache write: ${response.usage.cache_creation_input_tokens}`);\n\n    return response;\n}\n\n\u002F\u002F Cost savings: 90% reduction on cached tokens\n\u002F\u002F Latency savings: Up to 2x faster\n\n### Response Caching\n\nCache full LLM responses for identical or similar queries\n\n**When to use**: Same queries asked repeatedly\n\nimport { createHash } from 'crypto';\nimport Redis from 'ioredis';\n\nconst redis = new Redis(process.env.REDIS_URL);\n\nclass ResponseCache {\n    private ttl = 3600;  \u002F\u002F 1 hour default\n\n    \u002F\u002F Exact match caching\n    async getCached(prompt: string): Promise\u003Cstring | null> {\n        const key = this.hashPrompt(prompt);\n        return await redis.get(`response:${key}`);\n    }\n\n    async setCached(prompt: string, response: string): Promise\u003Cvoid> {\n        const key = this.hashPrompt(prompt);\n        await redis.set(`response:${key}`, response, 'EX', this.ttl);\n    }\n\n    private hashPrompt(prompt: string): string {\n        return createHash('sha256').update(prompt).digest('hex');\n    }\n\n    \u002F\u002F Semantic similarity caching\n    async getSemanticallySimilar(\n        prompt: string,\n        threshold: number = 0.95\n    ): Promise\u003Cstring | null> {\n        const embedding = await embed(prompt);\n        const similar = await this.vectorCache.search(embedding, 1);\n\n        if (similar.length && similar[0].similarity > threshold) {\n            return await redis.get(`response:${similar[0].id}`);\n        }\n        return null;\n    }\n\n    \u002F\u002F Temperature-aware caching\n    async getCachedWithParams(\n        prompt: string,\n        params: { temperature: number; model: string }\n    ): Promise\u003Cstring | null> {\n        \u002F\u002F Only cache low-temperature responses\n        if (params.temperature > 0.5) return null;\n\n        const key = this.hashPrompt(\n            `${prompt}|${params.model}|${params.temperature}`\n        );\n        return await redis.get(`response:${key}`);\n    }\n}\n\n### Cache Augmented Generation (CAG)\n\nPre-cache documents in prompt instead of RAG retrieval\n\n**When to use**: Document corpus is stable and fits in context\n\n\u002F\u002F CAG: Pre-compute document context, cache in prompt\n\u002F\u002F Better than RAG when:\n\u002F\u002F - Documents are stable\n\u002F\u002F - Total fits in context window\n\u002F\u002F - Latency is critical\n\nclass CAGSystem {\n    private cachedContext: string | null = null;\n    private lastUpdate: number = 0;\n\n    async buildCachedContext(documents: Document[]): Promise\u003Cvoid> {\n        \u002F\u002F Pre-process and format documents\n        const formatted = documents.map(d =>\n            `## ${d.title}\\n${d.content}`\n        ).join('\\n\\n');\n\n        \u002F\u002F Store with timestamp\n        this.cachedContext = formatted;\n        this.lastUpdate = Date.now();\n    }\n\n    async query(userQuery: string): Promise\u003Cstring> {\n        \u002F\u002F Use cached context directly in prompt\n        const response = await client.messages.create({\n            model: \"claude-sonnet-4-20250514\",\n            max_tokens: 1024,\n            system: [\n                {\n                    type: \"text\",\n                    text: \"You are a helpful assistant with access to the following documentation.\",\n                    cache_control: { type: \"ephemeral\" }\n                },\n                {\n                    type: \"text\",\n                    text: this.cachedContext!,  \u002F\u002F Pre-cached docs\n                    cache_control: { type: \"ephemeral\" }\n                }\n            ],\n            messages: [{ role: \"user\", content: userQuery }]\n        });\n\n        return response.content[0].text;\n    }\n\n    \u002F\u002F Periodic refresh\n    async refreshIfNeeded(documents: Document[]): Promise\u003Cvoid> {\n        const stale = Date.now() - this.lastUpdate > 3600000;  \u002F\u002F 1 hour\n        if (stale) {\n            await this.buildCachedContext(documents);\n        }\n    }\n}\n\n\u002F\u002F CAG vs RAG decision matrix:\n\u002F\u002F | Factor           | CAG Better | RAG Better |\n\u002F\u002F |------------------|------------|------------|\n\u002F\u002F | Corpus size      | \u003C 100K tokens | > 100K tokens |\n\u002F\u002F | Update frequency | Low | High |\n\u002F\u002F | Latency needs    | Critical | Flexible |\n\u002F\u002F | Query specificity| General | Specific |\n\n## Sharp Edges\n\n### Cache miss causes latency spike with additional overhead\n\nSeverity: HIGH\n\nSituation: Slow response when cache miss, slower than no caching\n\nSymptoms:\n- Slow responses on cache miss\n- Cache hit rate below 50%\n- Higher latency than uncached\n\nWhy this breaks:\nCache check adds latency.\nCache write adds more latency.\nMiss + overhead > no caching.\n\nRecommended fix:\n\n\u002F\u002F Optimize for cache misses, not just hits\n\nclass OptimizedCache {\n    async queryWithCache(prompt: string): Promise\u003Cstring> {\n        const cacheKey = this.hash(prompt);\n\n        \u002F\u002F Non-blocking cache check\n        const cachedPromise = this.cache.get(cacheKey);\n        const llmPromise = this.queryLLM(prompt);\n\n        \u002F\u002F Race: use cache if available before LLM returns\n        const cached = await Promise.race([\n            cachedPromise,\n            sleep(50).then(() => null)  \u002F\u002F 50ms cache timeout\n        ]);\n\n        if (cached) {\n            \u002F\u002F Cancel LLM request if possible\n            return cached;\n        }\n\n        \u002F\u002F Cache miss: continue with LLM\n        const response = await llmPromise;\n\n        \u002F\u002F Async cache write (don't block response)\n        this.cache.set(cacheKey, response).catch(console.error);\n\n        return response;\n    }\n}\n\n\u002F\u002F Alternative: Probabilistic caching\n\u002F\u002F Only cache if query matches known high-frequency patterns\nclass SelectiveCache {\n    private patterns: Map\u003Cstring, number> = new Map();\n\n    shouldCache(prompt: string): boolean {\n        const pattern = this.extractPattern(prompt);\n        const frequency = this.patterns.get(pattern) || 0;\n\n        \u002F\u002F Only cache high-frequency patterns\n        return frequency > 10;\n    }\n\n    recordQuery(prompt: string): void {\n        const pattern = this.extractPattern(prompt);\n        this.patterns.set(pattern, (this.patterns.get(pattern) || 0) + 1);\n    }\n}\n\n### Cached responses become incorrect over time\n\nSeverity: HIGH\n\nSituation: Users get outdated or wrong information from cache\n\nSymptoms:\n- Users report wrong information\n- Answers don't match current data\n- Complaints about outdated responses\n\nWhy this breaks:\nSource data changed.\nNo cache invalidation.\nLong TTLs for dynamic data.\n\nRecommended fix:\n\n\u002F\u002F Implement proper cache invalidation\n\nclass InvalidatingCache {\n    \u002F\u002F Version-based invalidation\n    private cacheVersion = 1;\n\n    getCacheKey(prompt: string): string {\n        return `v${this.cacheVersion}:${this.hash(prompt)}`;\n    }\n\n    invalidateAll(): void {\n        this.cacheVersion++;\n        \u002F\u002F Old keys automatically become orphaned\n    }\n\n    \u002F\u002F Content-hash invalidation\n    async setWithContentHash(\n        key: string,\n        response: string,\n        sourceContent: string\n    ): Promise\u003Cvoid> {\n        const contentHash = this.hash(sourceContent);\n        await this.cache.set(key, {\n            response,\n            contentHash,\n            timestamp: Date.now()\n        });\n    }\n\n    async getIfValid(\n        key: string,\n        currentSourceContent: string\n    ): Promise\u003Cstring | null> {\n        const cached = await this.cache.get(key);\n        if (!cached) return null;\n\n        \u002F\u002F Check if source content changed\n        const currentHash = this.hash(currentSourceContent);\n        if (cached.contentHash !== currentHash) {\n            await this.cache.delete(key);\n            return null;\n        }\n\n        return cached.response;\n    }\n\n    \u002F\u002F Event-based invalidation\n    onSourceUpdate(sourceId: string): void {\n        \u002F\u002F Invalidate all caches that used this source\n        this.invalidateByTag(`source:${sourceId}`);\n    }\n}\n\n### Prompt caching doesn't work due to prefix changes\n\nSeverity: MEDIUM\n\nSituation: Cache misses despite similar prompts\n\nSymptoms:\n- Cache hit rate lower than expected\n- Cache creation tokens high, read low\n- Similar prompts not hitting cache\n\nWhy this breaks:\nAnthropic caching requires exact prefix match.\nTimestamps or dynamic content in prefix.\nDifferent message order.\n\nRecommended fix:\n\n\u002F\u002F Structure prompts for optimal caching\n\nclass CacheOptimizedPrompts {\n    \u002F\u002F WRONG: Dynamic content in cached prefix\n    buildPromptBad(query: string): SystemMessage[] {\n        return [\n            {\n                type: \"text\",\n                text: `You are helpful. Current time: ${new Date()}`,  \u002F\u002F BREAKS CACHE!\n                cache_control: { type: \"ephemeral\" }\n            }\n        ];\n    }\n\n    \u002F\u002F RIGHT: Static prefix, dynamic at end\n    buildPromptGood(query: string): SystemMessage[] {\n        return [\n            {\n                type: \"text\",\n                text: STATIC_SYSTEM_PROMPT,  \u002F\u002F Never changes\n                cache_control: { type: \"ephemeral\" }\n            },\n            {\n                type: \"text\",\n                text: STATIC_KNOWLEDGE_BASE,  \u002F\u002F Rarely changes\n                cache_control: { type: \"ephemeral\" }\n            }\n            \u002F\u002F Dynamic content goes in messages, NOT system\n        ];\n    }\n\n    \u002F\u002F Prefix ordering matters\n    buildWithConsistentOrder(components: string[]): SystemMessage[] {\n        \u002F\u002F Sort components for consistent ordering\n        const sorted = [...components].sort();\n        return sorted.map((c, i) => ({\n            type: \"text\",\n            text: c,\n            cache_control: i === sorted.length - 1\n                ? { type: \"ephemeral\" }\n                : undefined  \u002F\u002F Only cache the full prefix\n        }));\n    }\n}\n\n## Validation Checks\n\n### Caching High Temperature Responses\n\nSeverity: WARNING\n\nMessage: Caching with high temperature. Responses are non-deterministic.\n\nFix action: Only cache responses with temperature \u003C= 0.5\n\n### Cache Without TTL\n\nSeverity: WARNING\n\nMessage: Cache without TTL. May serve stale data indefinitely.\n\nFix action: Set appropriate TTL based on data freshness requirements\n\n### Dynamic Content in Cached Prefix\n\nSeverity: WARNING\n\nMessage: Dynamic content in cached prefix. Will cause cache misses.\n\nFix action: Move dynamic content outside of cache_control blocks\n\n### No Cache Metrics\n\nSeverity: INFO\n\nMessage: Cache without hit\u002Fmiss tracking. Can't measure effectiveness.\n\nFix action: Add cache hit\u002Fmiss metrics and logging\n\n## Collaboration\n\n### Delegation Triggers\n\n- context window|token -> context-window-management (Need context optimization)\n- rag|retrieval -> rag-implementation (Need retrieval system)\n- memory -> conversation-memory (Need memory persistence)\n\n### High-Performance LLM System\n\nSkills: prompt-caching, context-window-management, rag-implementation\n\nWorkflow:\n\n```\n1. Analyze query patterns\n2. Implement prompt caching for stable prefixes\n3. Add response caching for frequent queries\n4. Consider CAG for stable document sets\n5. Monitor and optimize hit rates\n```\n\n## Related Skills\n\nWorks well with: `context-window-management`, `rag-implementation`, `conversation-memory`\n\n## When to Use\n- User mentions or implies: prompt caching\n- User mentions or implies: cache prompt\n- User mentions or implies: response cache\n- User mentions or implies: cag\n- User mentions or implies: cache augmented\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,143,788,"2026-05-16 13:35:29",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"编程开发","coding","mdi-code-braces","代码生成、调试、审查，提升开发效率",2,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":25,"skillCount":32,"createdAt":26},"后端开发","backend","mdi-server","API、数据库、服务端架构",296,[34],{"id":35,"skillId":4,"version":36,"fileName":37,"fileSize":38,"filePath":39,"fileHash":40,"manifest":41,"createdAt":19},"fa843126-c0c5-47d8-9978-f33786f2d9d8","1.0.0","prompt-caching.zip",4560,"uploads\u002Fskills\u002F194ce431-7204-4e52-8373-07dca5570420\u002Fprompt-caching.zip","f62dd8010701361200774b87d9619882196578f2fecc2d2ed83d3cb96a62e397","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":13427}]",{"code":43,"message":44,"data":45},200,"success",{"items":46,"stats":47,"page":50},[],{"averageRating":48,"totalRatings":48,"ratingCounts":49},0,[48,48,48,48,48],{"limit":51,"offset":48,"hasMore":52,"nextOffset":51,"ratedOnly":16},15,false]