应用简介
LLM 运营 -- RAG、嵌入、向量数据库、微调、高级提示工程、LLM 成本、质量评估和 IA 生产架构。
---
name: llm-ops
description: "LLM Operations -- RAG, embeddings, vector databases, fine-tuning, prompt engineering avancado, custos de LLM, evals de qualidade e arquiteturas de IA para producao."
risk: safe
source: community
date_added: '2026-03-06'
author: renat
tags:
- llm
- rag
- embeddings
- vector-db
- fine-tuning
tools:
- claude-code
- antigravity
- cursor
- gemini-cli
- codex-cli
---
# LLM-OPS -- IA de Producao
## Overview
LLM Operations -- RAG, embeddings, vector databases, fine-tuning, prompt engineering avancado, custos de LLM, evals de qualidade e arquiteturas de IA para producao. Ativar para: implementar RAG, criar pipeline de embeddings, Pinecone/Chroma/pgvector, fine-tuning, prompt engineering, reducao de custos de LLM, evals, cache semantico, streaming, agents.
## When to Use This Skill
- When you need specialized assistance with this domain
## Do Not Use This Skill When
- The task is unrelated to llm ops
- A simpler, more specific tool can handle the request
- The user needs general-purpose assistance without domain expertise
## How It Works
> A diferenca entre um prototipo de IA e um produto de IA e operabilidade.
> LLM-Ops e a engenharia que torna IA confiavel, escalavel e economica.
---
## Arquitetura Rag Completa
[Documentos] -> [Chunking] -> [Embeddings] -> [Vector DB]
|
[Query] -> [Embed query] -> [Semantic Search] -> [Top K chunks]
|
[LLM + Context] -> [Resposta]
## Pipeline De Indexacao
from anthropic import Anthropic
import chromadb
client = Anthropic()
chroma = chromadb.PersistentClient(path="./chroma_db")
def chunk_text(text, chunk_size=500, overlap=50):
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size - overlap):
chunk = " ".join(words[i:i + chunk_size])
if chunk: chunks.append(chunk)
return chunks
def index_document(doc_id, content_text, metadata=None):
chunks = chunk_text(content_text)
ids = [f"{doc_id}_chunk_{i}" for i in range(len(chunks))]
collection.upsert(ids=ids, documents=chunks)
return len(chunks)
## Pipeline De Query Com Rag
def rag_query(query, top_k=5, system=None):
results = collection.query(
query_texts=[query], n_results=top_k,
include=["documents", "metadatas", "distances"])
context_parts = []
for doc, meta, dist in zip(results["documents"][0],
results["metadatas"][0],
results["distances"][0]):
if dist < 1.5:
src = meta.get("source", "doc")
context_parts.append(f"[Fonte: {src}]
{doc}")
context = "
---
".join(context_parts)
response = client.messages.create(
model="claude-opus-4-20250805", max_tokens=1024,
system=system or "Responda baseado no contexto.",
messages=[{"role": "user", "content": f"Contexto:
{context}
{query}"}])
return response.content[0].text
---
## Escolha Do Vector Db
| DB | Melhor Para | Hosting | Custo |
|----|------------|---------|-------|
| Chroma | Desenvolvimento, local | Self-hosted | Gratis |
| pgvector | Ja usa PostgreSQL | Self/Cloud | Gratis |
| Pinecone | Producao gerenciada | Cloud | USD 70+/mes |
| Weaviate | Multi-modal | Self/Cloud | Gratis+ |
| Qdrant | Alta performance | Self/Cloud | Gratis+ |
## Pgvector
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE knowledge_embeddings (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
embedding vector(1536),
metadata JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX ON knowledge_embeddings
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
SELECT content, 1 - (embedding <=> QUERY_VECTOR) AS similarity
FROM knowledge_embeddings ORDER BY similarity DESC LIMIT 5;
---
## Estrutura De Prompt De Elite
Componentes do system prompt Auri:
- Identidade: Nome (Auri), Tom (Natural, caloroso, direto), Plataforma (Amazon Alexa)
- Regras: Maximo 3 paragrafos curtos, sem markdown, linguagem conversacional
- Capacidades: analise de negocios, conselho baseado em dados, criatividade
- Limitacoes: sem internet tempo real, sem transacoes financeiras
- Personalizacao: {user_name}, {user_preferences}, {relevant_history}
## Chain-Of-Thought
def cot_analysis(problem: str) -> str:
steps = [
"1. O que exatamente esta sendo pedido?",
"2. Que informacoes sao criticas para resolver?",
"3. Quais abordagens possiveis existem?",
"4. Qual abordagem e melhor e por que?",
"5. Quais riscos ou limitacoes existem?",
]
prompt = f"Analise passo a passo:
PROBLEMA: {problem}
"
prompt += "
".join(steps) + "
Resposta final (concisa, para voz):"
return call_claude(prompt)
---
## Cache Semantico
class SemanticCache:
def __init__(self, similarity_threshold=0.95):
self.threshold = similarity_threshold
self.cache = {}
def get_cached(self, query, embedding):
for cached_emb, (response, _) in self.cache.items():
if cosine_similarity(embedding, cached_emb) >= self.threshold:
return response
return None
def set_cache(self, query, embedding, response):
self.cache[tuple(embedding)] = (response, query)
## Estimativa De Custos Claude
PRICING = {
"claude-opus-4-20250805": {"input": 15.00, "output": 75.00},
"claude-sonnet-4-5": {"input": 3.00, "output": 15.00},
"claude-haiku-3-5": {"input": 0.80, "output": 4.00},
}
def estimate_monthly_cost(model, avg_input, avg_output, req_per_day):
p = PRICING[model]
daily = (avg_input + avg_output) * req_per_day / 1e6
monthly = daily * p["input"] * 30
return {"model": model, "monthly_cost": "USD %.2f" % monthly}
---
## Framework De Avaliacao
from anthropic import Anthropic
client = Anthropic()
def evaluate_response(question, expected, actual, criteria):
criteria_text = "
".join(f"- {c}" for c in criteria)
eval_prompt = (
f"Avalie a resposta do assistente de IA.
"
f"PERGUNTA: {question}
RESPOSTA ESPERADA: {expected}
"
f"RESPOSTA ATUAL: {actual}
Criterios:
{criteria_text}
"
"Nota 0-10 e justificativa para cada criterio. Formato JSON."
)
response = client.messages.create(
model="claude-haiku-3-5", max_tokens=1024,
messages=[{"role": "user", "content": eval_prompt}]
)
import json
return json.loads(response.content[0].text)
AURI_EVALS = [
{
"question": "Quais sao os principais riscos de abrir startup agora?",
"criteria": ["precisao_factual", "relevancia", "clareza_para_voz"]
},
]
---
## 6. Comandos
| Comando | Acao |
|---------|------|
| /rag-setup | Configura pipeline RAG completo |
| /embed-docs | Indexa documentos no vector DB |
| /prompt-optimize | Otimiza prompt para qualidade e custo |
| /cost-estimate | Estima custo mensal do LLM |
| /eval-run | Roda suite de evals de qualidade |
| /cache-setup | Configura cache semantico |
| /model-select | Escolhe modelo ideal para o caso de uso |
## Best Practices
- Provide clear, specific context about your project and requirements
- Review all suggestions before applying them to production code
- Combine with other complementary skills for comprehensive analysis
## Common Pitfalls
- Using this skill for tasks outside its domain expertise
- Applying recommendations without understanding your specific context
- Not providing enough project context for accurate analysis
## Limitations
- Use this skill only when the task clearly matches the scope described above.
- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.
发布日期
5/16/2026
提供方
SkillOPIC
来源类型
导入
sickn33
coding
数据安全
使用 Skill 时,您的对话内容将被发送至 AI 模型进行处理。我们会严格保护您的隐私数据,不会将您的对话内容用于模型训练或分享给第三方。 以下为此 Skill 的数据处理说明。
此 Skill 将处理您的对话输入
您的消息将作为 Prompt 上下文发送至 AI 模型
所有通信均通过加密通道传输
对话记录仅保存在本地
您可以随时清除本地对话历史,清除后数据不可恢复
评分和评价
已验证评分
Skill 信息
了解此 Skill 的详细信息和功能特性
编程开发
后端开发
文件结构
SKILL.md8.1 KB
版本历史
- 公开
- 来源于用户导入
如需详细了解相关要求,请访问帮助中心,或给我们提交反馈信息