SkillOPIC

应用简介

使用Python日志记录、警报和CLI指标检索,用Trackio跟踪ML实验。

---
source: "https://github.com/huggingface/skills/tree/main/skills/huggingface-trackio"
name: hugging-face-trackio
description: Track ML experiments with Trackio using Python logging, alerts, and CLI metric retrieval.
risk: unknown
---

# Trackio - Experiment Tracking for ML Training

Trackio is an experiment tracking library for logging and visualizing ML training metrics. It syncs to Hugging Face Spaces for real-time monitoring dashboards.

## Three Interfaces

| Task | Interface | Reference |
|------|-----------|-----------|
| **Logging metrics** during training | Python API | [references/logging_metrics.md](references/logging_metrics.md) |
| **Firing alerts** for training diagnostics | Python API | [references/alerts.md](references/alerts.md) |
| **Retrieving metrics & alerts** after/during training | CLI | [references/retrieving_metrics.md](references/retrieving_metrics.md) |

## When to Use Each

### Python API → Logging

Use `import trackio` in your training scripts to log metrics:

- Initialize tracking with `trackio.init()`
- Log metrics with `trackio.log()` or use TRL's `report_to="trackio"`
- Finalize with `trackio.finish()`

**Key concept**: For remote/cloud training, pass `space_id` — metrics sync to a Space dashboard so they persist after the instance terminates.

→ See [references/logging_metrics.md](references/logging_metrics.md) for setup, TRL integration, and configuration options.

### Python API → Alerts

Insert `trackio.alert()` calls in training code to flag important events — like inserting print statements for debugging, but structured and queryable:

- `trackio.alert(title="...", level=trackio.AlertLevel.WARN)` — fire an alert
- Three severity levels: `INFO`, `WARN`, `ERROR`
- Alerts are printed to terminal, stored in the database, shown in the dashboard, and optionally sent to webhooks (Slack/Discord)

**Key concept for LLM agents**: Alerts are the primary mechanism for autonomous experiment iteration. An agent should insert alerts into training code for diagnostic conditions (loss spikes, NaN gradients, low accuracy, training stalls). Since alerts are printed to the terminal, an agent that is watching the training script's output will see them automatically. For background or detached runs, the agent can poll via CLI instead.

→ See [references/alerts.md](references/alerts.md) for the full alerts API, webhook setup, and autonomous agent workflows.

### CLI → Retrieving

Use the `trackio` command to query logged metrics and alerts:

- `trackio list projects/runs/metrics` — discover what's available
- `trackio get project/run/metric` — retrieve summaries and values
- `trackio list alerts --project <name> --json` — retrieve alerts
- `trackio show` — launch the dashboard
- `trackio sync` — sync to HF Space

**Key concept**: Add `--json` for programmatic output suitable for automation and LLM agents.

→ See [references/retrieving_metrics.md](references/retrieving_metrics.md) for all commands, workflows, and JSON output formats.

## Minimal Logging Setup

```python
import trackio

trackio.init(project="my-project", space_id="username/trackio")
trackio.log({"loss": 0.1, "accuracy": 0.9})
trackio.log({"loss": 0.09, "accuracy": 0.91})
trackio.finish()
```

### Minimal Retrieval

```bash
trackio list projects --json
trackio get metric --project my-project --run my-run --metric loss --json
```

## Autonomous ML Experiment Workflow

When running experiments autonomously as an LLM agent, the recommended workflow is:

1. **Set up training with alerts** — insert `trackio.alert()` calls for diagnostic conditions
2. **Launch training** — run the script in the background
3. **Poll for alerts** — use `trackio list alerts --project <name> --json --since <timestamp>` to check for new alerts
4. **Read metrics** — use `trackio get metric ...` to inspect specific values
5. **Iterate** — based on alerts and metrics, stop the run, adjust hyperparameters, and launch a new run

```python
import trackio

trackio.init(project="my-project", config={"lr": 1e-4})

for step in range(num_steps):
    loss = train_step()
    trackio.log({"loss": loss, "step": step})

    if step > 100 and loss > 5.0:
        trackio.alert(
            title="Loss divergence",
            text=f"Loss {loss:.4f} still high after {step} steps",
            level=trackio.AlertLevel.ERROR,
        )
    if step > 0 and abs(loss) < 1e-8:
        trackio.alert(
            title="Vanishing loss",
            text="Loss near zero — possible gradient collapse",
            level=trackio.AlertLevel.WARN,
        )

trackio.finish()
```

Then poll from a separate terminal/process:

```bash
trackio list alerts --project my-project --json --since "2025-01-01T00:00:00"
```

## Limitations
- Use this skill only when the task clearly matches the scope described above.
- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.
发布日期

5/16/2026

提供方

SkillOPIC

来源类型

导入

sickn33
other

数据安全

使用 Skill 时,您的对话内容将被发送至 AI 模型进行处理。我们会严格保护您的隐私数据,不会将您的对话内容用于模型训练或分享给第三方。 以下为此 Skill 的数据处理说明。

此 Skill 将处理您的对话输入

您的消息将作为 Prompt 上下文发送至 AI 模型

所有通信均通过加密通道传输
对话记录仅保存在本地

您可以随时清除本地对话历史,清除后数据不可恢复

评分和评价

已验证评分
0 / 5
0条评价
1
0
2
0
3
0
4
0
5
0

暂无评价,快来抢沙发吧!

Skill 信息

了解此 Skill 的详细信息和功能特性

其他

职场发展

文件结构
5 个文件· 24.7 KB
.claude-plugin
references
SKILL.md5.0 KB
版本历史
  • 公开
  • 来源于用户导入

如需详细了解相关要求,请访问帮助中心,或给我们提交反馈信息