[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-aba51b60-c0d6-4799-b8de-4ee251476513":3,"$fJEuI_HZQWgZcIool2Cs4SkI0xIecJ2ibq3vh4Lk54xo":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"aba51b60-c0d6-4799-b8de-4ee251476513","hugging-face-jobs","在Hugging Face Jobs上运行工作负载，使用托管CPU、GPU、TPU、密钥和Hub持久化。","cat_life_career","mod_other","sickn33,other","---\nsource: \"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fskills\u002Ftree\u002Fmain\u002Fskills\u002Fhuggingface-jobs\"\nname: hugging-face-jobs\ndescription: Run workloads on Hugging Face Jobs with managed CPUs, GPUs, TPUs, secrets, and Hub persistence.\nlicense: Complete terms in LICENSE.txt\nrisk: unknown\n---\n\n# Running Workloads on Hugging Face Jobs\n\n## Overview\n\nRun any workload on fully managed Hugging Face infrastructure. No local setup required—jobs run on cloud CPUs, GPUs, or TPUs and can persist results to the Hugging Face Hub.\n\n**Common use cases:**\n- **Data Processing** - Transform, filter, or analyze large datasets\n- **Batch Inference** - Run inference on thousands of samples\n- **Experiments & Benchmarks** - Reproducible ML experiments\n- **Model Training** - Fine-tune models (see `model-trainer` skill for TRL-specific training)\n- **Synthetic Data Generation** - Generate datasets using LLMs\n- **Development & Testing** - Test code without local GPU setup\n- **Scheduled Jobs** - Automate recurring tasks\n\n**For model training specifically:** See the `model-trainer` skill for TRL-based training workflows.\n\n## When to Use This Skill\n\nUse this skill when users want to:\n- Run Python workloads on cloud infrastructure\n- Execute jobs without local GPU\u002FTPU setup\n- Process data at scale\n- Run batch inference or experiments\n- Schedule recurring tasks\n- Use GPUs\u002FTPUs for any workload\n- Persist results to the Hugging Face Hub\n\n## Key Directives\n\nWhen assisting with jobs:\n\n1. **ALWAYS use `hf_jobs()` MCP tool** - Submit jobs using `hf_jobs(\"uv\", {...})` or `hf_jobs(\"run\", {...})`. The `script` parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to `hf_jobs()`.\n\n2. **Always handle authentication** - Jobs that interact with the Hub require `HF_TOKEN` via secrets. See Token Usage section below.\n\n3. **Provide job details after submission** - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.\n\n4. **Set appropriate timeouts** - Default 30min may be insufficient for long-running tasks.\n\n## Prerequisites Checklist\n\nBefore starting any job, verify:\n\n### ✅ **Account & Authentication**\n- Hugging Face Account with [Pro](https:\u002F\u002Fhf.co\u002Fpro), [Team](https:\u002F\u002Fhf.co\u002Fenterprise), or [Enterprise](https:\u002F\u002Fhf.co\u002Fenterprise) plan (Jobs require paid plan)\n- Authenticated login: Check with `hf_whoami()`\n- **HF_TOKEN for Hub Access** ⚠️ CRITICAL - Required for any Hub operations (push models\u002Fdatasets, download private repos, etc.)\n- Token must have appropriate permissions (read for downloads, write for uploads)\n\n### ✅ **Token Usage** (See Token Usage section for details)\n\n**When tokens are required:**\n- Pushing models\u002Fdatasets to Hub\n- Accessing private repositories\n- Using Hub APIs in scripts\n- Any authenticated Hub operations\n\n**How to provide tokens:**\n```python\n# hf_jobs MCP tool — $HF_TOKEN is auto-replaced with real token:\n{\"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}}\n\n# HfApi().run_uv_job() — MUST pass actual token:\nfrom huggingface_hub import get_token\nsecrets={\"HF_TOKEN\": get_token()}\n```\n\n**⚠️ CRITICAL:** The `$HF_TOKEN` placeholder is ONLY auto-replaced by the `hf_jobs` MCP tool. When using `HfApi().run_uv_job()`, you MUST pass the real token via `get_token()`. Passing the literal string `\"$HF_TOKEN\"` results in a 9-character invalid token and 401 errors.\n\n## Token Usage Guide\n\n### Understanding Tokens\n\n**What are HF Tokens?**\n- Authentication credentials for Hugging Face Hub\n- Required for authenticated operations (push, private repos, API access)\n- Stored securely on your machine after `hf auth login`\n\n**Token Types:**\n- **Read Token** - Can download models\u002Fdatasets, read private repos\n- **Write Token** - Can push models\u002Fdatasets, create repos, modify content\n- **Organization Token** - Can act on behalf of an organization\n\n### When Tokens Are Required\n\n**Always Required:**\n- Pushing models\u002Fdatasets to Hub\n- Accessing private repositories\n- Creating new repositories\n- Modifying existing repositories\n- Using Hub APIs programmatically\n\n**Not Required:**\n- Downloading public models\u002Fdatasets\n- Running jobs that don't interact with Hub\n- Reading public repository information\n\n### How to Provide Tokens to Jobs\n\n#### Method 1: Automatic Token (Recommended)\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"your_script.py\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # ✅ Automatic replacement\n})\n```\n\n**How it works:**\n- `$HF_TOKEN` is a placeholder that gets replaced with your actual token\n- Uses the token from your logged-in session (`hf auth login`)\n- Most secure and convenient method\n- Token is encrypted server-side when passed as a secret\n\n**Benefits:**\n- No token exposure in code\n- Uses your current login session\n- Automatically updated if you re-login\n- Works seamlessly with MCP tools\n\n#### Method 2: Explicit Token (Not Recommended)\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"your_script.py\",\n    \"secrets\": {\"HF_TOKEN\": \"hf_abc123...\"}  # ⚠️ Hardcoded token\n})\n```\n\n**When to use:**\n- Only if automatic token doesn't work\n- Testing with a specific token\n- Organization tokens (use with caution)\n\n**Security concerns:**\n- Token visible in code\u002Flogs\n- Must manually update if token rotates\n- Risk of token exposure\n\n#### Method 3: Environment Variable (Less Secure)\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"your_script.py\",\n    \"env\": {\"HF_TOKEN\": \"hf_abc123...\"}  # ⚠️ Less secure than secrets\n})\n```\n\n**Difference from secrets:**\n- `env` variables are visible in job logs\n- `secrets` are encrypted server-side\n- Always prefer `secrets` for tokens\n\n### Using Tokens in Scripts\n\n**In your Python script, tokens are available as environment variables:**\n\n```python\n# \u002F\u002F\u002F script\n# dependencies = [\"huggingface-hub\"]\n# \u002F\u002F\u002F\n\nimport os\nfrom huggingface_hub import HfApi\n\n# Token is automatically available if passed via secrets\ntoken = os.environ.get(\"HF_TOKEN\")\n\n# Use with Hub API\napi = HfApi(token=token)\n\n# Or let huggingface_hub auto-detect\napi = HfApi()  # Automatically uses HF_TOKEN env var\n```\n\n**Best practices:**\n- Don't hardcode tokens in scripts\n- Use `os.environ.get(\"HF_TOKEN\")` to access\n- Let `huggingface_hub` auto-detect when possible\n- Verify token exists before Hub operations\n\n### Token Verification\n\n**Check if you're logged in:**\n```python\nfrom huggingface_hub import whoami\nuser_info = whoami()  # Returns your username if authenticated\n```\n\n**Verify token in job:**\n```python\nimport os\nassert \"HF_TOKEN\" in os.environ, \"HF_TOKEN not found!\"\ntoken = os.environ[\"HF_TOKEN\"]\nprint(f\"Token starts with: {token[:7]}...\")  # Should start with \"hf_\"\n```\n\n### Common Token Issues\n\n**Error: 401 Unauthorized**\n- **Cause:** Token missing or invalid\n- **Fix:** Add `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}` to job config\n- **Verify:** Check `hf_whoami()` works locally\n\n**Error: 403 Forbidden**\n- **Cause:** Token lacks required permissions\n- **Fix:** Ensure token has write permissions for push operations\n- **Check:** Token type at https:\u002F\u002Fhuggingface.co\u002Fsettings\u002Ftokens\n\n**Error: Token not found in environment**\n- **Cause:** `secrets` not passed or wrong key name\n- **Fix:** Use `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}` (not `env`)\n- **Verify:** Script checks `os.environ.get(\"HF_TOKEN\")`\n\n**Error: Repository access denied**\n- **Cause:** Token doesn't have access to private repo\n- **Fix:** Use token from account with access\n- **Check:** Verify repo visibility and your permissions\n\n### Token Security Best Practices\n\n1. **Never commit tokens** - Use `$HF_TOKEN` placeholder or environment variables\n2. **Use secrets, not env** - Secrets are encrypted server-side\n3. **Rotate tokens regularly** - Generate new tokens periodically\n4. **Use minimal permissions** - Create tokens with only needed permissions\n5. **Don't share tokens** - Each user should use their own token\n6. **Monitor token usage** - Check token activity in Hub settings\n\n### Complete Token Example\n\n```python\n# Example: Push results to Hub\nhf_jobs(\"uv\", {\n    \"script\": \"\"\"\n# \u002F\u002F\u002F script\n# dependencies = [\"huggingface-hub\", \"datasets\"]\n# \u002F\u002F\u002F\n\nimport os\nfrom huggingface_hub import HfApi\nfrom datasets import Dataset\n\n# Verify token is available\nassert \"HF_TOKEN\" in os.environ, \"HF_TOKEN required!\"\n\n# Use token for Hub operations\napi = HfApi(token=os.environ[\"HF_TOKEN\"])\n\n# Create and push dataset\ndata = {\"text\": [\"Hello\", \"World\"]}\ndataset = Dataset.from_dict(data)\ndataset.push_to_hub(\"username\u002Fmy-dataset\", token=os.environ[\"HF_TOKEN\"])\n\nprint(\"✅ Dataset pushed successfully!\")\n\"\"\",\n    \"flavor\": \"cpu-basic\",\n    \"timeout\": \"30m\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # ✅ Token provided securely\n})\n```\n\n## Quick Start: Two Approaches\n\n### Approach 1: UV Scripts (Recommended)\n\nUV scripts use PEP 723 inline dependencies for clean, self-contained workloads.\n\n**MCP Tool:**\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"\"\"\n# \u002F\u002F\u002F script\n# dependencies = [\"transformers\", \"torch\"]\n# \u002F\u002F\u002F\n\nfrom transformers import pipeline\nimport torch\n\n# Your workload here\nclassifier = pipeline(\"sentiment-analysis\")\nresult = classifier(\"I love Hugging Face!\")\nprint(result)\n\"\"\",\n    \"flavor\": \"cpu-basic\",\n    \"timeout\": \"30m\"\n})\n```\n\n**CLI Equivalent:**\n```bash\nhf jobs uv run my_script.py --flavor cpu-basic --timeout 30m\n```\n\n**Python API:**\n```python\nfrom huggingface_hub import run_uv_job\nrun_uv_job(\"my_script.py\", flavor=\"cpu-basic\", timeout=\"30m\")\n```\n\n**Benefits:** Direct MCP tool usage, clean code, dependencies declared inline, no file saving required\n\n**When to use:** Default choice for all workloads, custom logic, any scenario requiring `hf_jobs()`\n\n#### Custom Docker Images for UV Scripts\n\nBy default, UV scripts use `ghcr.io\u002Fastral-sh\u002Fuv:python3.12-bookworm-slim`. For ML workloads with complex dependencies, use pre-built images:\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"inference.py\",\n    \"image\": \"vllm\u002Fvllm-openai:latest\",  # Pre-built image with vLLM\n    \"flavor\": \"a10g-large\"\n})\n```\n\n**CLI:**\n```bash\nhf jobs uv run --image vllm\u002Fvllm-openai:latest --flavor a10g-large inference.py\n```\n\n**Benefits:** Faster startup, pre-installed dependencies, optimized for specific frameworks\n\n#### Python Version\n\nBy default, UV scripts use Python 3.12. Specify a different version:\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"my_script.py\",\n    \"python\": \"3.11\",  # Use Python 3.11\n    \"flavor\": \"cpu-basic\"\n})\n```\n\n**Python API:**\n```python\nfrom huggingface_hub import run_uv_job\nrun_uv_job(\"my_script.py\", python=\"3.11\")\n```\n\n#### Working with Scripts\n\n⚠️ **Important:** There are *two* \"script path\" stories depending on how you run Jobs:\n\n- **Using the `hf_jobs()` MCP tool (recommended in this repo)**: the `script` value must be **inline code** (a string) or a **URL**. A local filesystem path (like `\".\u002Fscripts\u002Ffoo.py\"`) won't exist inside the remote container.\n- **Using the `hf jobs uv run` CLI**: local file paths **do work** (the CLI uploads your script).\n\n**Common mistake with `hf_jobs()` MCP tool:**\n\n```python\n# ❌ Will fail (remote container can't see your local path)\nhf_jobs(\"uv\", {\"script\": \".\u002Fscripts\u002Ffoo.py\"})\n```\n\n**Correct patterns with `hf_jobs()` MCP tool:**\n\n```python\n# ✅ Inline: read the local script file and pass its *contents*\nfrom pathlib import Path\nscript = Path(\"hf-jobs\u002Fscripts\u002Ffoo.py\").read_text()\nhf_jobs(\"uv\", {\"script\": script})\n\n# ✅ URL: host the script somewhere reachable\nhf_jobs(\"uv\", {\"script\": \"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002F...\u002Fraw\u002Fmain\u002Ffoo.py\"})\n\n# ✅ URL from GitHub\nhf_jobs(\"uv\", {\"script\": \"https:\u002F\u002Fraw.githubusercontent.com\u002Fhuggingface\u002Ftrl\u002Fmain\u002Ftrl\u002Fscripts\u002Fsft.py\"})\n```\n\n**CLI equivalent (local paths supported):**\n\n```bash\nhf jobs uv run .\u002Fscripts\u002Ffoo.py -- --your --args\n```\n\n#### Adding Dependencies at Runtime\n\nAdd extra dependencies beyond what's in the PEP 723 header:\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"inference.py\",\n    \"dependencies\": [\"transformers\", \"torch>=2.0\"],  # Extra deps\n    \"flavor\": \"a10g-small\"\n})\n```\n\n**Python API:**\n```python\nfrom huggingface_hub import run_uv_job\nrun_uv_job(\"inference.py\", dependencies=[\"transformers\", \"torch>=2.0\"])\n```\n\n### Approach 2: Docker-Based Jobs\n\nRun jobs with custom Docker images and commands.\n\n**MCP Tool:**\n```python\nhf_jobs(\"run\", {\n    \"image\": \"python:3.12\",\n    \"command\": [\"python\", \"-c\", \"print('Hello from HF Jobs!')\"],\n    \"flavor\": \"cpu-basic\",\n    \"timeout\": \"30m\"\n})\n```\n\n**CLI Equivalent:**\n```bash\nhf jobs run python:3.12 python -c \"print('Hello from HF Jobs!')\"\n```\n\n**Python API:**\n```python\nfrom huggingface_hub import run_job\nrun_job(image=\"python:3.12\", command=[\"python\", \"-c\", \"print('Hello!')\"], flavor=\"cpu-basic\")\n```\n\n**Benefits:** Full Docker control, use pre-built images, run any command\n**When to use:** Need specific Docker images, non-Python workloads, complex environments\n\n**Example with GPU:**\n```python\nhf_jobs(\"run\", {\n    \"image\": \"pytorch\u002Fpytorch:2.6.0-cuda12.4-cudnn9-devel\",\n    \"command\": [\"python\", \"-c\", \"import torch; print(torch.cuda.get_device_name())\"],\n    \"flavor\": \"a10g-small\",\n    \"timeout\": \"1h\"\n})\n```\n\n**Using Hugging Face Spaces as Images:**\n\nYou can use Docker images from HF Spaces:\n```python\nhf_jobs(\"run\", {\n    \"image\": \"hf.co\u002Fspaces\u002Flhoestq\u002Fduckdb\",  # Space as Docker image\n    \"command\": [\"duckdb\", \"-c\", \"SELECT 'Hello from DuckDB!'\"],\n    \"flavor\": \"cpu-basic\"\n})\n```\n\n**CLI:**\n```bash\nhf jobs run hf.co\u002Fspaces\u002Flhoestq\u002Fduckdb duckdb -c \"SELECT 'Hello!'\"\n```\n\n### Finding More UV Scripts on Hub\n\nThe `uv-scripts` organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:\n\n```python\n# Discover available UV script collections\ndataset_search({\"author\": \"uv-scripts\", \"sort\": \"downloads\", \"limit\": 20})\n\n# Explore a specific collection\nhub_repo_details([\"uv-scripts\u002Fclassification\"], repo_type=\"dataset\", include_readme=True)\n```\n\n**Popular collections:** OCR, classification, synthetic-data, vLLM, dataset-creation\n\n## Hardware Selection\n\n> **Reference:** [HF Jobs Hardware Docs](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Fen\u002Fspaces-config-reference) (updated 07\u002F2025)\n\n| Workload Type | Recommended Hardware | Use Case |\n|---------------|---------------------|----------|\n| Data processing, testing | `cpu-basic`, `cpu-upgrade` | Lightweight tasks |\n| Small models, demos | `t4-small` | \u003C1B models, quick tests |\n| Medium models | `t4-medium`, `l4x1` | 1-7B models |\n| Large models, production | `a10g-small`, `a10g-large` | 7-13B models |\n| Very large models | `a100-large` | 13B+ models |\n| Batch inference | `a10g-large`, `a100-large` | High-throughput |\n| Multi-GPU workloads | `l4x4`, `a10g-largex2`, `a10g-largex4` | Parallel\u002Flarge models |\n| TPU workloads | `v5e-1x1`, `v5e-2x2`, `v5e-2x4` | JAX\u002FFlax, TPU-optimized |\n\n**All Available Flavors:**\n- **CPU:** `cpu-basic`, `cpu-upgrade`\n- **GPU:** `t4-small`, `t4-medium`, `l4x1`, `l4x4`, `a10g-small`, `a10g-large`, `a10g-largex2`, `a10g-largex4`, `a100-large`\n- **TPU:** `v5e-1x1`, `v5e-2x2`, `v5e-2x4`\n\n**Guidelines:**\n- Start with smaller hardware for testing\n- Scale up based on actual needs\n- Use multi-GPU for parallel workloads or large models\n- Use TPUs for JAX\u002FFlax workloads\n- See `references\u002Fhardware_guide.md` for detailed specifications\n\n## Critical: Saving Results\n\n**⚠️ EPHEMERAL ENVIRONMENT—MUST PERSIST RESULTS**\n\nThe Jobs environment is temporary. All files are deleted when the job ends. If results aren't persisted, **ALL WORK IS LOST**.\n\n### Persistence Options\n\n**1. Push to Hugging Face Hub (Recommended)**\n\n```python\n# Push models\nmodel.push_to_hub(\"username\u002Fmodel-name\", token=os.environ[\"HF_TOKEN\"])\n\n# Push datasets\ndataset.push_to_hub(\"username\u002Fdataset-name\", token=os.environ[\"HF_TOKEN\"])\n\n# Push artifacts\napi.upload_file(\n    path_or_fileobj=\"results.json\",\n    path_in_repo=\"results.json\",\n    repo_id=\"username\u002Fresults\",\n    token=os.environ[\"HF_TOKEN\"]\n)\n```\n\n**2. Use External Storage**\n\n```python\n# Upload to S3, GCS, etc.\nimport boto3\ns3 = boto3.client('s3')\ns3.upload_file('results.json', 'my-bucket', 'results.json')\n```\n\n**3. Send Results via API**\n\n```python\n# POST results to your API\nimport requests\nrequests.post(\"https:\u002F\u002Fyour-api.com\u002Fresults\", json=results)\n```\n\n### Required Configuration for Hub Push\n\n**In job submission:**\n```python\n# hf_jobs MCP tool:\n{\"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}}  # auto-replaced\n\n# HfApi().run_uv_job():\nfrom huggingface_hub import get_token\nsecrets={\"HF_TOKEN\": get_token()}  # must pass real token\n```\n\n**In script:**\n```python\nimport os\nfrom huggingface_hub import HfApi\n\n# Token automatically available from secrets\napi = HfApi(token=os.environ.get(\"HF_TOKEN\"))\n\n# Push your results\napi.upload_file(...)\n```\n\n### Verification Checklist\n\nBefore submitting:\n- [ ] Results persistence method chosen\n- [ ] Token in secrets if using Hub (MCP: `\"$HF_TOKEN\"`, Python API: `get_token()`)\n- [ ] Script handles missing token gracefully\n- [ ] Test persistence path works\n\n**See:** `references\u002Fhub_saving.md` for detailed Hub persistence guide\n\n## Timeout Management\n\n**⚠️ DEFAULT: 30 MINUTES**\n\nJobs automatically stop after the timeout. For long-running tasks like training, always set a custom timeout.\n\n### Setting Timeouts\n\n**MCP Tool:**\n```python\n{\n    \"timeout\": \"2h\"   # 2 hours\n}\n```\n\n**Supported formats:**\n- Integer\u002Ffloat: seconds (e.g., `300` = 5 minutes)\n- String with suffix: `\"5m\"` (minutes), `\"2h\"` (hours), `\"1d\"` (days)\n- Examples: `\"90m\"`, `\"2h\"`, `\"1.5h\"`, `300`, `\"1d\"`\n\n**Python API:**\n```python\nfrom huggingface_hub import run_job, run_uv_job\n\nrun_job(image=\"python:3.12\", command=[...], timeout=\"2h\")\nrun_uv_job(\"script.py\", timeout=7200)  # 2 hours in seconds\n```\n\n### Timeout Guidelines\n\n| Scenario | Recommended | Notes |\n|----------|-------------|-------|\n| Quick test | 10-30 min | Verify setup |\n| Data processing | 1-2 hours | Depends on data size |\n| Batch inference | 2-4 hours | Large batches |\n| Experiments | 4-8 hours | Multiple runs |\n| Long-running | 8-24 hours | Production workloads |\n\n**Always add 20-30% buffer** for setup, network delays, and cleanup.\n\n**On timeout:** Job killed immediately, all unsaved progress lost\n\n## Cost Estimation\n\n**General guidelines:**\n\n```\nTotal Cost = (Hours of runtime) × (Cost per hour)\n```\n\n**Example calculations:**\n\n**Quick test:**\n- Hardware: cpu-basic ($0.10\u002Fhour)\n- Time: 15 minutes (0.25 hours)\n- Cost: $0.03\n\n**Data processing:**\n- Hardware: l4x1 ($2.50\u002Fhour)\n- Time: 2 hours\n- Cost: $5.00\n\n**Batch inference:**\n- Hardware: a10g-large ($5\u002Fhour)\n- Time: 4 hours\n- Cost: $20.00\n\n**Cost optimization tips:**\n1. Start small - Test on cpu-basic or t4-small\n2. Monitor runtime - Set appropriate timeouts\n3. Use checkpoints - Resume if job fails\n4. Optimize code - Reduce unnecessary compute\n5. Choose right hardware - Don't over-provision\n\n## Monitoring and Tracking\n\n### Check Job Status\n\n**MCP Tool:**\n```python\n# List all jobs\nhf_jobs(\"ps\")\n\n# Inspect specific job\nhf_jobs(\"inspect\", {\"job_id\": \"your-job-id\"})\n\n# View logs\nhf_jobs(\"logs\", {\"job_id\": \"your-job-id\"})\n\n# Cancel a job\nhf_jobs(\"cancel\", {\"job_id\": \"your-job-id\"})\n```\n\n**Python API:**\n```python\nfrom huggingface_hub import list_jobs, inspect_job, fetch_job_logs, cancel_job\n\n# List your jobs\njobs = list_jobs()\n\n# List running jobs only\nrunning = [j for j in list_jobs() if j.status.stage == \"RUNNING\"]\n\n# Inspect specific job\njob_info = inspect_job(job_id=\"your-job-id\")\n\n# View logs\nfor log in fetch_job_logs(job_id=\"your-job-id\"):\n    print(log)\n\n# Cancel a job\ncancel_job(job_id=\"your-job-id\")\n```\n\n**CLI:**\n```bash\nhf jobs ps                    # List jobs\nhf jobs logs \u003Cjob-id>         # View logs\nhf jobs cancel \u003Cjob-id>       # Cancel job\n```\n\n**Remember:** Wait for user to request status checks. Avoid polling repeatedly.\n\n### Job URLs\n\nAfter submission, jobs have monitoring URLs:\n```\nhttps:\u002F\u002Fhuggingface.co\u002Fjobs\u002Fusername\u002Fjob-id\n```\n\nView logs, status, and details in the browser.\n\n### Wait for Multiple Jobs\n\n```python\nimport time\nfrom huggingface_hub import inspect_job, run_job\n\n# Run multiple jobs\njobs = [run_job(image=img, command=cmd) for img, cmd in workloads]\n\n# Wait for all to complete\nfor job in jobs:\n    while inspect_job(job_id=job.id).status.stage not in (\"COMPLETED\", \"ERROR\"):\n        time.sleep(10)\n```\n\n## Scheduled Jobs\n\nRun jobs on a schedule using CRON expressions or predefined schedules.\n\n**MCP Tool:**\n```python\n# Schedule a UV script that runs every hour\nhf_jobs(\"scheduled uv\", {\n    \"script\": \"your_script.py\",\n    \"schedule\": \"@hourly\",\n    \"flavor\": \"cpu-basic\"\n})\n\n# Schedule with CRON syntax\nhf_jobs(\"scheduled uv\", {\n    \"script\": \"your_script.py\",\n    \"schedule\": \"0 9 * * 1\",  # 9 AM every Monday\n    \"flavor\": \"cpu-basic\"\n})\n\n# Schedule a Docker-based job\nhf_jobs(\"scheduled run\", {\n    \"image\": \"python:3.12\",\n    \"command\": [\"python\", \"-c\", \"print('Scheduled!')\"],\n    \"schedule\": \"@daily\",\n    \"flavor\": \"cpu-basic\"\n})\n```\n\n**Python API:**\n```python\nfrom huggingface_hub import create_scheduled_job, create_scheduled_uv_job\n\n# Schedule a Docker job\ncreate_scheduled_job(\n    image=\"python:3.12\",\n    command=[\"python\", \"-c\", \"print('Running on schedule!')\"],\n    schedule=\"@hourly\"\n)\n\n# Schedule a UV script\ncreate_scheduled_uv_job(\"my_script.py\", schedule=\"@daily\", flavor=\"cpu-basic\")\n\n# Schedule with GPU\ncreate_scheduled_uv_job(\n    \"ml_inference.py\",\n    schedule=\"0 *\u002F6 * * *\",  # Every 6 hours\n    flavor=\"a10g-small\"\n)\n```\n\n**Available schedules:**\n- `@annually`, `@yearly` - Once per year\n- `@monthly` - Once per month\n- `@weekly` - Once per week\n- `@daily` - Once per day\n- `@hourly` - Once per hour\n- CRON expression - Custom schedule (e.g., `\"*\u002F5 * * * *\"` for every 5 minutes)\n\n**Manage scheduled jobs:**\n```python\n# MCP Tool\nhf_jobs(\"scheduled ps\")                              # List scheduled jobs\nhf_jobs(\"scheduled inspect\", {\"job_id\": \"...\"})     # Inspect details\nhf_jobs(\"scheduled suspend\", {\"job_id\": \"...\"})     # Pause\nhf_jobs(\"scheduled resume\", {\"job_id\": \"...\"})      # Resume\nhf_jobs(\"scheduled delete\", {\"job_id\": \"...\"})      # Delete\n```\n\n**Python API for management:**\n```python\nfrom huggingface_hub import (\n    list_scheduled_jobs,\n    inspect_scheduled_job,\n    suspend_scheduled_job,\n    resume_scheduled_job,\n    delete_scheduled_job\n)\n\n# List all scheduled jobs\nscheduled = list_scheduled_jobs()\n\n# Inspect a scheduled job\ninfo = inspect_scheduled_job(scheduled_job_id)\n\n# Suspend (pause) a scheduled job\nsuspend_scheduled_job(scheduled_job_id)\n\n# Resume a scheduled job\nresume_scheduled_job(scheduled_job_id)\n\n# Delete a scheduled job\ndelete_scheduled_job(scheduled_job_id)\n```\n\n## Webhooks: Trigger Jobs on Events\n\nTrigger jobs automatically when changes happen in Hugging Face repositories.\n\n**Python API:**\n```python\nfrom huggingface_hub import create_webhook\n\n# Create webhook that triggers a job when a repo changes\nwebhook = create_webhook(\n    job_id=job.id,\n    watched=[\n        {\"type\": \"user\", \"name\": \"your-username\"},\n        {\"type\": \"org\", \"name\": \"your-org-name\"}\n    ],\n    domains=[\"repo\", \"discussion\"],\n    secret=\"your-secret\"\n)\n```\n\n**How it works:**\n1. Webhook listens for changes in watched repositories\n2. When triggered, the job runs with `WEBHOOK_PAYLOAD` environment variable\n3. Your script can parse the payload to understand what changed\n\n**Use cases:**\n- Auto-process new datasets when uploaded\n- Trigger inference when models are updated\n- Run tests when code changes\n- Generate reports on repository activity\n\n**Access webhook payload in script:**\n```python\nimport os\nimport json\n\npayload = json.loads(os.environ.get(\"WEBHOOK_PAYLOAD\", \"{}\"))\nprint(f\"Event type: {payload.get('event', {}).get('action')}\")\n```\n\nSee [Webhooks Documentation](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fguides\u002Fwebhooks) for more details.\n\n## Common Workload Patterns\n\nThis repository ships ready-to-run UV scripts in `hf-jobs\u002Fscripts\u002F`. Prefer using them instead of inventing new templates.\n\n### Pattern 1: Dataset → Model Responses (vLLM) — `scripts\u002Fgenerate-responses.py`\n\n**What it does:** loads a Hub dataset (chat `messages` or a `prompt` column), applies a model chat template, generates responses with vLLM, and **pushes** the output dataset + dataset card back to the Hub.\n\n**Requires:** GPU + **write** token (it pushes a dataset).\n\n```python\nfrom pathlib import Path\n\nscript = Path(\"hf-jobs\u002Fscripts\u002Fgenerate-responses.py\").read_text()\nhf_jobs(\"uv\", {\n    \"script\": script,\n    \"script_args\": [\n        \"username\u002Finput-dataset\",\n        \"username\u002Foutput-dataset\",\n        \"--messages-column\", \"messages\",\n        \"--model-id\", \"Qwen\u002FQwen3-30B-A3B-Instruct-2507\",\n        \"--temperature\", \"0.7\",\n        \"--top-p\", \"0.8\",\n        \"--max-tokens\", \"2048\",\n    ],\n    \"flavor\": \"a10g-large\",\n    \"timeout\": \"4h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"},\n})\n```\n\n### Pattern 2: CoT Self-Instruct Synthetic Data — `scripts\u002Fcot-self-instruct.py`\n\n**What it does:** generates synthetic prompts\u002Fanswers via CoT Self-Instruct, optionally filters outputs (answer-consistency \u002F RIP), then **pushes** the generated dataset + dataset card to the Hub.\n\n**Requires:** GPU + **write** token (it pushes a dataset).\n\n```python\nfrom pathlib import Path\n\nscript = Path(\"hf-jobs\u002Fscripts\u002Fcot-self-instruct.py\").read_text()\nhf_jobs(\"uv\", {\n    \"script\": script,\n    \"script_args\": [\n        \"--seed-dataset\", \"davanstrien\u002Fs1k-reasoning\",\n        \"--output-dataset\", \"username\u002Fsynthetic-math\",\n        \"--task-type\", \"reasoning\",\n        \"--num-samples\", \"5000\",\n        \"--filter-method\", \"answer-consistency\",\n    ],\n    \"flavor\": \"l4x4\",\n    \"timeout\": \"8h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"},\n})\n```\n\n### Pattern 3: Streaming Dataset Stats (Polars + HF Hub) — `scripts\u002Ffinepdfs-stats.py`\n\n**What it does:** scans parquet directly from Hub (no 300GB download), computes temporal stats, and (optionally) uploads results to a Hub dataset repo.\n\n**Requires:** CPU is often enough; token needed **only** if you pass `--output-repo` (upload).\n\n```python\nfrom pathlib import Path\n\nscript = Path(\"hf-jobs\u002Fscripts\u002Ffinepdfs-stats.py\").read_text()\nhf_jobs(\"uv\", {\n    \"script\": script,\n    \"script_args\": [\n        \"--limit\", \"10000\",\n        \"--show-plan\",\n        \"--output-repo\", \"username\u002Ffinepdfs-temporal-stats\",\n    ],\n    \"flavor\": \"cpu-upgrade\",\n    \"timeout\": \"2h\",\n    \"env\": {\"HF_XET_HIGH_PERFORMANCE\": \"1\"},\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"},\n})\n```\n\n## Common Failure Modes\n\n### Out of Memory (OOM)\n\n**Fix:**\n1. Reduce batch size or data chunk size\n2. Process data in smaller batches\n3. Upgrade hardware: cpu → t4 → a10g → a100\n\n### Job Timeout\n\n**Fix:**\n1. Check logs for actual runtime\n2. Increase timeout with buffer: `\"timeout\": \"3h\"`\n3. Optimize code for faster execution\n4. Process data in chunks\n\n### Hub Push Failures\n\n**Fix:**\n1. Add token to secrets: MCP uses `\"$HF_TOKEN\"` (auto-replaced), Python API uses `get_token()` (must pass real token)\n2. Verify token in script: `assert \"HF_TOKEN\" in os.environ`\n3. Check token permissions\n4. Verify repo exists or can be created\n\n### Missing Dependencies\n\n**Fix:**\nAdd to PEP 723 header:\n```python\n# \u002F\u002F\u002F script\n# dependencies = [\"package1\", \"package2>=1.0.0\"]\n# \u002F\u002F\u002F\n```\n\n### Authentication Errors\n\n**Fix:**\n1. Check `hf_whoami()` works locally\n2. Verify token in secrets — MCP: `\"$HF_TOKEN\"`, Python API: `get_token()` (NOT `\"$HF_TOKEN\"`)\n3. Re-login: `hf auth login`\n4. Check token has required permissions\n\n## Troubleshooting\n\n**Common issues:**\n- Job times out → Increase timeout, optimize code\n- Results not saved → Check persistence method, verify HF_TOKEN\n- Out of Memory → Reduce batch size, upgrade hardware\n- Import errors → Add dependencies to PEP 723 header\n- Authentication errors → Check token, verify secrets parameter\n\n**See:** `references\u002Ftroubleshooting.md` for complete troubleshooting guide\n\n## Resources\n\n### References (In This Skill)\n- `references\u002Ftoken_usage.md` - Complete token usage guide\n- `references\u002Fhardware_guide.md` - Hardware specs and selection\n- `references\u002Fhub_saving.md` - Hub persistence guide\n- `references\u002Ftroubleshooting.md` - Common issues and solutions\n\n### Scripts (In This Skill)\n- `scripts\u002Fgenerate-responses.py` - vLLM batch generation: dataset → responses → push to Hub\n- `scripts\u002Fcot-self-instruct.py` - CoT Self-Instruct synthetic data generation + filtering → push to Hub\n- `scripts\u002Ffinepdfs-stats.py` - Polars streaming stats over `finepdfs-edu` parquet on Hub (optional push)\n\n### External Links\n\n**Official Documentation:**\n- [HF Jobs Guide](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fguides\u002Fjobs) - Main documentation\n- [HF Jobs CLI Reference](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fguides\u002Fcli#hf-jobs) - Command line interface\n- [HF Jobs API Reference](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fpackage_reference\u002Fhf_api) - Python API details\n- [Hardware Flavors Reference](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Fen\u002Fspaces-config-reference) - Available hardware\n\n**Related Tools:**\n- [UV Scripts Guide](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002Fguides\u002Fscripts\u002F) - PEP 723 inline dependencies\n- [UV Scripts Organization](https:\u002F\u002Fhuggingface.co\u002Fuv-scripts) - Community UV script collection\n- [HF Hub Authentication](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fquick-start#authentication) - Token setup\n- [Webhooks Documentation](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fguides\u002Fwebhooks) - Event triggers\n\n## Key Takeaways\n\n1. **Submit scripts inline** - The `script` parameter accepts Python code directly; no file saving required unless user requests\n2. **Jobs are asynchronous** - Don't wait\u002Fpoll; let user check when ready\n3. **Always set timeout** - Default 30 min may be insufficient; set appropriate timeout\n4. **Always persist results** - Environment is ephemeral; without persistence, all work is lost\n5. **Use tokens securely** - MCP: `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}`, Python API: `secrets={\"HF_TOKEN\": get_token()}` — `\"$HF_TOKEN\"` only works with MCP tool\n6. **Choose appropriate hardware** - Start small, scale up based on needs (see hardware guide)\n7. **Use UV scripts** - Default to `hf_jobs(\"uv\", {...})` with inline scripts for Python workloads\n8. **Handle authentication** - Verify tokens are available before Hub operations\n9. **Monitor jobs** - Provide job URLs and status check commands\n10. **Optimize costs** - Choose right hardware, set appropriate timeouts\n\n## Quick Reference: MCP Tool vs CLI vs Python API\n\n| Operation | MCP Tool | CLI | Python API |\n|-----------|----------|-----|------------|\n| Run UV script | `hf_jobs(\"uv\", {...})` | `hf jobs uv run script.py` | `run_uv_job(\"script.py\")` |\n| Run Docker job | `hf_jobs(\"run\", {...})` | `hf jobs run image cmd` | `run_job(image, command)` |\n| List jobs | `hf_jobs(\"ps\")` | `hf jobs ps` | `list_jobs()` |\n| View logs | `hf_jobs(\"logs\", {...})` | `hf jobs logs \u003Cid>` | `fetch_job_logs(job_id)` |\n| Cancel job | `hf_jobs(\"cancel\", {...})` | `hf jobs cancel \u003Cid>` | `cancel_job(job_id)` |\n| Schedule UV | `hf_jobs(\"scheduled uv\", {...})` | `hf jobs scheduled uv run SCHEDULE script.py` | `create_scheduled_uv_job()` |\n| Schedule Docker | `hf_jobs(\"scheduled run\", {...})` | `hf jobs scheduled run SCHEDULE image cmd` | `create_scheduled_job()` |\n| List scheduled | `hf_jobs(\"scheduled ps\")` | `hf jobs scheduled ps` | `list_scheduled_jobs()` |\n| Delete scheduled | `hf_jobs(\"scheduled delete\", {...})` | `hf jobs scheduled delete \u003Cid>` | `delete_scheduled_job()` |\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,84,127,"2026-05-16 13:22:44",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"其他","other","mdi-page-next-outline","其他类型Skill",5,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"职场发展","career","mdi-briefcase-outline","面试准备、简历优化、职业规划",4,575,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"3d7ef47a-3f0a-4974-b1d7-31206df4eb45","1.0.0","hugging-face-jobs.zip",44472,"uploads\u002Fskills\u002Faba51b60-c0d6-4799-b8de-4ee251476513\u002Fhugging-face-jobs.zip","43b43b5e08af7017608eec41100bd4bc33fe520093d5c51acca2c1bc4a546281","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":31684},{\"path\":\"index.html\",\"isDirectory\":false,\"size\":7682},{\"path\":\"references\u002Fhardware_guide.md\",\"isDirectory\":false,\"size\":8279},{\"path\":\"references\u002Fhub_saving.md\",\"isDirectory\":false,\"size\":7644},{\"path\":\"references\u002Ftoken_usage.md\",\"isDirectory\":false,\"size\":13424},{\"path\":\"references\u002Ftroubleshooting.md\",\"isDirectory\":false,\"size\":10242},{\"path\":\"scripts\u002Fcot-self-instruct.py\",\"isDirectory\":false,\"size\":24883},{\"path\":\"scripts\u002Ffinepdfs-stats.py\",\"isDirectory\":false,\"size\":17115},{\"path\":\"scripts\u002Fgenerate-responses.py\",\"isDirectory\":false,\"size\":20518}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]