[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-b20d8568-2c49-4149-b310-904a7187a126":3,"$fELItn0jDShH_0qtB0GAi8dtmZZWPawKPhpryZAD3ZQc":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"b20d8568-2c49-4149-b310-904a7187a126","polars","适用于内存中数据的快速内存 DataFrame 库。当 pandas 太慢但数据仍适合内存时使用。惰性评估，并行执行，Apache Arrow 后端。最适合 1-100GB 数据集，ETL 管道，更快的 pandas 替代品。对于大于 RAM 的数据，请使用 dask 或 vaex。","cat_prod_data","mod_productivity","sickn33,productivity","---\nname: polars\ndescription: Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.\nlicense: https:\u002F\u002Fgithub.com\u002Fpola-rs\u002Fpolars\u002Fblob\u002Fmain\u002FLICENSE\nmetadata:\n    skill-author: K-Dense Inc.\nrisk: unknown\nsource: community\n---\n\n# Polars\n\n## When to Use\n- You need a faster in-memory DataFrame workflow than pandas for data that still fits in RAM.\n- You are building ETL, analytics, or transformation pipelines that benefit from lazy evaluation and parallel execution.\n- You want expression-based tabular operations on top of Apache Arrow semantics.\n\n## Overview\n\nPolars is a lightning-fast DataFrame library for Python and Rust built on Apache Arrow. Work with Polars' expression-based API, lazy evaluation framework, and high-performance data manipulation capabilities for efficient data processing, pandas migration, and data pipeline optimization.\n\n## Quick Start\n\n### Installation and Basic Usage\n\nInstall Polars:\n```python\nuv pip install polars\n```\n\nBasic DataFrame creation and operations:\n```python\nimport polars as pl\n\n# Create DataFrame\ndf = pl.DataFrame({\n    \"name\": [\"Alice\", \"Bob\", \"Charlie\"],\n    \"age\": [25, 30, 35],\n    \"city\": [\"NY\", \"LA\", \"SF\"]\n})\n\n# Select columns\ndf.select(\"name\", \"age\")\n\n# Filter rows\ndf.filter(pl.col(\"age\") > 25)\n\n# Add computed columns\ndf.with_columns(\n    age_plus_10=pl.col(\"age\") + 10\n)\n```\n\n## Core Concepts\n\n### Expressions\n\nExpressions are the fundamental building blocks of Polars operations. They describe transformations on data and can be composed, reused, and optimized.\n\n**Key principles:**\n- Use `pl.col(\"column_name\")` to reference columns\n- Chain methods to build complex transformations\n- Expressions are lazy and only execute within contexts (select, with_columns, filter, group_by)\n\n**Example:**\n```python\n# Expression-based computation\ndf.select(\n    pl.col(\"name\"),\n    (pl.col(\"age\") * 12).alias(\"age_in_months\")\n)\n```\n\n### Lazy vs Eager Evaluation\n\n**Eager (DataFrame):** Operations execute immediately\n```python\ndf = pl.read_csv(\"file.csv\")  # Reads immediately\nresult = df.filter(pl.col(\"age\") > 25)  # Executes immediately\n```\n\n**Lazy (LazyFrame):** Operations build a query plan, optimized before execution\n```python\nlf = pl.scan_csv(\"file.csv\")  # Doesn't read yet\nresult = lf.filter(pl.col(\"age\") > 25).select(\"name\", \"age\")\ndf = result.collect()  # Now executes optimized query\n```\n\n**When to use lazy:**\n- Working with large datasets\n- Complex query pipelines\n- When only some columns\u002Frows are needed\n- Performance is critical\n\n**Benefits of lazy evaluation:**\n- Automatic query optimization\n- Predicate pushdown\n- Projection pushdown\n- Parallel execution\n\nFor detailed concepts, load `references\u002Fcore_concepts.md`.\n\n## Common Operations\n\n### Select\nSelect and manipulate columns:\n```python\n# Select specific columns\ndf.select(\"name\", \"age\")\n\n# Select with expressions\ndf.select(\n    pl.col(\"name\"),\n    (pl.col(\"age\") * 2).alias(\"double_age\")\n)\n\n# Select all columns matching a pattern\ndf.select(pl.col(\"^.*_id$\"))\n```\n\n### Filter\nFilter rows by conditions:\n```python\n# Single condition\ndf.filter(pl.col(\"age\") > 25)\n\n# Multiple conditions (cleaner than using &)\ndf.filter(\n    pl.col(\"age\") > 25,\n    pl.col(\"city\") == \"NY\"\n)\n\n# Complex conditions\ndf.filter(\n    (pl.col(\"age\") > 25) | (pl.col(\"city\") == \"LA\")\n)\n```\n\n### With Columns\nAdd or modify columns while preserving existing ones:\n```python\n# Add new columns\ndf.with_columns(\n    age_plus_10=pl.col(\"age\") + 10,\n    name_upper=pl.col(\"name\").str.to_uppercase()\n)\n\n# Parallel computation (all columns computed in parallel)\ndf.with_columns(\n    pl.col(\"value\") * 10,\n    pl.col(\"value\") * 100,\n)\n```\n\n### Group By and Aggregations\nGroup data and compute aggregations:\n```python\n# Basic grouping\ndf.group_by(\"city\").agg(\n    pl.col(\"age\").mean().alias(\"avg_age\"),\n    pl.len().alias(\"count\")\n)\n\n# Multiple group keys\ndf.group_by(\"city\", \"department\").agg(\n    pl.col(\"salary\").sum()\n)\n\n# Conditional aggregations\ndf.group_by(\"city\").agg(\n    (pl.col(\"age\") > 30).sum().alias(\"over_30\")\n)\n```\n\nFor detailed operation patterns, load `references\u002Foperations.md`.\n\n## Aggregations and Window Functions\n\n### Aggregation Functions\nCommon aggregations within `group_by` context:\n- `pl.len()` - count rows\n- `pl.col(\"x\").sum()` - sum values\n- `pl.col(\"x\").mean()` - average\n- `pl.col(\"x\").min()` \u002F `pl.col(\"x\").max()` - extremes\n- `pl.first()` \u002F `pl.last()` - first\u002Flast values\n\n### Window Functions with `over()`\nApply aggregations while preserving row count:\n```python\n# Add group statistics to each row\ndf.with_columns(\n    avg_age_by_city=pl.col(\"age\").mean().over(\"city\"),\n    rank_in_city=pl.col(\"salary\").rank().over(\"city\")\n)\n\n# Multiple grouping columns\ndf.with_columns(\n    group_avg=pl.col(\"value\").mean().over(\"category\", \"region\")\n)\n```\n\n**Mapping strategies:**\n- `group_to_rows` (default): Preserves original row order\n- `explode`: Faster but groups rows together\n- `join`: Creates list columns\n\n## Data I\u002FO\n\n### Supported Formats\nPolars supports reading and writing:\n- CSV, Parquet, JSON, Excel\n- Databases (via connectors)\n- Cloud storage (S3, Azure, GCS)\n- Google BigQuery\n- Multiple\u002Fpartitioned files\n\n### Common I\u002FO Operations\n\n**CSV:**\n```python\n# Eager\ndf = pl.read_csv(\"file.csv\")\ndf.write_csv(\"output.csv\")\n\n# Lazy (preferred for large files)\nlf = pl.scan_csv(\"file.csv\")\nresult = lf.filter(...).select(...).collect()\n```\n\n**Parquet (recommended for performance):**\n```python\ndf = pl.read_parquet(\"file.parquet\")\ndf.write_parquet(\"output.parquet\")\n```\n\n**JSON:**\n```python\ndf = pl.read_json(\"file.json\")\ndf.write_json(\"output.json\")\n```\n\nFor comprehensive I\u002FO documentation, load `references\u002Fio_guide.md`.\n\n## Transformations\n\n### Joins\nCombine DataFrames:\n```python\n# Inner join\ndf1.join(df2, on=\"id\", how=\"inner\")\n\n# Left join\ndf1.join(df2, on=\"id\", how=\"left\")\n\n# Join on different column names\ndf1.join(df2, left_on=\"user_id\", right_on=\"id\")\n```\n\n### Concatenation\nStack DataFrames:\n```python\n# Vertical (stack rows)\npl.concat([df1, df2], how=\"vertical\")\n\n# Horizontal (add columns)\npl.concat([df1, df2], how=\"horizontal\")\n\n# Diagonal (union with different schemas)\npl.concat([df1, df2], how=\"diagonal\")\n```\n\n### Pivot and Unpivot\nReshape data:\n```python\n# Pivot (wide format)\ndf.pivot(values=\"sales\", index=\"date\", columns=\"product\")\n\n# Unpivot (long format)\ndf.unpivot(index=\"id\", on=[\"col1\", \"col2\"])\n```\n\nFor detailed transformation examples, load `references\u002Ftransformations.md`.\n\n## Pandas Migration\n\nPolars offers significant performance improvements over pandas with a cleaner API. Key differences:\n\n### Conceptual Differences\n- **No index**: Polars uses integer positions only\n- **Strict typing**: No silent type conversions\n- **Lazy evaluation**: Available via LazyFrame\n- **Parallel by default**: Operations parallelized automatically\n\n### Common Operation Mappings\n\n| Operation | Pandas | Polars |\n|-----------|--------|--------|\n| Select column | `df[\"col\"]` | `df.select(\"col\")` |\n| Filter | `df[df[\"col\"] > 10]` | `df.filter(pl.col(\"col\") > 10)` |\n| Add column | `df.assign(x=...)` | `df.with_columns(x=...)` |\n| Group by | `df.groupby(\"col\").agg(...)` | `df.group_by(\"col\").agg(...)` |\n| Window | `df.groupby(\"col\").transform(...)` | `df.with_columns(...).over(\"col\")` |\n\n### Key Syntax Patterns\n\n**Pandas sequential (slow):**\n```python\ndf.assign(\n    col_a=lambda df_: df_.value * 10,\n    col_b=lambda df_: df_.value * 100\n)\n```\n\n**Polars parallel (fast):**\n```python\ndf.with_columns(\n    col_a=pl.col(\"value\") * 10,\n    col_b=pl.col(\"value\") * 100,\n)\n```\n\nFor comprehensive migration guide, load `references\u002Fpandas_migration.md`.\n\n## Best Practices\n\n### Performance Optimization\n\n1. **Use lazy evaluation for large datasets:**\n   ```python\n   lf = pl.scan_csv(\"large.csv\")  # Don't use read_csv\n   result = lf.filter(...).select(...).collect()\n   ```\n\n2. **Avoid Python functions in hot paths:**\n   - Stay within expression API for parallelization\n   - Use `.map_elements()` only when necessary\n   - Prefer native Polars operations\n\n3. **Use streaming for very large data:**\n   ```python\n   lf.collect(streaming=True)\n   ```\n\n4. **Select only needed columns early:**\n   ```python\n   # Good: Select columns early\n   lf.select(\"col1\", \"col2\").filter(...)\n\n   # Bad: Filter on all columns first\n   lf.filter(...).select(\"col1\", \"col2\")\n   ```\n\n5. **Use appropriate data types:**\n   - Categorical for low-cardinality strings\n   - Appropriate integer sizes (i32 vs i64)\n   - Date types for temporal data\n\n### Expression Patterns\n\n**Conditional operations:**\n```python\npl.when(condition).then(value).otherwise(other_value)\n```\n\n**Column operations across multiple columns:**\n```python\ndf.select(pl.col(\"^.*_value$\") * 2)  # Regex pattern\n```\n\n**Null handling:**\n```python\npl.col(\"x\").fill_null(0)\npl.col(\"x\").is_null()\npl.col(\"x\").drop_nulls()\n```\n\nFor additional best practices and patterns, load `references\u002Fbest_practices.md`.\n\n## Resources\n\nThis skill includes comprehensive reference documentation:\n\n### references\u002F\n- `core_concepts.md` - Detailed explanations of expressions, lazy evaluation, and type system\n- `operations.md` - Comprehensive guide to all common operations with examples\n- `pandas_migration.md` - Complete migration guide from pandas to Polars\n- `io_guide.md` - Data I\u002FO operations for all supported formats\n- `transformations.md` - Joins, concatenation, pivots, and reshaping operations\n- `best_practices.md` - Performance optimization tips and common patterns\n\nLoad these references as needed when users require detailed information about specific topics.\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,199,373,"2026-05-16 13:34:18",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"效率工具","productivity","mdi-lightning-bolt-outline","文档处理、数据分析、自动化工作流",4,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"数据分析","data-analysis","mdi-chart-bar","数据可视化、统计分析",2,30,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"83614248-3a7f-4831-baee-9e0fe18c2f6c","1.0.0","polars.zip",3979,"uploads\u002Fskills\u002Fb20d8568-2c49-4149-b310-904a7187a126\u002Fpolars.zip","93f0ab40fa5131d63ee3fd472214a75bc3512f519797242e41de653dbc4075b9","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":10093}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]