[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-60a48c10-f337-4b76-b991-c551bb1d1e18":3,"$fBv_IP-VpYj-Z6-7gYIXLpmw28GJXtSoSTQnpTpWLakM":43},{"id":4,"title":5,"description":6,"categoryId":7,"moduleId":8,"tags":9,"prompt":10,"icon":11,"source":12,"sourceUrl":13,"authorId":14,"authorName":15,"isPublic":16,"stars":17,"runs":18,"createdAt":19,"updatedAt":19,"module":20,"category":27,"packages":34},"60a48c10-f337-4b76-b991-c551bb1d1e18","scanpy","Scanpy是一个基于AnnData的可扩展Python工具包，用于分析单细胞RNA-seq数据。应用此技能进行完整的单细胞工作流程，包括质量控制、归一化、降维、聚类、标记基因识别、可视化和轨迹分析。","cat_life_career","mod_other","sickn33,other","---\nname: scanpy\ndescription: \"Scanpy is a scalable Python toolkit for analyzing single-cell RNA-seq data, built on AnnData. Apply this skill for complete single-cell workflows including quality control, normalization, dimensionality reduction, clustering, marker gene identification, visualization, and trajectory analysis.\"\nlicense: SD-3-Clause license\nmetadata:\n    skill-author: K-Dense Inc.\nrisk: unknown\nsource: community\n---\n\n# Scanpy: Single-Cell Analysis\n\n## Overview\n\nScanpy is a scalable Python toolkit for analyzing single-cell RNA-seq data, built on AnnData. Apply this skill for complete single-cell workflows including quality control, normalization, dimensionality reduction, clustering, marker gene identification, visualization, and trajectory analysis.\n\n## When to Use This Skill\n\nThis skill should be used when:\n- Analyzing single-cell RNA-seq data (.h5ad, 10X, CSV formats)\n- Performing quality control on scRNA-seq datasets\n- Creating UMAP, t-SNE, or PCA visualizations\n- Identifying cell clusters and finding marker genes\n- Annotating cell types based on gene expression\n- Conducting trajectory inference or pseudotime analysis\n- Generating publication-quality single-cell plots\n\n## Quick Start\n\n### Basic Import and Setup\n\n```python\nimport scanpy as sc\nimport pandas as pd\nimport numpy as np\n\n# Configure settings\nsc.settings.verbosity = 3\nsc.settings.set_figure_params(dpi=80, facecolor='white')\nsc.settings.figdir = '.\u002Ffigures\u002F'\n```\n\n### Loading Data\n\n```python\n# From 10X Genomics\nadata = sc.read_10x_mtx('path\u002Fto\u002Fdata\u002F')\nadata = sc.read_10x_h5('path\u002Fto\u002Fdata.h5')\n\n# From h5ad (AnnData format)\nadata = sc.read_h5ad('path\u002Fto\u002Fdata.h5ad')\n\n# From CSV\nadata = sc.read_csv('path\u002Fto\u002Fdata.csv')\n```\n\n### Understanding AnnData Structure\n\nThe AnnData object is the core data structure in scanpy:\n\n```python\nadata.X          # Expression matrix (cells × genes)\nadata.obs        # Cell metadata (DataFrame)\nadata.var        # Gene metadata (DataFrame)\nadata.uns        # Unstructured annotations (dict)\nadata.obsm       # Multi-dimensional cell data (PCA, UMAP)\nadata.raw        # Raw data backup\n\n# Access cell and gene names\nadata.obs_names  # Cell barcodes\nadata.var_names  # Gene names\n```\n\n## Standard Analysis Workflow\n\n### 1. Quality Control\n\nIdentify and filter low-quality cells and genes:\n\n```python\n# Identify mitochondrial genes\nadata.var['mt'] = adata.var_names.str.startswith('MT-')\n\n# Calculate QC metrics\nsc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], inplace=True)\n\n# Visualize QC metrics\nsc.pl.violin(adata, ['n_genes_by_counts', 'total_counts', 'pct_counts_mt'],\n             jitter=0.4, multi_panel=True)\n\n# Filter cells and genes\nsc.pp.filter_cells(adata, min_genes=200)\nsc.pp.filter_genes(adata, min_cells=3)\nadata = adata[adata.obs.pct_counts_mt \u003C 5, :]  # Remove high MT% cells\n```\n\n**Use the QC script for automated analysis:**\n```bash\npython scripts\u002Fqc_analysis.py input_file.h5ad --output filtered.h5ad\n```\n\n### 2. Normalization and Preprocessing\n\n```python\n# Normalize to 10,000 counts per cell\nsc.pp.normalize_total(adata, target_sum=1e4)\n\n# Log-transform\nsc.pp.log1p(adata)\n\n# Save raw counts for later\nadata.raw = adata\n\n# Identify highly variable genes\nsc.pp.highly_variable_genes(adata, n_top_genes=2000)\nsc.pl.highly_variable_genes(adata)\n\n# Subset to highly variable genes\nadata = adata[:, adata.var.highly_variable]\n\n# Regress out unwanted variation\nsc.pp.regress_out(adata, ['total_counts', 'pct_counts_mt'])\n\n# Scale data\nsc.pp.scale(adata, max_value=10)\n```\n\n### 3. Dimensionality Reduction\n\n```python\n# PCA\nsc.tl.pca(adata, svd_solver='arpack')\nsc.pl.pca_variance_ratio(adata, log=True)  # Check elbow plot\n\n# Compute neighborhood graph\nsc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)\n\n# UMAP for visualization\nsc.tl.umap(adata)\nsc.pl.umap(adata, color='leiden')\n\n# Alternative: t-SNE\nsc.tl.tsne(adata)\n```\n\n### 4. Clustering\n\n```python\n# Leiden clustering (recommended)\nsc.tl.leiden(adata, resolution=0.5)\nsc.pl.umap(adata, color='leiden', legend_loc='on data')\n\n# Try multiple resolutions to find optimal granularity\nfor res in [0.3, 0.5, 0.8, 1.0]:\n    sc.tl.leiden(adata, resolution=res, key_added=f'leiden_{res}')\n```\n\n### 5. Marker Gene Identification\n\n```python\n# Find marker genes for each cluster\nsc.tl.rank_genes_groups(adata, 'leiden', method='wilcoxon')\n\n# Visualize results\nsc.pl.rank_genes_groups(adata, n_genes=25, sharey=False)\nsc.pl.rank_genes_groups_heatmap(adata, n_genes=10)\nsc.pl.rank_genes_groups_dotplot(adata, n_genes=5)\n\n# Get results as DataFrame\nmarkers = sc.get.rank_genes_groups_df(adata, group='0')\n```\n\n### 6. Cell Type Annotation\n\n```python\n# Define marker genes for known cell types\nmarker_genes = ['CD3D', 'CD14', 'MS4A1', 'NKG7', 'FCGR3A']\n\n# Visualize markers\nsc.pl.umap(adata, color=marker_genes, use_raw=True)\nsc.pl.dotplot(adata, var_names=marker_genes, groupby='leiden')\n\n# Manual annotation\ncluster_to_celltype = {\n    '0': 'CD4 T cells',\n    '1': 'CD14+ Monocytes',\n    '2': 'B cells',\n    '3': 'CD8 T cells',\n}\nadata.obs['cell_type'] = adata.obs['leiden'].map(cluster_to_celltype)\n\n# Visualize annotated types\nsc.pl.umap(adata, color='cell_type', legend_loc='on data')\n```\n\n### 7. Save Results\n\n```python\n# Save processed data\nadata.write('results\u002Fprocessed_data.h5ad')\n\n# Export metadata\nadata.obs.to_csv('results\u002Fcell_metadata.csv')\nadata.var.to_csv('results\u002Fgene_metadata.csv')\n```\n\n## Common Tasks\n\n### Creating Publication-Quality Plots\n\n```python\n# Set high-quality defaults\nsc.settings.set_figure_params(dpi=300, frameon=False, figsize=(5, 5))\nsc.settings.file_format_figs = 'pdf'\n\n# UMAP with custom styling\nsc.pl.umap(adata, color='cell_type',\n           palette='Set2',\n           legend_loc='on data',\n           legend_fontsize=12,\n           legend_fontoutline=2,\n           frameon=False,\n           save='_publication.pdf')\n\n# Heatmap of marker genes\nsc.pl.heatmap(adata, var_names=genes, groupby='cell_type',\n              swap_axes=True, show_gene_labels=True,\n              save='_markers.pdf')\n\n# Dot plot\nsc.pl.dotplot(adata, var_names=genes, groupby='cell_type',\n              save='_dotplot.pdf')\n```\n\nRefer to `references\u002Fplotting_guide.md` for comprehensive visualization examples.\n\n### Trajectory Inference\n\n```python\n# PAGA (Partition-based graph abstraction)\nsc.tl.paga(adata, groups='leiden')\nsc.pl.paga(adata, color='leiden')\n\n# Diffusion pseudotime\nadata.uns['iroot'] = np.flatnonzero(adata.obs['leiden'] == '0')[0]\nsc.tl.dpt(adata)\nsc.pl.umap(adata, color='dpt_pseudotime')\n```\n\n### Differential Expression Between Conditions\n\n```python\n# Compare treated vs control within cell types\nadata_subset = adata[adata.obs['cell_type'] == 'T cells']\nsc.tl.rank_genes_groups(adata_subset, groupby='condition',\n                         groups=['treated'], reference='control')\nsc.pl.rank_genes_groups(adata_subset, groups=['treated'])\n```\n\n### Gene Set Scoring\n\n```python\n# Score cells for gene set expression\ngene_set = ['CD3D', 'CD3E', 'CD3G']\nsc.tl.score_genes(adata, gene_set, score_name='T_cell_score')\nsc.pl.umap(adata, color='T_cell_score')\n```\n\n### Batch Correction\n\n```python\n# ComBat batch correction\nsc.pp.combat(adata, key='batch')\n\n# Alternative: use Harmony or scVI (separate packages)\n```\n\n## Key Parameters to Adjust\n\n### Quality Control\n- `min_genes`: Minimum genes per cell (typically 200-500)\n- `min_cells`: Minimum cells per gene (typically 3-10)\n- `pct_counts_mt`: Mitochondrial threshold (typically 5-20%)\n\n### Normalization\n- `target_sum`: Target counts per cell (default 1e4)\n\n### Feature Selection\n- `n_top_genes`: Number of HVGs (typically 2000-3000)\n- `min_mean`, `max_mean`, `min_disp`: HVG selection parameters\n\n### Dimensionality Reduction\n- `n_pcs`: Number of principal components (check variance ratio plot)\n- `n_neighbors`: Number of neighbors (typically 10-30)\n\n### Clustering\n- `resolution`: Clustering granularity (0.4-1.2, higher = more clusters)\n\n## Common Pitfalls and Best Practices\n\n1. **Always save raw counts**: `adata.raw = adata` before filtering genes\n2. **Check QC plots carefully**: Adjust thresholds based on dataset quality\n3. **Use Leiden over Louvain**: More efficient and better results\n4. **Try multiple clustering resolutions**: Find optimal granularity\n5. **Validate cell type annotations**: Use multiple marker genes\n6. **Use `use_raw=True` for gene expression plots**: Shows original counts\n7. **Check PCA variance ratio**: Determine optimal number of PCs\n8. **Save intermediate results**: Long workflows can fail partway through\n\n## Bundled Resources\n\n### scripts\u002Fqc_analysis.py\nAutomated quality control script that calculates metrics, generates plots, and filters data:\n\n```bash\npython scripts\u002Fqc_analysis.py input.h5ad --output filtered.h5ad \\\n    --mt-threshold 5 --min-genes 200 --min-cells 3\n```\n\n### references\u002Fstandard_workflow.md\nComplete step-by-step workflow with detailed explanations and code examples for:\n- Data loading and setup\n- Quality control with visualization\n- Normalization and scaling\n- Feature selection\n- Dimensionality reduction (PCA, UMAP, t-SNE)\n- Clustering (Leiden, Louvain)\n- Marker gene identification\n- Cell type annotation\n- Trajectory inference\n- Differential expression\n\nRead this reference when performing a complete analysis from scratch.\n\n### references\u002Fapi_reference.md\nQuick reference guide for scanpy functions organized by module:\n- Reading\u002Fwriting data (`sc.read_*`, `adata.write_*`)\n- Preprocessing (`sc.pp.*`)\n- Tools (`sc.tl.*`)\n- Plotting (`sc.pl.*`)\n- AnnData structure and manipulation\n- Settings and utilities\n\nUse this for quick lookup of function signatures and common parameters.\n\n### references\u002Fplotting_guide.md\nComprehensive visualization guide including:\n- Quality control plots\n- Dimensionality reduction visualizations\n- Clustering visualizations\n- Marker gene plots (heatmaps, dot plots, violin plots)\n- Trajectory and pseudotime plots\n- Publication-quality customization\n- Multi-panel figures\n- Color palettes and styling\n\nConsult this when creating publication-ready figures.\n\n### assets\u002Fanalysis_template.py\nComplete analysis template providing a full workflow from data loading through cell type annotation. Copy and customize this template for new analyses:\n\n```bash\ncp assets\u002Fanalysis_template.py my_analysis.py\n# Edit parameters and run\npython my_analysis.py\n```\n\nThe template includes all standard steps with configurable parameters and helpful comments.\n\n## Additional Resources\n\n- **Official scanpy documentation**: https:\u002F\u002Fscanpy.readthedocs.io\u002F\n- **Scanpy tutorials**: https:\u002F\u002Fscanpy-tutorials.readthedocs.io\u002F\n- **scverse ecosystem**: https:\u002F\u002Fscverse.org\u002F (related tools: squidpy, scvi-tools, cellrank)\n- **Best practices**: Luecken & Theis (2019) \"Current best practices in single-cell RNA-seq\"\n\n## Tips for Effective Analysis\n\n1. **Start with the template**: Use `assets\u002Fanalysis_template.py` as a starting point\n2. **Run QC script first**: Use `scripts\u002Fqc_analysis.py` for initial filtering\n3. **Consult references as needed**: Load workflow and API references into context\n4. **Iterate on clustering**: Try multiple resolutions and visualization methods\n5. **Validate biologically**: Check marker genes match expected cell types\n6. **Document parameters**: Record QC thresholds and analysis settings\n7. **Save checkpoints**: Write intermediate results at key steps\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.\n","","imported","https:\u002F\u002Fgithub.com\u002Fsickn33\u002Fantigravity-awesome-skills","user_system_seed","SkillOPIC",true,132,1076,"2026-05-16 13:37:52",{"id":8,"name":21,"slug":22,"icon":23,"description":24,"sort":25,"createdAt":26},"其他","other","mdi-page-next-outline","其他类型Skill",5,"2026-05-16 12:53:40",{"id":7,"name":28,"slug":29,"icon":30,"description":31,"moduleId":8,"sort":32,"skillCount":33,"createdAt":26},"职场发展","career","mdi-briefcase-outline","面试准备、简历优化、职业规划",4,575,[35],{"id":36,"skillId":4,"version":37,"fileName":38,"fileSize":39,"filePath":40,"fileHash":41,"manifest":42,"createdAt":19},"223056e7-b493-4dff-baca-749a4ad48c63","1.0.0","scanpy.zip",4532,"uploads\u002Fskills\u002F60a48c10-f337-4b76-b991-c551bb1d1e18\u002Fscanpy.zip","1a3866dc08514c12fcd226f7b78aca148bfdc356046f4a6401b01ae5dc116b27","[{\"path\":\"SKILL.md\",\"isDirectory\":false,\"size\":11647}]",{"code":44,"message":45,"data":46},200,"success",{"items":47,"stats":48,"page":51},[],{"averageRating":49,"totalRatings":49,"ratingCounts":50},0,[49,49,49,49,49],{"limit":52,"offset":49,"hasMore":53,"nextOffset":52,"ratedOnly":16},15,false]