[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-cafferychen777--mLLMCelltype":3,"tool-cafferychen777--mLLMCelltype":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",142651,2,"2026-04-06T23:34:12",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":77,"owner_email":77,"owner_twitter":78,"owner_website":79,"owner_url":80,"languages":81,"stars":102,"forks":103,"last_commit_at":104,"license":105,"difficulty_score":32,"env_os":106,"env_gpu":107,"env_ram":108,"env_deps":109,"category_tags":117,"github_topics":118,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":129,"updated_at":130,"faqs":131,"releases":160},4949,"cafferychen777\u002FmLLMCelltype","mLLMCelltype","Cell type annotation for single-cell RNA-seq using multi-LLM consensus","mLLMCelltype 是一款专为单细胞 RNA 测序（scRNA-seq）数据设计的自动化细胞类型注释工具。在单细胞分析中，准确识别细胞类型往往依赖专家经验或参考数据集，不仅耗时且容易受单一模型偏差影响。mLLMCelltype 创新性地引入了“多大型语言模型共识”机制，通过整合 OpenAI、Claude、Gemini、DeepSeek 等十余种主流大模型的预测结果，让多个 AI“专家”对同一数据进行多轮讨论与交叉验证，从而达成更可靠的注释结论。\n\n该方法无需依赖外部参考数据集，即可实现高达 95% 的注释准确率，并能输出不确定性指标（如共识比例和香农熵），帮助研究者识别存疑结果。它无缝兼容 Scanpy 和 Seurat 等常用生物信息学流程，既提供 Python\u002FR 代码包供开发者调用，也拥有免安装的网页版界面，极大降低了使用门槛。\n\nmLLMCelltype 特别适合生物信息学研究人员、基因组学科学家以及需要处理单细胞数据的实验室团队使用。其独特的多模型迭代讨论架构，有效克服了单一 AI 模型的局限性，为复杂生物数据的解读提供了更加稳健、透明的解决方案。","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fcafferychen777_mLLMCelltype_readme_10b9dbfecc53.png\" alt=\"mLLMCelltype logo\" width=\"300\"\u002F>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"README_CN.md\">中文\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fcafferychen777\u002FmLLMCelltype\u002Fstargazers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fcafferychen777\u002FmLLMCelltype?style=social\" alt=\"GitHub stars\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fcafferychen777\u002FmLLMCelltype\u002Fnetwork\u002Fmembers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fcafferychen777\u002FmLLMCelltype?style=social\" alt=\"GitHub forks\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002Fpb2aZdG4\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-Join%20Chat-7289da?logo=discord&logoColor=white\" alt=\"Discord\">\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002FCRAN.R-project.org\u002Fpackage=mLLMCelltype\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fcafferychen777_mLLMCelltype_readme_15339a69db17.png\" alt=\"CRAN version\">\u003C\u002Fa>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fcafferychen777\u002FmLLMCelltype\" alt=\"License\">\n  \u003Ca href=\"https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2025.04.10.647852v1\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FbioRxiv-2025.04.10.647852-blue\" alt=\"bioRxiv preprint\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fmllmcelltype\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fmllmcelltype\" alt=\"PyPI version\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fcafferychen777\u002FmLLMCelltype\u002Fblob\u002Fmain\u002Fnotebooks\u002FmLLMCelltype_Tutorial.ipynb\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOpen%20in-Colab-F9AB00?logo=googlecolab&logoColor=white\" alt=\"Open in Colab\">\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n# mLLMCelltype: Multi-LLM Consensus Framework for Cell Type Annotation\n\nmLLMCelltype is a multi-LLM consensus framework for automated cell type annotation in single-cell RNA sequencing (scRNA-seq) data. The framework integrates multiple large language models including OpenAI GPT-5.2, Anthropic Claude-4.6\u002F4.5, Google Gemini-3, X.AI Grok-4, DeepSeek-V3, Alibaba Qwen3, Zhipu GLM-4, MiniMax, Stepfun, and OpenRouter to improve annotation accuracy through consensus-based predictions.\n\n## Abstract\n\nmLLMCelltype is an open-source tool for single-cell transcriptomics analysis that uses multiple large language models to identify cell types from gene expression data. The software implements a consensus approach where multiple models analyze the same data and their predictions are combined, which helps reduce errors and provides uncertainty metrics. This methodology offers advantages over single-model approaches through integration of multiple model predictions. mLLMCelltype integrates with single-cell analysis platforms such as Scanpy and Seurat, allowing researchers to incorporate it into existing workflows. The method does not require reference datasets for annotation.\n\nIn our benchmarks ([Yang et al., 2025](https:\u002F\u002Fdoi.org\u002F10.1101\u002F2025.04.10.647852)), the consensus approach achieved up to 95% accuracy on tested datasets.\n\n## Table of Contents\n- [Key Features](#key-features)\n- [Installation](#installation)\n- [Usage Examples](#usage-examples)\n- [Visualization Example](#visualization-example)\n- [Citation](#citation)\n- [Contributing](#contributing)\n\n**Web Application**: A browser-based interface is available at [mllmcelltype.com](https:\u002F\u002Fmllmcelltype.com) (no installation required).\n\n**See also**: [FlashDeconv](https:\u002F\u002Fgithub.com\u002Fcafferychen777\u002FFlashDeconv) — cell type deconvolution for spatial transcriptomics (Visium, Visium HD, Stereo-seq).\n\n## Key Features\n\n- **Multi-LLM Consensus**: Integrates predictions from multiple LLMs to reduce single-model limitations and biases\n- **Model Support**: Compatible with 10+ LLM providers including OpenAI, Anthropic, Google, and others\n- **Iterative Discussion**: LLMs evaluate evidence and refine annotations through multiple rounds of discussion\n- **Uncertainty Quantification**: Provides Consensus Proportion and Shannon Entropy metrics to identify uncertain annotations\n- **Cross-Model Validation**: Reduces incorrect predictions through multi-model comparison\n- **Noise Tolerance**: Maintains accuracy with imperfect marker gene lists\n- **Hierarchical Annotation**: Supports multi-resolution analysis with consistency checks\n- **Reference-Free**: Performs annotation without pre-training or reference datasets\n- **Documentation**: Records complete reasoning process for transparency\n- **Integration**: Compatible with Scanpy\u002FSeurat workflows and marker gene outputs\n- **Extensibility**: Supports addition of new LLMs as they become available\n\nFor changelog and updates, see [NEWS.md](R\u002FNEWS.md).\n\n## Installation\n\n### R Version\n\n```r\n# Install from CRAN (recommended)\ninstall.packages(\"mLLMCelltype\")\n\n# Or install development version from GitHub\ndevtools::install_github(\"cafferychen777\u002FmLLMCelltype\", subdir = \"R\")\n```\n\n### Python Version\n\n[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1ZgmtlaORogSy0-QsaF0CHwFWOyOD26d2?usp=sharing)\n\n**Quick Start**: Try mLLMCelltype in Google Colab without any installation. Click the badge above to open an interactive notebook with examples and step-by-step guidance.\n\n```bash\n# Install from PyPI\npip install mllmcelltype\n\n# Or install from GitHub (note the subdirectory parameter)\npip install git+https:\u002F\u002Fgithub.com\u002Fcafferychen777\u002FmLLMCelltype.git#subdirectory=python\n```\n\n#### Important Note on Dependencies\n\nmLLMCelltype uses a modular design where different LLM provider libraries are optional dependencies. Depending on which models you plan to use, you'll need to install the corresponding packages:\n\n```bash\n# For using OpenAI models (GPT-5, etc.)\npip install \"mllmcelltype[openai]\"\n\n# For using Anthropic models (Claude)\npip install \"mllmcelltype[anthropic]\"\n\n# For using Google models (Gemini)\npip install \"mllmcelltype[gemini]\"\n\n# To install all optional dependencies at once\npip install \"mllmcelltype[all]\"\n```\n\nIf you encounter errors like `ImportError: cannot import name 'genai' from 'google'`, it means you need to install the corresponding provider package. For example:\n\n```bash\n# For Google Gemini models\npip install google-genai\n```\n\n### Supported Models\n\n- **OpenAI**: GPT-5.2\u002FGPT-5\u002FGPT-4.1 ([API Key](https:\u002F\u002Fplatform.openai.com\u002Fsettings\u002Forganization\u002Fbilling\u002Foverview))\n- **Anthropic**: Claude-4.6-Opus\u002FClaude-4.5-Sonnet\u002FClaude-4.5-Haiku ([API Key](https:\u002F\u002Fconsole.anthropic.com\u002F))\n- **Google**: Gemini-3-Pro\u002FGemini-3-Flash ([API Key](https:\u002F\u002Fai.google.dev\u002F?authuser=2))\n- **Alibaba**: Qwen3-Max ([API Key](https:\u002F\u002Fwww.alibabacloud.com\u002Fen\u002Fproduct\u002Fmodelstudio))\n- **DeepSeek**: DeepSeek-V3\u002FDeepSeek-R1 ([API Key](https:\u002F\u002Fplatform.deepseek.com\u002Fusage))\n- **Minimax**: MiniMax-M2.1 ([API Key](https:\u002F\u002Fintl.minimaxi.com\u002Fuser-center\u002Fbasic-information\u002Finterface-key))\n- **Stepfun**: Step-3 ([API Key](https:\u002F\u002Fplatform.stepfun.com\u002Faccount-info))\n- **Zhipu**: GLM-4.7\u002FGLM-4-Plus ([API Key](https:\u002F\u002Fbigmodel.cn\u002F))\n- **X.AI**: Grok-4\u002FGrok-3 ([API Key](https:\u002F\u002Faccounts.x.ai\u002F))\n- **OpenRouter**: Access to multiple models through a single API ([API Key](https:\u002F\u002Fopenrouter.ai\u002Fkeys))\n  - Supports models from OpenAI, Anthropic, Meta, Google, Mistral, and more\n  - Format: 'provider\u002Fmodel-name' (e.g., 'openai\u002Fgpt-5.2', 'anthropic\u002Fclaude-opus-4.5')\n  - Free models available with `:free` suffix (e.g., 'deepseek\u002Fdeepseek-r1:free', 'meta-llama\u002Fllama-4-maverick:free')\n  - **Note**: Free tier limits: 50 requests\u002Fday (1000\u002Fday with $10+ credits), 20 requests\u002Fminute. Some models may be unavailable.\n\n## Usage Examples\n\n### Python\n\n```python\n# Example of using mLLMCelltype for single-cell RNA-seq cell type annotation with Scanpy\nimport scanpy as sc\nimport pandas as pd\nfrom mllmcelltype import annotate_clusters, interactive_consensus_annotation\nimport os\n\n# Note: Logging is automatically configured when importing mllmcelltype\n# You can customize logging if needed using the logging module\n\n# Load your single-cell RNA-seq dataset in AnnData format\nadata = sc.read_h5ad('your_data.h5ad')  # Replace with your scRNA-seq dataset path\n\n# Perform Leiden clustering for cell population identification if not already done\nif 'leiden' not in adata.obs.columns:\n    print(\"Computing leiden clustering for cell population identification...\")\n    # Preprocess single-cell data: normalize counts and log-transform for gene expression analysis\n    if 'log1p' not in adata.uns:\n        sc.pp.normalize_total(adata, target_sum=1e4)  # Normalize to 10,000 counts per cell\n        sc.pp.log1p(adata)  # Log-transform normalized counts\n\n    # Dimensionality reduction: calculate PCA for scRNA-seq data\n    if 'X_pca' not in adata.obsm:\n        sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)  # Select informative genes\n        sc.pp.pca(adata, use_highly_variable=True)  # Compute principal components\n\n    # Cell clustering: compute neighborhood graph and perform Leiden community detection\n    sc.pp.neighbors(adata, n_neighbors=10, n_pcs=30)  # Build KNN graph for clustering\n    sc.tl.leiden(adata, resolution=0.8)  # Identify cell populations using Leiden algorithm\n    print(f\"Leiden clustering completed, identified {len(adata.obs['leiden'].cat.categories)} distinct cell populations\")\n\n# Identify marker genes for each cell cluster using differential expression analysis\nsc.tl.rank_genes_groups(adata, 'leiden', method='wilcoxon')  # Wilcoxon rank-sum test for marker detection\n\n# Extract top marker genes for each cell cluster to use in cell type annotation\nmarker_genes = {}\nfor i in range(len(adata.obs['leiden'].cat.categories)):\n    # Select top 10 differentially expressed genes as markers for each cluster\n    genes = [adata.uns['rank_genes_groups']['names'][str(i)][j] for j in range(10)]\n    marker_genes[str(i)] = genes\n\n# IMPORTANT: mLLMCelltype requires gene symbols (e.g., KCNJ8, PDGFRA) not Ensembl IDs (e.g., ENSG00000176771)\n# If your AnnData object uses Ensembl IDs, convert them to gene symbols for accurate annotation:\n# Example conversion code:\n# if 'Gene' in adata.var.columns:  # Check if gene symbols are available in the metadata\n#     gene_name_dict = dict(zip(adata.var_names, adata.var['Gene']))\n#     marker_genes = {cluster: [gene_name_dict.get(gene_id, gene_id) for gene_id in genes]\n#                    for cluster, genes in marker_genes.items()}\n\n# IMPORTANT: mLLMCelltype requires numeric cluster IDs\n# The 'cluster' column must contain numeric values or values that can be converted to numeric.\n# Non-numeric cluster IDs (e.g., \"cluster_1\", \"T_cells\", \"7_0\") may cause errors or unexpected behavior.\n# If your data contains non-numeric cluster IDs, create a mapping between original IDs and numeric IDs:\n# Example standardization code:\n# original_ids = list(marker_genes.keys())\n# id_mapping = {original: idx for idx, original in enumerate(original_ids)}\n# marker_genes = {str(id_mapping[cluster]): genes for cluster, genes in marker_genes.items()}\n\n# Configure API keys for the large language models used in consensus annotation\n# At least one API key is required for multi-LLM consensus annotation\nos.environ[\"OPENAI_API_KEY\"] = \"your-openai-api-key\"      # For GPT-5.2\u002F5\u002F4.1 models\nos.environ[\"ANTHROPIC_API_KEY\"] = \"your-anthropic-api-key\"  # For Claude-4.6\u002F4.5 models\nos.environ[\"GEMINI_API_KEY\"] = \"your-gemini-api-key\"      # For Google Gemini-3 models\nos.environ[\"QWEN_API_KEY\"] = \"your-qwen-api-key\"        # For Alibaba Qwen3 models\n# Additional optional LLM providers for enhanced consensus diversity:\n# os.environ[\"DEEPSEEK_API_KEY\"] = \"your-deepseek-api-key\"   # For DeepSeek-V3 models\n# os.environ[\"ZHIPU_API_KEY\"] = \"your-zhipu-api-key\"       # For Zhipu GLM-4 models\n# os.environ[\"STEPFUN_API_KEY\"] = \"your-stepfun-api-key\"    # For Stepfun models\n# os.environ[\"MINIMAX_API_KEY\"] = \"your-minimax-api-key\"    # For MiniMax models\n# os.environ[\"OPENROUTER_API_KEY\"] = \"your-openrouter-api-key\"  # For accessing multiple models via OpenRouter\n\n# Execute multi-LLM consensus cell type annotation with iterative deliberation\nconsensus_results = interactive_consensus_annotation(\n    marker_genes=marker_genes,  # Dictionary of marker genes for each cluster\n    species=\"human\",            # Specify organism for appropriate cell type annotation\n    tissue=\"blood\",            # Specify tissue context for more accurate annotation\n    models=[\"gpt-5.2\", \"claude-sonnet-4-5-20250929\", \"gemini-3-pro\", \"qwen3-max\"],  # Multiple LLMs for consensus\n    consensus_threshold=1,     # Minimum proportion required for consensus agreement\n    max_discussion_rounds=3    # Number of deliberation rounds between models for refinement\n)\n\n# Alternatively, use OpenRouter for accessing multiple models through a single API\n# This is especially useful for accessing free models with the :free suffix\nos.environ[\"OPENROUTER_API_KEY\"] = \"your-openrouter-api-key\"\n\n# Example using free OpenRouter models (no credits required)\nfree_models_results = interactive_consensus_annotation(\n    marker_genes=marker_genes,\n    species=\"human\",\n    tissue=\"blood\",\n    models=[\n        {\"provider\": \"openrouter\", \"model\": \"meta-llama\u002Fllama-4-maverick:free\"},      # Meta Llama 4 Maverick (free)\n        {\"provider\": \"openrouter\", \"model\": \"venice\u002Funcensored:free\"},                # Venice Uncensored (free)\n        {\"provider\": \"openrouter\", \"model\": \"deepseek\u002Fdeepseek-r1:free\"},             # DeepSeek R1 (free, advanced reasoning)\n        {\"provider\": \"openrouter\", \"model\": \"meta-llama\u002Fllama-3.3-70b-instruct:free\"} # Meta Llama 3.3 70B (free)\n    ],\n    consensus_threshold=0.7,\n    max_discussion_rounds=2\n)\n\n# Retrieve final consensus cell type annotations from the multi-LLM deliberation\nfinal_annotations = consensus_results[\"consensus\"]\n\n# Integrate consensus cell type annotations into the original AnnData object\nadata.obs['consensus_cell_type'] = adata.obs['leiden'].astype(str).map(final_annotations)\n\n# Add uncertainty quantification metrics to evaluate annotation confidence\nadata.obs['consensus_proportion'] = adata.obs['leiden'].astype(str).map(consensus_results[\"consensus_proportion\"])  # Agreement level\nadata.obs['entropy'] = adata.obs['leiden'].astype(str).map(consensus_results[\"entropy\"])  # Annotation uncertainty\n\n# Prepare for visualization: compute UMAP embeddings if not already available\n# UMAP provides a 2D representation of cell populations for visualization\nif 'X_umap' not in adata.obsm:\n    print(\"Computing UMAP coordinates...\")\n    # Make sure neighbors are computed first\n    if 'neighbors' not in adata.uns:\n        sc.pp.neighbors(adata, n_neighbors=10, n_pcs=30)\n    sc.tl.umap(adata)\n    print(\"UMAP coordinates computed\")\n\n# Visualize results with enhanced aesthetics\n# Basic visualization\nsc.pl.umap(adata, color='consensus_cell_type', legend_loc='right', frameon=True, title='mLLMCelltype Consensus Annotations')\n\n# More customized visualization\nimport matplotlib.pyplot as plt\n\n# Set figure size and style\nplt.rcParams['figure.figsize'] = (10, 8)\nplt.rcParams['font.size'] = 12\n\n# Create a more publication-ready UMAP\nfig, ax = plt.subplots(1, 1, figsize=(12, 10))\nsc.pl.umap(adata, color='consensus_cell_type', legend_loc='on data',\n         frameon=True, title='mLLMCelltype Consensus Annotations',\n         palette='tab20', size=50, legend_fontsize=12,\n         legend_fontoutline=2, ax=ax)\n\n# Visualize uncertainty metrics\nfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7))\nsc.pl.umap(adata, color='consensus_proportion', ax=ax1, title='Consensus Proportion',\n         cmap='viridis', vmin=0, vmax=1, size=30)\nsc.pl.umap(adata, color='entropy', ax=ax2, title='Annotation Uncertainty (Shannon Entropy)',\n         cmap='magma', vmin=0, size=30)\nplt.tight_layout()\n```\n\n### Using a Single Free OpenRouter Model\n\nFor users who prefer a simpler approach with just one model, the DeepSeek R1 free model via OpenRouter can be used without API credits:\n\n```python\nimport os\nfrom mllmcelltype import annotate_clusters\n\n# Note: Logging is automatically configured\n\n# Set your OpenRouter API key\nos.environ[\"OPENROUTER_API_KEY\"] = \"your-openrouter-api-key\"\n\n# Define marker genes for each cluster\nmarker_genes = {\n    \"0\": [\"CD3D\", \"CD3E\", \"CD3G\", \"CD2\", \"IL7R\", \"TCF7\"],           # T cells\n    \"1\": [\"CD19\", \"MS4A1\", \"CD79A\", \"CD79B\", \"HLA-DRA\", \"CD74\"],   # B cells\n    \"2\": [\"CD14\", \"LYZ\", \"CSF1R\", \"ITGAM\", \"CD68\", \"FCGR3A\"]      # Monocytes\n}\n\n# Annotate using DeepSeek R1 free model\nannotations = annotate_clusters(\n    marker_genes=marker_genes,\n    species='human',\n    tissue='peripheral blood',\n    provider='openrouter',\n    model='deepseek\u002Fdeepseek-r1:free'  # Free model with advanced reasoning\n)\n\n# Print annotations\nfor cluster, annotation in annotations.items():\n    print(f\"Cluster {cluster}: {annotation}\")\n```\n\nThis approach uses a free model and does not require API credits.\n\n#### Extracting Marker Genes from AnnData Objects\n\nIf you're using Scanpy with AnnData objects, you can easily extract marker genes directly from the `rank_genes_groups` results:\n\n```python\nimport os\nimport scanpy as sc\nfrom mllmcelltype import annotate_clusters\n\n# Note: Logging is automatically configured\n\n# Set your OpenRouter API key\nos.environ[\"OPENROUTER_API_KEY\"] = \"your-openrouter-api-key\"\n\n# Load and preprocess your data\nadata = sc.read_h5ad('your_data.h5ad')\n\n# Perform preprocessing and clustering if not already done\n# sc.pp.normalize_total(adata, target_sum=1e4)\n# sc.pp.log1p(adata)\n# sc.pp.highly_variable_genes(adata)\n# sc.pp.pca(adata)\n# sc.pp.neighbors(adata)\n# sc.tl.leiden(adata)\n\n# Find marker genes for each cluster\nsc.tl.rank_genes_groups(adata, 'leiden', method='wilcoxon')\n\n# Extract top marker genes for each cluster\nmarker_genes = {\n    cluster: adata.uns['rank_genes_groups']['names'][cluster][:10].tolist()\n    for cluster in adata.obs['leiden'].cat.categories\n}\n\n# Annotate using DeepSeek R1 free model\nannotations = annotate_clusters(\n    marker_genes=marker_genes,\n    species='human',\n    tissue='peripheral blood',  # adjust based on your tissue type\n    provider='openrouter',\n    model='deepseek\u002Fdeepseek-r1:free'  # Free model\n)\n\n# Add annotations to AnnData object\nadata.obs['cell_type'] = adata.obs['leiden'].astype(str).map(annotations)\n\n# Visualize results\nsc.pl.umap(adata, color='cell_type', legend_loc='on data',\n           frameon=True, title='Cell Types Annotated by DeepSeek R1')\n```\n\nThis method automatically extracts the top differentially expressed genes for each cluster from the `rank_genes_groups` results, making it easy to integrate mLLMCelltype into your Scanpy workflow.\n\n### R\n\n> **Note**: For more detailed R tutorials and documentation, please visit the [mLLMCelltype documentation website](https:\u002F\u002Fcafferyang.com\u002FmLLMCelltype\u002F).\n\n#### Using Seurat Object\n\n```r\n# Load required packages\nlibrary(mLLMCelltype)\nlibrary(Seurat)\nlibrary(dplyr)\nlibrary(ggplot2)\nlibrary(cowplot) # Added for plot_grid\n\n# Load your preprocessed Seurat object\npbmc \u003C- readRDS(\"your_seurat_object.rds\")\n\n# If starting with raw data, perform preprocessing steps\n# pbmc \u003C- NormalizeData(pbmc)\n# pbmc \u003C- FindVariableFeatures(pbmc, selection.method = \"vst\", nfeatures = 2000)\n# pbmc \u003C- ScaleData(pbmc)\n# pbmc \u003C- RunPCA(pbmc)\n# pbmc \u003C- FindNeighbors(pbmc, dims = 1:10)\n# pbmc \u003C- FindClusters(pbmc, resolution = 0.5)\n# pbmc \u003C- RunUMAP(pbmc, dims = 1:10)\n\n# Find marker genes for each cluster\npbmc_markers \u003C- FindAllMarkers(pbmc,\n                            only.pos = TRUE,\n                            min.pct = 0.25,\n                            logfc.threshold = 0.25)\n\n# Set up cache directory to speed up processing\ncache_dir \u003C- \".\u002Fmllmcelltype_cache\"\ndir.create(cache_dir, showWarnings = FALSE, recursive = TRUE)\n\n# Choose a model from any supported provider\n# Supported models include:\n# - OpenAI: 'gpt-5.2', 'gpt-5', 'gpt-4.1', 'o3-pro', 'o3', 'o4-mini', 'o1', 'o1-pro'\n# - Anthropic: 'claude-opus-4-6-20260205', 'claude-sonnet-4-5-20250929', 'claude-haiku-4-5-20251001'\n# - DeepSeek: 'deepseek-chat', 'deepseek-reasoner'\n# - Google: 'gemini-3-pro', 'gemini-3-flash', 'gemini-2.5-pro', 'gemini-2.0-flash'\n# - Qwen: 'qwen3-max', 'qwen-max-2025-01-25'\n# - Stepfun: 'step-3', 'step-2-16k', 'step-2-mini'\n# - Zhipu: 'glm-4.7', 'glm-4-plus'\n# - MiniMax: 'minimax-m2.1', 'minimax-m2'\n# - Grok: 'grok-4', 'grok-4.1', 'grok-4-heavy', 'grok-3', 'grok-3-fast', 'grok-3-mini'\n# - OpenRouter: Access to models from multiple providers through a single API. Format: 'provider\u002Fmodel-name'\n#   - OpenAI models: 'openai\u002Fgpt-5.2', 'openai\u002Fgpt-5', 'openai\u002Fo3-pro', 'openai\u002Fo4-mini'\n#   - Anthropic models: 'anthropic\u002Fclaude-opus-4.5', 'anthropic\u002Fclaude-sonnet-4.5', 'anthropic\u002Fclaude-haiku-4.5'\n#   - Meta models: 'meta-llama\u002Fllama-4-maverick', 'meta-llama\u002Fllama-4-scout', 'meta-llama\u002Fllama-3.3-70b-instruct'\n#   - Google models: 'google\u002Fgemini-3-pro', 'google\u002Fgemini-3-flash', 'google\u002Fgemini-2.5-pro'\n#   - Mistral models: 'mistralai\u002Fmistral-large', 'mistralai\u002Fmagistral-medium-2506'\n#   - Other models: 'deepseek\u002Fdeepseek-r1', 'deepseek\u002Fdeepseek-chat-v3.1', 'microsoft\u002Fmai-ds-r1'\n\n# Run LLMCelltype annotation with multiple LLM models\nconsensus_results \u003C- interactive_consensus_annotation(\n  input = pbmc_markers,\n  tissue_name = \"human PBMC\",  # provide tissue context\n  models = c(\n    \"claude-sonnet-4-5-20250929\",    # Anthropic\n    \"gpt-5.2\",                  # OpenAI\n    \"gemini-3-pro\",           # Google\n    \"qwen3-max\"               # Alibaba\n  ),\n  api_keys = list(\n    anthropic = \"your-anthropic-key\",\n    openai = \"your-openai-key\",\n    gemini = \"your-google-key\",\n    qwen = \"your-qwen-key\"\n  ),\n  top_gene_count = 10,\n  controversy_threshold = 1.0,\n  entropy_threshold = 1.0,\n  cache_dir = cache_dir\n)\n\n# Print structure of results to understand the data\nprint(\"Available fields in consensus_results:\")\nprint(names(consensus_results))\n\n# Add annotations to Seurat object\n# Get cell type annotations from consensus_results$final_annotations\ncluster_to_celltype_map \u003C- consensus_results$final_annotations\n\n# Create new cell type identifier column\ncell_types \u003C- as.character(Idents(pbmc))\nfor (cluster_id in names(cluster_to_celltype_map)) {\n  cell_types[cell_types == cluster_id] \u003C- cluster_to_celltype_map[[cluster_id]]\n}\n\n# Add cell type annotations to Seurat object\npbmc$cell_type \u003C- cell_types\n\n# Add uncertainty metrics\n# Extract detailed consensus results containing metrics\nconsensus_details \u003C- consensus_results$initial_results$consensus_results\n\n# Create a data frame with metrics for each cluster\nuncertainty_metrics \u003C- data.frame(\n  cluster_id = names(consensus_details),\n  consensus_proportion = sapply(consensus_details, function(res) res$consensus_proportion),\n  entropy = sapply(consensus_details, function(res) res$entropy)\n)\n\n# Add uncertainty metrics for each cell\n# Note: seurat_clusters is a metadata column automatically created by FindClusters() function\n# It contains the cluster ID assigned to each cell during clustering\n# Here we use it to map cluster-level metrics (consensus_proportion and entropy) to individual cells\n\n# If you don't have seurat_clusters column (e.g., if you used a different clustering method),\n# you can use the active identity (Idents) or any other cluster assignment in your metadata:\n# Option 1: Use active identity\n# current_clusters \u003C- as.character(Idents(pbmc))\n# Option 2: Use another metadata column that contains cluster IDs\n# current_clusters \u003C- pbmc$your_cluster_column\n\n# For this example, we use the standard seurat_clusters column:\ncurrent_clusters \u003C- pbmc$seurat_clusters  # Get cluster ID for each cell\n\n# Match each cell's cluster ID with the corresponding metrics in uncertainty_metrics\npbmc$consensus_proportion \u003C- uncertainty_metrics$consensus_proportion[match(current_clusters, uncertainty_metrics$cluster_id)]\npbmc$entropy \u003C- uncertainty_metrics$entropy[match(current_clusters, uncertainty_metrics$cluster_id)]\n\n# Save results for future use\nsaveRDS(consensus_results, \"pbmc_mLLMCelltype_results.rds\")\nsaveRDS(pbmc, \"pbmc_annotated.rds\")\n\n# Visualize results with SCpubr for publication-ready plots\nif (!requireNamespace(\"SCpubr\", quietly = TRUE)) {\n  remotes::install_github(\"enblacar\u002FSCpubr\")\n}\nlibrary(SCpubr)\nlibrary(viridis)  # For color palettes\n\n# Basic UMAP visualization with default settings\npdf(\"pbmc_basic_annotations.pdf\", width=8, height=6)\nSCpubr::do_DimPlot(sample = pbmc,\n                  group.by = \"cell_type\",\n                  label = TRUE,\n                  legend.position = \"right\") +\n  ggtitle(\"mLLMCelltype Consensus Annotations\")\ndev.off()\n\n# More customized visualization with enhanced styling\npdf(\"pbmc_custom_annotations.pdf\", width=8, height=6)\nSCpubr::do_DimPlot(sample = pbmc,\n                  group.by = \"cell_type\",\n                  label = TRUE,\n                  label.box = TRUE,\n                  legend.position = \"right\",\n                  pt.size = 1.0,\n                  border.size = 1,\n                  font.size = 12) +\n  ggtitle(\"mLLMCelltype Consensus Annotations\") +\n  theme(plot.title = element_text(hjust = 0.5))\ndev.off()\n\n# Visualize uncertainty metrics with enhanced SCpubr plots\n# Get cell types and create a named color palette\ncell_types \u003C- unique(pbmc$cell_type)\ncolor_palette \u003C- viridis::viridis(length(cell_types))\nnames(color_palette) \u003C- cell_types\n\n# Cell type annotations with SCpubr\np1 \u003C- SCpubr::do_DimPlot(sample = pbmc,\n                  group.by = \"cell_type\",\n                  label = TRUE,\n                  legend.position = \"bottom\",  # Place legend at the bottom\n                  pt.size = 1.0,\n                  label.size = 4,  # Smaller label font size\n                  label.box = TRUE,  # Add background box to labels for better readability\n                  repel = TRUE,  # Make labels repel each other to avoid overlap\n                  colors.use = color_palette,\n                  plot.title = \"Cell Type\") +\n      theme(plot.title = element_text(hjust = 0.5, margin = margin(b = 15, t = 10)),\n            legend.text = element_text(size = 8),\n            legend.key.size = unit(0.3, \"cm\"),\n            plot.margin = unit(c(0.8, 0.8, 0.8, 0.8), \"cm\"))\n\n# Consensus proportion feature plot with SCpubr\np2 \u003C- SCpubr::do_FeaturePlot(sample = pbmc,\n                       features = \"consensus_proportion\",\n                       order = TRUE,\n                       pt.size = 1.0,\n                       enforce_symmetry = FALSE,\n                       legend.title = \"Consensus\",\n                       plot.title = \"Consensus Proportion\",\n                       sequential.palette = \"YlGnBu\",  # Yellow-Green-Blue gradient, following Nature Methods standards\n                       sequential.direction = 1,  # Light to dark direction\n                       min.cutoff = min(pbmc$consensus_proportion),  # Set minimum value\n                       max.cutoff = max(pbmc$consensus_proportion),  # Set maximum value\n                       na.value = \"lightgrey\") +  # Color for missing values\n      theme(plot.title = element_text(hjust = 0.5, margin = margin(b = 15, t = 10)),\n            plot.margin = unit(c(0.8, 0.8, 0.8, 0.8), \"cm\"))\n\n# Shannon entropy feature plot with SCpubr\np3 \u003C- SCpubr::do_FeaturePlot(sample = pbmc,\n                       features = \"entropy\",\n                       order = TRUE,\n                       pt.size = 1.0,\n                       enforce_symmetry = FALSE,\n                       legend.title = \"Entropy\",\n                       plot.title = \"Shannon Entropy\",\n                       sequential.palette = \"OrRd\",  # Orange-Red gradient, following Nature Methods standards\n                       sequential.direction = -1,  # Dark to light direction (reversed)\n                       min.cutoff = min(pbmc$entropy),  # Set minimum value\n                       max.cutoff = max(pbmc$entropy),  # Set maximum value\n                       na.value = \"lightgrey\") +  # Color for missing values\n      theme(plot.title = element_text(hjust = 0.5, margin = margin(b = 15, t = 10)),\n            plot.margin = unit(c(0.8, 0.8, 0.8, 0.8), \"cm\"))\n\n# Combine plots with equal widths\npdf(\"pbmc_uncertainty_metrics.pdf\", width=18, height=7)\ncombined_plot \u003C- cowplot::plot_grid(p1, p2, p3, ncol = 3, rel_widths = c(1.2, 1.2, 1.2))\nprint(combined_plot)\ndev.off()\n```\n\n#### Using CSV Input\n\nYou can also use mLLMCelltype with CSV files directly without Seurat, which is useful for cases where you already have marker genes available in CSV format:\n\n```r\n# Install the latest version of mLLMCelltype\ndevtools::install_github(\"cafferychen777\u002FmLLMCelltype\", subdir = \"R\", force = TRUE)\n\n# Load necessary packages\nlibrary(mLLMCelltype)\n\n# Configure unified logging (optional - uses defaults if not specified)\nconfigure_logger(level = \"INFO\", console_output = TRUE, json_format = TRUE)\n\n# Create cache directory\ncache_dir \u003C- \"path\u002Fto\u002Fyour\u002Fcache\"\ndir.create(cache_dir, showWarnings = FALSE, recursive = TRUE)\n\n# Read CSV file content\nmarkers_file \u003C- \"path\u002Fto\u002Fyour\u002Fmarkers.csv\"\nfile_content \u003C- readLines(markers_file)\n\n# Skip header row\ndata_lines \u003C- file_content[-1]\n\n# Convert data to list format, using numeric indices as keys\nmarker_genes_list \u003C- list()\ncluster_names \u003C- c()\n\n# First collect all cluster names\nfor(line in data_lines) {\n  parts \u003C- strsplit(line, \",\", fixed = TRUE)[[1]]\n  cluster_names \u003C- c(cluster_names, parts[1])\n}\n\n# Then create marker_genes_list with numeric indices\nfor(i in 1:length(data_lines)) {\n  line \u003C- data_lines[i]\n  parts \u003C- strsplit(line, \",\", fixed = TRUE)[[1]]\n\n  # First part is the cluster name\n  cluster_name \u003C- parts[1]\n\n  # Use the original cluster ID as key (preserve input IDs as-is)\n  cluster_id \u003C- as.character(cluster_name)\n\n  # Remaining parts are genes\n  genes \u003C- parts[-1]\n\n  # Filter out NA and empty strings\n  genes \u003C- genes[!is.na(genes) & genes != \"\"]\n\n  # Add to marker_genes_list\n  marker_genes_list[[cluster_id]] \u003C- list(genes = genes)\n}\n\n# Set API keys\napi_keys \u003C- list(\n  gemini = \"YOUR_GEMINI_API_KEY\",\n  qwen = \"YOUR_QWEN_API_KEY\",\n  grok = \"YOUR_GROK_API_KEY\",\n  openai = \"YOUR_OPENAI_API_KEY\",\n  anthropic = \"YOUR_ANTHROPIC_API_KEY\"\n)\n\n# Run consensus annotation with paid models\nconsensus_results \u003C-\n  interactive_consensus_annotation(\n    input = marker_genes_list,\n    tissue_name = \"your tissue type\", # e.g., \"human heart\"\n    models = c(\"gemini-3-pro\",\n              \"gemini-3-flash\",\n              \"qwen3-max\",\n              \"grok-4\",\n              \"claude-sonnet-4-5-20250929\",\n              \"gpt-5.2\"),\n    api_keys = api_keys,\n    controversy_threshold = 0.6,\n    entropy_threshold = 1.0,\n    max_discussion_rounds = 3,\n    cache_dir = cache_dir\n  )\n\n# Alternatively, use free OpenRouter models (no credits required)\n# Add OpenRouter API key to the api_keys list\napi_keys$openrouter \u003C- \"your-openrouter-api-key\"\n\n# Run consensus annotation with free models\nfree_consensus_results \u003C-\n  interactive_consensus_annotation(\n    input = marker_genes_list,\n    tissue_name = \"your tissue type\", # e.g., \"human heart\"\n    models = c(\n      \"meta-llama\u002Fllama-4-maverick:free\",      # Meta Llama 4 Maverick (free)\n      \"venice\u002Funcensored:free\",                # Venice Uncensored (free)\n      \"deepseek\u002Fdeepseek-r1:free\",             # DeepSeek R1 (free, advanced reasoning)\n      \"meta-llama\u002Fllama-3.3-70b-instruct:free\" # Meta Llama 3.3 70B (free)\n    ),\n    api_keys = api_keys,\n    consensus_check_model = \"deepseek\u002Fdeepseek-r1:free\",  # Free model for consensus checking\n    controversy_threshold = 0.6,\n    entropy_threshold = 1.0,\n    max_discussion_rounds = 2,\n    cache_dir = cache_dir\n  )\n\n# Save results\nsaveRDS(consensus_results, \"your_results.rds\")\n\n# Print results summary\ncat(\"\\nResults summary:\\n\")\ncat(\"Available fields:\", paste(names(consensus_results), collapse=\", \"), \"\\n\\n\")\n\n# Print final annotations\ncat(\"Final cell type annotations:\\n\")\nfor(cluster in names(consensus_results$final_annotations)) {\n  cat(sprintf(\"%s: %s\\n\", cluster, consensus_results$final_annotations[[cluster]]))\n}\n```\n\n**Notes on CSV format**:\n- The CSV file should have values in the first column that will be used as indices (these can be cluster names, numbers like 0,1,2,3 or 1,2,3,4, etc.)\n- The values in the first column are only used for reference and are not passed to the LLMs\n- Subsequent columns should contain marker genes for each cluster\n- An example CSV file for cat heart tissue is included in the package at `inst\u002Fextdata\u002FCat_Heart_markers.csv`\n\nExample CSV structure:\n```\ncluster,gene\n0,Negr1,Cask,Tshz2,Ston2,Fstl1,Dse,Celf2,Hmcn2,Setbp1,Cblb\n1,Palld,Grb14,Mybpc3,Ensfcag00000044939,Dcun1d2,Acacb,Slco1c1,Ppp1r3c,Sema3c,Ppp1r14c\n2,Adgrf5,Tbx1,Slco2b1,Pi15,Adam23,Bmx,Pde8b,Pkhd1l1,Dtx1,Ensfcag00000051556\n3,Clec2d,Trat1,Rasgrp1,Card11,Cytip,Sytl3,Tmem156,Bcl11b,Lcp1,Lcp2\n```\n\nYou can access the example data in your R script using:\n```r\nsystem.file(\"extdata\", \"Cat_Heart_markers.csv\", package = \"mLLMCelltype\")\n```\n\n### Using a Single LLM Model\n\nIf you only want to use a single LLM model instead of the consensus approach, use the `annotate_cell_types()` function. This is useful when you have access to only one API key or prefer a specific model:\n\n```r\n# Load required packages\nlibrary(mLLMCelltype)\nlibrary(Seurat)\n\n# Load your preprocessed Seurat object\npbmc \u003C- readRDS(\"your_seurat_object.rds\")\n\n# Find marker genes for each cluster\npbmc_markers \u003C- FindAllMarkers(pbmc,\n                            only.pos = TRUE,\n                            min.pct = 0.25,\n                            logfc.threshold = 0.25)\n\n# Choose a model from any supported provider\n# Supported models include:\n# - OpenAI: 'gpt-5.2', 'gpt-5', 'gpt-4.1', 'o3-pro', 'o3', 'o4-mini', 'o1', 'o1-pro'\n# - Anthropic: 'claude-opus-4-6-20260205', 'claude-sonnet-4-5-20250929', 'claude-haiku-4-5-20251001'\n# - DeepSeek: 'deepseek-chat', 'deepseek-reasoner'\n# - Google: 'gemini-3-pro', 'gemini-3-flash', 'gemini-2.5-pro', 'gemini-2.0-flash'\n# - Qwen: 'qwen3-max', 'qwen-max-2025-01-25'\n# - Stepfun: 'step-3', 'step-2-16k', 'step-2-mini'\n# - Zhipu: 'glm-4.7', 'glm-4-plus'\n# - MiniMax: 'minimax-m2.1', 'minimax-m2'\n# - Grok: 'grok-4', 'grok-4.1', 'grok-4-heavy', 'grok-3', 'grok-3-fast', 'grok-3-mini'\n# - OpenRouter: Access to models from multiple providers through a single API. Format: 'provider\u002Fmodel-name'\n#   - OpenAI models: 'openai\u002Fgpt-5.2', 'openai\u002Fgpt-5', 'openai\u002Fo3-pro', 'openai\u002Fo4-mini'\n#   - Anthropic models: 'anthropic\u002Fclaude-opus-4.5', 'anthropic\u002Fclaude-sonnet-4.5', 'anthropic\u002Fclaude-haiku-4.5'\n#   - Meta models: 'meta-llama\u002Fllama-4-maverick', 'meta-llama\u002Fllama-4-scout', 'meta-llama\u002Fllama-3.3-70b-instruct'\n#   - Google models: 'google\u002Fgemini-3-pro', 'google\u002Fgemini-3-flash', 'google\u002Fgemini-2.5-pro'\n#   - Mistral models: 'mistralai\u002Fmistral-large', 'mistralai\u002Fmagistral-medium-2506'\n#   - Other models: 'deepseek\u002Fdeepseek-r1', 'deepseek\u002Fdeepseek-chat-v3.1', 'microsoft\u002Fmai-ds-r1'\n\n# Run cell type annotation with a single LLM model\nsingle_model_results \u003C- annotate_cell_types(\n  input = pbmc_markers,\n  tissue_name = \"human PBMC\",  # provide tissue context\n  model = \"claude-sonnet-4-5-20250929\",  # specify a single model (Claude Sonnet 4.5)\n  api_key = \"your-anthropic-key\",  # provide the API key directly\n  top_gene_count = 10\n)\n\n# Using a free OpenRouter model\nfree_model_results \u003C- annotate_cell_types(\n  input = pbmc_markers,\n  tissue_name = \"human PBMC\",\n  model = \"meta-llama\u002Fllama-4-maverick:free\",  # free model with :free suffix\n  api_key = \"your-openrouter-key\",\n  top_gene_count = 10\n)\n\n# Print the results\nprint(single_model_results)\n\n# Add annotations to Seurat object\n# single_model_results is a character vector with one annotation per cluster\npbmc$cell_type \u003C- plyr::mapvalues(\n  x = as.character(Idents(pbmc)),\n  from = names(single_model_results),\n  to = single_model_results\n)\n\n# Visualize results\nDimPlot(pbmc, group.by = \"cell_type\", label = TRUE) +\n  ggtitle(\"Cell Types Annotated by Single LLM Model\")\n```\n\n#### Comparing Different Models\n\nYou can also compare annotations from different models by running `annotate_cell_types()` multiple times with different models:\n\n```r\n# Define models to test\nmodels_to_test \u003C- c(\n  \"claude-sonnet-4-5-20250929\",     # Anthropic\n  \"gpt-5.2\",                    # OpenAI\n  \"gemini-3-pro\",              # Google\n  \"qwen3-max\"                  # Alibaba\n)\n\n# API keys for different providers\napi_keys \u003C- list(\n  anthropic = \"your-anthropic-key\",\n  openai = \"your-openai-key\",\n  gemini = \"your-gemini-key\",\n  qwen = \"your-qwen-key\"\n)\n\n# Test each model and store results\nresults \u003C- list()\nfor (model in models_to_test) {\n  provider \u003C- get_provider(model)\n  api_key \u003C- api_keys[[provider]]\n\n  # Run annotation\n  results[[model]] \u003C- annotate_cell_types(\n    input = pbmc_markers,\n    tissue_name = \"human PBMC\",\n    model = model,\n    api_key = api_key,\n    top_gene_count = 10\n  )\n\n  # Add to Seurat object\n  column_name \u003C- paste0(\"cell_type_\", gsub(\"[^a-zA-Z0-9]\", \"_\", model))\n  pbmc[[column_name]] \u003C- plyr::mapvalues(\n    x = as.character(Idents(pbmc)),\n    from = names(results[[model]]),\n    to = results[[model]]\n  )\n}\n```\n\n### Advanced Consensus Configuration: Specifying the Consensus Check Model\n\nThe `consensus_check_model` parameter (R) \u002F `consensus_model` parameter (Python) allows you to specify which LLM model to use for consensus checking and discussion moderation. This parameter is important for the accuracy of consensus annotation because the consensus check model:\n\n1. Evaluates semantic similarity between different cell type annotations\n2. Calculates consensus metrics (proportion and entropy)\n3. Moderates and synthesizes discussions between models for controversial clusters\n4. Makes final decisions when models disagree\n\nWe recommend using a capable model for consensus checking, as this directly impacts annotation quality.\n\n#### Recommended Models for Consensus Checking\n\n- **Anthropic**: `claude-opus-4-6-20260205`, `claude-sonnet-4-5-20250929`\n- **OpenAI**: `o1`, `o1-pro`, `gpt-5.2`, `gpt-4.1`\n- **Google**: `gemini-3-pro`, `gemini-3-flash`\n- **Other**: `deepseek-r1` \u002F `deepseek-reasoner`, `qwen3-max`, `grok-4`\n\n#### R Package Usage\n\n```r\n# Example 1: Specifying a consensus check model\nconsensus_results \u003C- interactive_consensus_annotation(\n  input = marker_genes_list,\n  tissue_name = \"human brain\",\n  models = c(\"gpt-5.2\", \"claude-sonnet-4-5-20250929\", \"gemini-3-pro\", \"qwen3-max\"),\n  api_keys = api_keys,\n  consensus_check_model = \"claude-sonnet-4-5-20250929\",\n  controversy_threshold = 0.7,\n  entropy_threshold = 1.0\n)\n\n# Example 2: Using an alternative consensus check model\nconsensus_results \u003C- interactive_consensus_annotation(\n  input = marker_genes_list,\n  tissue_name = \"mouse liver\",\n  models = c(\"gpt-5.2\", \"gemini-3-pro\", \"qwen3-max\"),\n  api_keys = api_keys,\n  consensus_check_model = \"claude-sonnet-4-5-20250929\",\n  controversy_threshold = 0.7,\n  entropy_threshold = 1.0\n)\n\n# Example 3: Using OpenAI's reasoning model\nconsensus_results \u003C- interactive_consensus_annotation(\n  input = marker_genes_list,\n  tissue_name = \"human immune cells\",\n  models = c(\"gpt-5.2\", \"claude-sonnet-4-5-20250929\", \"gemini-3-pro\"),\n  api_keys = api_keys,\n  consensus_check_model = \"o1\",\n  controversy_threshold = 0.7,\n  entropy_threshold = 1.0\n)\n```\n\n#### Python Package Usage\n\n```python\n# Example 1: Specifying a consensus model\nconsensus_results = interactive_consensus_annotation(\n    marker_genes=marker_genes,\n    species=\"human\",\n    tissue=\"brain\",\n    models=[\"gpt-5.2\", \"claude-sonnet-4-5-20250929\", \"gemini-3-pro\", \"qwen3-max\"],\n    consensus_model=\"claude-sonnet-4-5-20250929\",\n    consensus_threshold=0.7,\n    entropy_threshold=1.0\n)\n\n# Example 2: Using dictionary format\nconsensus_results = interactive_consensus_annotation(\n    marker_genes=marker_genes,\n    species=\"mouse\",\n    tissue=\"liver\",\n    models=[\"gpt-5.2\", \"gemini-3-pro\", \"qwen3-max\"],\n    consensus_model={\"provider\": \"anthropic\", \"model\": \"claude-sonnet-4-5-20250929\"},\n    consensus_threshold=0.7,\n    entropy_threshold=1.0\n)\n\n# Example 3: Using Google's model for consensus\nconsensus_results = interactive_consensus_annotation(\n    marker_genes=marker_genes,\n    species=\"human\",\n    tissue=\"heart\",\n    models=[\"gpt-5.2\", \"claude-sonnet-4-5-20250929\", \"qwen3-max\"],\n    consensus_model={\"provider\": \"google\", \"model\": \"gemini-3-pro\"},\n    consensus_threshold=0.7,\n    entropy_threshold=1.0\n)\n\n# Example 4: Default behavior (uses Qwen with fallback)\nconsensus_results = interactive_consensus_annotation(\n    marker_genes=marker_genes,\n    species=\"human\",\n    tissue=\"blood\",\n    models=[\"gpt-5.2\", \"claude-sonnet-4-5-20250929\", \"gemini-3-pro\"],\n    # If not specified, defaults to qwen3-max with claude-sonnet-4-5-20250929 as fallback\n    consensus_threshold=0.7,\n    entropy_threshold=1.0\n)\n```\n\n#### Notes on Consensus Model Selection\n\n1. **Model Availability**: Ensure you have API access to your chosen consensus model. The system will use fallback models if the primary choice is unavailable.\n\n2. **Consistency**: Use the same model for all consensus checks within a project to ensure consistent evaluation criteria.\n\n3. **Default Behavior**:\n   - R: Uses the first model in the `models` list if not specified\n   - Python: Defaults to `qwen3-max` with `claude-sonnet-4-5-20250929` as fallback\n\nThe consensus check model must accurately assess semantic similarity between different cell type names (e.g., recognizing that \"T lymphocyte\" and \"T cell\" refer to the same cell type), understand biological context, and synthesize discussions from multiple models.\n\n### Advanced Features: Cluster Selection and Cache Control (v1.3.1)\n\nmLLMCelltype v1.3.1 introduces two parameters that give you fine-grained control over the annotation process:\n\n#### 1. **clusters_to_analyze** - Selective Cluster Analysis\n\nThis parameter allows you to specify exactly which clusters to analyze without manually filtering your input data:\n\n```r\n# Example: Focus on specific clusters for T cell subtyping\nconsensus_results \u003C- interactive_consensus_annotation(\n  input = pbmc_markers,\n  tissue_name = \"human PBMC - T cell subtypes\",\n  models = c(\"gpt-5.2\", \"claude-sonnet-4-5-20250929\"),\n  api_keys = api_keys,\n  clusters_to_analyze = c(0, 1, 7),  # Only analyze T cell clusters\n  controversy_threshold = 0.7\n)\n\n# Example: Re-analyze controversial clusters with different context\nconsensus_results \u003C- interactive_consensus_annotation(\n  input = pbmc_markers,\n  tissue_name = \"activated immune cells\",\n  models = c(\"gpt-5.2\", \"claude-sonnet-4-5-20250929\", \"gemini-3-pro\"),\n  api_keys = api_keys,\n  clusters_to_analyze = c(\"3\", \"5\"),  # Focus on specific clusters\n  cache_dir = \"consensus_cache\"\n)\n```\n\n**Benefits:**\n- No need to subset your data manually\n- Maintains original cluster numbering\n- Reduces API calls and costs by only analyzing relevant clusters\n- Useful for iterative refinement of specific cell populations\n\n#### 2. **force_rerun** - Bypass Cache for Fresh Analysis\n\nThis parameter forces re-analysis of controversial clusters, bypassing cached results:\n\n```r\n# Example: Initial broad analysis\ninitial_results \u003C- interactive_consensus_annotation(\n  input = markers,\n  tissue_name = \"human brain\",\n  models = c(\"gpt-5.2\", \"claude-sonnet-4-5-20250929\"),\n  api_keys = api_keys,\n  use_cache = TRUE\n)\n\n# Example: Re-analyze with specific subtype context\nsubtype_results \u003C- interactive_consensus_annotation(\n  input = markers,\n  tissue_name = \"human brain - neuronal subtypes\",\n  models = c(\"gpt-5.2\", \"claude-sonnet-4-5-20250929\"),\n  api_keys = api_keys,\n  clusters_to_analyze = c(2, 3, 5),  # Neuronal clusters\n  force_rerun = TRUE,  # Force fresh analysis despite cache\n  use_cache = TRUE     # Still benefit from cache for non-controversial clusters\n)\n```\n\n**Important Notes:**\n- `force_rerun` only affects controversial clusters requiring LLM discussion\n- Non-controversial clusters still use cache for performance\n- Useful when changing tissue context or focusing on subtypes\n- Combines well with `clusters_to_analyze` for targeted re-analysis\n\n#### Common Use Cases\n\n1. **Iterative Subtyping Workflow:**\n```r\n# Step 1: General cell type annotation\ngeneral_types \u003C- interactive_consensus_annotation(\n  input = data,\n  tissue_name = \"human PBMC\",\n  models = models,\n  api_keys = api_keys\n)\n\n# Step 2: Focus on T cells with subtype context\nt_cell_subtypes \u003C- interactive_consensus_annotation(\n  input = data,\n  tissue_name = \"human T lymphocytes\",\n  models = models,\n  api_keys = api_keys,\n  clusters_to_analyze = c(0, 1, 4, 7),  # T cell clusters from step 1\n  force_rerun = TRUE  # Fresh analysis with T cell context\n)\n\n# Step 3: Further refine CD8+ T cells\ncd8_subtypes \u003C- interactive_consensus_annotation(\n  input = data,\n  tissue_name = \"human CD8+ T cells - activation states\",\n  models = models,\n  api_keys = api_keys,\n  clusters_to_analyze = c(1, 4),  # CD8+ clusters\n  force_rerun = TRUE\n)\n```\n\n2. **Cost-Effective Re-analysis:**\n```r\n# Only re-analyze clusters that were controversial\ncontroversial \u003C- initial_results$controversial_clusters\n\nrefined_results \u003C- interactive_consensus_annotation(\n  input = data,\n  tissue_name = \"human PBMC - refined\",\n  models = c(\"gpt-5.2\", \"claude-sonnet-4-5-20250929\", \"gemini-3-pro\"),\n  api_keys = api_keys,\n  clusters_to_analyze = controversial,  # Only controversial ones\n  force_rerun = TRUE,\n  consensus_check_model = \"claude-sonnet-4-5-20250929\"\n)\n```\n\n## Visualization Examples\n\n### Cell Type Annotation Visualization\n\nBelow is an example of publication-ready visualization created with mLLMCelltype and SCpubr, showing cell type annotations alongside uncertainty metrics (Consensus Proportion and Shannon Entropy):\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fcafferychen777_mLLMCelltype_readme_0d02668f6e53.png\" alt=\"mLLMCelltype Visualization\" width=\"900\"\u002F>\n\u003C\u002Fdiv>\n\n*Figure: Left panel shows cell type annotations on UMAP projection. Middle panel displays the consensus proportion using a yellow-green-blue gradient (deeper blue indicates stronger agreement among LLMs). Right panel shows Shannon entropy using an orange-red gradient (deeper red indicates lower uncertainty, lighter orange indicates higher uncertainty).*\n\n### Marker Gene Visualization\n\nmLLMCelltype includes marker gene visualization functions that integrate with the consensus annotation workflow:\n\n```r\n# Load required libraries\nlibrary(mLLMCelltype)\nlibrary(Seurat)\nlibrary(ggplot2)\n\n# After running consensus annotation\nconsensus_results \u003C- interactive_consensus_annotation(\n  input = markers_df,\n  tissue_name = \"human PBMC\",\n  models = c(\"anthropic\u002Fclaude-sonnet-4.5\", \"openai\u002Fgpt-5.2\"),\n  api_keys = list(openrouter = \"your_api_key\")\n)\n\n# Create marker gene visualizations using Seurat\n# Add consensus annotations to Seurat object\ncluster_ids \u003C- as.character(Idents(pbmc_data))\ncell_type_annotations \u003C- consensus_results$final_annotations[cluster_ids]\n\n# Handle any missing annotations\nif (any(is.na(cell_type_annotations))) {\n  na_mask \u003C- is.na(cell_type_annotations)\n  cell_type_annotations[na_mask] \u003C- paste(\"Cluster\", cluster_ids[na_mask])\n}\n\n# Add to Seurat object\npbmc_data@meta.data$cell_type_consensus \u003C- cell_type_annotations\n\n# Create a dotplot of marker genes\nDotPlot(pbmc_data,\n        features = top_markers,\n        group.by = \"cell_type_consensus\") +\n  RotatedAxis()\n\n# Create a heatmap of marker genes\nDoHeatmap(pbmc_data,\n          features = top_markers,\n          group.by = \"cell_type_consensus\")\n```\n\n**Marker Gene Visualization Features:**\n\n- **DotPlot**: Shows both percentage of cells expressing each gene (dot size) and average expression level (color intensity)\n- **Heatmap**: Displays scaled expression values with clustering of genes and cell types\n- **Integration**: Works directly with consensus annotation results added to Seurat objects\n- **Standard Seurat Functions**: Uses familiar Seurat visualization functions for consistency\n\nFor detailed instructions and advanced customization options, see the [Visualization Guide](https:\u002F\u002Fcafferyang.com\u002FmLLMCelltype\u002Farticles\u002Fvisualization-guide.html).\n\n## Citation\n\nIf you use mLLMCelltype in your research, please cite:\n\n```bibtex\n@article{Yang2025.04.10.647852,\n  author = {Yang, Chen and Zhang, Xianyang and Chen, Jun},\n  title = {Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data},\n  elocation-id = {2025.04.10.647852},\n  year = {2025},\n  doi = {10.1101\u002F2025.04.10.647852},\n  publisher = {Cold Spring Harbor Laboratory},\n  URL = {https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002Fearly\u002F2025\u002F04\u002F17\u002F2025.04.10.647852},\n  journal = {bioRxiv}\n}\n```\n\nYou can also cite this in plain text format:\n\nYang, C., Zhang, X., & Chen, J. (2025). Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data. *bioRxiv*. [Read our full research paper on bioRxiv](https:\u002F\u002Fdoi.org\u002F10.1101\u002F2025.04.10.647852)\n\n## Contributing\n\nWe welcome contributions from the community. There are many ways you can contribute to mLLMCelltype:\n\n### Reporting Issues\n\nIf you encounter any bugs, have feature requests, or have questions about using mLLMCelltype, please [open an issue](https:\u002F\u002Fgithub.com\u002Fcafferychen777\u002FmLLMCelltype\u002Fissues) on our GitHub repository. When reporting bugs, please include:\n\n- A clear description of the problem\n- Steps to reproduce the issue\n- Expected vs. actual behavior\n- Your operating system and package version information\n- Any relevant code snippets or error messages\n\n### Pull Requests\n\nWe encourage you to contribute code improvements or new features through pull requests:\n\n1. Fork the repository\n2. Create a new branch for your feature (`git checkout -b feature\u002Famazing-feature`)\n3. Commit your changes (`git commit -m 'Add some amazing feature'`)\n4. Push to the branch (`git push origin feature\u002Famazing-feature`)\n5. Open a Pull Request\n\n### Areas for Contribution\n\nHere are some areas where contributions would be particularly valuable:\n\n- Adding support for new LLM models\n- Improving documentation and examples\n- Optimizing performance\n- Adding new visualization options\n- Extending functionality for specialized cell types or tissues\n- Translations of documentation into different languages\n\n### Code Style\n\nPlease follow the existing code style in the repository. For R code, we generally follow the [tidyverse style guide](https:\u002F\u002Fstyle.tidyverse.org\u002F). For Python code, we follow [PEP 8](https:\u002F\u002Fwww.python.org\u002Fdev\u002Fpeps\u002Fpep-0008\u002F).\n\n### Community\n\nJoin our [Discord community](https:\u002F\u002Fdiscord.gg\u002Fpb2aZdG4) for discussion and questions about mLLMCelltype and single-cell RNA-seq analysis.\n\nThank you for helping improve mLLMCelltype!\n","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fcafferychen777_mLLMCelltype_readme_10b9dbfecc53.png\" alt=\"mLLMCelltype logo\" width=\"300\"\u002F>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"README_CN.md\">中文\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fcafferychen777\u002FmLLMCelltype\u002Fstargazers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fcafferychen777\u002FmLLMCelltype?style=social\" alt=\"GitHub stars\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fcafferychen777\u002FmLLMCelltype\u002Fnetwork\u002Fmembers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fcafferychen777\u002FmLLMCelltype?style=social\" alt=\"GitHub forks\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002Fpb2aZdG4\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-Join%20Chat-7289da?logo=discord&logoColor=white\" alt=\"Discord\">\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002FCRAN.R-project.org\u002Fpackage=mLLMCelltype\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fcafferychen777_mLLMCelltype_readme_15339a69db17.png\" alt=\"CRAN version\">\u003C\u002Fa>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fcafferychen777\u002FmLLMCelltype\" alt=\"License\">\n  \u003Ca href=\"https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2025.04.10.647852v1\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FbioRxiv-2025.04.10.647852-blue\" alt=\"bioRxiv preprint\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fmllmcelltype\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fmllmcelltype\" alt=\"PyPI version\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fcafferychen777\u002FmLLMCelltype\u002Fblob\u002Fmain\u002Fnotebooks\u002FmLLMCelltype_Tutorial.ipynb\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOpen%20in-Colab-F9AB00?logo=googlecolab&logoColor=white\" alt=\"Open in Colab\">\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n# mLLMCelltype: 多大语言模型共识框架用于细胞类型注释\n\nmLLMCelltype 是一个基于多大语言模型共识的自动化单细胞RNA测序（scRNA-seq）数据细胞类型注释框架。该框架整合了包括 OpenAI GPT-5.2、Anthropic Claude-4.6\u002F4.5、Google Gemini-3、X.AI Grok-4、DeepSeek-V3、Alibaba Qwen3、Zhipu GLM-4、MiniMax、Stepfun 和 OpenRouter 在内的多个大型语言模型，通过基于共识的预测来提高注释准确性。\n\n## 摘要\n\nmLLMCelltype 是一款开源的单细胞转录组学分析工具，利用多个大型语言模型从基因表达数据中识别细胞类型。该软件采用共识方法，即多个模型对同一数据进行分析并将它们的预测结果进行综合，从而有助于减少误差并提供不确定性度量。这种方法通过整合多个模型的预测结果，相较于单一模型方法具有显著优势。mLLMCelltype 可与 Scanpy 和 Seurat 等单细胞分析平台集成，使研究人员能够将其纳入现有的工作流程中。该方法无需参考数据集即可进行注释。\n\n在我们的基准测试中（Yang 等，2025），共识方法在测试数据集上达到了高达95%的准确率。\n\n## 目录\n- [主要特性](#key-features)\n- [安装](#installation)\n- [使用示例](#usage-examples)\n- [可视化示例](#visualization-example)\n- [引用](#citation)\n- [贡献](#contributing)\n\n**Web 应用程序**：浏览器端界面可在 [mllmcelltype.com](https:\u002F\u002Fmllmcelltype.com) 上访问（无需安装）。\n\n**另请参阅**：[FlashDeconv](https:\u002F\u002Fgithub.com\u002Fcafferychen777\u002FFlashDeconv) — 用于空间转录组学（Visium、Visium HD、Stereo-seq）的细胞类型去卷积工具。\n\n## 主要特性\n\n- **多大语言模型共识**：整合多个大语言模型的预测结果，以减少单一模型的局限性和偏差\n- **模型支持**：兼容 OpenAI、Anthropic、Google 等10余家大语言模型提供商\n- **迭代讨论**：大语言模型通过多轮讨论评估证据并优化注释\n- **不确定性量化**：提供共识比例和香农熵指标，以识别不确定的注释\n- **跨模型验证**：通过多模型比较减少错误预测\n- **抗噪能力**：即使标记基因列表不完善也能保持较高的准确性\n- **层次化注释**：支持多分辨率分析，并进行一致性检查\n- **无参考依赖**：无需预训练或参考数据集即可完成注释\n- **文档记录**：完整记录推理过程，确保透明性\n- **集成性**：与 Scanpy\u002FSeurat 工作流及标记基因输出兼容\n- **可扩展性**：随着新大语言模型的出现，可随时添加支持\n\n有关更改日志和更新，请参阅 [NEWS.md](R\u002FNEWS.md)。\n\n## 安装\n\n### R 版本\n\n```r\n# 从 CRAN 安装（推荐）\ninstall.packages(\"mLLMCelltype\")\n\n# 或从 GitHub 安装开发版本\ndevtools::install_github(\"cafferychen777\u002FmLLMCelltype\", subdir = \"R\")\n```\n\n### Python 版本\n\n[![在 Colab 中打开](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1ZgmtlaORogSy0-QsaF0CHwFWOyOD26d2?usp=sharing)\n\n**快速入门**：无需任何安装，即可在 Google Colab 中试用 mLLMCelltype。点击上方徽章即可打开包含示例和分步指导的交互式笔记本。\n\n```bash\n# 从 PyPI 安装\npip install mllmcelltype\n\n# 或从 GitHub 安装（注意子目录参数）\npip install git+https:\u002F\u002Fgithub.com\u002Fcafferychen777\u002FmLLMCelltype.git#subdirectory=python\n```\n\n#### 关于依赖的重要说明\n\nmLLMCelltype 采用模块化设计，不同大语言模型提供商的库为可选依赖。根据您计划使用的模型类型，需要安装相应的包：\n\n```bash\n# 使用 OpenAI 模型（GPT-5 等）\npip install \"mllmcelltype[openai]\"\n\n# 使用 Anthropic 模型（Claude）\npip install \"mllmcelltype[anthropic]\"\n\n# 使用 Google 模型（Gemini）\npip install \"mllmcelltype[gemini]\"\n\n# 一次性安装所有可选依赖\npip install \"mllmcelltype[all]\"\n```\n\n如果您遇到类似 `ImportError: cannot import name 'genai' from 'google'` 的错误，说明您需要安装对应的提供商包。例如：\n\n```bash\n# 对于 Google Gemini 模型\npip install google-genai\n```\n\n### 支持的模型\n\n- **OpenAI**: GPT-5.2\u002FGPT-5\u002FGPT-4.1（[API密钥](https:\u002F\u002Fplatform.openai.com\u002Fsettings\u002Forganization\u002Fbilling\u002Foverview)）\n- **Anthropic**: Claude-4.6-Opus\u002FClaude-4.5-Sonnet\u002FClaude-4.5-Haiku（[API密钥](https:\u002F\u002Fconsole.anthropic.com\u002F)）\n- **Google**: Gemini-3-Pro\u002FGemini-3-Flash（[API密钥](https:\u002F\u002Fai.google.dev\u002F?authuser=2)）\n- **阿里巴巴**: Qwen3-Max（[API密钥](https:\u002F\u002Fwww.alibabacloud.com\u002Fen\u002Fproduct\u002Fmodelstudio)）\n- **DeepSeek**: DeepSeek-V3\u002FDeepSeek-R1（[API密钥](https:\u002F\u002Fplatform.deepseek.com\u002Fusage)）\n- **Minimax**: MiniMax-M2.1（[API密钥](https:\u002F\u002Fintl.minimaxi.com\u002Fuser-center\u002Fbasic-information\u002Finterface-key)）\n- **Stepfun**: Step-3（[API密钥](https:\u002F\u002Fplatform.stepfun.com\u002Faccount-info)）\n- **智谱**: GLM-4.7\u002FGLM-4-Plus（[API密钥](https:\u002F\u002Fbigmodel.cn\u002F)）\n- **X.AI**: Grok-4\u002FGrok-3（[API密钥](https:\u002F\u002Faccounts.x.ai\u002F)）\n- **OpenRouter**: 通过单一API访问多种模型（[API密钥](https:\u002F\u002Fopenrouter.ai\u002Fkeys)）\n  - 支持来自OpenAI、Anthropic、Meta、Google、Mistral等公司的模型\n  - 格式：'provider\u002Fmodel-name'（例如，'openai\u002Fgpt-5.2'、'anthropic\u002Fclaude-opus-4.5'）\n  - 提供带有`:free`后缀的免费模型（例如，'deepseek\u002Fdeepseek-r1:free'、'meta-llama\u002Fllama-4-maverick:free'）\n  - **注意**：免费套餐限制为每天50次请求（充值10美元以上可提升至每天1000次），每分钟20次请求。部分模型可能不可用。\n\n## 使用示例\n\n### Python\n\n```python\n# 使用mLLMCelltype结合Scanpy进行单细胞RNA测序细胞类型注释的示例\nimport scanpy as sc\nimport pandas as pd\nfrom mllmcelltype import annotate_clusters, interactive_consensus_annotation\nimport os\n\n# 注意：导入mllmcelltype时会自动配置日志记录\n# 如有需要，可使用logging模块自定义日志设置\n\n# 加载您的AnnData格式的单细胞RNA测序数据集\nadata = sc.read_h5ad('your_data.h5ad')  # 替换为您自己的scRNA-seq数据集路径\n\n# 如果尚未完成，执行Leiden聚类以识别细胞群体\nif 'leiden' not in adata.obs.columns:\n    print(\"正在计算Leiden聚类以识别细胞群体...\")\n    # 预处理单细胞数据：归一化计数并进行对数转换，用于基因表达分析\n    if 'log1p' not in adata.uns:\n        sc.pp.normalize_total(adata, target_sum=1e4)  # 归一化至每细胞10,000个计数\n        sc.pp.log1p(adata)  # 对归一化后的计数取自然对数\n\n    # 降维：计算scRNA-seq数据的主成分分析\n    if 'X_pca' not in adata.obsm:\n        sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)  # 选择信息量丰富的基因\n        sc.pp.pca(adata, use_highly_variable=True)  # 计算主成分\n\n    # 细胞聚类：构建KNN图并执行Leiden社区检测\n    sc.pp.neighbors(adata, n_neighbors=10, n_pcs=30)  # 构建用于聚类的KNN图\n    sc.tl.leiden(adata, resolution=0.8)  # 使用Leiden算法识别细胞群体\n    print(f\"Leiden聚类已完成，共识别出{len(adata.obs['leiden'].cat.categories)}个不同的细胞群体\")\n\n# 利用差异表达分析鉴定每个细胞簇的标记基因\nsc.tl.rank_genes_groups(adata, 'leiden', method='wilcoxon')  # 使用Wilcoxon秩和检验检测标记基因\n\n# 提取每个细胞簇的 top 10 标记基因，用于细胞类型注释\nmarker_genes = {}\nfor i in range(len(adata.obs['leiden'].cat.categories)):\n    # 选择每个簇中差异表达最显著的前10个基因作为标记\n    genes = [adata.uns['rank_genes_groups']['names'][str(i)][j] for j in range(10)]\n    marker_genes[str(i)] = genes\n\n# 重要提示：mLLMCelltype要求使用基因符号（如KCNJ8、PDGFRA），而非Ensembl ID（如ENSG00000176771）\n# 如果您的AnnData对象使用的是Ensembl ID，请将其转换为基因符号以便准确注释：\n# 示例转换代码：\n# if 'Gene' in adata.var.columns:  # 检查元数据中是否已有基因符号\n#     gene_name_dict = dict(zip(adata.var_names, adata.var['Gene']))\n#     marker_genes = {cluster: [gene_name_dict.get(gene_id, gene_id) for gene_id in genes]\n#                    for cluster, genes in marker_genes.items()}\n\n# 重要提示：mLLMCelltype要求使用数值型簇ID\n# “cluster”列必须包含数值或可转换为数值的值。\n# 非数值型簇ID（如“cluster_1”、“T_cells”、“7_0”）可能导致错误或异常行为。\n# 如果您的数据包含非数值型簇ID，需创建原始ID与数值ID之间的映射：\n# 示例标准化代码：\n# original_ids = list(marker_genes.keys())\n# id_mapping = {original: idx for idx, original in enumerate(original_ids)}\n# marker_genes = {str(id_mapping[cluster]): genes for cluster, genes in marker_genes.items()}\n\n# 配置用于共识注释的大型语言模型API密钥\n# 至少需要一个API密钥才能进行多LLM共识注释\nos.environ[\"OPENAI_API_KEY\"] = \"your-openai-api-key\"      # 用于GPT-5.2\u002F5\u002F4.1模型\nos.environ[\"ANTHROPIC_API_KEY\"] = \"your-anthropic-api-key\"  # 用于Claude-4.6\u002F4.5模型\nos.environ[\"GEMINI_API_KEY\"] = \"your-gemini-api-key\"      # 用于Google Gemini-3模型\nos.environ[\"QWEN_API_KEY\"] = \"your-qwen-api-key\"        # 用于阿里巴巴Qwen3模型\n# 其他可选的LLM提供商，以增强共识多样性：\n# os.environ[\"DEEPSEEK_API_KEY\"] = \"your-deepseek-api-key\"   # 用于DeepSeek-V3模型\n# os.environ[\"ZHIPU_API_KEY\"] = \"your-zhipu-api-key\"       # 用于智谱GLM-4模型\n# os.environ[\"STEPFUN_API_KEY\"] = \"your-stepfun-api-key\"    # 用于Stepfun模型\n# os.environ[\"MINIMAX_API_KEY\"] = \"your-minimax-api-key\"    # 用于MiniMax模型\n# os.environ[\"OPENROUTER_API_KEY\"] = \"your-openrouter-api-key\"  # 用于通过OpenRouter访问多种模型\n\n# 执行多LLM共识细胞类型注释，并进行迭代讨论\nconsensus_results = interactive_consensus_annotation(\n    marker_genes=marker_genes,  # 每个簇的标记基因字典\n    species=\"human\",            # 指定生物种类，以便进行适当的细胞类型注释\n    tissue=\"blood\",            # 指定组织背景，以提高注释准确性\n    models=[\"gpt-5.2\", \"claude-sonnet-4-5-20250929\", \"gemini-3-pro\", \"qwen3-max\"],  # 多个LLM用于共识\n    consensus_threshold=1,     # 达成共识所需的最低比例\n    max_discussion_rounds=3    # 模型之间进行细化讨论的轮次数\n)\n\n# 或者，使用OpenRouter通过单一API访问多种模型\n# 这对于访问带有`:free`后缀的免费模型尤其有用\nos.environ[\"OPENROUTER_API_KEY\"] = \"your-openrouter-api-key\"\n\n# 使用免费的 OpenRouter 模型示例（无需积分）\nfree_models_results = interactive_consensus_annotation(\n    marker_genes=marker_genes,\n    species=\"human\",\n    tissue=\"blood\",\n    models=[\n        {\"provider\": \"openrouter\", \"model\": \"meta-llama\u002Fllama-4-maverick:free\"},      # Meta Llama 4 Maverick（免费）\n        {\"provider\": \"openrouter\", \"model\": \"venice\u002Funcensored:free\"},                # Venice Uncensored（免费）\n        {\"provider\": \"openrouter\", \"model\": \"deepseek\u002Fdeepseek-r1:free\"},             # DeepSeek R1（免费，高级推理能力）\n        {\"provider\": \"openrouter\", \"model\": \"meta-llama\u002Fllama-3.3-70b-instruct:free\"} # Meta Llama 3.3 70B（免费）\n    ],\n    consensus_threshold=0.7,\n    max_discussion_rounds=2\n)\n\n# 从多模型协商中获取最终的共识细胞类型注释\nfinal_annotations = consensus_results[\"consensus\"]\n\n# 将共识细胞类型注释整合到原始 AnnData 对象中\nadata.obs['consensus_cell_type'] = adata.obs['leiden'].astype(str).map(final_annotations)\n\n# 添加不确定性量化指标以评估注释置信度\nadata.obs['consensus_proportion'] = adata.obs['leiden'].astype(str).map(consensus_results[\"consensus_proportion\"])  # 一致性水平\nadata.obs['entropy'] = adata.obs['leiden'].astype(str).map(consensus_results[\"entropy\"])  # 注释不确定性\n\n# 准备可视化：如果尚未计算 UMAP 嵌入，则进行计算\n# UMAP 提供细胞群体的二维表示，便于可视化\nif 'X_umap' not in adata.obsm:\n    print(\"正在计算 UMAP 坐标...\")\n    # 确保先计算好近邻信息\n    if 'neighbors' not in adata.uns:\n        sc.pp.neighbors(adata, n_neighbors=10, n_pcs=30)\n    sc.tl.umap(adata)\n    print(\"UMAP 坐标已计算完成\")\n\n# 使用增强的美学效果可视化结果\n# 基本可视化\nsc.pl.umap(adata, color='consensus_cell_type', legend_loc='right', frameon=True, title='mLLMCelltype 共识注释')\n\n# 更加定制化的可视化\nimport matplotlib.pyplot as plt\n\n# 设置图形大小和样式\nplt.rcParams['figure.figsize'] = (10, 8)\nplt.rcParams['font.size'] = 12\n\n# 创建更适合发表的 UMAP 图\nfig, ax = plt.subplots(1, 1, figsize=(12, 10))\nsc.pl.umap(adata, color='consensus_cell_type', legend_loc='on data',\n         frameon=True, title='mLLMCelltype 共识注释',\n         palette='tab20', size=50, legend_fontsize=12,\n         legend_fontoutline=2, ax=ax)\n\n# 可视化不确定性指标\nfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7))\nsc.pl.umap(adata, color='consensus_proportion', ax=ax1, title='共识比例',\n         cmap='viridis', vmin=0, vmax=1, size=30)\nsc.pl.umap(adata, color='entropy', ax=ax2, title='注释不确定性（香农熵）',\n         cmap='magma', vmin=0, size=30)\nplt.tight_layout()\n```\n\n### 使用单个免费 OpenRouter 模型\n\n对于希望采用更简单方法、仅使用一个模型的用户，可以通过 OpenRouter 使用 DeepSeek R1 免费模型，而无需 API 积分：\n\n```python\nimport os\nfrom mllmcelltype import annotate_clusters\n\n# 注意：日志记录已自动配置\n\n# 设置您的 OpenRouter API 密钥\nos.environ[\"OPENROUTER_API_KEY\"] = \"your-openrouter-api-key\"\n\n# 定义每个聚类的标记基因\nmarker_genes = {\n    \"0\": [\"CD3D\", \"CD3E\", \"CD3G\", \"CD2\", \"IL7R\", \"TCF7\"],           # T 细胞\n    \"1\": [\"CD19\", \"MS4A1\", \"CD79A\", \"CD79B\", \"HLA-DRA\", \"CD74\"],   # B 细胞\n    \"2\": [\"CD14\", \"LYZ\", \"CSF1R\", \"ITGAM\", \"CD68\", \"FCGR3A\"]      # 单核细胞\n}\n\n# 使用 DeepSeek R1 免费模型进行注释\nannotations = annotate_clusters(\n    marker_genes=marker_genes,\n    species='human',\n    tissue='外周血',\n    provider='openrouter',\n    model='deepseek\u002Fdeepseek-r1:free'  # 免费且具备高级推理能力的模型\n)\n\n# 打印注释结果\nfor cluster, annotation in annotations.items():\n    print(f\"Cluster {cluster}: {annotation}\")\n```\n\n此方法使用免费模型，无需任何 API 积分。\n\n#### 从 AnnData 对象中提取标记基因\n\n如果您正在使用 Scanpy 和 AnnData 对象，可以直接从 `rank_genes_groups` 结果中轻松提取标记基因：\n\n```python\nimport os\nimport scanpy as sc\nfrom mllmcelltype import annotate_clusters\n\n# 注意：日志记录已自动配置\n\n# 设置您的 OpenRouter API 密钥\nos.environ[\"OPENROUTER_API_KEY\"] = \"your-openrouter-api-key\"\n\n# 加载并预处理数据\nadata = sc.read_h5ad('your_data.h5ad')\n\n# 如果尚未完成预处理和聚类，可执行以下步骤：\n# sc.pp.normalize_total(adata, target_sum=1e4)\n# sc.pp.log1p(adata)\n# sc.pp.highly_variable_genes(adata)\n# sc.pp.pca(adata)\n# sc.pp.neighbors(adata)\n# sc.tl.leiden(adata)\n\n# 为每个聚类寻找标记基因\nsc.tl.rank_genes_groups(adata, 'leiden', method='wilcoxon')\n\n# 提取每个聚类的前 10 个标记基因\nmarker_genes = {\n    cluster: adata.uns['rank_genes_groups']['names'][cluster][:10].tolist()\n    for cluster in adata.obs['leiden'].cat.categories\n}\n\n# 使用 DeepSeek R1 免费模型进行注释\nannotations = annotate_clusters(\n    marker_genes=marker_genes,\n    species='human',\n    tissue='外周血',  \u002F\u002F 根据您的组织类型调整\n    provider='openrouter',\n    model='deepseek\u002Fdeepseek-r1:free'  \u002F\u002F 免费模型\n)\n\n# 将注释添加到 AnnData 对象中\nadata.obs['cell_type'] = adata.obs['leiden'].astype(str).map(annotations)\n\n# 可视化结果\nsc.pl.umap(adata, color='cell_type', legend_loc='on data',\n           frameon=True, title='由 DeepSeek R1 注释的细胞类型')\n```\n\n这种方法会自动从 `rank_genes_groups` 结果中提取每个聚类的前若干个差异表达基因，从而轻松地将 mLLMCelltype 集成到您的 Scanpy 工作流中。\n\n### R\n\n> **注意**：有关更详细的 R 教程和文档，请访问 [mLLMCelltype 文档网站](https:\u002F\u002Fcafferyang.com\u002FmLLMCelltype\u002F)。\n\n#### 使用 Seurat 对象\n\n```r\n# 加载所需包\nlibrary(mLLMCelltype)\nlibrary(Seurat)\nlibrary(dplyr)\nlibrary(ggplot2)\nlibrary(cowplot) # 用于 plot_grid\n\n# 加载您预处理好的 Seurat 对象\npbmc \u003C- readRDS(\"your_seurat_object.rds\")\n\n# 如果从原始数据开始，需执行预处理步骤\n# pbmc \u003C- NormalizeData(pbmc)\n# pbmc \u003C- FindVariableFeatures(pbmc, selection.method = \"vst\", nfeatures = 2000)\n# pbmc \u003C- ScaleData(pbmc)\n# pbmc \u003C- RunPCA(pbmc)\n# pbmc \u003C- FindNeighbors(pbmc, dims = 1:10)\n# pbmc \u003C- FindClusters(pbmc, resolution = 0.5)\n# pbmc \u003C- RunUMAP(pbmc, dims = 1:10)\n\n# 为每个聚类寻找标记基因\npbmc_markers \u003C- FindAllMarkers(pbmc,\n                            only.pos = TRUE,\n                            min.pct = 0.25,\n                            logfc.threshold = 0.25)\n\n# 设置缓存目录以加快处理速度\ncache_dir \u003C- \".\u002Fmllmcelltype_cache\"\ndir.create(cache_dir, showWarnings = FALSE, recursive = TRUE)\n\n# 选择任意支持提供商的模型\n# 支持的模型包括：\n# - OpenAI: 'gpt-5.2', 'gpt-5', 'gpt-4.1', 'o3-pro', 'o3', 'o4-mini', 'o1', 'o1-pro'\n\n# - Anthropic: 'claude-opus-4-6-20260205', 'claude-sonnet-4-5-20250929', 'claude-haiku-4-5-20251001'\n# - DeepSeek: 'deepseek-chat', 'deepseek-reasoner'\n# - Google: 'gemini-3-pro', 'gemini-3-flash', 'gemini-2.5-pro', 'gemini-2.0-flash'\n# - Qwen: 'qwen3-max', 'qwen-max-2025-01-25'\n# - Stepfun: 'step-3', 'step-2-16k', 'step-2-mini'\n# - Zhipu: 'glm-4.7', 'glm-4-plus'\n# - MiniMax: 'minimax-m2.1', 'minimax-m2'\n# - Grok: 'grok-4', 'grok-4.1', 'grok-4-heavy', 'grok-3', 'grok-3-fast', 'grok-3-mini'\n# - OpenRouter: 通过单一 API 访问多家提供商的模型。格式：'provider\u002Fmodel-name'\n#   - OpenAI 模型：'openai\u002Fgpt-5.2', 'openai\u002Fgpt-5', 'openai\u002Fo3-pro', 'openai\u002Fo4-mini'\n#   - Anthropic 模型：'anthropic\u002Fclaude-opus-4.5', 'anthropic\u002Fclaude-sonnet-4.5', 'anthropic\u002Fclaude-haiku-4.5'\n#   - Meta 模型：'meta-llama\u002Fllama-4-maverick', 'meta-llama\u002Fllama-4-scout', 'meta-llama\u002Fllama-3.3-70b-instruct'\n#   - Google 模型：'google\u002Fgemini-3-pro', 'google\u002Fgemini-3-flash', 'google\u002Fgemini-2.5-pro'\n#   - Mistral 模型：'mistralai\u002Fmistral-large', 'mistralai\u002Fmagistral-medium-2506'\n#   - 其他模型：'deepseek\u002Fdeepseek-r1', 'deepseek\u002Fdeepseek-chat-v3.1', 'microsoft\u002Fmai-ds-r1'\n\n# 使用多个 LLM 模型运行 LLMCelltype 注释\nconsensus_results \u003C- interactive_consensus_annotation(\n  input = pbmc_markers,\n  tissue_name = \"human PBMC\",  # 提供组织背景\n  models = c(\n    \"claude-sonnet-4-5-20250929\",    # Anthropic\n    \"gpt-5.2\",                  # OpenAI\n    \"gemini-3-pro\",           # Google\n    \"qwen3-max\"               # Alibaba\n  ),\n  api_keys = list(\n    anthropic = \"your-anthropic-key\",\n    openai = \"your-openai-key\",\n    gemini = \"your-google-key\",\n    qwen = \"your-qwen-key\"\n  ),\n  top_gene_count = 10,\n  controversy_threshold = 1.0,\n  entropy_threshold = 1.0,\n  cache_dir = cache_dir\n)\n\n# 打印结果结构以了解数据\nprint(\"consensus_results 中可用字段：\")\nprint(names(consensus_results))\n\n# 将注释添加到 Seurat 对象\n# 从 consensus_results$final_annotations 获取细胞类型注释\ncluster_to_celltype_map \u003C- consensus_results$final_annotations\n\n# 创建新的细胞类型标识列\ncell_types \u003C- as.character(Idents(pbmc))\nfor (cluster_id in names(cluster_to_celltype_map)) {\n  cell_types[cell_types == cluster_id] \u003C- cluster_to_celltype_map[[cluster_id]]\n}\n\n# 将细胞类型注释添加到 Seurat 对象\npbmc$cell_type \u003C- cell_types\n\n# 添加不确定性指标\n# 提取包含指标的详细共识结果\nconsensus_details \u003C- consensus_results$initial_results$consensus_results\n\n# 为每个簇创建一个包含指标的数据框\nuncertainty_metrics \u003C- data.frame(\n  cluster_id = names(consensus_details),\n  consensus_proportion = sapply(consensus_details, function(res) res$consensus_proportion),\n  entropy = sapply(consensus_details, function(res) res$entropy)\n)\n\n# 为每个细胞添加不确定性指标\n# 注意：seurat_clusters 是 FindClusters() 函数自动创建的元数据列\n# 它包含在聚类过程中分配给每个细胞的簇 ID\n# 我们在这里使用它将簇级别的指标（consensus_proportion 和 entropy）映射到单个细胞上\n\n# 如果没有 seurat_clusters 列（例如，您使用了不同的聚类方法），\n# 您可以使用活动身份（Idents）或其他元数据中的簇分配：\n# 选项 1：使用活动身份\n# current_clusters \u003C- as.character(Idents(pbmc))\n# 选项 2：使用包含簇 ID 的其他元数据列\n# current_clusters \u003C- pbmc$your_cluster_column\n\n# 对于本示例，我们使用标准的 seurat_clusters 列：\ncurrent_clusters \u003C- pbmc$seurat_clusters  # 获取每个细胞的簇 ID\n\n# 将每个细胞的簇 ID 与 uncertainty_metrics 中对应的指标匹配\npbmc$consensus_proportion \u003C- uncertainty_metrics$consensus_proportion[match(current_clusters, uncertainty_metrics$cluster_id)]\npbmc$entropy \u003C- uncertainty_metrics$entropy[match(current_clusters, uncertainty_metrics$cluster_id)]\n\n# 保存结果以备将来使用\nsaveRDS(consensus_results, \"pbmc_mLLMCelltype_results.rds\")\nsaveRDS(pbmc, \"pbmc_annotated.rds\")\n\n# 使用 SCpubr 可视化结果，生成适合发表的图表\nif (!requireNamespace(\"SCpubr\", quietly = TRUE)) {\n  remotes::install_github(\"enblacar\u002FSCpubr\")\n}\nlibrary(SCpubr)\nlibrary(viridis)  # 用于颜色方案\n\n# 基本 UMAP 可视化，采用默认设置\npdf(\"pbmc_basic_annotations.pdf\", width=8, height=6)\nSCpubr::do_DimPlot(sample = pbmc,\n                  group.by = \"cell_type\",\n                  label = TRUE,\n                  legend.position = \"right\") +\n  ggtitle(\"mLLMCelltype Consensus Annotations\")\ndev.off()\n\n# 更加定制化的可视化，增强样式效果\npdf(\"pbmc_custom_annotations.pdf\", width=8, height=6)\nSCpubr::do_DimPlot(sample = pbmc,\n                  group.by = \"cell_type\",\n                  label = TRUE,\n                  label.box = TRUE,\n                  legend.position = \"right\",\n                  pt.size = 1.0,\n                  border.size = 1,\n                  font.size = 12) +\n  ggtitle(\"mLLMCelltype Consensus Annotations\") +\n  theme(plot.title = element_text(hjust = 0.5))\ndev.off()\n\n# 使用增强版 SCpubr 图表可视化不确定性指标\n# 获取细胞类型并创建命名颜色 palette\ncell_types \u003C- unique(pbmc$cell_type)\ncolor_palette \u003C- viridis::viridis(length(cell_types))\nnames(color_palette) \u003C- cell_types\n\n# 使用 SCpubr 进行细胞类型标注\np1 \u003C- SCpubr::do_DimPlot(sample = pbmc,\n                  group.by = \"cell_type\",\n                  label = TRUE,\n                  legend.position = \"bottom\",  # 将图例放在底部\n                  pt.size = 1.0,\n                  label.size = 4,  # 较小的标签字体大小\n                  label.box = TRUE,  # 为标签添加背景框以提高可读性\n                  repel = TRUE,  # 使标签相互排斥以避免重叠\n                  colors.use = color_palette,\n                  plot.title = \"Cell Type\") +\n      theme(plot.title = element_text(hjust = 0.5, margin = margin(b = 15, t = 10)),\n            legend.text = element_text(size = 8),\n            legend.key.size = unit(0.3, \"cm\"),\n            plot.margin = unit(c(0.8, 0.8, 0.8, 0.8), \"cm\"))\n\n# 使用SCpubr绘制共识比例特征图\np2 \u003C- SCpubr::do_FeaturePlot(sample = pbmc,\n                       features = \"consensus_proportion\",\n                       order = TRUE,\n                       pt.size = 1.0,\n                       enforce_symmetry = FALSE,\n                       legend.title = \"Consensus\",\n                       plot.title = \"Consensus Proportion\",\n                       sequential.palette = \"YlGnBu\",  # 黄-绿-蓝渐变，遵循Nature Methods标准\n                       sequential.direction = 1,  # 由浅到深的方向\n                       min.cutoff = min(pbmc$consensus_proportion),  # 设置最小值\n                       max.cutoff = max(pbmc$consensus_proportion),  # 设置最大值\n                       na.value = \"lightgrey\") +  # 缺失值的颜色\n      theme(plot.title = element_text(hjust = 0.5, margin = margin(b = 15, t = 10)),\n            plot.margin = unit(c(0.8, 0.8, 0.8, 0.8), \"cm\"))\n\n# 使用SCpubr绘制香农熵特征图\np3 \u003C- SCpubr::do_FeaturePlot(sample = pbmc,\n                       features = \"entropy\",\n                       order = TRUE,\n                       pt.size = 1.0,\n                       enforce_symmetry = FALSE,\n                       legend.title = \"Entropy\",\n                       plot.title = \"Shannon Entropy\",\n                       sequential.palette = \"OrRd\",  # 橙-红渐变，遵循Nature Methods标准\n                       sequential.direction = -1,  # 由深到浅的方向（反转）\n                       min.cutoff = min(pbmc$entropy),  # 设置最小值\n                       max.cutoff = max(pbmc$entropy),  # 设置最大值\n                       na.value = \"lightgrey\") +  # 缺失值的颜色\n      theme(plot.title = element_text(hjust = 0.5, margin = margin(b = 15, t = 10)),\n            plot.margin = unit(c(0.8, 0.8, 0.8, 0.8), \"cm\"))\n\n# 将图表以相等宽度组合在一起\npdf(\"pbmc_uncertainty_metrics.pdf\", width=18, height=7)\ncombined_plot \u003C- cowplot::plot_grid(p1, p2, p3, ncol = 3, rel_widths = c(1.2, 1.2, 1.2))\nprint(combined_plot)\ndev.off()\n```\n\n#### 使用CSV输入\n\n您也可以直接使用CSV文件与mLLMCelltype配合使用，而无需Seurat，这在您已经拥有CSV格式的标记基因时非常有用：\n\n```r\n# 安装最新版本的mLLMCelltype\ndevtools::install_github(\"cafferychen777\u002FmLLMCelltype\", subdir = \"R\", force = TRUE)\n\n# 加载必要的包\nlibrary(mLLMCelltype)\n\n# 配置统一日志记录（可选——未指定时使用默认设置）\nconfigure_logger(level = \"INFO\", console_output = TRUE, json_format = TRUE)\n\n# 创建缓存目录\ncache_dir \u003C- \"path\u002Fto\u002Fyour\u002Fcache\"\ndir.create(cache_dir, showWarnings = FALSE, recursive = TRUE)\n\n# 读取CSV文件内容\nmarkers_file \u003C- \"path\u002Fto\u002Fyour\u002Fmarkers.csv\"\nfile_content \u003C- readLines(markers_file)\n\n# 跳过标题行\ndata_lines \u003C- file_content[-1]\n\n# 将数据转换为列表格式，以数字索引作为键\nmarker_genes_list \u003C- list()\ncluster_names \u003C- c()\n\n# 首先收集所有簇名\nfor(line in data_lines) {\n  parts \u003C- strsplit(line, \",\", fixed = TRUE)[[1]]\n  cluster_names \u003C- c(cluster_names, parts[1])\n}\n\n# 然后创建带有数字索引的marker_genes_list\nfor(i in 1:length(data_lines)) {\n  line \u003C- data_lines[i]\n  parts \u003C- strsplit(line, \",\", fixed = TRUE)[[1]]\n\n  # 第一部分是簇名\n  cluster_name \u003C- parts[1]\n\n  # 使用原始簇ID作为键（保留输入ID不变）\n  cluster_id \u003C- as.character(cluster_name)\n\n  # 剩余部分是基因\n  genes \u003C- parts[-1]\n\n  # 过滤掉NA和空字符串\n  genes \u003C- genes[!is.na(genes) & genes != \"\"]\n\n  # 添加到marker_genes_list\n  marker_genes_list[[cluster_id]] \u003C- list(genes = genes)\n}\n\n# 设置API密钥\napi_keys \u003C- list(\n  gemini = \"YOUR_GEMINI_API_KEY\",\n  qwen = \"YOUR_QWEN_API_KEY\",\n  grok = \"YOUR_GROK_API_KEY\",\n  openai = \"YOUR_OPENAI_API_KEY\",\n  anthropic = \"YOUR_ANTHROPIC_API_KEY\"\n)\n\n# 使用付费模型进行共识注释\nconsensus_results \u003C-\n  interactive_consensus_annotation(\n    input = marker_genes_list,\n    tissue_name = \"your tissue type\", # 例如，“human heart”\n    models = c(\"gemini-3-pro\",\n              \"gemini-3-flash\",\n              \"qwen3-max\",\n              \"grok-4\",\n              \"claude-sonnet-4-5-20250929\",\n              \"gpt-5.2\"),\n    api_keys = api_keys,\n    controversy_threshold = 0.6,\n    entropy_threshold = 1.0,\n    max_discussion_rounds = 3,\n    cache_dir = cache_dir\n  )\n\n# 或者，使用免费的OpenRouter模型（无需积分）\n# 将OpenRouter API密钥添加到api_keys列表中\napi_keys$openrouter \u003C- \"your-openrouter-api-key\"\n\n# 使用免费模型进行共识注释\nfree_consensus_results \u003C-\n  interactive_consensus_annotation(\n    input = marker_genes_list,\n    tissue_name = \"your tissue type\", # 例如，“human heart”\n    models = c(\n      \"meta-llama\u002Fllama-4-maverick:free\",      # Meta Llama 4 Maverick（免费）\n      \"venice\u002Funcensored:free\",                # Venice Uncensored（免费）\n      \"deepseek\u002Fdeepseek-r1:free\",             # DeepSeek R1（免费，高级推理能力）\n      \"meta-llama\u002Fllama-3.3-70b-instruct:free\" # Meta Llama 3.3 70B（免费）\n    ),\n    api_keys = api_keys,\n    consensus_check_model = \"deepseek\u002Fdeepseek-r1:free\",  # 用于共识检查的免费模型\n    controversy_threshold = 0.6,\n    entropy_threshold = 1.0,\n    max_discussion_rounds = 2,\n    cache_dir = cache_dir\n  )\n\n# 保存结果\nsaveRDS(consensus_results, \"your_results.rds\")\n\n# 打印结果摘要\ncat(\"\\nResults summary:\\n\")\ncat(\"Available fields:\", paste(names(consensus_results), collapse=\", \"), \"\\n\\n\")\n\n# 打印最终注释\ncat(\"Final cell type annotations:\\n\")\nfor(cluster in names(consensus_results$final_annotations)) {\n  cat(sprintf(\"%s: %s\\n\", cluster, consensus_results$final_annotations[[cluster]]))\n}\n```\n\n**关于CSV格式的注意事项**：\n- CSV文件的第一列应包含用作索引的值（这些可以是簇名称、数字如0,1,2,3或1,2,3,4等）。\n- 第一列中的值仅用于参考，并不会传递给LLM。\n- 后续列应包含每个簇的标记基因。\n- 包含猫心脏组织示例的CSV文件位于软件包的`inst\u002Fextdata\u002FCat_Heart_markers.csv`中。\n\nCSV结构示例：\n```\ncluster,gene\n0,Negr1,Cask,Tshz2,Fstl1,Dse,Celf2,Hmcn2,Setbp1,Cblb\n1,Palld,Grb14,Mybpc3,Ensfcag00000044939,Dcun1d2,Acacb,Slco1c1,Ppp1r3c,Sema3c,Ppp1r14c\n2,Adgrf5,Tbx1,Slco2b1,Pi15,Adam23,Bmx,Pde8b,Pkhd1l1,Dtx1,Ensfcag00000051556\n3,Clec2d,Trat1,Rasgrp1,Card11,Cytip,Sytl3,Tmem156,Bcl11b,Lcp1,Lcp2\n```\n\n您可以在R脚本中通过以下方式访问示例数据：\n```r\nsystem.file(\"extdata\", \"Cat_Heart_markers.csv\", package = \"mLLMCelltype\")\n```\n\n### 使用单个LLM模型\n\n如果您只想使用单个LLM模型而不是共识方法，可以使用`annotate_cell_types()`函数。这在您只拥有一份API密钥或更倾向于特定模型时非常有用：\n\n```r\n\n# 加载所需包\nlibrary(mLLMCelltype)\nlibrary(Seurat)\n\n# 加载预处理好的Seurat对象\npbmc \u003C- readRDS(\"your_seurat_object.rds\")\n\n# 为每个簇寻找标记基因\npbmc_markers \u003C- FindAllMarkers(pbmc,\n                            only.pos = TRUE,\n                            min.pct = 0.25,\n                            logfc.threshold = 0.25)\n\n# 从任何支持的提供商中选择一个模型\n# 支持的模型包括：\n# - OpenAI: 'gpt-5.2', 'gpt-5', 'gpt-4.1', 'o3-pro', 'o3', 'o4-mini', 'o1', 'o1-pro'\n# - Anthropic: 'claude-opus-4-6-20260205', 'claude-sonnet-4-5-20250929', 'claude-haiku-4-5-20251001'\n# - DeepSeek: 'deepseek-chat', 'deepseek-reasoner'\n# - Google: 'gemini-3-pro', 'gemini-3-flash', 'gemini-2.5-pro', 'gemini-2.0-flash'\n# - Qwen: 'qwen3-max', 'qwen-max-2025-01-25'\n# - Stepfun: 'step-3', 'step-2-16k', 'step-2-mini'\n# - Zhipu: 'glm-4.7', 'glm-4-plus'\n# - MiniMax: 'minimax-m2.1', 'minimax-m2'\n# - Grok: 'grok-4', 'grok-4.1', 'grok-4-heavy', 'grok-3', 'grok-3-fast', 'grok-3-mini'\n# - OpenRouter: 通过单一API访问多个提供商的模型。格式为：'provider\u002Fmodel-name'\n#   - OpenAI模型：'openai\u002Fgpt-5.2', 'openai\u002Fgpt-5', 'openai\u002Fo3-pro', 'openai\u002Fo4-mini'\n#   - Anthropic模型：'anthropic\u002Fclaude-opus-4.5', 'anthropic\u002Fclaude-sonnet-4.5', 'anthropic\u002Fclaude-haiku-4.5'\n#   - Meta模型：'meta-llama\u002Fllama-4-maverick', 'meta-llama\u002Fllama-4-scout', 'meta-llama\u002Fllama-3.3-70b-instruct'\n#   - Google模型：'google\u002Fgemini-3-pro', 'google\u002Fgemini-3-flash', 'google\u002Fgemini-2.5-pro'\n#   - Mistral模型：'mistralai\u002Fmistral-large', 'mistralai\u002Fmagistral-medium-2506'\n#   - 其他模型：'deepseek\u002Fdeepseek-r1', 'deepseek\u002Fdeepseek-chat-v3.1', 'microsoft\u002Fmai-ds-r1'\n\n# 使用单个LLM模型运行细胞类型注释\nsingle_model_results \u003C- annotate_cell_types(\n  input = pbmc_markers,\n  tissue_name = \"human PBMC\",  # 提供组织背景\n  model = \"claude-sonnet-4-5-20250929\",  # 指定单个模型（Claude Sonnet 4.5）\n  api_key = \"your-anthropic-key\",  # 直接提供API密钥\n  top_gene_count = 10\n)\n\n# 使用免费的OpenRouter模型\nfree_model_results \u003C- annotate_cell_types(\n  input = pbmc_markers,\n  tissue_name = \"human PBMC\",\n  model = \"meta-llama\u002Fllama-4-maverick:free\",  # 带有:free后缀的免费模型\n  api_key = \"your-openrouter-key\",\n  top_gene_count = 10\n)\n\n# 打印结果\nprint(single_model_results)\n\n# 将注释添加到Seurat对象\n# single_model_results是一个字符向量，每个簇对应一个注释\npbmc$cell_type \u003C- plyr::mapvalues(\n  x = as.character(Idents(pbmc)),\n  from = names(single_model_results),\n  to = single_model_results\n)\n\n# 可视化结果\nDimPlot(pbmc, group.by = \"cell_type\", label = TRUE) +\n  ggtitle(\"由单个LLM模型注释的细胞类型\")\n```\n\n#### 比较不同模型\n\n你还可以通过多次运行`annotate_cell_types()`并使用不同的模型来比较注释结果：\n\n```r\n# 定义要测试的模型\nmodels_to_test \u003C- c(\n  \"claude-sonnet-4-5-20250929\",     # Anthropic\n  \"gpt-5.2\",                    # OpenAI\n  \"gemini-3-pro\",              # Google\n  \"qwen3-max\"                  # Alibaba\n)\n\n# 不同提供商的API密钥\napi_keys \u003C- list(\n  anthropic = \"your-anthropic-key\",\n  openai = \"your-openai-key\",\n  gemini = \"your-gemini-key\",\n  qwen = \"your-qwen-key\"\n)\n\n# 测试每个模型并存储结果\nresults \u003C- list()\nfor (model in models_to_test) {\n  provider \u003C- get_provider(model)\n  api_key \u003C- api_keys[[provider]]\n\n  # 运行注释\n  results[[model]] \u003C- annotate_cell_types(\n    input = pbmc_markers,\n    tissue_name = \"human PBMC\",\n    model = model,\n    api_key = api_key,\n    top_gene_count = 10\n  )\n\n  # 添加到Seurat对象\n  column_name \u003C- paste0(\"cell_type_\", gsub(\"[^a-zA-Z0-9]\", \"_\", model))\n  pbmc[[column_name]] \u003C- plyr::mapvalues(\n    x = as.character(Idents(pbmc)),\n    from = names(results[[model]]),\n    to = results[[model]]\n  )\n}\n```\n\n### 高级共识配置：指定共识检查模型\n\n`consensus_check_model`参数（R）\u002F `consensus_model`参数（Python）允许你指定用于共识检查和讨论调解的LLM模型。此参数对共识注释的准确性非常重要，因为共识检查模型：\n\n1. 评估不同细胞类型注释之间的语义相似性\n2. 计算共识指标（比例和熵）\n3. 调解并综合各模型对争议簇的讨论\n4. 在模型意见不一致时做出最终决定\n\n我们建议使用功能强大的模型进行共识检查，因为这会直接影响注释质量。\n\n#### 推荐用于共识检查的模型\n\n- **Anthropic**: `claude-opus-4-6-20260205`, `claude-sonnet-4-5-20250929`\n- **OpenAI**: `o1`, `o1-pro`, `gpt-5.2`, `gpt-4.1`\n- **Google**: `gemini-3-pro`, `gemini-3-flash`\n- **其他**: `deepseek-r1` \u002F `deepseek-reasoner`, `qwen3-max`, `grok-4`\n\n#### R包使用示例\n\n```r\n# 示例1：指定共识检查模型\nconsensus_results \u003C- interactive_consensus_annotation(\n  input = marker_genes_list,\n  tissue_name = \"human brain\",\n  models = c(\"gpt-5.2\", \"claude-sonnet-4-5-20250929\", \"gemini-3-pro\", \"qwen3-max\"),\n  api_keys = api_keys,\n  consensus_check_model = \"claude-sonnet-4-5-20250929\",\n  controversy_threshold = 0.7,\n  entropy_threshold = 1.0\n)\n\n# 示例2：使用替代的共识检查模型\nconsensus_results \u003C- interactive_consensus_annotation(\n  input = marker_genes_list,\n  tissue_name = \"mouse liver\",\n  models = c(\"gpt-5.2\", \"gemini-3-pro\", \"qwen3-max\"),\n  api_keys = api_keys,\n  consensus_check_model = \"claude-sonnet-4-5-20250929\",\n  controversy_threshold = 0.7,\n  entropy_threshold = 1.0\n)\n\n# 示例3：使用OpenAI的推理模型\nconsensus_results \u003C- interactive_consensus_annotation(\n  input = marker_genes_list,\n  tissue_name = \"human免疫细胞\",\n  models = c(\"gpt-5.2\", \"claude-sonnet-4-5-20250929\", \"gemini-3-pro\"),\n  api_keys = api_keys,\n  consensus_check_model = \"o1\",\n  controversy_threshold = 0.7,\n  entropy_threshold = 1.0\n)\n```\n\n#### Python包使用示例\n\n```python\n# 示例1：指定共识模型\nconsensus_results = interactive_consensus_annotation(\n    marker_genes=marker_genes,\n    species=\"human\",\n    tissue=\"brain\",\n    models=[\"gpt-5.2\", \"claude-sonnet-4-5-20250929\", \"gemini-3-pro\", \"qwen3-max\"],\n    consensus_model=\"claude-sonnet-4-5-20250929\",\n    consensus_threshold=0.7,\n    entropy_threshold=1.0\n)\n\n# 示例2：使用字典格式\nconsensus_results = interactive_consensus_annotation(\n    marker_genes=marker_genes,\n    species=\"mouse\",\n    tissue=\"肝脏\",\n    models=[\"gpt-5.2\", \"gemini-3-pro\", \"qwen3-max\"],\n    consensus_model={\"provider\": \"anthropic\", \"model\": \"claude-sonnet-4-5-20250929\"},\n    consensus_threshold=0.7,\n    entropy_threshold=1.0\n)\n```\n\n# 示例 3：使用 Google 的模型进行共识判断\nconsensus_results = interactive_consensus_annotation(\n    marker_genes=marker_genes,\n    species=\"human\",\n    tissue=\"heart\",\n    models=[\"gpt-5.2\", \"claude-sonnet-4-5-20250929\", \"qwen3-max\"],\n    consensus_model={\"provider\": \"google\", \"model\": \"gemini-3-pro\"},\n    consensus_threshold=0.7,\n    entropy_threshold=1.0\n)\n\n# 示例 4：默认行为（使用 Qwen 并回退）\nconsensus_results = interactive_consensus_annotation(\n    marker_genes=marker_genes,\n    species=\"human\",\n    tissue=\"blood\",\n    models=[\"gpt-5.2\", \"claude-sonnet-4-5-20250929\", \"gemini-3-pro\"],\n    # 如果未指定，则默认使用 qwen3-max，并以 claude-sonnet-4-5-20250929 作为回退\n    consensus_threshold=0.7,\n    entropy_threshold=1.0\n)\n```\n\n#### 关于共识模型选择的注意事项\n\n1. **模型可用性**：请确保您拥有所选共识模型的 API 访问权限。如果首选模型不可用，系统将使用回退模型。\n\n2. **一致性**：在一个项目中，所有共识检查应使用相同的模型，以保证评估标准的一致性。\n\n3. **默认行为**：\n   - R：若未指定，默认使用 `models` 列表中的第一个模型。\n   - Python：默认使用 `qwen3-max`，并以 `claude-sonnet-4-5-20250929` 作为回退。\n\n共识检查模型必须能够准确评估不同细胞类型名称之间的语义相似性（例如，识别出“T 淋巴细胞”和“T 细胞”指代的是同一种细胞类型），理解生物学背景，并综合多个模型的讨论结果。\n\n### 高级功能：簇选择与缓存控制（v1.3.1）\n\nmLLMCelltype v1.3.1 引入了两个参数，使您能够对注释过程进行精细控制：\n\n#### 1. **clusters_to_analyze** - 选择性簇分析\n\n此参数允许您精确指定要分析的簇，而无需手动过滤输入数据：\n\n```r\n# 示例：专注于特定簇进行 T 细胞亚型分类\nconsensus_results \u003C- interactive_consensus_annotation(\n  input = pbmc_markers,\n  tissue_name = \"human PBMC - T cell subtypes\",\n  models = c(\"gpt-5.2\", \"claude-sonnet-4-5-20250929\"),\n  api_keys = api_keys,\n  clusters_to_analyze = c(0, 1, 7),  # 仅分析 T 细胞簇\n  controversy_threshold = 0.7\n)\n\n# 示例：用不同上下文重新分析有争议的簇\nconsensus_results \u003C- interactive_consensus_annotation(\n  input = pbmc_markers,\n  tissue_name = \"activated immune cells\",\n  models = c(\"gpt-5.2\", \"claude-sonnet-4-5-20250929\", \"gemini-3-pro\"),\n  api_keys = api_keys,\n  clusters_to_analyze = c(\"3\", \"5\"),  # 专注于特定簇\n  cache_dir = \"consensus_cache\"\n)\n```\n\n**优点：**\n- 无需手动对数据进行子集化\n- 保持原始簇编号不变\n- 仅分析相关簇，从而减少 API 调用次数和成本\n- 适用于对特定细胞群体进行迭代优化\n\n#### 2. **force_rerun** - 强制绕过缓存进行全新分析\n\n此参数会强制重新分析有争议的簇，忽略缓存结果：\n\n```r\n# 示例：初始的广泛分析\ninitial_results \u003C- interactive_consensus_annotation(\n  input = markers,\n  tissue_name = \"human brain\",\n  models = c(\"gpt-5.2\", \"claude-sonnet-4-5-20250929\"),\n  api_keys = api_keys,\n  use_cache = TRUE\n)\n\n# 示例：结合特定亚型上下文重新分析\nsubtype_results \u003C- interactive_consensus_annotation(\n  input = markers,\n  tissue_name = \"human brain - neuronal subtypes\",\n  models = c(\"gpt-5.2\", \"claude-sonnet-4-5-20250929\"),\n  api_keys = api_keys,\n  clusters_to_analyze = c(2, 3, 5),  # 神经元簇\n  force_rerun = TRUE,  # 强制全新分析，尽管存在缓存\n  use_cache = TRUE     # 对于非争议簇仍可利用缓存以提升性能\n)\n```\n\n**重要提示：**\n- `force_rerun` 仅影响需要 LLM 讨论的争议簇\n- 非争议簇仍将使用缓存以提高效率\n- 在更改组织上下文或专注于亚型时非常有用\n- 可与 `clusters_to_analyze` 结合使用，实现有针对性的重新分析\n\n#### 常见应用场景\n\n1. **迭代式亚型分类工作流：**\n```r\n# 步骤 1：通用细胞类型注释\ngeneral_types \u003C- interactive_consensus_annotation(\n  input = data,\n  tissue_name = \"human PBMC\",\n  models = models,\n  api_keys = api_keys\n)\n\n# 步骤 2：聚焦 T 细胞及其亚型上下文\nt_cell_subtypes \u003C- interactive_consensus_annotation(\n  input = data,\n  tissue_name = \"human T lymphocytes\",\n  models = models,\n  api_keys = api_keys,\n  clusters_to_analyze = c(0, 1, 4, 7),  # 来自步骤 1 的 T 细胞簇\n  force_rerun = TRUE  # 结合 T 细胞上下文进行全新分析\n)\n\n# 步骤 3：进一步细化 CD8+ T 细胞亚型\ncd8_subtypes \u003C- interactive_consensus_annotation(\n  input = data,\n  tissue_name = \"human CD8+ T cells - activation states\",\n  models = models,\n  api_keys = api_keys,\n  clusters_to_analyze = c(1, 4),  # CD8+ 簇\n  force_rerun = TRUE\n)\n```\n\n2. **经济高效的重新分析：**\n```r\n# 仅重新分析那些存在争议的簇\ncontroversial \u003C- initial_results$controversial_clusters\n\nrefined_results \u003C- interactive_consensus_annotation(\n  input = data,\n  tissue_name = \"human PBMC - refined\",\n  models = c(\"gpt-5.2\", \"claude-sonnet-4-5-20250929\", \"gemini-3-pro\"),\n  api_keys = api_keys,\n  clusters_to_analyze = controversial,  # 仅针对有争议的簇\n  force_rerun = TRUE,\n  consensus_check_model = \"claude-sonnet-4-5-20250929\"\n)\n```\n\n## 可视化示例\n\n### 细胞类型注释可视化\n\n以下是使用 mLLMCelltype 和 SCpubr 创建的可用于发表的可视化示例，展示了细胞类型注释以及不确定性指标（共识比例和香农熵）：\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fcafferychen777_mLLMCelltype_readme_0d02668f6e53.png\" alt=\"mLLMCelltype Visualization\" width=\"900\"\u002F>\n\u003C\u002Fdiv>\n\n*图：左侧面板显示 UMAP 投影上的细胞类型注释。中间面板使用黄绿蓝渐变色表示共识比例（蓝色越深表示 LLM 之间的一致性越高）。右侧面板使用橙红色渐变色表示香农熵（红色越深表示不确定性越低，橙色越浅表示不确定性越高）。*\n\n### 标记基因可视化\n\nmLLMCelltype 包含与共识注释流程集成的标记基因可视化函数：\n\n```r\n# 加载所需库\nlibrary(mLLMCelltype)\nlibrary(Seurat)\nlibrary(ggplot2)\n\n# 运行共识注释后\nconsensus_results \u003C- interactive_consensus_annotation(\n  input = markers_df,\n  tissue_name = \"human PBMC\",\n  models = c(\"anthropic\u002Fclaude-sonnet-4.5\", \"openai\u002Fgpt-5.2\"),\n  api_keys = list(openrouter = \"your_api_key\")\n)\n\n# 使用 Seurat 创建标记基因可视化\n# 将共识注释添加到 Seurat 对象中\ncluster_ids \u003C- as.character(Idents(pbmc_data))\ncell_type_annotations \u003C- consensus_results$final_annotations[cluster_ids]\n\n# 处理缺失的注释\nif (any(is.na(cell_type_annotations))) {\n  na_mask \u003C- is.na(cell_type_annotations)\n  cell_type_annotations[na_mask] \u003C- paste(\"Cluster\", cluster_ids[na_mask])\n}\n\n# 添加到 Seurat 对象\npbmc_data@meta.data$cell_type_consensus \u003C- cell_type_annotations\n\n# 创建标记基因点图\nDotPlot(pbmc_data,\n        features = top_markers,\n        group.by = \"cell_type_consensus\") +\n  RotatedAxis()\n\n# 创建标记基因热图\nDoHeatmap(pbmc_data,\n          features = top_markers,\n          group.by = \"cell_type_consensus\")\n```\n\n**标记基因可视化功能：**\n\n- **点图（DotPlot）**：同时展示每个基因在细胞中表达的百分比（点的大小）和平均表达水平（颜色强度）\n- **热图（Heatmap）**：显示归一化后的表达值，并对基因和细胞类型进行聚类\n- **集成性**：可直接与添加到 Seurat 对象中的共识注释结果配合使用\n- **标准 Seurat 函数**：采用熟悉的 Seurat 可视化函数，保持一致性\n\n有关详细说明和高级自定义选项，请参阅 [可视化指南](https:\u002F\u002Fcafferyang.com\u002FmLLMCelltype\u002Farticles\u002Fvisualization-guide.html)。\n\n## 引用\n\n如果您在研究中使用 mLLMCelltype，请引用以下内容：\n\n```bibtex\n@article{Yang2025.04.10.647852,\n  author = {Yang, Chen and Zhang, Xianyang and Chen, Jun},\n  title = {大型语言模型共识显著提升单细胞 RNA 测序数据的细胞类型注释准确度},\n  elocation-id = {2025.04.10.647852},\n  year = {2025},\n  doi = {10.1101\u002F2025.04.10.647852},\n  publisher = {冷泉港实验室},\n  URL = {https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002Fearly\u002F2025\u002F04\u002F17\u002F2025.04.10.647852},\n  journal = {bioRxiv}\n}\n```\n\n您也可以使用以下纯文本格式引用：\n\nYang, C., Zhang, X., & Chen, J. (2025). 大型语言模型共识显著提升单细胞 RNA 测序数据的细胞类型注释准确度。*bioRxiv*。[阅读我们在 bioRxiv 上的完整研究论文](https:\u002F\u002Fdoi.org\u002F10.1101\u002F2025.04.10.647852)\n\n## 贡献\n\n我们欢迎社区成员的贡献。您可以通过多种方式参与 mLLMCelltype 的开发：\n\n### 报告问题\n\n如果您遇到任何错误、有功能需求或对 mLLMCelltype 的使用有任何疑问，请在我们的 GitHub 仓库中 [提交一个问题](https:\u002F\u002Fgithub.com\u002Fcafferychen777\u002FmLLMCelltype\u002Fissues)。报告问题时，请提供以下信息：\n\n- 清晰的问题描述\n- 复现问题的步骤\n- 预期行为与实际行为的对比\n- 您的操作系统及软件包版本信息\n- 相关的代码片段或错误信息\n\n### 拉取请求\n\n我们也鼓励您通过拉取请求贡献代码改进或新功能：\n\n1. 克隆仓库并创建分支（`git checkout -b feature\u002Famazing-feature`）\n2. 提交更改（`git commit -m '添加一项很棒的功能'`）\n3. 推送到分支（`git push origin feature\u002Famazing-feature`）\n4. 打开拉取请求\n\n### 可贡献的方向\n\n以下是一些特别需要帮助的领域：\n\n- 增加对新 LLM 模型的支持\n- 改进文档和示例\n- 优化性能\n- 增加新的可视化选项\n- 扩展针对特定细胞类型或组织的功能\n- 将文档翻译成其他语言\n\n### 代码风格\n\n请遵循仓库中现有的代码风格。对于 R 代码，我们通常遵循 [tidyverse 风格指南](https:\u002F\u002Fstyle.tidyverse.org\u002F)；对于 Python 代码，则遵循 [PEP 8](https:\u002F\u002Fwww.python.org\u002Fdev\u002Fpeps\u002Fpep-0008\u002F)。\n\n### 社区\n\n加入我们的 [Discord 社区](https:\u002F\u002Fdiscord.gg\u002Fpb2aZdG4)，讨论 mLLMCelltype 和单细胞 RNA 测序分析相关话题。\n\n感谢您为 mLLMCelltype 的改进贡献力量！","# mLLMCelltype 快速上手指南\n\nmLLMCelltype 是一个基于多大型语言模型（Multi-LLM）共识框架的单细胞 RNA 测序（scRNA-seq）细胞类型自动注释工具。它通过整合多个主流大模型（如 GPT、Claude、Gemini、Qwen 等）的预测结果，无需参考数据集即可实现高精度的细胞类型鉴定，并提供不确定性量化指标。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux, macOS 或 Windows\n*   **Python 版本**：Python 3.8 或更高版本\n*   **前置依赖**：\n    *   单细胞数据分析库：`scanpy` (Python) 或 `Seurat` (R)\n    *   API Key：至少需要一个支持的 LLM 服务商的 API Key（如 OpenAI, Anthropic, Google, 阿里云百炼\u002F通义千问等）\n*   **网络环境**：由于需要调用海外或国内大模型 API，请确保网络畅通。若使用国内模型（如 Qwen, GLM, DeepSeek），连接通常更稳定。\n\n## 安装步骤\n\n您可以选择 Python 或 R 版本进行安装。推荐使用 Python 版本以获得更灵活的模型集成体验。\n\n### 方式一：Python 安装\n\n**1. 基础安装**\n通过 PyPI 安装核心包：\n```bash\npip install mllmcelltype\n```\n\n或者从 GitHub 安装最新开发版：\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fcafferychen777\u002FmLLMCelltype.git#subdirectory=python\n```\n\n**2. 安装模型依赖（可选但推荐）**\nmLLMCelltype 采用模块化设计，请根据您计划使用的模型安装对应的额外依赖：\n\n```bash\n# 使用 OpenAI 模型 (GPT 系列)\npip install \"mllmcelltype[openai]\"\n\n# 使用 Anthropic 模型 (Claude 系列)\npip install \"mllmcelltype[anthropic]\"\n\n# 使用 Google 模型 (Gemini 系列)\npip install \"mllmcelltype[gemini]\"\n\n# 一次性安装所有可选依赖\npip install \"mllmcelltype[all]\"\n```\n> **注意**：如果运行时报错 `ImportError: cannot import name 'genai' from 'google'` 等，请手动安装对应厂商的 SDK（例如：`pip install google-genai`）。\n\n### 方式二：R 安装\n\n```r\n# 从 CRAN 安装（推荐）\ninstall.packages(\"mLLMCelltype\")\n\n# 或从 GitHub 安装开发版\n# 需先安装 devtools: install.packages(\"devtools\")\ndevtools::install_github(\"cafferychen777\u002FmLLMCelltype\", subdir = \"R\")\n```\n\n### 免安装体验\n您可以直接在 Google Colab 中运行示例，无需本地配置环境：\n[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fcafferychen777\u002FmLLMCelltype\u002Fblob\u002Fmain\u002Fnotebooks\u002FmLLMCelltype_Tutorial.ipynb)\n\n## 基本使用\n\n以下是最简单的 Python 使用流程，展示如何结合 `scanpy` 进行细胞类型注释。\n\n### 1. 数据准备与聚类\n首先加载单细胞数据并进行标准的预处理和聚类（如果尚未完成）。\n\n```python\nimport scanpy as sc\nfrom mllmcelltype import interactive_consensus_annotation\nimport os\n\n# 加载数据 (AnnData 格式)\nadata = sc.read_h5ad('your_data.h5ad')\n\n# 若未聚类，执行 Leiden 聚类\nif 'leiden' not in adata.obs.columns:\n    sc.pp.normalize_total(adata, target_sum=1e4)\n    sc.pp.log1p(adata)\n    sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)\n    sc.pp.pca(adata, use_highly_variable=True)\n    sc.pp.neighbors(adata, n_neighbors=10, n_pcs=30)\n    sc.tl.leiden(adata, resolution=0.8)\n\n# 提取每个簇的标记基因 (Marker Genes)\nsc.tl.rank_genes_groups(adata, 'leiden', method='wilcoxon')\n\nmarker_genes = {}\nfor i in range(len(adata.obs['leiden'].cat.categories)):\n    # 选取每个簇前 10 个差异表达基因\n    genes = [adata.uns['rank_genes_groups']['names'][str(i)][j] for j in range(10)]\n    marker_genes[str(i)] = genes\n```\n\n### 2. 配置 API Key\n设置您拥有的大模型 API Key。支持同时配置多个模型以启用“共识机制”。\n\n```python\n# 至少配置一个 API Key\nos.environ[\"OPENAI_API_KEY\"] = \"your-openai-api-key\"\nos.environ[\"ANTHROPIC_API_KEY\"] = \"your-anthropic-api-key\"\nos.environ[\"QWEN_API_KEY\"] = \"your-qwen-api-key\"  # 阿里云通义千问\n# os.environ[\"DEEPSEEK_API_KEY\"] = \"your-deepseek-api-key\" # 深度求索\n```\n\n### 3. 执行共识注释\n调用 `interactive_consensus_annotation` 函数，让多个模型对标记基因进行讨论并达成共识。\n\n```python\n# 执行多模型共识注释\nconsensus_results = interactive_consensus_annotation(\n    marker_genes=marker_genes,\n    models=[\"gpt-5.2\", \"claude-4.6\", \"qwen-max\"],  # 指定参与的模型\n    max_rounds=3  # 最大讨论轮次\n)\n\n# 将结果映射回 AnnData 对象\n# (具体结果处理逻辑请参考完整教程，此处为概念演示)\nprint(consensus_results)\n```\n\n### 关键注意事项\n*   **基因符号格式**：输入必须是标准的基因符号（如 `KCNJ8`, `PDGFRA`），**不能**是 Ensembl ID（如 `ENSG00000176771`）。如果数据中包含 Ensembl ID，请先转换为 Gene Symbol。\n*   **簇 ID 格式**：聚类标签必须是数字或可转换为数字的字符串（如 `\"0\", \"1\"`），避免使用 `\"cluster_1\"` 或 `\"T_cells\"` 等非纯数字格式，否则可能导致错误。\n*   **无参考数据**：该工具不需要预先训练的参考数据集，完全基于标记基因和大模型的生物学知识进行推理。","某生物信息学研究员正在分析一项罕见的神经退行性疾病单细胞测序数据，急需在缺乏高质量参考图谱的情况下，准确识别样本中未知的稀有免疫细胞亚群。\n\n### 没有 mLLMCelltype 时\n- **依赖单一模型导致偏差**：仅使用单个大模型进行注释时，容易受该模型训练数据偏见影响，将新型细胞错误归类为常见类型。\n- **人工复核耗时巨大**：面对模型给出的模糊预测，研究员需手动查阅大量文献逐一验证基因标记物，耗费数天时间。\n- **缺乏置信度评估**：传统方法难以量化注释的不确定性，导致研究人员无法判断哪些细胞类型的鉴定结果是可靠的，哪些存疑。\n- **冷启动困难**：由于缺乏匹配的参考数据集，基于映射的传统工具完全失效，分析工作陷入停滞。\n\n### 使用 mLLMCelltype 后\n- **多模型共识消除偏见**：mLLMCelltype 调用 GPT-5.2、Claude-4.6 等十余个大模型进行“集体会诊”，通过共识机制有效纠正了单一模型的误判。\n- **自动化迭代讨论**：工具内置的多轮讨论机制自动评估基因表达证据并优化注释，将原本数天的人工复核工作缩短至几小时。\n- **明确的不确定性指标**：系统直接输出香农熵和共识比例，帮助研究员快速定位那些需要重点关注的低置信度细胞群。\n- **无需参考数据即可运行**：凭借强大的生成式能力，mLLMCelltype 直接在无参考图谱条件下完成了高精度（达 95%）的细胞类型鉴定。\n\nmLLMCelltype 通过整合多模型智慧与量化不确定性，将单细胞注释从“盲目猜测”转变为可信赖的自动化科学发现流程。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fcafferychen777_mLLMCelltype_0d02668f.png","cafferychen777","Caffery Yang","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fcafferychen777_d82f0460.jpg","Statistics PhD @ TAMU | LLM4omics & Productivity Tools Researcher","Texas A&M University",null,"CafferyYang","https:\u002F\u002Fcafferyang.com\u002F","https:\u002F\u002Fgithub.com\u002Fcafferychen777",[82,86,90,94,98],{"name":83,"color":84,"percentage":85},"Python","#3572A5",59.5,{"name":87,"color":88,"percentage":89},"R","#198CE7",36.4,{"name":91,"color":92,"percentage":93},"Jupyter Notebook","#DA5B0B",3.9,{"name":95,"color":96,"percentage":97},"Shell","#89e051",0.2,{"name":99,"color":100,"percentage":101},"CSS","#663399",0.1,636,54,"2026-04-01T15:23:20","MIT","Linux, macOS, Windows","未说明 (该工具通过 API 调用云端大模型，本地无需 GPU)","未说明",{"notes":110,"python":108,"dependencies":111},"1. 该工具为无参考数据集（Reference-Free）的注释框架，核心计算依赖外部 LLM API，本地仅需运行轻量级协调代码。2. 必须配置至少一个 LLM 提供商的 API Key（如 OpenAI, Anthropic, Google 等）方可运行。3. 支持 R 语言和 Python 两种版本，Python 版可通过 pip 安装并配合 Scanpy\u002FSeurat 工作流使用。4. 输入数据要求基因名为标准基因符号（Gene Symbols），不支持 Ensembl ID；聚类标签必须为数值型或可转换为数值型。5. 提供 Web 应用版本 (mllmcelltype.com) 无需本地安装即可使用。",[112,113,114,115,116],"scanpy","pandas","openai (可选)","anthropic (可选)","google-genai (可选)",[35,16,14],[119,120,121,122,123,112,124,125,126,127,128],"bioinformatics","cell-type-annotation","consensus-algorithm","large-language-models","llm","seurat","single-cell","scrna","computational-biology","scrnaseq-analysis","2026-03-27T02:49:30.150509","2026-04-07T17:05:50.668840",[132,137,142,147,152,156],{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},22471,"如何使用 OpenRouter 提供的免费模型（如 deepseek-chat-v3）进行注释？","工具已更新支持更多免费 OpenRouter 模型。您可以直接在 models 参数中指定模型名称，例如 \"deepseek\u002Fdeepseek-chat-v3-0324:free\"。维护者已扩展了硬编码的模型列表以包含这些免费模型。请注意免费模型存在速率限制：每分钟 20 次请求；若账户积分少于 10，每天限 50 次；若积分大于等于 10，每天限 1000 次。如果遇到问题，可尝试直接调用 process_openrouter 函数作为临时解决方案。","https:\u002F\u002Fgithub.com\u002Fcafferychen777\u002FmLLMCelltype\u002Fissues\u002F17",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},22472,"网页版使用过程中出现 \"Processing failed\" 错误怎么办？","该错误通常是由于 GLM-4 等模型 API 响应时间过长（20-30 秒），导致心跳超时被系统误判为失败。维护者已部署修复方案：1. 强制心跳刷新机制，即使主线程被阻塞，每 60 秒也会强制写入数据库；2. 将超时阈值从 10 分钟延长至 15 分钟；3. 改进了错误处理逻辑。请重新安装或更新到最新版本以获取修复。","https:\u002F\u002Fgithub.com\u002Fcafferychen777\u002FmLLMCelltype\u002Fissues\u002F67",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},22473,"切换模型提供商后仍报错，缓存系统是否保留了旧配置？","是的，这是一个已知的缓存键生成问题。当在普通模型（如 gpt-4o）和 OpenRouter 模型（如 openai\u002Fgpt-4o-mini）之间切换时，由于 provider 未正确规范化，导致缓存键冲突。修复方法是在 utils.py 的 create_cache_key 函数中增加判断：如果模型名称包含 \"\u002F\"，则强制将 provider 设为 \"openrouter\"。代码示例：`if \"\u002F\" in normalized_model: normalized_provider = \"openrouter\"`。请更新到最新版本以应用此修复。","https:\u002F\u002Fgithub.com\u002Fcafferychen777\u002FmLLMCelltype\u002Fissues\u002F65",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},22474,"运行代码时遇到 \"local variable 'consensus_response' referenced before assignment\" 错误如何解决？","该错误通常发生在使用某些特定的 OpenRouter 免费模型组合时，导致变量未正确初始化。建议检查 models 列表中的模型名称拼写是否正确，并确保所有选用的模型在当前版本中受支持。维护者已在后续版本中修复了相关逻辑，确保 consensus_response 在所有执行路径下都被正确赋值。请尝试更新包版本或减少并发模型数量进行测试。","https:\u002F\u002Fgithub.com\u002Fcafferychen777\u002FmLLMCelltype\u002Fissues\u002F64",{"id":153,"question_zh":154,"answer_zh":155,"source_url":146},22475,"为什么即使设置了 use_cache=False，仍然会重试失败的 API 调用或显示旧模型日志？","这是因为代码逻辑中缓存检查和模型可用性验证的顺序问题。即使禁用缓存，程序可能仍会尝试加载之前的上下文或重复调用失败的 API。维护者承认这是代码逻辑复杂导致的，并计划优化：在函数开始时先检查 API 可用性，若不可用则直接丢弃该模型，避免无效重试。目前建议手动清理日志目录或重启环境以避免旧状态干扰。",{"id":157,"question_zh":158,"answer_zh":159,"source_url":146},22476,"如何配置多个不同提供商的模型进行共识注释？","您可以在 interactive_consensus_annotation 函数的 models 参数中传入字典列表，每个字典指定 provider 和 model。例如：`models=[{\"provider\": \"openrouter\", \"model\": \"openai\u002Fgpt-4o-mini\"}, {\"provider\": \"openrouter\", \"model\": \"anthropic\u002Fclaude-sonnet-4\"}]`。确保已正确设置对应提供商的 API Key 环境变量。对于 OpenRouter 模型，只需提供统一的 OPENROUTER_API_KEY 即可。",[161,166,171,176,181,186],{"id":162,"version":163,"summary_zh":164,"released_at":165},136180,"v2.0.0","# mLLMCelltype v2.0.0\n\n## 重大变更\n- **日志记录默认设置已更改**：`UnifiedLogger` 的控制台输出现默认关闭。如需启用，请使用 `configure_logger(console_output = TRUE)`。\n\n## 新增模型支持\n- OpenAI：GPT-5.2、GPT-5.1、GPT-5、o3-pro、o4-mini\n- Anthropic：Claude Opus 4.6、Claude Opus 4.5、Claude Sonnet 4.5\n- Google：Gemini 3 Pro、Gemini 3 Flash、Gemini 2.5 Pro\u002FFlash\n- X.AI：Grok 4、Grok 4.1、Grok 4 Heavy\n- DeepSeek：DeepSeek R1\n- 阿里巴巴：Qwen 3 Max\n- 智谱：GLM-4.7\n- MiniMax：MiniMax M2.1\n\n## 架构改进\n- 统一日志记录：将原有的双重调试机制整合为单一的 `log_debug()` 方法。\n- 完整审计追踪：所有 `warning()` 调用均与 `log_warn()` 配对；LLM 共识决策已被记录。\n- NAMESPACE 修复：`QwenProcessor` 正确导出；`.qwen_endpoint_cache` 已从公共 API 中移除。\n- 在整个注释流程中保持聚类 ID 不变。\n\n## 错误修复\n- 修复了 `@export` 放置不当的问题，该问题导致内部状态而非 `QwenProcessor` 类被导出。\n- 修复了 API 集成测试中缺失 `tissue_name` 参数的问题。\n- 修复了在模型比较中 `print()` 跳过日志系统的问题。\n- 修复了尾随换行符导致控制台输出双倍行距的问题。\n- 改进了 API 密钥验证中的 NA 值处理。\n\n## 包信息\n- **R 包**：mLLMCelltype_2.0.0.tar.gz\n- **测试结果**：174 个通过，0 个失败，0 个警告","2026-02-08T09:06:08",{"id":167,"version":168,"summary_zh":169,"released_at":170},136181,"v1.2.9","# mLLMCelltype v1.2.9 发行说明\n\n## 🎉 重大更新\n\n### 缓存系统修复 (#65)\n- **修复了关键的缓存隔离问题**：OpenRouter 模型错误地复用了常规模型的缓存\n- **解决方案**：修改了 `create_cache_key()` 函数，为包含 `\u002F` 的 OpenRouter 模型正确规范化提供商\n- **影响**：防止缓存污染，并在切换提供商时确保细胞类型注释的准确性\n\n### 新功能\n- **缓存管理模块**：新增 `cache_manager.py`，提供缓存检查和管理工具\n  - `get_cache_info()`：获取当前缓存状态信息\n  - `clear_mllmcelltype_cache()`：交互式缓存清除\n  - CLI 界面：`python -m mllmcelltype.cache_manager`\n- **全面文档**：在 `docs\u002FCACHE_SYSTEM.md` 中添加了详细的缓存系统文档\n- **示例脚本**：在 `examples\u002Fcache_management_example.py` 中增加了缓存管理示例\n\n### 代码质量改进\n- 修复了整个 Python 包中的所有 ruff 代码风格检查错误\n- 添加了 `.gitignore` 文件，用于排除临时测试文件\n- 更新了导入顺序并移除了未使用的导入\n- 为缓存系统验证添加了全面的测试套件\n\n## 📦 软件包信息\n\n### Python 软件包\n- 版本：1.2.9\n- 文件：\n  - `mllmcelltype-1.2.9.tar.gz` - 源码分发包\n  - `mllmcelltype-1.2.9-py3-none-any.whl` - Wheel 分发包\n\n### R 软件包\n- 版本：1.2.9\n- 文件：`mLLMCelltype_1.2.9.tar.gz`\n\n## 🐛 错误修复\n- 常规模型与 OpenRouter 模型之间的缓存隔离问题\n- 示例中的导入顺序问题\n- 未使用变量警告\n- F-string 格式化问题\n\n## 📚 文档更新\n- 添加了全面的缓存系统文档\n- 更新了示例，以展示正确的模型使用方法\n- 添加了 API 密钥配置示例（`.env.example`）\n\n## 🧪 测试\n- 为缓存系统添加了广泛的测试套件\n- 通过实际 API 调用验证了缓存隔离\n- 测试了包括组织上下文变化在内的边缘情况\n\n## 安装\n\n### Python\n```bash\npip install mllmcelltype==1.2.9\n```\n\n### R\n```r\ninstall.packages(\"mLLMCelltype_1.2.9.tar.gz\", repos = NULL, type = \"source\")\n```\n\n## 致谢\n感谢 @eason-analytics 报告缓存问题 (#65) 并提供了详细的复现步骤！","2025-07-01T12:11:06",{"id":172,"version":173,"summary_zh":174,"released_at":175},136182,"v1.2.8","mLLMCelltype v1.2.8（R包）和v1.2.3（Python包）发布","2025-06-24T05:32:49",{"id":177,"version":178,"summary_zh":179,"released_at":180},136183,"v1.2.4","## 重要错误修复\n\n### 修复了主要的 `as.logical(from)` 错误\n- **解决了在处理大量聚类（60个以上）时出现的严重错误**  \n  该错误是由非字符型数据被传递给 `strsplit()` 函数引起的。  \n  现在用户可以成功处理包含大量聚类的数据集，而不会遇到类型强制转换错误。\n\n### 增强了 API 响应处理\n- 在 API 处理函数中的所有 `strsplit()` 操作周围**添加了全面的 `tryCatch()` 块**  \n  - 改进了响应验证机制，防止函数或闭包类型被当作字符字符串处理  \n  - 对所有 API 处理函数进行了**增强的错误处理**：  \n    `process_openrouter.R`、`process_anthropic.R`、`process_openai.R`、`process_deepseek.R`、`process_qwen.R`、`process_stepfun.R`、`process_minimax.R`、`process_zhipu.R`、`process_gemini.R`、`process_grok.R`\n\n### 改进内容\n- **更好地处理 NULL 值**：改进了 `unlist()` 操作，以过滤掉 NULL 值并优雅地处理错误  \n- **增强了日志记录**：增加了更详细的错误日志，便于调试 API 响应问题  \n- **改进了共识检查**：增强了 `check_consensus.R` 的功能，以处理格式错误的响应等边缘情况\n\n### 技术细节\n- 修复了大型聚类数据集可能导致响应解析中出现类型强制转换错误的问题  \n- 为 API 响应中的函数\u002F闭包类型添加了验证，以防止下游错误  \n- 改进了错误信息，以便更好地诊断 API 响应问题\n\n## 安装说明\n\n```r\n# 从 GitHub 安装\ndevtools::install_github(\"cafferychen777\u002FmLLMCelltype\", subdir = \"R\")\n```\n\n## 变更内容\n- 提升了处理大型数据集时的鲁棒性  \n- 改进了所有 API 集成中的错误处理能力  \n- 通过增强的日志记录功能，提高了调试能力\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fcafferychen777\u002FmLLMCelltype\u002Fcompare\u002Fv1.2.3...v1.2.4","2025-05-25T21:00:43",{"id":182,"version":183,"summary_zh":184,"released_at":185},136184,"v1.2.0","## 1.2.0（2025-04-28）\n\n### 改进\n* 首次在CRAN上发布\n* 更新了文档和示例 vignette\n* 改进了包结构以符合 CRAN 的要求\n* 修复了 CITATION 文件格式\n* 将 URL 更新为 https:\u002F\u002Fcafferyang.com\u002FmLLMCelltype\u002F","2025-04-30T20:25:50",{"id":187,"version":188,"summary_zh":189,"released_at":190},136185,"v1.1.4","## mLLMCelltype v1.1.4 (2025-04-24)\n\n### 新功能与改进\n- 扩展了 OpenRouter 模型列表，以支持免费模型\n- 解决了通义千问模型国际版和中国大陆版之间存在的不一致问题\n\n### 错误修复\n- 修复了共识检查流程中对 OpenRouter 模型的处理问题\n- 通过添加显式变量定义和注释，修复了 README 文件中 `current_clusters` 变量未定义的问题\n\n### 其他更新\n- 更新了 DESCRIPTION 文件\n\n本版本提升了模型兼容性，并修复了多个关键问题。我们建议所有用户升级到此版本。","2025-04-28T16:03:57"]