[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-lupantech--chameleon-llm":3,"tool-lupantech--chameleon-llm":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",150720,2,"2026-04-11T11:33:10",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":77,"owner_email":78,"owner_twitter":72,"owner_website":79,"owner_url":80,"languages":81,"stars":94,"forks":95,"last_commit_at":96,"license":97,"difficulty_score":32,"env_os":98,"env_gpu":99,"env_ram":99,"env_deps":100,"category_tags":110,"github_topics":111,"view_count":32,"oss_zip_url":119,"oss_zip_packed_at":119,"status":17,"created_at":120,"updated_at":121,"faqs":122,"releases":142},6643,"lupantech\u002Fchameleon-llm","chameleon-llm","Codes for \"Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models\".","Chameleon 是一个专为大型语言模型（LLM）设计的“即插即用”组合推理框架。它旨在解决大模型在处理科学问题、数学问答及表格分析等复杂任务时，难以灵活调用外部工具或进行多步逻辑推导的痛点。传统方法往往需要针对特定任务重新训练模型，而 Chameleon 让模型能够像变色龙适应环境一样，动态感知并自主选择合适的工具（如代码解释器、搜索模块等）来协同解决问题，无需修改模型底层参数。\n\n该项目特别适合 AI 研究人员、开发者以及需要构建复杂推理应用的技术团队使用。通过提供模块化的代码实现，它降低了将大模型与外部能力集成的门槛。其核心技术亮点在于“组合式推理”能力：系统能自动拆解复杂问题，规划推理步骤，并在执行过程中灵活切换不同工具，显著提升了 GPT-4 等模型在专业领域的准确性和鲁棒性。作为曾入选顶级 AI 论文榜单的开源项目，Chameleon 为探索大模型在实际场景中的深层应用能力提供了强有力的支持。","# :lizard: Chameleon: Plug-and-Play Compositional Reasoning with GPT-4\n\n![Science Problems](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTask-Science_Problems-blue) \n![Science Problems](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTask-MathQA-blue) \n![Science Problems](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTask-TableQA-blue) \n![Chain-of-Thought](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel-Tool_Use-green) \n![GPT-4](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel-GPT--4-green) \n![LLMs](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel-LLMs-green)\n\nCode for the Paper \"[Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.09842)\".\n\n:bell: If you have any questions or suggestions, please don't hesitate to let us know. You can directly email [Pan Lu](https:\u002F\u002Flupantech.github.io\u002F) using the email address lupantech@gmail.com, comment on the [Twitter](https:\u002F\u002Ftwitter.com\u002Flupantech\u002Fstatus\u002F1648879085115052033), or post an issue on this repository.\n\n[[Project Page](https:\u002F\u002Fchameleon-llm.github.io\u002F)] [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.09842)] [[Twitter](https:\u002F\u002Ftwitter.com\u002Flupantech\u002Fstatus\u002F1648879085115052033)] [[Linkedin](https:\u002F\u002Fwww.linkedin.com\u002Ffeed\u002Fupdate\u002Furn:li:activity:7056703894063644672)] [[YouTube](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=EWFixIk4vjs&ab_channel=WorldofAI)] [[Slides](https:\u002F\u002Flupantech.github.io\u002Fdocs\u002FChameleon_LLM_Pan_Lu_Google_Brain_2023.05.05.pdf)]\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_a110afce6c25.png\" width=\"10%\"> \u003Cbr>\n  Tentative logo for \u003Cb>Chameleon\u003C\u002Fb>.\n\u003C\u002Fp>\n\n## 💥 News 💥\n\n- **[2023.05.06]** Thrilled to see that our Chameleon paper has been ranked **#1** out of 1,682 AI papers by [AlphaSignal](https:\u002F\u002Falphasignalai.beehiiv.com\u002Fp\u002Fweeks-top-5-ai-papers?utm_source=alphasignalai.beehiiv.com&utm_medium=newsletter&utm_campaign=this-week-s-top-5-ai-papers). \n- **[2023.05.05]** We are excited to share that Pan Lu was invited to deliver a talk to the Reasoning Team at Google Brain. View the presentation slides here: [[Slides](https:\u002F\u002Flupantech.github.io\u002Fdocs\u002FChameleon_LLM_Pan_Lu_Google_Brain_2023.05.05.pdf)]\n- **[2023.04.24]** Our work has been featured in a [MarkTechPost](https:\u002F\u002Fwww.marktechpost.com\u002F2023\u002F04\u002F24\u002Fmeet-chameleon-a-plug-and-play-compositional-reasoning-framework-that-harnesses-the-capabilities-of-large-language-models\u002F) article.\n- **[2023.04.23]** Our research has been recognized as one of the \"Top ML Papers of the Week\" by [DAIR.AI](https:\u002F\u002Fwww.linkedin.com\u002Fpulse\u002Ftop-ml-papers-week-dair-ai-8e\u002F?trackingId=w6D1Ow8FxKSTjgdFuwgYnQ%3D%3D).\n- **[2023.04.22]** Thrilled to announce that our work has been featured on [WorldofAI](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=EWFixIk4vjs&ab_channel=WorldofAI)'s [YouTube channel](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=EWFixIk4vjs&ab_channel=WorldofAI)!\n- **[2023.04.21]** Our work is the trending project on https:\u002F\u002Ftrends.vercel.app. [[Link](https:\u002F\u002Fraw.githubusercontent.com\u002Flupantech\u002Fchameleon-llm\u002Fmain\u002Fassets\u002Ftrend.png)]\n- **[2023.04.20]** Huge thanks to [John Nay](https:\u002F\u002Ftwitter.com\u002Fjohnjnay\u002Fstatus\u002F1649036276627132418) for sharing our work on [Twitter](https:\u002F\u002Ftwitter.com\u002Fjohnjnay\u002Fstatus\u002F1649036276627132418)!\n- **[2023.04.19]** Our research is now listed on [Papers with Code](https:\u002F\u002Fpaperswithcode.com\u002Fpaper\u002Fchameleon-plug-and-play-compositional).\n- **[2023.04.19]** We appreciate [Aran Komatsuzaki](https:\u002F\u002Ftwitter.com\u002Farankomatsuzaki\u002Fstatus\u002F1648848332977221632) for featuring our work on [Twitter](https:\u002F\u002Ftwitter.com\u002Farankomatsuzaki\u002Fstatus\u002F1648848332977221632) in a timely manner!\n- **[2023.04.19]** Special thanks to [@_akhaliq](https:\u002F\u002Ftwitter.com\u002F_akhaliq\u002Fstatus\u002F1648851856930533378) for promptly sharing our work on [Twitter](https:\u002F\u002Ftwitter.com\u002F_akhaliq\u002Fstatus\u002F1648851856930533378)!\n- **[2023.04.19]** Visit our project's homepage at [Chameleon-LLM](https:\u002F\u002Fchameleon-llm.github.io\u002F).\n- **[2023.04.19]** Our paper is now accessible at https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.09842.\n\n\n\n## :lizard: About Chameleon\n\n**Chameleon** is a plug-and-play compositional reasoning framework that augments LLMs with various types of tools. **Chameleon** synthesizes programs to compose various tools, including LLM models, off-the-shelf vision models, web search engines, Python functions, and rule-based modules tailored to user interests. Built on top of an LLM as a natural language planner, **Chameleon** infers the appropriate sequence of tools to compose and execute in order to generate a final response. \n\n![showcase_scienceqa](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_ad31ca3c3b87.png)\n\nWe showcase the adaptability and effectiveness of **Chameleon** on two tasks: [ScienceQA](https:\u002F\u002Fscienceqa.github.io\u002F) and [TabMWP](https:\u002F\u002Fpromptpg.github.io\u002F). Notably, **Chameleon** with GPT-4 achieves an 86.54% accuracy on ScienceQA, significantly improving upon the best published few-shot model by 11.37%; using GPT-4 as the underlying LLM, **Chameleon** achieves a 17.0% increase over the state-of-the-art model, leading to a 98.78% overall accuracy on TabMWP. Further studies suggest that using GPT-4 as a planner exhibits more consistent and rational tool selection and is able to infer potential constraints given the instructions, compared to other LLMs like ChatGPT.\n\nFor more details, you can find our project page [here](https:\u002F\u002Fchameleon-llm.github.io\u002F) and our paper [here](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.09842.pdf).\n\n## :tv: YouTube Video\n\nWe would like to express our immense gratitude to [WorldofAI](https:\u002F\u002Fwww.youtube.com\u002F@intheworldofai) for featuring and introducing our work on [YouTube](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=EWFixIk4vjs&ab_channel=WorldofAI)!\n\n[![YouTube Video](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_f7407a6da6ef.jpg)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=EWFixIk4vjs)\n\n\n\n## :star: Star History\n\n[![Star History Chart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_8512ef8718d4.png)](https:\u002F\u002Fstar-history.com\u002F#lupantech\u002Fchameleon-llm&Date)\n\n\n## 🐙 Requirements\n\n- [OpenAI API key](https:\u002F\u002Fplatform.openai.com\u002Faccount\u002Fapi-keys)\n- [Bing Search API](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fbing\u002Fapis\u002Fbing-web-search-api.) (If you want to enable the bing search module but the module is optional)\n\nInstall all required python dependencies (generated by `pipreqs`):\n\n```\npython==3.8.10\nhuggingface-hub\nnumpy==1.23.2\nopenai==0.23.0\npandas==1.4.3\ntransformers==4.21.1\nrequests==2.28.1\n```\n\nInstall all required python dependencies (you can skip this step if you have set up the dependencies before and the versions are not strictly required):\n\n```\npip install -r requirements.txt\n```\n\n\n\n## ⚠️ Configuration ⚠️\n\n### OpenAI API Key\n\nObtain your OpenAI API key from: https:\u002F\u002Fplatform.openai.com\u002Faccount\u002Fapi-keys.\n\nTo use OpenAI API key for **Chameleon**, you **NEED** to have billing set up (AKA paid account).\n\nYou can set up paid account at https:\u002F\u002Fplatform.openai.com\u002Faccount\u002Fbilling\u002Foverview.\n\n### Bing Search API Key (Optional)\n\nObtain your Bing Search API key from: https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fbing\u002Fapis\u002Fbing-web-search-api.\n\nThe Bing Search API key is **optional**. Failure to set up this key will lead to a slight performance drop on the ScienceQA task.\n\n\n\n## :hammer_and_wrench: Module Inventory\n\n### Different Tools in Chameleon\n\n Different types of tools in our module inventory:\n\n![tools](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_64ee2e08c6e7.png)\n\n### Tool Subset\n\nTools used on ScienceQA and TabMWP, respectively. The reusable tools in two tasks are highlighted in green:\n\n![tools_task](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_dec9b609aa63.png)\n\n\n\n## 🤖 Run Chameleon on ScienceQA\n\nScience Question Answering ([ScienceQA](https:\u002F\u002Fscienceqa.github.io\u002F)) is a multi-modal question-answering benchmark covering a wide range of scientific topics over diverse contexts. The ScienceQA dataset is provided in [`data\u002Fscienceqa`](https:\u002F\u002Fgithub.com\u002Flupantech\u002Fchameleon-llm\u002Ftree\u002Fmain\u002Fdata\u002Fscienceqa). For more details, you can explore the dataset and check out the [Explore](https:\u002F\u002Fscienceqa.github.io\u002Fexplore.html) page and [Visualize](https:\u002F\u002Fscienceqa.github.io\u002Fvisualize.html) page.\n\nFor the current version, the results for the `Image Captioner` and `Text Detector` are off-the-shelf and stored in `data\u002Fscienceqa\u002Fcaptions.json` and `data\u002Fscienceqa\u002Focrs.json`, respectively. The live calling these two modules are coming soon!\n\nTo run **Chameleon** (GPT-4):\n\n```sh\ncd run_scienceqa\n\npython run.py \\\n--model chameleon \\\n--label chameleon_gpt4 \\\n--policy_engine gpt-4 \\\n--kr_engine gpt-4 \\\n--qg_engine gpt-4 \\\n--sg_engine gpt-4 \\\n--test_split test \\\n--test_number -1\n```\n\nIt will generate the predictions and save the results at `results\u002Fscienceqa\u002Fchameleon_gpt4_test.json`,  `results\u002Fscienceqa\u002Fchameleon_gpt4_test_cache.jsonl`, and  `results\u002Fscienceqa\u002Fchameleon_gpt4_test_cache.json`.\n\nWe can get the accuracy metrics on average and across different question classes by running:\n\n```sh\npython evaluate.py \\\n--data_file ..\u002Fdata\u002Fscienceqa\u002Fproblems.json \\\n--result_root ..\u002Fresults\u002Fscienceqa \\\n--result_files chameleon_chatgpt_test_cache.jsonl\n```\n\nTo run **Chameleon** (ChatGPT):\n\n```sh\npython run.py \\\n--model chameleon \\\n--label chameleon_gpt4 \\\n--policy_engine gpt-3.5-turbo \\\n--kr_engine gpt-3.5-turbo \\\n--qg_engine gpt-3.5-turbo \\\n--sg_engine gpt-3.5-turbo \\\n--test_split test \\\n--test_number -1\n```\n\nOur **Chameleon** is a generalized form of the [CoT (chain-of-thought)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.11903) method, where the generated program is a sequence of `Solution Generator` and `Answer Generator`. By passing `--model` as `cot`,  `modules` is set as `[\"solution_generator\", \"answer_generator\"]`.\n\nTo run CoT (chain-of-thought prompted) GPT-4:\n\n```sh\npython run.py \\\n--model cot \\\n--label cot_gpt4 \\\n--sg_engine gpt-4 \\\n--test_split test \\\n--test_number -1\n```\n\nTo run CoT (chain-of-thought prompted) ChatGPT:\n\n```sh\npython run.py \\\n--model cot \\\n--label cot_chatgpt \\\n--sg_engine gpt-4 \\\n--test_split test \\\n--test_number -1\n```\n\n\n\n## 🤖 Run Chameleon on TabMWP\n\nThe TabMWP dataset contains 38,431 tabular math word problems. Each question in TabMWP is aligned with a tabular context, which is presented as an image, semi-structured text, and a structured table. The TabMWP dataset is provided in [`data\u002Ftabmwp`](https:\u002F\u002Fgithub.com\u002Flupantech\u002FPromptPG\u002Fblob\u002Fmain\u002Fdata\u002Ftabmwp). For more details, you can explore the datatset and check out the [Explore](https:\u002F\u002Fpromptpg.github.io\u002Fexplore.html) page and [Visualize](https:\u002F\u002Fpromptpg.github.io\u002Fvisualize.html) page.\n\nTo run **Chameleon** (GPT-4):\n\n```sh\ncd run_tabmwp\n\npython run.py \\\n--model chameleon \\\n--label chameleon_gpt4 \\\n--test_split test \\\n--policy_engine gpt-4 \\\n--rl_engine gpt-4 \\\n--cl_engine gpt-4 \\\n--tv_engine gpt-4 \\\n--kr_engine gpt-4 \\\n--sg_engine gpt-4 \\\n--pg_engine gpt-4 \\\n--test_number -1 \\\n--rl_cell_threshold 18 \\\n--cl_cell_threshold 18\n```\n\nIt will generate the predictions and save the results at `results\u002Ftabmwp\u002Fchameleon_gpt4_test.json`,  `results\u002Ftabmwp\u002Fchameleon_gpt4_test_cache.jsonl`, and  `results\u002Ftabmwp\u002Fchameleon_gpt4_test_cache.json`.\n\nWe can get the accuracy metrics on average and across different question classes by running:\n\n```sh\npython evaluate.py \\\n--data_file ..\u002Fdata\u002Ftabmwp\u002Fproblems_test.json \\\n--result_root ..\u002Fresults\u002Ftabmwp \\\n--result_files chameleon_chatgpt_test_cache.jsonl\n```\n\nTo run **Chameleon** (ChatGPT):\n\n```sh\npython run.py \\\n--model chameleon \\\n--label chameleon_chatgpt \\\n--test_split test \\\n--policy_engine gpt-3.5-turbo \\\n--rl_engine gpt-3.5-turbo \\\n--cl_engine gpt-3.5-turbo \\\n--tv_engine gpt-3.5-turbo \\\n--kr_engine gpt-3.5-turbo \\\n--sg_engine gpt-3.5-turbo \\\n--pg_engine gpt-3.5-turbo \\\n--test_number -1 \\\n--rl_cell_threshold 18 \\\n--cl_cell_threshold 18\n```\n\nTo run CoT (chain-of-thought prompted) GPT-4:\n\n```sh\npython run.py \\\n--model cot \\\n--label cot_gpt4 \\\n--test_split test \\\n--sg_engine gpt-4 \\\n--test_number -1\n```\n\nTo run CoT (chain-of-thought prompted) ChatGPT:\n\n```sh\npython run.py \\\n--model cot \\\n--label cot_chatgpt \\\n--test_split test \\\n--sg_engine gpt-3.5-turbo \\\n--test_number -1\n```\n\nOur **Chameleon** is a generalized form of the [PoT (program-of-thought)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.12588) method, where the generated program is a sequence of `Program Generator`,  `Program Executor`, and `Answer Generator`. By passing `--model` as `pot`,  `modules` is set as `[\"program_generator\", \"program_executor\", \"answer_generator\"]`.\n\nTo run PoT (program-of-thought prompted) GPT-4:\n\n```sh\npython run.py \\\n--model pot \\\n--label pot_gpt4 \\\n--test_split test \\\n--pg_engine gpt-4 \\\n--test_number -1\n```\n\nTo run PoT (program-of-thought prompted) ChatGPT:\n\n```sh\npython run.py \\\n--model pot \\\n--label pot_chatgpt \\\n--test_split test \\\n--pg_engine gpt-3.5-turbo \\\n--test_number -1\n```\n\n## 😈 More Examples\n\n### More examples on ScienceQA dataset\n\n![showcase_scienceqa_more](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_9a31c3168658.png)\n\n**Chameleon** (GPT-4) is able to adapt to different input queries by generating programs that compose various tools and executing them sequentially to obtain the correct answers. \n\nFor instance, the query above asks, “Which animal’s skin is adapted for survival in cold places?”, which involves scientific terminology related to animal survival. Consequently, the planner decides to rely on the *Bing search* engine for domain-specific knowledge, benefiting from the numerous online resources available.\n\n### More examples on TabMWP\n\n![showcase_tabmwp_long](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_5b6603cf4fcf.png)\n\nThe adaptability and versatility of our **Chameleon** for various queries are also observed on TabMWP, as illustrated in the examples in the figure above. \n\nThe first example involves mathematical reasoning on a tax form. **Chameleon** (1) calls the knowledge retrieval model to recall basic knowledge that assists in understanding such domain-specific tables, (2) describes the table in a more readable natural language format, and (3) finally relies on program-aided tools to perform precise computations. \n\nIn the second example, the system generates Python code that closely aligns with the background knowledge provided by the knowledge retrieval model. \n\nThe third example requires the system to locate the cell in a large tabular context given the input query. **Chameleon** calls the row lookup model to help accurately locate the relevant rows and generate the language solution via an LLM model, instead of relying on program-based tools.\n\n\n\n## :chart_with_upwards_trend: How Good is Chameleon?\n\nSignificant improvements are observed for **Chameleon** over both fine-tuned models and few-shot prompted GPT-4\u002FChatGPT:\n\n![results](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_09a6b5326c34.png)\n\nTo visualize the predictions made by **Chameleon**, simply execute the Jupyter Notebook corresponding to your specific task: `notebooks\u002Fresults_viewer_[TASK].ipynb`. This will provide an interactive and user-friendly way to explore the results generated by the model. Alternatively, explore our [project page](https:\u002F\u002Fchameleon-llm.github.io\u002F) for more information and options.\n\n\n\n## :slot_machine: What Plans Are Chameleon Learning?\n\n### Tool Use\n\nTools called in the generated programs from **Chameleon** (ChatGPT) and **Chameleon** (GPT-4) on ScienceQA:\n\n![tool_call_scienceqa](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_df742d9ffc37.png)\n\nTools called in the generated programs from Chameleon (ChatGPT) and Chameleon (GPT-4) on TabMWP:\n\n![tool_call_tabmwp](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_315ff99a6b24.png)\n\n### Transition Graph\n\nExecute `notebooks\u002Ftransition_[TASK]_[Model]_Engine.ipynb` to visualize the module transition graph for programs generated on the test set.\n\nTransitions between modules in programs generated by **Chameleon** (GPT-4) on ScienceQA. START is the start symbol, END is a terminal symbol and the others are non-terminal symbols.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_c85811c819a4.png\" width=45% height=45%>\n\nTransitions between modules in programs generated by **Chameleon** (GPT-4) on TabMWPQA. START is the start symbol, END is a terminal symbol and the others are non-terminal symbols.\n\n\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_2f62fa2001ee.png\" width=55% height=55%>\n\n\n\n## :smile_cat: Want to Develop A New Task?\n\n- **Construct the module inventory**: Create prompts for LLM-based models within the `demos` directory. Define the input, execution, and output for each module in `model.py`.\n- **Develop the LLM planner**: Provide a comprehensive description of the module inventory and include a few examples that demonstrate how to map queries to the target program.\n- **Implement the data loader and evaluation method**: Define the data loader within `model.py`. To modify the evaluation method, update the corresponding section in `main.py`.\n- **Enjoy the process**: With the groundwork in place, it's time to have fun and dive into the task at hand!\n\n\n\n## :coffee: Stay Connected!\n\nFantastic! I'm always open to engaging discussions, collaborations, or even just sharing a virtual coffee. To get in touch, visit [Pan Lu](https:\u002F\u002Flupantech.github.io\u002F)'s homepage for contact information.\n\n\n\n\n## :white_check_mark: Cite\n\nIf you find **Chameleon** useful for your research and applications, please kindly cite using this BibTeX:\n\n```latex\n@article{lu2023chameleon,\n  title={Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models},\n  author={Lu, Pan and Peng, Baolin and Cheng, Hao and Galley, Michel and Chang, Kai-Wei and Wu, Ying Nian and Zhu, Song-Chun and Gao, Jianfeng},\n  journal={arXiv preprint arXiv:2304.09842},\n  year={2023}\n}\n```\n","# :lizard: 变色龙：基于GPT-4的即插即用组合推理\n\n![科学问题](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTask-Science_Problems-blue) \n![科学问题](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTask-MathQA-blue) \n![科学问题](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTask-TableQA-blue) \n![思维链](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel-Tool_Use-green) \n![GPT-4](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel-GPT--4-green) \n![大语言模型](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel-LLMs-green)\n\n论文“变色龙：基于大型语言模型的即插即用组合推理”（[arXiv链接](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.09842)）的代码。\n\n:bell: 如果您有任何问题或建议，请随时告诉我们。您可以直接通过电子邮件 lupantech@gmail.com 联系 [Pan Lu](https:\u002F\u002Flupantech.github.io\u002F)，在 [Twitter](https:\u002F\u002Ftwitter.com\u002Flupantech\u002Fstatus\u002F1648879085115052033) 上留言，或者在此仓库中提交一个问题。\n\n[[项目主页](https:\u002F\u002Fchameleon-llm.github.io\u002F)] [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.09842)] [[Twitter](https:\u002F\u002Ftwitter.com\u002Flupantech\u002Fstatus\u002F1648879085115052033)] [[LinkedIn](https:\u002F\u002Fwww.linkedin.com\u002Ffeed\u002Fupdate\u002Furn:li:activity:7056703894063644672)] [[YouTube](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=EWFixIk4vjs&ab_channel=WorldofAI)] [[演示文稿](https:\u002F\u002Flupantech.github.io\u002Fdocs\u002FChameleon_LLM_Pan_Lu_Google_Brain_2023.05.05.pdf)]\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_a110afce6c25.png\" width=\"10%\"> \u003Cbr>\n  “变色龙”的暂定标志。\n\u003C\u002Fp>\n\n## 💥 新闻 💥\n\n- **[2023.05.06]** 我们很高兴地看到，我们的变色龙论文在 [AlphaSignal](https:\u002F\u002Falphasignalai.beehiiv.com\u002Fp\u002Fweeks-top-5-ai-papers?utm_source=alphasignalai.beehiiv.com&utm_medium=newsletter&utm_campaign=this-week-s-top-5-ai-papers) 的评选中，从1,682篇人工智能论文中脱颖而出，位列 **#1**。\n- **[2023.05.05]** 我们非常荣幸地宣布，Pan Lu受邀在Google Brain的推理团队进行演讲。演示文稿请点击这里：[[演示文稿](https:\u002F\u002Flupantech.github.io\u002Fdocs\u002FChameleon_LLM_Pan_Lu_Google_Brain_2023.05.05.pdf)]\n- **[2023.04.24]** 我们的成果被 [MarkTechPost](https:\u002F\u002Fwww.marktechpost.com\u002F2023\u002F04\u002F24\u002Fmeet-chameleon-a-plug-and-play-compositional-reasoning-framework-that-harnesses-the-capabilities-of-large-language-models\u002F) 报道。\n- **[2023.04.23]** 我们的研究所取得的进展被 [DAIR.AI](https:\u002F\u002Fwww.linkedin.com\u002Fpulse\u002Ftop-ml-papers-week-dair-ai-8e\u002F?trackingId=w6D1Ow8FxKSTjgdFuwgYnQ%3D%3D) 评为“本周最佳机器学习论文”之一。\n- **[2023.04.22]** 我们很兴奋地宣布，我们的工作已被 [WorldofAI](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=EWFixIk4vjs&ab_channel=WorldofAI) 的 [YouTube 频道](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=EWFixIk4vjs&ab_channel=WorldofAI) 报道！\n- **[2023.04.21]** 我们的工作目前是 https:\u002F\u002Ftrends.vercel.app 上的热门项目。[[链接](https:\u002F\u002Fraw.githubusercontent.com\u002Flupantech\u002Fchameleon-llm\u002Fmain\u002Fassets\u002Ftrend.png)]\n- **[2023.04.20]** 非常感谢 [John Nay](https:\u002F\u002Ftwitter.com\u002Fjohnjnay\u002Fstatus\u002F1649036276627132418) 在 [Twitter](https:\u002F\u002Ftwitter.com\u002Fjohnjnay\u002Fstatus\u002F1649036276627132418) 上分享了我们的工作！\n- **[2023.04.19]** 我们的研究所发表在了 [Papers with Code](https:\u002F\u002Fpaperswithcode.com\u002Fpaper\u002Fchameleon-plug-and-play-compositional) 上。\n- **[2023.04.19]** 我们感谢 [Aran Komatsuzaki](https:\u002F\u002Ftwitter.com\u002Farankomatsuzaki\u002Fstatus\u002F1648848332977221632) 及时在 [Twitter](https:\u002F\u002Ftwitter.com\u002Farankomatsuzaki\u002Fstatus\u002F1648848332977221632) 上介绍了我们的工作！\n- **[2023.04.19]** 特别感谢 [@_akhaliq](https:\u002F\u002Ftwitter.com\u002F_akhaliq\u002Fstatus\u002F1648851856930533378) 迅速在 [Twitter](https:\u002F\u002Ftwitter.com\u002F_akhaliq\u002Fstatus\u002F1648851856930533378) 上分享了我们的工作！\n- **[2023.04.19]** 欢迎访问我们的项目主页：[Chameleon-LLM](https:\u002F\u002Fchameleon-llm.github.io\u002F)。\n- **[2023.04.19]** 我们的论文现已可在 https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.09842 获取。\n\n\n\n## :lizard: 关于变色龙\n\n**变色龙**是一个即插即用的组合推理框架，能够将各类工具与大语言模型相结合。**变色龙**会合成程序来组合不同的工具，包括其他大语言模型、现成的视觉模型、网络搜索引擎、Python函数以及根据用户需求定制的规则模块。以大语言模型作为自然语言规划器的基础，**变色龙**能够推断出合适的工具组合顺序，并按此顺序执行，最终生成响应。\n\n![showcase_scienceqa](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_ad31ca3c3b87.png)\n\n我们在两个任务上展示了**变色龙**的适应性和有效性：[ScienceQA](https:\u002F\u002Fscienceqa.github.io\u002F) 和 [TabMWP](https:\u002F\u002Fpromptpg.github.io\u002F)。值得注意的是，使用GPT-4作为基础模型时，**变色龙**在ScienceQA上的准确率达到86.54%，比已发表的最佳少样本模型高出11.37%；而在TabMWP上，**变色龙**的整体准确率达到了98.78%，相比当前最先进的模型提升了17.0%。进一步的研究表明，与其他大语言模型（如ChatGPT）相比，使用GPT-4作为规划器时，工具选择更加一致且合理，并且能够根据指令推断出潜在的约束条件。\n\n更多详细信息，请访问我们的项目主页 [here](https:\u002F\u002Fchameleon-llm.github.io\u002F) 和论文 [here](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.09842.pdf)。\n\n## :tv: YouTube 视频\n\n我们衷心感谢 [WorldofAI](https:\u002F\u002Fwww.youtube.com\u002F@intheworldofai) 在 [YouTube](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=EWFixIk4vjs&ab_channel=WorldofAI) 上对我们的工作进行了报道和介绍！\n\n[![YouTube 视频](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_f7407a6da6ef.jpg)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=EWFixIk4vjs)\n\n\n\n## :star: 星标历史\n\n[![星标历史图表](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_8512ef8718d4.png)](https:\u002F\u002Fstar-history.com\u002F#lupantech\u002Fchameleon-llm&Date)\n\n\n## 🐙 系统要求\n\n- [OpenAI API 密钥](https:\u002F\u002Fplatform.openai.com\u002Faccount\u002Fapi-keys)\n- [必应搜索 API](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fbing\u002Fapis\u002Fbing-web-search-api.) （如果您想启用必应搜索模块，但该模块是可选的）\n\n安装所有必需的 Python 依赖项（由 `pipreqs` 生成）：\n\n```\npython==3.8.10\nhuggingface-hub\nnumpy==1.23.2\nopenai==0.23.0\npandas==1.4.3\ntransformers==4.21.1\nrequests==2.28.1\n```\n\n安装所有必需的 Python 依赖项（如果您之前已经设置好这些依赖项，且版本要求不严格，则可以跳过此步骤）：\n\n```\npip install -r requirements.txt\n```\n\n\n\n## ⚠️ 配置 ⚠️\n\n### OpenAI API 密钥\n\n请从 https:\u002F\u002Fplatform.openai.com\u002Faccount\u002Fapi-keys 获取您的 OpenAI API 密钥。\n\n要使用 OpenAI API 密钥运行 **变色龙**，您 **必须** 设置好账单信息（即开通付费账户）。\n\n您可以在 https:\u002F\u002Fplatform.openai.com\u002Faccount\u002Fbilling\u002Foverview 设置付费账户。\n\n### Bing 搜索 API 密钥（可选）\n\n从以下网址获取您的 Bing 搜索 API 密钥：https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fbing\u002Fapis\u002Fbing-web-search-api。\n\nBing 搜索 API 密钥是**可选**的。如果未设置此密钥，将在 ScienceQA 任务上导致性能略有下降。\n\n\n\n## :hammer_and_wrench: 模块清单\n\n### Chameleon 中的不同工具\n\n我们模块库中的不同类型工具：\n\n![tools](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_64ee2e08c6e7.png)\n\n### 工具子集\n\n分别用于 ScienceQA 和 TabMWP 的工具。两个任务中可重用的工具以绿色突出显示：\n\n![tools_task](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_dec9b609aa63.png)\n\n\n\n## 🤖 在 ScienceQA 上运行 Chameleon\n\n科学问题回答（[ScienceQA](https:\u002F\u002Fscienceqa.github.io\u002F)）是一个多模态问答基准，涵盖了广泛科学主题和多样化情境。ScienceQA 数据集位于 [`data\u002Fscienceqa`](https:\u002F\u002Fgithub.com\u002Flupantech\u002Fchameleon-llm\u002Ftree\u002Fmain\u002Fdata\u002Fscienceqa)。有关更多详细信息，您可以探索该数据集，并查看 [Explore](https:\u002F\u002Fscienceqa.github.io\u002Fexplore.html) 页面和 [Visualize](https:\u002F\u002Fscienceqa.github.io\u002Fvisualize.html) 页面。\n\n对于当前版本，`Image Captioner` 和 `Text Detector` 的结果为现成可用，分别存储在 `data\u002Fscienceqa\u002Fcaptions.json` 和 `data\u002Fscienceqa\u002Focrs.json` 中。实时调用这两个模块的功能即将推出！\n\n要运行 **Chameleon**（GPT-4）：\n\n```sh\ncd run_scienceqa\n\npython run.py \\\n--model chameleon \\\n--label chameleon_gpt4 \\\n--policy_engine gpt-4 \\\n--kr_engine gpt-4 \\\n--qg_engine gpt-4 \\\n--sg_engine gpt-4 \\\n--test_split test \\\n--test_number -1\n```\n\n它将生成预测结果，并将结果保存在 `results\u002Fscienceqa\u002Fchameleon_gpt4_test.json`、`results\u002Fscienceqa\u002Fchameleon_gpt4_test_cache.jsonl` 和 `results\u002Fscienceqa\u002Fchameleon_gpt4_test_cache.json` 中。\n\n我们可以通过运行以下命令来获得平均准确率以及不同问题类别的准确率指标：\n\n```sh\npython evaluate.py \\\n--data_file ..\u002Fdata\u002Fscienceqa\u002Fproblems.json \\\n--result_root ..\u002Fresults\u002Fscienceqa \\\n--result_files chameleon_chatgpt_test_cache.jsonl\n```\n\n要运行 **Chameleon**（ChatGPT）：\n\n```sh\npython run.py \\\n--model chameleon \\\n--label chameleon_gpt4 \\\n--policy_engine gpt-3.5-turbo \\\n--kr_engine gpt-3.5-turbo \\\n--qg_engine gpt-3.5-turbo \\\n--sg_engine gpt-3.5-turbo \\\n--test_split test \\\n--test_number -1\n```\n\n我们的 **Chameleon** 是 [CoT（思维链）](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.11903) 方法的一种泛化形式，其中生成的程序是由一系列 `Solution Generator` 和 `Answer Generator` 组成的。通过将 `--model` 设置为 `cot`，`modules` 将被设置为 `[\"solution_generator\", \"answer_generator\"]`。\n\n要运行 CoT（思维链提示）GPT-4：\n\n```sh\npython run.py \\\n--model cot \\\n--label cot_gpt4 \\\n--sg_engine gpt-4 \\\n--test_split test \\\n--test_number -1\n```\n\n要运行 CoT（思维链提示）ChatGPT：\n\n```sh\npython run.py \\\n--model cot \\\n--label cot_chatgpt \\\n--sg_engine gpt-4 \\\n--test_split test \\\n--test_number -1\n```\n\n\n\n## 🤖 在 TabMWP 上运行 Chameleon\n\nTabMWP 数据集包含 38,431 道表格型数学应用题。TabMWP 中的每道题目都与一个表格上下文相关联，该上下文以图像、半结构化文本和结构化表格的形式呈现。TabMWP 数据集位于 [`data\u002Ftabmwp`](https:\u002F\u002Fgithub.com\u002Flupantech\u002FPromptPG\u002Fblob\u002Fmain\u002Fdata\u002Ftabmwp)。有关更多详细信息，您可以探索该数据集，并查看 [Explore](https:\u002F\u002Fpromptpg.github.io\u002Fexplore.html) 页面和 [Visualize](https:\u002F\u002Fpromptpg.github.io\u002Fvisualize.html) 页面。\n\n要运行 **Chameleon**（GPT-4）：\n\n```sh\ncd run_tabmwp\n\npython run.py \\\n--model chameleon \\\n--label chameleon_gpt4 \\\n--test_split test \\\n--policy_engine gpt-4 \\\n--rl_engine gpt-4 \\\n--cl_engine gpt-4 \\\n--tv_engine gpt-4 \\\n--kr_engine gpt-4 \\\n--sg_engine gpt-4 \\\n--pg_engine gpt-4 \\\n--test_number -1 \\\n--rl_cell_threshold 18 \\\n--cl_cell_threshold 18\n```\n\n它将生成预测结果，并将结果保存在 `results\u002Ftabmwp\u002Fchameleon_gpt4_test.json`、`results\u002Ftabmwp\u002Fchameleon_gpt4_test_cache.jsonl` 和 `results\u002Ftabmwp\u002Fchameleon_gpt4_test_cache.json` 中。\n\n我们可以通过运行以下命令来获得平均准确率以及不同问题类别的准确率指标：\n\n```sh\npython evaluate.py \\\n--data_file ..\u002Fdata\u002Ftabmwp\u002Fproblems_test.json \\\n--result_root ..\u002Fresults\u002Ftabmwp \\\n--result_files chameleon_chatgpt_test_cache.jsonl\n```\n\n要运行 **Chameleon**（ChatGPT）：\n\n```sh\npython run.py \\\n--model chameleon \\\n--label chameleon_chatgpt \\\n--test_split test \\\n--policy_engine gpt-3.5-turbo \\\n--rl_engine gpt-3.5-turbo \\\n--cl_engine gpt-3.5-turbo \\\n--tv_engine gpt-3.5-turbo \\\n--kr_engine gpt-3.5-turbo \\\n--sg_engine gpt-3.5-turbo \\\n--pg_engine gpt-3.5-turbo \\\n--test_number -1 \\\n--rl_cell_threshold 18 \\\n--cl_cell_threshold 18\n```\n\n要运行 CoT（思维链提示）GPT-4：\n\n```sh\npython run.py \\\n--model cot \\\n--label cot_gpt4 \\\n--test_split test \\\n--sg_engine gpt-4 \\\n--test_number -1\n```\n\n要运行 CoT（思维链提示）ChatGPT：\n\n```sh\npython run.py \\\n--model cot \\\n--label cot_chatgpt \\\n--test_split test \\\n--sg_engine gpt-3.5-turbo \\\n--test_number -1\n```\n\n我们的 **Chameleon** 是 [PoT（程序链）](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.12588) 方法的一种泛化形式，其中生成的程序是由 `Program Generator`、`Program Executor` 和 `Answer Generator` 组成的序列。通过将 `--model` 设置为 `pot`，`modules` 将被设置为 `[\"program_generator\", \"program_executor\", \"answer_generator\"]`。\n\n要运行 PoT（程序链提示）GPT-4：\n\n```sh\npython run.py \\\n--model pot \\\n--label pot_gpt4 \\\n--test_split test \\\n--pg_engine gpt-4 \\\n--test_number -1\n```\n\n要运行 PoT（程序链提示）ChatGPT：\n\n```sh\npython run.py \\\n--model pot \\\n--label pot_chatgpt \\\n--test_split test \\\n--pg_engine gpt-3.5-turbo \\\n--test_number -1\n```\n\n## 😈 更多示例\n\n### ScienceQA 数据集上的更多示例\n\n![showcase_scienceqa_more](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_9a31c3168658.png)\n\n**Chameleon**（GPT-4）能够通过生成由各种工具组成的程序，并按顺序执行这些程序来适应不同的输入查询，从而获得正确答案。\n\n例如，上述查询问道：“哪种动物的皮肤适合在寒冷环境中生存？”这个问题涉及与动物生存相关的科学术语。因此，规划器决定依赖 *Bing 搜索* 引擎来获取领域专业知识，从而受益于丰富的在线资源。\n\n### TabMWP 上的更多示例\n\n![showcase_tabmwp_long](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_5b6603cf4fcf.png)\n\n我们的 **Chameleon** 对各类查询的适应性和多功能性在 TabMWP 上同样得到了体现，如上图所示的例子所示。\n\n第一个例子涉及对税务表格的数学推理。**Chameleon** (1) 调用知识检索模型来回忆有助于理解此类领域特定表格的基础知识，(2) 以更易读的自然语言格式描述该表格，(3) 最后依靠程序辅助工具进行精确计算。\n\n第二个例子中，系统生成的 Python 代码与知识检索模型提供的背景知识高度一致。\n\n第三个例子要求系统根据输入查询，在一个大型表格上下文中定位单元格。**Chameleon** 调用行查找模型来帮助准确找到相关行，并通过 LLM 模型生成语言解决方案，而不是依赖基于程序的工具。\n\n\n\n## :chart_with_upwards_trend: Chameleon 到底有多好？\n\n与经过微调的模型以及少样本提示的 GPT-4\u002FChatGPT 相比，**Chameleon** 表现出了显著的提升：\n\n![results](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_09a6b5326c34.png)\n\n要可视化 **Chameleon** 的预测结果，只需运行与您具体任务相对应的 Jupyter Notebook：`notebooks\u002Fresults_viewer_[TASK].ipynb`。这将提供一种交互式且用户友好的方式来探索模型生成的结果。或者，您也可以访问我们的 [项目页面](https:\u002F\u002Fchameleon-llm.github.io\u002F) 获取更多信息和选项。\n\n\n\n## :slot_machine: Chameleon 正在学习哪些内容？\n\n### 工具使用\n\n在 ScienceQA 数据集上，由 **Chameleon** (ChatGPT) 和 **Chameleon** (GPT-4) 生成的程序中调用的工具：\n\n![tool_call_scienceqa](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_df742d9ffc37.png)\n\n在 TabMWP 数据集上，由 Chameleon (ChatGPT) 和 Chameleon (GPT-4) 生成的程序中调用的工具：\n\n![tool_call_tabmwp](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_315ff99a6b24.png)\n\n### 转移图\n\n运行 `notebooks\u002Ftransition_[TASK]_[Model]_Engine.ipynb`，即可可视化测试集上生成的程序的模块转移图。\n\n这是 **Chameleon** (GPT-4) 在 ScienceQA 数据集上生成的程序中模块之间的转移情况。START 是起始符号，END 是终结符号，其余均为非终结符号。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_c85811c819a4.png\" width=45% height=45%>\n\n这是 **Chameleon** (GPT-4) 在 TabMWPQA 数据集上生成的程序中模块之间的转移情况。START 是起始符号，END 是终结符号，其余均为非终结符号。\n\n\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_readme_2f62fa2001ee.png\" width=55% height=55%>\n\n\n\n## :smile_cat: 想开发一项新任务吗？\n\n- **构建模块库**：在 `demos` 目录下为基于 LLM 的模型创建提示。在 `model.py` 中定义每个模块的输入、执行和输出。\n- **开发 LLM 规划器**：全面描述模块库，并提供几个示例，展示如何将查询映射到目标程序。\n- **实现数据加载器和评估方法**：在 `model.py` 中定义数据加载器。若需修改评估方法，则更新 `main.py` 中相应部分。\n- **享受过程**：基础工作就绪后，就可以尽情享受并投入到手头的任务中了！\n\n\n\n## :coffee: 保持联系！\n\n太棒了！我随时欢迎交流讨论、合作，甚至只是线上喝杯咖啡。如需联系，请访问 [Pan Lu](https:\u002F\u002Flupantech.github.io\u002F) 的主页获取联系方式。\n\n\n\n\n## :white_check_mark: 引用\n\n如果您发现 **Chameleon** 对您的研究和应用有所帮助，请使用以下 BibTeX 格式引用：\n\n```latex\n@article{lu2023chameleon,\n  title={Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models},\n  author={Lu, Pan and Peng, Baolin and Cheng, Hao and Galley, Michel and Chang, Kai-Wei and Wu, Ying Nian and Zhu, Song-Chun and Gao, Jianfeng},\n  journal={arXiv preprint arXiv:2304.09842},\n  year={2023}\n}\n```","# Chameleon-LLM 快速上手指南\n\nChameleon 是一个即插即用的组合推理框架，能够增强大语言模型（LLM）调用各种工具（如视觉模型、搜索引擎、Python 函数等）的能力。它利用 LLM 作为自然语言规划器，自动推断并执行工具序列以解决复杂问题（如科学问答 ScienceQA 和表格数学题 TabMWP）。\n\n## 环境准备\n\n### 系统要求\n- **Python 版本**: 推荐 `3.8.10` (兼容范围通常为 3.8+)\n- **操作系统**: Linux \u002F macOS \u002F Windows\n\n### 前置依赖与密钥\n在运行前，请确保已准备好以下 API 密钥：\n\n1.  **OpenAI API Key** (必需)\n    -   需拥有付费账户（Billing 已设置）。\n    -   获取地址：https:\u002F\u002Fplatform.openai.com\u002Faccount\u002Fapi-keys\n2.  **Bing Search API Key** (可选)\n    -   用于启用网络搜索模块。若未配置，ScienceQA 任务性能可能略有下降。\n    -   获取地址：https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fbing\u002Fapis\u002Fbing-web-search-api\n\n## 安装步骤\n\n1.  **克隆仓库**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Flupantech\u002Fchameleon-llm.git\n    cd chameleon-llm\n    ```\n\n2.  **安装 Python 依赖**\n    项目依赖包括 `openai`, `transformers`, `pandas` 等。\n    ```bash\n    pip install -r requirements.txt\n    ```\n    > **提示**：国内用户若下载缓慢，可使用清华源加速：\n    > `pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n\n3.  **配置 API Key**\n    将获取到的 OpenAI API Key 设置为环境变量（以 Linux\u002FmacOS 为例）：\n    ```bash\n    export OPENAI_API_KEY=\"sk-...\"\n    ```\n    *(Windows PowerShell 用户使用 `$env:OPENAI_API_KEY=\"sk-...\"`)*\n\n## 基本使用\n\n以下以 **ScienceQA** 任务为例，展示如何运行 Chameleon。\n\n### 1. 运行 Chameleon (基于 GPT-4)\n进入任务目录并执行推理脚本。该命令将使用 GPT-4 作为策略引擎、知识检索引擎等所有模块。\n\n```bash\ncd run_scienceqa\n\npython run.py \\\n--model chameleon \\\n--label chameleon_gpt4 \\\n--policy_engine gpt-4 \\\n--kr_engine gpt-4 \\\n--qg_engine gpt-4 \\\n--sg_engine gpt-4 \\\n--test_split test \\\n--test_number -1\n```\n*结果将保存至 `results\u002Fscienceqa\u002F` 目录下。*\n\n### 2. 运行基线对比 (Chain-of-Thought)\nChameleon 也支持运行标准的思维链（CoT）模式作为对比：\n\n```bash\npython run.py \\\n--model cot \\\n--label cot_gpt4 \\\n--sg_engine gpt-4 \\\n--test_split test \\\n--test_number -1\n```\n\n### 3. 评估结果\n运行评估脚本以获取准确率指标：\n\n```bash\npython evaluate.py \\\n--data_file ..\u002Fdata\u002Fscienceqa\u002Fproblems.json \\\n--result_root ..\u002Fresults\u002Fscienceqa \\\n--result_files chameleon_gpt4_test_cache.jsonl\n```\n\n> **注意**：若需运行 **TabMWP** 任务，请参考 `run_tabmwp` 目录下的类似命令，并相应调整 `--rl_engine`, `--cl_engine` 等参数。","某科研团队的数据分析师需要处理包含复杂科学图表和数学公式的混合文档，并从中提取数据以回答多步推理问题。\n\n### 没有 chameleon-llm 时\n- **工具切换繁琐**：分析师需手动判断何时调用 OCR 识别图片、何时使用代码解释器计算数值，并在不同工具间反复复制粘贴结果。\n- **推理链条断裂**：面对“先读取图表趋势，再结合文本公式推导结论”这类复合任务，单一模型常因缺乏规划能力而遗漏关键步骤。\n- **错误难以追溯**：当最终答案出错时，很难定位是视觉识别偏差还是逻辑计算失误，调试过程如同黑盒摸索。\n- **开发成本高昂**：若要自动化此流程，工程师需编写大量硬编码规则来串联各个独立模块，维护极其困难。\n\n### 使用 chameleon-llm 后\n- **智能自动编排**：chameleon-llm 能自主感知任务需求，动态选择并串联视觉模型、计算器或搜索工具，实现真正的“即插即用”。\n- **连贯组合推理**：系统自动生成完整的思维链，无缝衔接“看图获取数据”与“代入公式计算”等环节，显著提升复杂科学问题的准确率。\n- **过程透明可控**：每一步调用的工具及中间结果均清晰可见，分析师可快速定位并修正特定环节的逻辑偏差。\n- **零代码快速部署**：无需编写复杂的胶水代码，仅需配置可用工具列表，chameleon-llm 即可让大模型具备处理多模态复合任务的能力。\n\nchameleon-llm 的核心价值在于赋予大模型自主规划与组合外部工具的能力，将碎片化的单点技能转化为解决复杂现实问题的系统化智慧。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flupantech_chameleon-llm_2f9c72c7.png","lupantech","Pan Lu","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Flupantech_93570425.jpg","Postdoc at Stanford; CS PhD at UCLA","Stanford University","Palo Alto","lupantech@gmail.com","https:\u002F\u002Flupantech.github.io","https:\u002F\u002Fgithub.com\u002Flupantech",[82,86,90],{"name":83,"color":84,"percentage":85},"Jupyter Notebook","#DA5B0B",75.3,{"name":87,"color":88,"percentage":89},"Python","#3572A5",24.6,{"name":91,"color":92,"percentage":93},"Shell","#89e051",0.1,1138,84,"2026-04-09T13:39:23","Apache-2.0","","未说明",{"notes":101,"python":102,"dependencies":103},"该工具主要基于 API 调用（GPT-4\u002FChatGPT），无需本地部署大模型，因此对本地 GPU 无明确要求。必须配置 OpenAI API Key 并开通付费账户；若需使用网络搜索功能，可选配置 Bing Search API Key。部分视觉任务（如图像描述、文字检测）在当前版本中使用预存结果，实时调用功能即将推出。","3.8.10",[104,105,106,107,108,109],"huggingface-hub","numpy==1.23.2","openai==0.23.0","pandas==1.4.3","transformers==4.21.1","requests==2.28.1",[13,35,15,14],[112,113,114,115,116,117,118],"python","ai","chatgpt","gpt-4","llm","openai","tool",null,"2026-03-27T02:49:30.150509","2026-04-11T23:24:19.657726",[123,128,133,137],{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},29984,"运行代码时遇到 'ModuleNotFoundError: No module named func_timeout' 错误怎么办？","这是因为 requirements.txt 文件中遗漏了依赖包。维护者已修复该问题，请确保您的环境中安装了 `func_timeout` 包。您可以重新拉取最新的代码并运行 `pip install -r requirements.txt` 来安装缺失的依赖。","https:\u002F\u002Fgithub.com\u002Flupantech\u002Fchameleon-llm\u002Fissues\u002F2",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},29985,"代码中的 update_modules 函数是否存在 eval() 执行列表的错误？","经维护者检查，当前代码中并不存在用户描述的 `eval()` 直接执行列表的错误。实际的 `update_modules` 函数逻辑如下：默认模块定义为列表 `default_modules = [\"solution_generator\", \"answer_generator\"]`，然后尝试对输入字符串 `_modules` 进行 `eval` 处理。如果输入格式不正确或断言失败，则回退到默认模块列表。如果您遇到具体问题，请提供复现代码片段以便进一步排查。\n\n参考代码：\n```python\ndef update_modules(self, _modules):\n    # default modules\n    default_modules = [\"solution_generator\", \"answer_generator\"]\n    \n    try:\n        modules = eval(_modules.lower().strip())\n        assert modules[-2:] == default_modules\n    except:\n        modules = default_modules\n\n    return modules\n```","https:\u002F\u002Fgithub.com\u002Flupantech\u002Fchameleon-llm\u002Fissues\u002F3",{"id":134,"question_zh":135,"answer_zh":136,"source_url":132},29986,"text_detector 和 image_captioner 模块是实时调用还是预先准备好的？","目前，大多数模块是通过实时调用（live calling）实现的。但是，`text_detector` 和 `image_captioner` 模块使用的是现成方案（off-the-shelf），即它们的响应是预先准备好的。这样做的目的是为了简化开发流程并加快推理速度。维护者表示，将这两个模块改为实时调用的实现已在计划中，将会尽快添加。",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},29987,"在 TabMWP 数据集上使用 GPT-3.5 Turbo 运行时，program_generator 生成的程序不合理且准确率低怎么办？","这是一个已知的使用场景问题。当使用 GPT-3.5 Turbo 作为模型引擎时，生成的程序逻辑可能与预期不符（例如变量定义错误或计算逻辑偏差），导致准确率极低。虽然官方示例使用的是 GPT-4，但在使用 GPT-3.5 Turbo 时，建议仔细检查 prompt 工程或调整相关阈值参数（如 `rl_cell_threshold` 和 `cl_cell_threshold`）。如果问题持续，可能需要针对特定数据集微调提示词或升级模型引擎至 GPT-4 以获得最佳效果。","https:\u002F\u002Fgithub.com\u002Flupantech\u002Fchameleon-llm\u002Fissues\u002F11",[143],{"id":144,"version":145,"summary_zh":146,"released_at":147},206605,"chameleon-v1.0","变色龙 v1.0","2023-04-20T00:33:25"]