[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-microsoft--JARVIS":3,"tool-microsoft--JARVIS":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":81,"owner_website":82,"owner_url":83,"languages":84,"stars":117,"forks":118,"last_commit_at":119,"license":120,"difficulty_score":121,"env_os":122,"env_gpu":123,"env_ram":124,"env_deps":125,"category_tags":129,"github_topics":130,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":134,"updated_at":135,"faqs":136,"releases":165},1667,"microsoft\u002FJARVIS","JARVIS","JARVIS, a system to connect LLMs with ML community. Paper: https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.17580.pdf","JARVIS是一个连接大型语言模型（LLM）与Hugging Face社区模型的协作系统。它让LLM担任“智能调度员”，自动拆解用户需求、匹配Hugging Face上的专家模型、执行任务并整合结果，轻松解决复杂AI问题。例如，当需要生成带文字的图片时，JARVIS会协调文本生成和图像生成模型协同工作，无需手动配置。开发者可快速构建多模型应用，研究人员能高效验证方案，普通用户也能通过Hugging Face Spaces的在线Demo直接体验。其核心创新在于四阶段工作流（任务规划、模型选择、执行、响应生成），有效整合数百个开源模型，大幅降低技术门槛，让AI协作更简单高效。","# JARVIS\n\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-Paper-\u003CCOLOR>.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17580)\n[![Open in Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97-Open%20in%20Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fmicrosoft\u002FHuggingGPT)\n\nThe mission of JARVIS is to explore artificial general intelligence (AGI) and deliver cutting-edge research to the whole community.\n\n## What's New\n\n+  [2024.01.15] We release Easytool for easier tool usage.\n   + The code and datasets are available at [EasyTool](\u002Feasytool).\n   + The paper is available at [EasyTool: Enhancing LLM-based Agents with Concise Tool Instruction](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.06201).\n+  [2023.11.30] We release TaskBench for evaluating task automation capability of LLMs.\n   + The code and datasets are available at [TaskBench](\u002Ftaskbench).\n   + The paper is available at [TaskBench: Benchmarking Large Language Models for Task Automation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.18760).\n+  [2023.07.28] We are now in the process of planning evaluation and project rebuilding. We will release a new version of Jarvis in the near future.\n+  [2023.07.24] We released a light langchain version of Jarvis. See \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Flangchain-ai\u002Flangchain\u002Ftree\u002Fmaster\u002Flibs\u002Fexperimental\u002Flangchain_experimental\u002Fautonomous_agents\u002Fhugginggpt\">here\u003C\u002Fa>.\n+  [2023.04.16] Jarvis now supports the OpenAI service on the Azure platform and the GPT-4 model.\n+  [2023.04.06] We added the Gradio demo and built the web API for `\u002Ftasks` and `\u002Fresults` in `server` mode.\n   +  The Gradio demo is now hosted on Hugging Face Space. (Build with `inference_mode=hybrid` and `local_deployment=standard`)\n   +  The Web API `\u002Ftasks` and `\u002Fresults` access intermediate results for `Stage #1`: task planning and `Stage #1-3`: model selection with execution results. See \u003Ca href=\"#Server\">here\u003C\u002Fa>.\n+  [2023.04.03] We added the CLI mode and provided parameters for configuring the scale of local endpoints.\n   +  You can enjoy a lightweight experience with Jarvis without deploying the models locally. See \u003Ca href=\"#Configuration\">here\u003C\u002Fa>.\n   +  Just run `python awesome_chat.py --config configs\u002Fconfig.lite.yaml` to experience it.\n+  [2023.04.01] We updated a version of code for building.\n\n\n### Overview\n\nLanguage serves as an interface for LLMs to connect numerous AI models for solving complicated AI tasks!\n\n\u003Cp align=\"center\">\n\u003Cimg width=\"100%\" alt=\"image\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_JARVIS_readme_e864256065fe.png\">    \n\u003C\u002Fp>\n\n\nSee our paper: [HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace](http:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17580), Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu and Yueting Zhuang (the first two authors contribute equally)\n\nWe introduce a collaborative system that consists of **an LLM as the controller** and **numerous expert models as collaborative executors** (from HuggingFace Hub). The workflow of our system consists of four stages:\n+ **Task Planning**: Using ChatGPT to analyze the requests of users to understand their intention, and disassemble them into possible solvable tasks.\n+ **Model Selection**: To solve the planned tasks, ChatGPT selects expert models hosted on Hugging Face based on their descriptions.\n+ **Task Execution**: Invokes and executes each selected model, and return the results to ChatGPT.\n+ **Response Generation**: Finally, using ChatGPT to integrate the prediction of all models, and generate responses.\n\n\u003Cp align=\"center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_JARVIS_readme_281241ff11f0.jpg\">\u003C\u002Fp>\n\n### System Requirements\n\n#### Default (Recommended)\n\nFor `configs\u002Fconfig.default.yaml`:\n\n+ Ubuntu 16.04 LTS\n+ VRAM >= 24GB\n+ RAM > 12GB (minimal), 16GB (standard), 80GB (full)\n+ Disk > 284GB \n  + 42GB for `damo-vilab\u002Ftext-to-video-ms-1.7b`\n  + 126GB for `ControlNet`\n  + 66GB for `stable-diffusion-v1-5`\n  + 50GB for others\n  \n#### Minimum (Lite)\n\nFor `configs\u002Fconfig.lite.yaml`:\n\n+ Ubuntu 16.04 LTS\n+ Nothing else\n\nThe configuration `configs\u002Fconfig.lite.yaml` does not require any expert models to be downloaded and deployed locally. However, it means that Jarvis is restricted to models running stably on HuggingFace Inference Endpoints.\n\n### Quick Start\n\nFirst replace `openai.key` and `huggingface.token` in `server\u002Fconfigs\u002Fconfig.default.yaml` with **your personal OpenAI Key** and **your Hugging Face Token**, or put them in the environment variables `OPENAI_API_KEY` and `HUGGINGFACE_ACCESS_TOKEN` respectively. Then run the following commands:\n\n\u003Cspan id=\"Server\">\u003C\u002Fspan>\n\n#### For Server:\n\n```bash\n# setup env\ncd server\nconda create -n jarvis python=3.8\nconda activate jarvis\nconda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia\npip install -r requirements.txt\n\n# download models. Make sure that `git-lfs` is installed.\ncd models\nbash download.sh # required when `inference_mode` is `local` or `hybrid`. \n\n# run server\ncd ..\npython models_server.py --config configs\u002Fconfig.default.yaml # required when `inference_mode` is `local` or `hybrid`\npython awesome_chat.py --config configs\u002Fconfig.default.yaml --mode server # for text-davinci-003\n```\n\nNow you can access Jarvis' services by the Web API. \n\n+ `\u002Fhugginggpt` --method `POST`, access the full service.\n+ `\u002Ftasks` --method `POST`, access intermediate results for Stage #1.\n+ `\u002Fresults` --method `POST`, access intermediate results for Stage #1-3.\n\nFor example:\n\n```bash\n# request\ncurl --location 'http:\u002F\u002Flocalhost:8004\u002Ftasks' \\\n--header 'Content-Type: application\u002Fjson' \\\n--data '{\n    \"messages\": [\n        {\n            \"role\": \"user\",\n            \"content\": \"based on pose of \u002Fexamples\u002Fd.jpg and content of \u002Fexamples\u002Fe.jpg, please show me a new image\"\n        }\n    ]\n}'\n\n# response\n[{\"args\":{\"image\":\"\u002Fexamples\u002Fd.jpg\"},\"dep\":[-1],\"id\":0,\"task\":\"openpose-control\"},{\"args\":{\"image\":\"\u002Fexamples\u002Fe.jpg\"},\"dep\":[-1],\"id\":1,\"task\":\"image-to-text\"},{\"args\":{\"image\":\"\u003CGENERATED>-0\",\"text\":\"\u003CGENERATED>-1\"},\"dep\":[1,0],\"id\":2,\"task\":\"openpose-text-to-image\"}]\n```\n\n\n#### For Web:\n\nWe provide a user-friendly web page. After starting `awesome_chat.py` in a server mode, you can run the commands to communicate with Jarvis in your browser:\n \n- you need to install `nodejs` and `npm` first.\n- [ IMPORTANT ] if you are running the web client on another machine, you need set `http:\u002F\u002F{LAN_IP_of_the_server}:{port}\u002F` to `HUGGINGGPT_BASE_URL` of `web\u002Fsrc\u002Fconfig\u002Findex.ts`.\n- if you want to use the video generation feature, you need to compile `ffmpeg` manually with H.264.\n- you can switch to ChatGPT by `double click` on the setting icon!\n\n```bash\ncd web\nnpm install\nnpm run dev\n```\n\n```bash\n# Optional: Install ffmpeg\n# This command need be executed without errors.\nLD_LIBRARY_PATH=\u002Fusr\u002Flocal\u002Flib \u002Fusr\u002Flocal\u002Fbin\u002Fffmpeg -i input.mp4 -vcodec libx264 output.mp4\n```\n\n\u003Cspan id=\"Gradio\">\u003C\u002Fspan>\n\n#### For Gradio\n\nThe Gradio demo is now hosted on Hugging Face Space. You can also run the following commands to start the demo locally:\n\n```bash\npython models_server.py --config configs\u002Fconfig.gradio.yaml\npython run_gradio_demo.py --config configs\u002Fconfig.gradio.yaml\n\n# or run the HF Space as a Docker image (Build with `inference_mode=hibrid` and `local_deployment=standard`)\ndocker run -it -p 7860:7860 --platform=linux\u002Famd64 registry.hf.space\u002Fmicrosoft-hugginggpt:latest python app.py\n```\n\n#### For CLI:\n\nYou can also run Jarvis more easily in CLI mode:\n\n```bash\ncd server\npython awesome_chat.py --config configs\u002Fconfig.default.yaml --mode cli\n```\n\nExamples of CLI mode:\n\n```\nWelcome to Jarvis! A collaborative system that consists of an LLM as the controller and numerous expert models as collaborative executors. Jarvis can plan tasks, schedule Hugging Face models, generate friendly responses based on your requests, and help you with many things. Please enter your request (`exit` to exit).\n[ User ]: Given a collection of image A: \u002Fexamples\u002Fa.jpg, B: \u002Fexamples\u002Fb.jpg, C: \u002Fexamples\u002Fc.jpg, please tell me how many zebras in these picture?\n[ Jarvis ]: Based on your request, I have used several models to process the images you provided. For the image A: \u002Fexamples\u002Fa.jpg, I used the image-to-text model nlpconnect\u002Fvit-gpt2-image-captioning to generate the text \"a cat sitting on a window sill looking out\". Then I used the object-detection model facebook\u002Fdetr-resnet-50 to generate the image with predicted box '\u002Fimages\u002Fa3dc.jpg', which contains two objects with labels 'potted plant' and 'cat'. For the image B: \u002Fexamples\u002Fb.jpg, I used the image-to-text model nlpconnect\u002Fvit-gpt2-image-captioning to generate the text \"two zebras standing in a field of dry grass\". Then I used the object-detection model facebook\u002Fdetr-resnet-50 to generate the image with predicted box '\u002Fimages\u002Ff5df.jpg', which contains three objects with labels 'zebra'. For the image C: \u002Fexamples\u002Fc.jpg, I used the image-to-text model nlpconnect\u002Fvit-gpt2-image-captioning to generate the text \"a zebra standing in a field of tall grass\". Then I used the object-detection model facebook\u002Fdetr-resnet-50 to generate the image with predicted box '\u002Fimages\u002F69c7.jpg', which contains one object with label 'zebra'. Finally, I used the canny-control model canny-control to generate the image '\u002Fimages\u002F7e18.png' based on the image \u002Fexamples\u002Ff.jpg. Therefore, there are four zebras in these pictures. Is there anything else I can help you with?\n[ User ]: Please answer all the named entities in the sentence: Iron Man is a superhero appearing in American comic books published by Marvel Comics. The character was co-created by writer and editor Stan Lee, developed by scripter Larry Lieber, and designed by artists Don Heck and Jack Kirby.\n[ Jarvis ]: Sure, I understand your request. Based on the inference results of the models, I will answer your request step by step. Firstly, I used the token-classification model dslim\u002Fbert-base-NER to recognize the named entities in the sentence. The inference result is that the named entities are Iron Man, Stan Lee, Larry Lieber, Don Heck and Jack Kirby. Then, I used the question-answering model deepset\u002Froberta-base-squad2 to answer your request. The inference result is that there is no output since the context cannot be empty. Therefore, I cannot make it. I hope this answer your request. Is there anything else I can help you with?\n[ User ]: Please generate a canny image based on \u002Fexamples\u002Ff.jpg\n[ Jarvis ]: Sure. I have generated a canny image based on \u002Fexamples\u002Ff.jpg. To do this, I first used the image-to-text model nlpconnect\u002Fvit-gpt2-image-captioning to generate the text description of the image, which is \"a herd of giraffes and zebras grazing in a field\". Then I used the canny-control model to generate the canny image of the original image, which is stored in the path \"\u002Fimages\u002F1ea2.png\". Finally, I used the canny-text-to-image model lllyasviel\u002Fsd-controlnet-canny to generate the canny image based on the text description and the canny image, which is stored in the path \"\u002Fimages\u002Fba66.png\". Do you need anything else?\n```\n\n\n### Configuration\n\nThe server-side configuration file is `server\u002Fconfigs\u002Fconfig.default.yaml`, and some parameters are presented as follows:\n\n+ `model`: LLM, currently supports `text-davinci-003`. We are working on integrating more open-source LLMs.\n+ `inference_mode`: mode of inference endpoints\n  + `local`: only use the local inference endpoints\n  + `huggingface`: only use the Hugging Face Inference Endpoints **(free of local inference endpoints)**\n  + `hybrid`: both of `local` and `huggingface`\n+ `local_deployment`: scale of locally deployed models, works under `local` or `hybrid` inference mode:\n  +  `minimal` (RAM>12GB, ControlNet only)\n  +  `standard` (RAM>16GB, ControlNet + Standard Pipelines)\n  +  `full` (RAM>42GB, All registered models)\n\nOn a personal laptop, we recommend the configuration of `inference_mode: hybrid `and `local_deployment: minimal`. But the available models under this setting may be limited due to the instability of remote Hugging Face Inference Endpoints.\n\n### NVIDIA Jetson Embedded Device Support\nA [Dockerfile](.\u002FDockerfile.jetson) is included that provides experimental support for [NVIDIA Jetson embedded devices](https:\u002F\u002Fdeveloper.nvidia.com\u002Fembedded-computing).  This image provides accelerated ffmpeg, pytorch, torchaudio, and torchvision dependencies.  To build the docker image, [ensure that the default docker runtime is set to 'nvidia'](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fnvidia-docker\u002Fwiki\u002FAdvanced-topics#default-runtime).  A pre-built image is provided at https:\u002F\u002Fhub.docker.com\u002Fr\u002Ftoolboc\u002Fnv-jarvis.\n\n```bash\n#Build the docker image\ndocker build --pull --rm -f \"Dockerfile.jetson\" -t toolboc\u002Fnv-jarvis:r35.2.1 \n```\n\nDue to to memory requirements, JARVIS is required to run on Jetson AGX Orin family devices (64G on-board RAM device preferred) with config options set to:\n* `inference_mode: local` \n* `local_deployment: standard`  \n\nModels and configs are recommended to be provided through a volume mount from the host to the container as shown in the `docker run` step below.  It is possible to uncomment the `# Download local models` section of the [Dockerfile](.\u002FDockerfile.jetson) to build a container with models included.\n\n#### Start the model server, awesomechat, and web app on Jetson Orin AGX\n\n```bash\n# run the container which will automatically start the model server\ndocker run --name jarvis --net=host --gpus all -v ~\u002Fjarvis\u002Fconfigs:\u002Fapp\u002Fserver\u002Fconfigs -v ~\u002Fsrc\u002FJARVIS\u002Fserver\u002Fmodels:\u002Fapp\u002Fserver\u002Fmodels toolboc\u002Fnv-jarvis:r35.2.1\n\n# (wait for model server to complete initialization)\n\n# start awesome_chat.py \ndocker exec jarvis python3 awesome_chat.py --config configs\u002Fconfig.default.yaml --mode server\n\n#start the web application (application will be acessible at http:\u002F\u002Flocalhost:9999)\ndocker exec jarvis npm run dev --prefix=\u002Fapp\u002Fweb\n```\n\n### Screenshots\n\n\u003Cp align=\"center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_JARVIS_readme_3a04d300ff09.jpg\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_JARVIS_readme_089916141c67.jpg\">\u003C\u002Fp>\n\n\n\n\n## Citation\nIf you find this work useful in your method, you can cite the paper as below:\n\n    @inproceedings{shen2023hugginggpt,\n      author = {Shen, Yongliang and Song, Kaitao and Tan, Xu and Li, Dongsheng and Lu, Weiming and Zhuang, Yueting},\n      booktitle = {Advances in Neural Information Processing Systems},\n      title = {HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace},\n      year = {2023}\n    }\n\n>\n    @article{shen2023taskbench,\n      title   = {TaskBench: Benchmarking Large Language Models for Task Automation},\n      author  = {Shen, Yongliang and Song, Kaitao and Tan, Xu and Zhang, Wenqi and Ren, Kan and Yuan, Siyu and Lu, Weiming and Li, Dongsheng and Zhuang, Yueting},\n      journal = {arXiv preprint arXiv:2311.18760},\n      year    = {2023}\n    }\n\n>\n    @article{yuan2024easytool,\n      title   = {EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction},\n      author  = {Siyu Yuan and Kaitao Song and Jiangjie Chen and Xu Tan and Yongliang Shen and Ren Kan and Dongsheng Li and Deqing Yang},\n      journal = {arXiv preprint arXiv:2401.06201},\n      year    = {2024}\n    }\n","# JARVIS\n\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-Paper-\u003CCOLOR>.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17580)\n[![Open in Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97-Open%20in%20Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fmicrosoft\u002FHuggingGPT)\n\nJARVIS的使命是探索通用人工智能（AGI），并向整个社区提供前沿研究。\n\n## 最新动态\n\n+  [2024年1月15日] 我们发布了Easytool，让工具使用更加便捷。\n   + 代码和数据集可在[EasyTool](\u002Feasytool)获取。\n   + 论文可访问[EasyTool：用简洁工具指令增强基于大语言模型的智能体](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.06201)。\n+  [2023年11月30日] 我们发布了TaskBench，用于评估大语言模型的任务自动化能力。\n   + 代码和数据集可在[TaskBench](\u002Ftaskbench)获取。\n   + 论文可访问[TaskBench：面向任务自动化的大型语言模型基准测试](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.18760)。\n+  [2023年7月28日] 我们目前正处于评估与项目重构的规划阶段。近期我们将发布Jarvis的新版本。\n+  [2023年7月24日] 我们发布了Jarvis的轻量版langchain。详情请见\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Flangchain-ai\u002Flangchain\u002Ftree\u002Fmaster\u002Flibs\u002Fexperimental\u002Flangchain_experimental\u002Fautonomous_agents\u002Fhugginggpt\">这里\u003C\u002Fa>。\n+  [2023年4月16日] Jarvis现已支持Azure平台上的OpenAI服务及GPT-4模型。\n+  [2023年4月6日] 我们新增了Gradio演示，并在`server`模式下构建了`\u002Ftasks`和`\u002Fresults`的Web API。\n   + Gradio演示现托管于Hugging Face Space。（采用`inference_mode=hybrid`和`local_deployment=standard`构建）\n   + Web API `\u002Ftasks`和`\u002Fresults`可访问中间结果，涵盖第1阶段：任务规划，以及第1至3阶段：模型选择与执行结果。详情请见\u003Ca href=\"#Server\">这里\u003C\u002Fa>。\n+  [2023年4月3日] 我们增加了CLI模式，并提供了配置本地端点规模的参数。\n   + 您无需本地部署模型，即可享受Jarvis的轻量化体验。详情请见\u003Ca href=\"#Configuration\">这里\u003C\u002Fa>。\n   + 只需运行`python awesome_chat.py --config configs\u002Fconfig.lite.yaml`即可体验。\n+  [2023年4月1日] 我们更新了部分代码版本。\n\n\n### 概览\n\n语言作为大语言模型的接口，可连接众多AI模型，以解决复杂的AI任务！\n\n\u003Cp align=\"center\">\n\u003Cimg width=\"100%\" alt=\"image\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_JARVIS_readme_e864256065fe.png\">    \n\u003C\u002Fp>\n\n\n查看我们的论文：[HuggingGPT：用ChatGPT及其伙伴在HuggingFace上解决AI任务](http:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17580)，沈永亮、宋凯涛、谭旭、李东升、陆伟明和庄宇婷（前两位作者贡献相当）\n\n我们介绍了一种协作系统，由**一个大语言模型作为控制器**和**众多专家模型作为协同执行者**（来自HuggingFace Hub）组成。该系统的流程分为四个阶段：\n+ **任务规划**：利用ChatGPT分析用户请求，理解其意图，并将其拆解为可解决的任务。\n+ **模型选择**：为解决规划好的任务，ChatGPT根据描述从Hugging Face上选择合适的专家模型。\n+ **任务执行**：调用并执行每个选定的模型，将结果返回给ChatGPT。\n+ **响应生成**：最后，利用ChatGPT整合所有模型的预测，生成最终响应。\n\n\u003Cp align=\"center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_JARVIS_readme_281241ff11f0.jpg\">\u003C\u002Fp>\n\n### 系统要求\n\n#### 默认（推荐）\n\n对于`configs\u002Fconfig.default.yaml`：\n\n+ Ubuntu 16.04 LTS\n+ 显存≥24GB\n+ 内存>12GB（最低）、16GB（标准）、80GB（完整）\n+ 磁盘>284GB \n  + 42GB用于`damo-vilab\u002Ftext-to-video-ms-1.7b`\n  + 126GB用于`ControlNet`\n  + 66GB用于`stable-diffusion-v1-5`\n  + 50GB用于其他\n\n#### 最低（精简）\n\n对于`configs\u002Fconfig.lite.yaml`：\n\n+ Ubuntu 16.04 LTS\n+ 无其他要求\n\n配置文件`configs\u002Fconfig.lite.yaml`无需下载和本地部署任何专家模型。不过，这意味着Jarvis仅限于在HuggingFace推理端点上稳定运行的模型。\n\n### 快速开始\n\n首先，将`server\u002Fconfigs\u002Fconfig.default.yaml`中的`openai.key`和`huggingface.token`替换为**您的个人OpenAI密钥**和**您的Hugging Face令牌**，或者分别放入环境变量`OPENAI_API_KEY`和`HUGGINGFACE_ACCESS_TOKEN`中。然后运行以下命令：\n\n\u003Cspan id=\"Server\">\u003C\u002Fspan>\n\n#### 对于服务器：\n\n```bash\n# 设置环境\ncd server\nconda create -n jarvis python=3.8\nconda activate jarvis\nconda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia\npip install -r requirements.txt\n\n# 下载模型。确保已安装`git-lfs`。\ncd models\nbash download.sh # 当`inference_mode`为`local`或`hybrid`时需要。\n\n# 运行服务器\ncd ..\npython models_server.py --config configs\u002Fconfig.default.yaml # 当`inference_mode`为`local`或`hybrid`时需要\npython awesome_chat.py --config configs\u002Fconfig.default.yaml --mode server # 对于text-davinci-003\n```\n\n现在，您可通过Web API访问Jarvis的服务。\n\n+ `\u002Fhugginggpt` --方法`POST`，访问完整服务。\n+ `\u002Ftasks` --方法`POST`，访问第1阶段的中间结果。\n+ `\u002Fresults` --方法`POST`，访问第1至3阶段的中间结果。\n\n例如：\n\n```bash\n# 请求\ncurl --location 'http:\u002F\u002Flocalhost:8004\u002Ftasks' \\\n--header 'Content-Type: application\u002Fjson' \\\n--data '{\n    \"messages\": [\n        {\n            \"role\": \"user\",\n            \"content\": \"根据\u002Fexamples\u002Fd.jpg的姿态和\u002Fexamples\u002Fe.jpg的内容，请给我一张新图\"\n        }\n    ]\n}'\n\n# 响应\n[{\"args\":{\"image\":\"\u002Fexamples\u002Fd.jpg\"},\"dep\":[-1],\"id\":0,\"task\":\"openpose-control\"},{\"args\":{\"image\":\"\u002Fexamples\u002Fe.jpg\"},\"dep\":[-1],\"id\":1,\"task\":\"image-to-text\"},{\"args\":{\"image\":\"\u003CGENERATED>-0\",\"text\":\"\u003CGENERATED>-1\"},\"dep\":[1,0],\"id\":2,\"task\":\"openpose-text-to-image\"}]\n```\n\n\n#### 对于网页：\n\n我们提供了一个友好的网页界面。在以服务器模式启动`awesome_chat.py`后，您可在浏览器中运行以下命令与Jarvis通信：\n\n- 需要先安装`nodejs`和`npm`。\n- [重要] 如果您在另一台机器上运行网页客户端，需将`http:\u002F\u002F{LAN_IP_of_the_server}:{port}\u002F`设置为`web\u002Fsrc\u002Fconfig\u002Findex.ts`中的`HUGGINGGPT_BASE_URL`。\n- 如果您想使用视频生成功能，需手动编译带有H.264的`ffmpeg`。\n- 您可通过双击设置图标切换到ChatGPT！\n\n```bash\ncd web\nnpm install\nnpm run dev\n```\n\n```bash\n# 可选：安装ffmpeg\n\n# 此命令需无错误执行。\nLD_LIBRARY_PATH=\u002Fusr\u002Flocal\u002Flib \u002Fusr\u002Flocal\u002Fbin\u002Fffmpeg -i input.mp4 -vcodec libx264 output.mp4\n```\n\n\u003Cspan id=\"Gradio\">\u003C\u002Fspan>\n\n#### 对于 Gradio\n\nGradio 演示现在托管在 Hugging Face Space 上。您也可以运行以下命令在本地启动演示：\n\n```bash\npython models_server.py --config configs\u002Fconfig.gradio.yaml\npython run_gradio_demo.py --config configs\u002Fconfig.gradio.yaml\n\n# 或者以 Docker 镜像形式运行 HF Space（使用 `inference_mode=hibrid` 和 `local_deployment=standard` 构建）\ndocker run -it -p 7860:7860 --platform=linux\u002Famd64 registry.hf.space\u002Fmicrosoft-hugginggpt:latest python app.py\n```\n\n#### 对于 CLI：\n\n您也可以更方便地以 CLI 模式运行 Jarvis：\n\n```bash\ncd server\npython awesome_chat.py --config configs\u002Fconfig.default.yaml --mode cli\n```\n\nCLI 模式的示例：\n\n```\n欢迎来到 Jarvis！这是一个协作系统，由一个大语言模型作为控制器，以及众多专家模型作为协同执行器组成。Jarvis 可以规划任务、调度 Hugging Face 模型、根据您的请求生成友好的回复，并帮助您完成许多事情。请输入您的请求（输入 `exit` 退出）。\n[ 用户 ]：给定一组图片 A：\u002Fexamples\u002Fa.jpg，B：\u002Fexamples\u002Fb.jpg，C：\u002Fexamples\u002Fc.jpg，请告诉我这些图片中有多少只斑马？\n[ Jarvis ]：根据您的请求，我使用了多个模型处理了您提供的图片。对于图片 A：\u002Fexamples\u002Fa.jpg，我使用了图像到文本模型 nlpconnect\u002Fvit-gpt2-image-captioning，生成了文本“一只猫坐在窗台上向外张望”。然后，我使用了目标检测模型 facebook\u002Fdetr-resnet-50，生成了带有预测框的图片 '\u002Fimages\u002Fa3dc.jpg'，其中包含两个对象，标签分别为‘盆栽植物’和‘猫’。对于图片 B：\u002Fexamples\u002Fb.jpg，我使用了图像到文本模型 nlpconnect\u002Fvit-gpt2-image-captioning，生成了文本“两只斑马站在一片干草地上”。然后，我使用了目标检测模型 facebook\u002Fdetr-resnet-50，生成了带有预测框的图片 '\u002Fimages\u002Ff5df.jpg'，其中包含三个对象，标签分别为‘斑马’。对于图片 C：\u002Fexamples\u002Fc.jpg，我使用了图像到文本模型 nlpconnect\u002Fvit-gpt2-image-captioning，生成了文本“一只斑马站在一片高草地上”。然后，我使用了目标检测模型 facebook\u002Fdetr-resnet-50，生成了带有预测框的图片 '\u002Fimages\u002F69c7.jpg'，其中包含一个对象，标签为‘斑马’。最后，我使用了 Canny 控制模型 canny-control，基于图片 \u002Fexamples\u002Ff.jpg 生成了图片 '\u002Fimages\u002F7e18.png'。因此，这些图片中共有四只斑马。还有其他我可以帮您的吗？\n[ 用户 ]：请回答句子中的所有命名实体：钢铁侠是美国漫画书中的超级英雄，由漫威漫画出版。这个角色由作家兼编辑斯坦·李共同创作，编剧拉里·利伯开发，艺术家唐·赫克和杰克·科比设计。\n[ Jarvis ]：好的，我明白您的请求。根据模型的推理结果，我会逐步回答您的请求。首先，我使用了分词分类模型 dslim\u002Fbert-base-NER 来识别句子中的命名实体。推理结果是：钢铁侠、斯坦·李、拉里·利伯、唐·赫克和杰克·科比。然后，我使用了问答模型 deepset\u002Froberta-base-squad2 回答您的请求。推理结果是没有输出，因为上下文不能为空。因此，我无法给出答案。希望这能解答您的疑问。还有其他我可以帮您的吗？\n[ 用户 ]：请基于 \u002Fexamples\u002Ff.jpg 生成一张 Canny 图片\n[ Jarvis ]：好的。我已经基于 \u002Fexamples\u002Ff.jpg 生成了一张 Canny 图片。为此，我先使用了图像到文本模型 nlpconnect\u002Fvit-gpt2-image-captioning，生成了图片的文本描述，即“一群长颈鹿和斑马正在草地上吃草”。然后，我使用了 Canny 控制模型，生成了原始图片的 Canny 图片，存储在路径 “\u002Fimages\u002F1ea2.png” 中。最后，我使用了 Canny 文本到图像模型 lllyasviel\u002Fsd-controlnet-canny，基于文本描述和 Canny 图片生成了最终的 Canny 图片，存储在路径 “\u002Fimages\u002Fba66.png” 中。您还需要其他帮助吗？\n```\n\n\n### 配置\n\n服务器端配置文件为 `server\u002Fconfigs\u002Fconfig.default.yaml`，部分参数如下所示：\n\n+ `model`: 大语言模型，目前支持 `text-davinci-003`。我们正在努力集成更多开源大语言模型。\n+ `inference_mode`: 推理端点模式\n  + `local`: 仅使用本地推理端点\n  + `huggingface`: 仅使用 Hugging Face 推理端点 **（不包含本地推理端点）**\n  + `hybrid`: 同时使用 `local` 和 `huggingface`\n+ `local_deployment`: 本地部署模型的规模，在 `local` 或 `hybrid` 推理模式下生效：\n  + `minimal`（内存 > 12GB，仅 ControlNet）\n  + `standard`（内存 > 16GB，ControlNet + 标准流水线）\n  + `full`（内存 > 42GB，所有已注册模型）\n\n在个人笔记本电脑上，我们推荐使用 `inference_mode: hybrid` 和 `local_deployment: minimal` 的配置。但在此设置下可用的模型可能受限，因为远程 Hugging Face 推理端点不够稳定。\n\n### NVIDIA Jetson 嵌入式设备支持\n我们提供了一个 [Dockerfile](.\u002FDockerfile.jetson)，用于实验性支持 [NVIDIA Jetson 嵌入式设备](https:\u002F\u002Fdeveloper.nvidia.com\u002Fembedded-computing)。该镜像提供了加速的 ffmpeg、PyTorch、Torchaudio 和 torchvision 依赖库。要构建 Docker 镜像，请确保默认的 Docker 运行时设置为 ‘nvidia’（参见 [NVIDIA\u002Fnvidia-docker Wiki](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fnvidia-docker\u002Fwiki\u002FAdvanced-topics#default-runtime)）。预构建的镜像可在 https:\u002F\u002Fhub.docker.com\u002Fr\u002Ftoolboc\u002Fnv-jarvis 提供。\n\n```bash\n# 构建 Docker 镜像\ndocker build --pull --rm -f \"Dockerfile.jetson\" -t toolboc\u002Fnv-jarvis:r35.2.1 \n```\n\n由于内存需求，JARVIS 必须在 Jetson AGX Orin 系列设备上运行（优先选择板载内存 64G 的设备），并设置以下配置选项：\n* `inference_mode: local`\n* `local_deployment: standard`\n\n建议通过卷挂载的方式将模型和配置从主机传递到容器中，如下面的 `docker run` 步骤所示。您也可以取消注释 [Dockerfile](.\u002FDockerfile.jetson) 中的 `# 下载本地模型` 部分，以构建包含模型的容器。\n\n#### 在 Jetson Orin AGX 上启动模型服务器、awesomechat 和 Web 应用程序\n\n```bash\n# 运行容器，它会自动启动模型服务器\ndocker run --name jarvis --net=host --gpus all -v ~\u002Fjarvis\u002Fconfigs:\u002Fapp\u002Fserver\u002Fconfigs -v ~\u002Fsrc\u002FJARVIS\u002Fserver\u002Fmodels:\u002Fapp\u002Fserver\u002Fmodels toolboc\u002Fnv-jarvis:r35.2.1\n\n# （等待模型服务器完成初始化）\n\n# 启动 awesome_chat.py \ndocker exec jarvis python3 awesome_chat.py --config configs\u002Fconfig.default.yaml --mode server\n\n#启动Web应用（应用可通过http:\u002F\u002Flocalhost:9999访问）\ndocker exec jarvis npm run dev --prefix=\u002Fapp\u002Fweb\n```\n\n### 截图\n\n\u003Cp align=\"center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_JARVIS_readme_3a04d300ff09.jpg\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_JARVIS_readme_089916141c67.jpg\">\u003C\u002Fp>\n\n\n\n\n## 引用\n如果您在自己的方法中使用了本工作，可参考以下论文进行引用：\n\n    @inproceedings{shen2023hugginggpt,\n      author = {沈永亮、宋凯涛、谭旭、李东升、陆伟明、庄宇婷},\n      booktitle = {神经信息处理系统进展},\n      title = {HuggingGPT：利用ChatGPT及其HuggingFace伙伴解决AI任务},\n      year = {2023}\n    }\n\n>\n    @article{shen2023taskbench,\n      title   = {TaskBench：面向任务自动化的大型语言模型基准测试},\n      author  = {沈永亮、宋凯涛、谭旭、张文琪、任侃、袁思宇、陆伟明、李东升、庄宇婷},\n      journal = {arXiv预印本 arXiv:2311.18760},\n      year    = {2023}\n    }\n\n>\n    @article{yuan2024easytool,\n      title   = {EASYTOOL：以简洁工具指令增强基于大语言模型的智能体},\n      author  = {袁思宇、宋凯涛、陈江杰、谭旭、沈永亮、任侃、李东升、杨德庆},\n      journal = {arXiv预印本 arXiv:2401.06201},\n      year    = {2024}\n    }","# JARVIS 快速上手指南\n\n## 环境准备\n\n### 系统要求\n- **默认配置（推荐）**：\n  - 操作系统：Ubuntu 16.04 LTS\n  - 显存：≥24GB\n  - 内存：≥12GB（最低），16GB（标准），80GB（完整）\n  - 磁盘空间：≥284GB（含多个模型文件）\n- **最小配置（Lite）**：\n  - 操作系统：Ubuntu 16.04 LTS\n  - 其他无需额外配置（依赖Hugging Face远程推理服务）\n\n### 前置依赖\n- `git-lfs`（用于下载大模型文件）\n- Conda（推荐Miniconda）\n- Python 3.8\n- Node.js 和 npm（Web界面需要）\n- ffmpeg（视频生成需要，需支持H.264编码）\n\n### 国内加速建议\n- PyTorch安装使用清华源：  \n  `conda install ... -c https:\u002F\u002Fmirrors.tuna.tsinghua.edu.cn\u002Fanaconda\u002Fpkgs\u002Fmain`\n- pip安装时使用清华源：  \n  `pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n- Hugging Face模型下载设置镜像：  \n  `export HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com`\n\n## 安装步骤\n\n1. 克隆仓库并进入server目录：\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FHuggingGPT.git\n   cd HuggingGPT\u002Fserver\n   ```\n\n2. 创建conda环境并安装依赖：\n   ```bash\n   conda create -n jarvis python=3.8\n   conda activate jarvis\n   conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c https:\u002F\u002Fmirrors.tuna.tsinghua.edu.cn\u002Fanaconda\u002Fpkgs\u002Fmain -c https:\u002F\u002Fmirrors.tuna.tsinghua.edu.cn\u002Fanaconda\u002Fpkgs\u002Ffree\n   pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n   ```\n\n3. 配置API密钥：\n   - 将OpenAI Key和Hugging Face Token填入`server\u002Fconfigs\u002Fconfig.default.yaml`，或设置环境变量：\n     ```bash\n     export OPENAI_API_KEY=your_openai_key\n     export HUGGINGFACE_ACCESS_TOKEN=your_hf_token\n     ```\n\n4. 下载模型（仅当使用默认配置且`inference_mode`为local\u002Fhybrid时）：\n   ```bash\n   cd models\n   bash download.sh\n   ```\n\n## 基本使用\n\n### Server模式\n```bash\n# 启动模型服务器（默认配置）\npython models_server.py --config configs\u002Fconfig.default.yaml\n# 启动聊天服务\npython awesome_chat.py --config configs\u002Fconfig.default.yaml --mode server\n```\n\n**Web API示例**：\n```bash\ncurl --location 'http:\u002F\u002Flocalhost:8004\u002Ftasks' \\\n--header 'Content-Type: application\u002Fjson' \\\n--data '{\n    \"messages\": [\n        {\n            \"role\": \"user\",\n            \"content\": \"based on pose of \u002Fexamples\u002Fd.jpg and content of \u002Fexamples\u002Fe.jpg, please show me a new image\"\n        }\n    ]\n}'\n```\n\n### Web界面\n```bash\ncd web\nnpm install\nnpm run dev\n```\n> 注意：若在其他机器访问，需修改`web\u002Fsrc\u002Fconfig\u002Findex.ts`中的`HUGGINGGPT_BASE_URL`为服务器IP和端口\n\n### CLI模式（最简单体验）\n```bash\npython awesome_chat.py --config configs\u002Fconfig.default.yaml --mode cli\n```\n**示例交互**：\n```\nWelcome to Jarvis! ...\n[ User ]: 请统计图片a.jpg、b.jpg、c.jpg中斑马的数量\n[ Jarvis ]: 根据模型处理结果，图片中斑马数量为4只...\n```","某零售企业的技术团队需要为用户上传的服装瑕疵图片生成自动检测报告，包括问题分类、修复建议和视频演示。\n\n### 没有 JARVIS 时  \n- 需要手动部署CLIP、Stable Diffusion、text-to-video等多个独立模型，环境配置耗时且易出错  \n- 数据在模型间传递需手动处理格式转换，例如图片转张量、文本编码，频繁出现兼容性问题  \n- 每次新增任务类型（如识别新瑕疵类型）需重写流程代码，维护成本高  \n- 模型选择和任务分解依赖人工判断，响应速度慢，无法快速适应业务变化  \n\n### 使用 JARVIS 后  \n- 仅需输入自然语言指令，JARVIS自动调用Hugging Face Hub上的预训练模型完成任务链  \n- 系统自动处理数据格式转换和模型间通信，无需手动编码，集成时间从数天压缩至几小时  \n- 新增任务类型只需更新配置文件，JARVIS自动匹配模型，业务响应速度提升80%  \n- 生成结构化报告直接输出，客服团队可立即使用，每周节省15小时人工处理时间  \n\nJARVIS让跨模型协作变得像使用单一工具一样简单，显著提升AI应用开发效率。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_JARVIS_e8642560.png","microsoft","Microsoft","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fmicrosoft_4900709c.png","Open source projects and samples from Microsoft",null,"opensource@microsoft.com","OpenAtMicrosoft","https:\u002F\u002Fopensource.microsoft.com","https:\u002F\u002Fgithub.com\u002Fmicrosoft",[85,89,93,97,101,105,109,113],{"name":86,"color":87,"percentage":88},"Python","#3572A5",90.2,{"name":90,"color":91,"percentage":92},"Vue","#41b883",5.3,{"name":94,"color":95,"percentage":96},"TypeScript","#3178c6",2.1,{"name":98,"color":99,"percentage":100},"Shell","#89e051",0.8,{"name":102,"color":103,"percentage":104},"JavaScript","#f1e05a",0.7,{"name":106,"color":107,"percentage":108},"PowerShell","#012456",0.6,{"name":110,"color":111,"percentage":112},"CSS","#663399",0.2,{"name":114,"color":115,"percentage":116},"HTML","#e34c26",0.1,24619,2094,"2026-04-05T10:11:19","MIT",4,"Linux","需要 NVIDIA GPU，显存 24GB+（默认配置），CUDA 11.7+；huggingface模式无需本地GPU","12GB+ (最低), 16GB+ (推荐)",{"notes":126,"python":127,"dependencies":128},"建议使用 conda 管理环境，对于默认配置需下载约 284GB 模型文件，需要 git-lfs；Web 界面需 nodejs 和 npm；视频生成需 ffmpeg","3.8+",[],[13],[131,132,133],"deep-learning","platform","pytorch","2026-03-27T02:49:30.150509","2026-04-06T06:44:39.831289",[137,142,146,151,156,161],{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},8842,"在 Gradio demo 中出现 KeyError: 'choices' 如何解决？","通常由于 OpenAI 账户配额用尽。检查 server\u002Flogs\u002Fdebug.log 文件以获取详细错误；确认 OpenAI 账户配额是否耗尽；如果配额用尽，登录 OpenAI 官网充值或调整计划。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FJARVIS\u002Fissues\u002F50",{"id":143,"question_zh":144,"answer_zh":145,"source_url":141},8843,"如何验证 OpenAI API Key 是否有效？","使用 curl 命令测试：`curl https:\u002F\u002Fapi.openai.com\u002Fv1\u002Fchat\u002Fcompletions -H \"Authorization: Bearer YOUR_KEY\" -d '{\"model\":\"gpt-3.5-turbo\",\"messages\":[{\"role\":\"user\",\"content\":\"Hello\"}]}'`。如果返回错误，Key 无效或配额用尽。",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},8844,"在 Colab 部署时遇到 Hugging Face 推理错误如何解决？","提供 Hugging Face cookies 给 Jarvis。获取 cookies 的方法：访问 huggingface.co 并登录，使用浏览器开发者工具（F12），在 Application 标签下找到 Cookies，复制 session token 或相关 cookies。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FJARVIS\u002Fissues\u002F18",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},8845,"安装时磁盘空间不足，如何解决？","使用 move-wsl 脚本将 WSL 安装移动到其他驱动器。下载脚本 https:\u002F\u002Fgithub.com\u002Fpxlrbt\u002Fmove-wsl，运行它将 WSL 实例迁移到有足够空间的驱动器。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FJARVIS\u002Fissues\u002F111",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},8846,"部署服务器后返回 404 错误，可能原因是什么？","常见原因是 OpenAI 配额用尽。检查 OpenAI 账户的配额和计费详情；登录 OpenAI 官网查看配额使用情况并充值。确保配置文件中的 API 密钥有效。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FJARVIS\u002Fissues\u002F72",{"id":162,"question_zh":163,"answer_zh":164,"source_url":155},8847,"为什么 Jarvis 安装占用大量磁盘空间？","当前版本默认下载所有模型，占用约 284GB。建议根据部署类型选择性下载模型，但当前需要手动管理；维护者表示未来版本将支持动态加载模型以减少空间使用。",[]]