[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-bigscience-workshop--petals":3,"tool-bigscience-workshop--petals":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",157379,2,"2026-04-15T23:32:42",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":78,"owner_website":79,"owner_url":80,"languages":81,"stars":94,"forks":95,"last_commit_at":96,"license":97,"difficulty_score":10,"env_os":98,"env_gpu":99,"env_ram":100,"env_deps":101,"category_tags":108,"github_topics":109,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":129,"updated_at":130,"faqs":131,"releases":161},7983,"bigscience-workshop\u002Fpetals","petals","🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading","Petals 是一款让你在家用电脑上运行超大语言模型的开源工具。它采用类似 BitTorrent 的分布式架构，将庞大的模型（如 Llama 3.1 405B、Mixtral 或 BLOOM）拆分到全球志愿者的 GPU 网络上协同计算。这意味着你无需购买昂贵的专业显卡，仅凭普通台式机甚至 Google Colab 免费额度，就能流畅地对这些巨型模型进行推理和微调，其速度比传统的本地卸载方案快达 10 倍。\n\nPetals 主要解决了个人用户和小型团队因硬件资源有限而无法部署参数量巨大模型的痛点。它特别适合开发者、AI 研究人员以及希望探索前沿大模型能力的技术爱好者使用。对于注重数据隐私的用户，Petals 还支持搭建受信任的私有网络集群。\n\n其核心技术亮点在于“去中心化协作”：用户既可以作为客户端调用全网算力，也可以贡献自己的闲置 GPU 加入公共 swarm（群簇）以提升整体容量。通过简单的 Python 代码即可连接分布式网络，像调用本地模型一样轻松生成文本。这种社区共建的模式不仅降低了大模型的使用门槛，也促进了算力的共享与高效利用。","\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002F7eR7Pan.png\" width=\"400\">\u003Cbr>\n    Run large language models at home, BitTorrent-style.\u003Cbr>\n    Fine-tuning and inference \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals#benchmarks\">up to 10x faster\u003C\u002Fa> than offloading\n    \u003Cbr>\u003Cbr>\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fpetals\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fpetals.svg?color=green\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FtfHfe8B34k\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F865254854262652969?label=discord&logo=discord&logoColor=white\">\u003C\u002Fa>\n    \u003Cbr>\n\u003C\u002Fp>\n\nGenerate text with distributed **Llama 3.1** (up to 405B), **Mixtral** (8x22B), **Falcon** (40B+) or **BLOOM** (176B) and fine‑tune them for your own tasks &mdash; right from your desktop computer or Google Colab:\n\n```python\nfrom transformers import AutoTokenizer\nfrom petals import AutoDistributedModelForCausalLM\n\n# Choose any model available at https:\u002F\u002Fhealth.petals.dev\nmodel_name = \"meta-llama\u002FMeta-Llama-3.1-405B-Instruct\"\n\n# Connect to a distributed network hosting model layers\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoDistributedModelForCausalLM.from_pretrained(model_name)\n\n# Run the model as if it were on your computer\ninputs = tokenizer(\"A cat sat\", return_tensors=\"pt\")[\"input_ids\"]\noutputs = model.generate(inputs, max_new_tokens=5)\nprint(tokenizer.decode(outputs[0]))  # A cat sat on a mat...\n```\n\n\u003Cp align=\"center\">\n    🚀 &nbsp;\u003Cb>\u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1uCphNY7gfAUkdDrTx21dZZwCOUDCMPw8?usp=sharing\">Try now in Colab\u003C\u002Fa>\u003C\u002Fb>\n\u003C\u002Fp>\n\n🦙 **Want to run Llama?** [Request access](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FMeta-Llama-3.1-405B-Instruct) to its weights, then run `huggingface-cli login` in the terminal before loading the model. Or just try it in our [chatbot app](https:\u002F\u002Fchat.petals.dev).\n\n🔏 **Privacy.** Your data will be processed with the help of other people in the public swarm. Learn more about privacy [here](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FSecurity,-privacy,-and-AI-safety). For sensitive data, you can set up a [private swarm](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FLaunch-your-own-swarm) among people you trust.\n\n💬 **Any questions?** Ping us in [our Discord](https:\u002F\u002Fdiscord.gg\u002FKdThf2bWVU)!\n\n## Connect your GPU and increase Petals capacity\n\nPetals is a community-run system &mdash; we rely on people sharing their GPUs. You can help serving one of the [available models](https:\u002F\u002Fhealth.petals.dev) or host a new model from 🤗 [Model Hub](https:\u002F\u002Fhuggingface.co\u002Fmodels)!\n\nAs an example, here is how to host a part of [Llama 3.1 (405B) Instruct](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FMeta-Llama-3.1-405B-Instruct) on your GPU:\n\n🦙 **Want to host Llama?** [Request access](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FMeta-Llama-3.1-405B-Instruct) to its weights, then run `huggingface-cli login` in the terminal before loading the model.\n\n🐧 **Linux + Anaconda.** Run these commands for NVIDIA GPUs (or follow [this](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FRunning-on-AMD-GPU) for AMD):\n\n```bash\nconda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia\npip install git+https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\npython -m petals.cli.run_server meta-llama\u002FMeta-Llama-3.1-405B-Instruct\n```\n\n🪟 **Windows + WSL.** Follow [this guide](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FRun-Petals-server-on-Windows) on our Wiki.\n\n🐋 **Docker.** Run our [Docker](https:\u002F\u002Fwww.docker.com) image for NVIDIA GPUs (or follow [this](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FRunning-on-AMD-GPU) for AMD):\n\n```bash\nsudo docker run -p 31330:31330 --ipc host --gpus all --volume petals-cache:\u002Fcache --rm \\\n    learningathome\u002Fpetals:main \\\n    python -m petals.cli.run_server --port 31330 meta-llama\u002FMeta-Llama-3.1-405B-Instruct\n```\n\n🍏 **macOS + Apple M1\u002FM2 GPU.** Install [Homebrew](https:\u002F\u002Fbrew.sh\u002F), then run these commands:\n\n```bash\nbrew install python\npython3 -m pip install git+https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\npython3 -m petals.cli.run_server meta-llama\u002FMeta-Llama-3.1-405B-Instruct\n```\n\n\u003Cp align=\"center\">\n    📚 &nbsp;\u003Cb>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FFAQ:-Frequently-asked-questions#running-a-server\">Learn more\u003C\u002Fa>\u003C\u002Fb> (how to use multiple GPUs, start the server on boot, etc.)\n\u003C\u002Fp>\n\n🔒 **Security.** Hosting a server does not allow others to run custom code on your computer. Learn more [here](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FSecurity,-privacy,-and-AI-safety).\n\n💬 **Any questions?** Ping us in [our Discord](https:\u002F\u002Fdiscord.gg\u002FX7DgtxgMhc)!\n\n🏆 **Thank you!** Once you load and host 10+ blocks, we can show your name or link on the [swarm monitor](https:\u002F\u002Fhealth.petals.dev) as a way to say thanks. You can specify them with `--public_name YOUR_NAME`.\n\n## How does it work?\n\n- You load a small part of the model, then join a [network](https:\u002F\u002Fhealth.petals.dev) of people serving the other parts. Single‑batch inference runs at up to **6 tokens\u002Fsec** for **Llama 2** (70B) and up to **4 tokens\u002Fsec** for **Falcon** (180B) — enough for [chatbots](https:\u002F\u002Fchat.petals.dev) and interactive apps.\n- You can employ any fine-tuning and sampling methods, execute custom paths through the model, or see its hidden states. You get the comforts of an API with the flexibility of **PyTorch** and **🤗 Transformers**.\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FRTYF3yW.png\" width=\"800\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    📜 &nbsp;\u003Cb>\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.01188.pdf\">Read paper\u003C\u002Fa>\u003C\u002Fb>\n    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n    📚 &nbsp;\u003Cb>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FFAQ:-Frequently-asked-questions\">See FAQ\u003C\u002Fa>\u003C\u002Fb>\n\u003C\u002Fp>\n\n## 📚 Tutorials, examples, and more\n\nBasic tutorials:\n\n- Getting started: [tutorial](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1uCphNY7gfAUkdDrTx21dZZwCOUDCMPw8?usp=sharing)\n- Prompt-tune Llama-65B for text semantic classification: [tutorial](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fbigscience-workshop\u002Fpetals\u002Fblob\u002Fmain\u002Fexamples\u002Fprompt-tuning-sst2.ipynb)\n- Prompt-tune BLOOM to create a personified chatbot: [tutorial](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fbigscience-workshop\u002Fpetals\u002Fblob\u002Fmain\u002Fexamples\u002Fprompt-tuning-personachat.ipynb)\n\nUseful tools:\n\n- [Chatbot web app](https:\u002F\u002Fchat.petals.dev) (connects to Petals via an HTTP\u002FWebSocket endpoint): [source code](https:\u002F\u002Fgithub.com\u002Fpetals-infra\u002Fchat.petals.dev)\n- [Monitor](https:\u002F\u002Fhealth.petals.dev) for the public swarm: [source code](https:\u002F\u002Fgithub.com\u002Fpetals-infra\u002Fhealth.petals.dev)\n\nAdvanced guides:\n\n- Launch a private swarm: [guide](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FLaunch-your-own-swarm)\n- Run a custom model: [guide](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FRun-a-custom-model-with-Petals)\n\n### Benchmarks\n\nPlease see **Section 3.3** of our [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.01188.pdf).\n\n### 🛠️ Contributing\n\nPlease see our [FAQ](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FFAQ:-Frequently-asked-questions#contributing) on contributing.\n\n### 📜 Citations\n\nAlexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Max Ryabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, and Colin Raffel.\n[Petals: Collaborative Inference and Fine-tuning of Large Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.01188)\n_Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)._ 2023.\n\n```bibtex\n@inproceedings{borzunov2023petals,\n  title = {Petals: Collaborative Inference and Fine-tuning of Large Models},\n  author = {Borzunov, Alexander and Baranchuk, Dmitry and Dettmers, Tim and Riabinin, Maksim and Belkada, Younes and Chumachenko, Artem and Samygin, Pavel and Raffel, Colin},\n  booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},\n  pages = {558--568},\n  year = {2023},\n  url = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.01188}\n}\n```\n\nAlexander Borzunov, Max Ryabinin, Artem Chumachenko, Dmitry Baranchuk, Tim Dettmers, Younes Belkada, Pavel Samygin, and Colin Raffel.\n[Distributed inference and fine-tuning of large language models over the Internet.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.08361)\n_Advances in Neural Information Processing Systems_ 36 (2023).\n\n```bibtex\n@inproceedings{borzunov2023distributed,\n  title = {Distributed inference and fine-tuning of large language models over the {I}nternet},\n  author = {Borzunov, Alexander and Ryabinin, Max and Chumachenko, Artem and Baranchuk, Dmitry and Dettmers, Tim and Belkada, Younes and Samygin, Pavel and Raffel, Colin},\n  booktitle = {Advances in Neural Information Processing Systems},\n  volume = {36},\n  pages = {12312--12331},\n  year = {2023},\n  url = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.08361}\n}\n```\n\n--------------------------------------------------------------------------------\n\n\u003Cp align=\"center\">\n    This project is a part of the \u003Ca href=\"https:\u002F\u002Fbigscience.huggingface.co\u002F\">BigScience\u003C\u002Fa> research workshop.\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fbigscience-workshop_petals_readme_4e5bdd16a807.png\" width=\"150\">\n\u003C\u002Fp>\n","\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002F7eR7Pan.png\" width=\"400\">\u003Cbr>\n    以BitTorrent的方式，在家运行大型语言模型。\u003Cbr>\n    微调与推理速度比传统卸载技术\u003Ccode>最高快10倍\u003C\u002Fcode>\u003Cbr>\u003Cbr>\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fpetals\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fpetals.svg?color=green\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FtfHfe8B34k\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F865254854262652969?label=discord&logo=discord&logoColor=white\">\u003C\u002Fa>\n    \u003Cbr>\n\u003C\u002Fp>\n\n使用分布式**Llama 3.1**（高达4050亿参数）、**Mixtral**（8×220亿参数）、**Falcon**（400亿+参数）或**BLOOM**（1760亿参数）生成文本，并针对您的特定任务进行微调——直接在您的台式电脑或Google Colab上即可完成：\n\n```python\nfrom transformers import AutoTokenizer\nfrom petals import AutoDistributedModelForCausalLM\n\n# 选择 https:\u002F\u002Fhealth.petals.dev 上的任意可用模型\nmodel_name = \"meta-llama\u002FMeta-Llama-3.1-405B-Instruct\"\n\n# 连接到托管模型各层的分布式网络\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoDistributedModelForCausalLM.from_pretrained(model_name)\n\n# 像在本地运行一样使用该模型\ninputs = tokenizer(\"A cat sat\", return_tensors=\"pt\")[\"input_ids\"]\noutputs = model.generate(inputs, max_new_tokens=5)\nprint(tokenizer.decode(outputs[0]))  # A cat sat on a mat...\n```\n\n\u003Cp align=\"center\">\n    🚀 &nbsp;\u003Cb>\u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1uCphNY7gfAUkdDrTx21dZZwCOUDCMPw8?usp=sharing\">立即在Colab中尝试\u003C\u002Fa>\u003C\u002Fb>\n\u003C\u002Fp>\n\n🦙 **想运行Llama吗？** 请先[申请访问权限](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FMeta-Llama-3.1-405B-Instruct)获取其权重，然后在加载模型前在终端中运行`huggingface-cli login`。或者直接在我们的[聊天机器人应用](https:\u002F\u002Fchat.petals.dev)中试用。\n\n🔏 **隐私保护。** 您的数据将由公共蜂群中的其他用户协助处理。更多关于隐私的信息，请参阅[这里](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FSecurity,-privacy,-and-AI-safety)。对于敏感数据，您可以与信任的人建立一个[私有蜂群](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FLaunch-your-own-swarm)。\n\n💬 **有任何问题吗？** 请在我们的[Discord](https:\u002F\u002Fdiscord.gg\u002FKdThf2bWVU)中联系我们！\n\n## 连接您的GPU，提升Petals容量\n\nPetals是一个由社区运营的系统——我们依赖于大家共享自己的GPU资源。您可以帮助提供其中一种[可用模型](https:\u002F\u002Fhealth.petals.dev)的服务，也可以从🤗 [Model Hub](https:\u002F\u002Fhuggingface.co\u002Fmodels)托管一个新的模型！\n\n例如，以下是如何在您的GPU上托管部分[Llama 3.1（4050亿参数）Instruct](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FMeta-Llama-3.1-405B-Instruct)的方法：\n\n🦙 **想托管Llama吗？** 请先[申请访问权限](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FMeta-Llama-3.1-405B-Instruct)获取其权重，然后在加载模型前在终端中运行`huggingface-cli login`。\n\n🐧 **Linux + Anaconda。** 对于NVIDIA GPU，请运行以下命令（AMD则请参考[这篇文档](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FRunning-on-AMD-GPU)）：\n\n```bash\nconda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia\npip install git+https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\npython -m petals.cli.run_server meta-llama\u002FMeta-Llama-3.1-405B-Instruct\n```\n\n🪟 **Windows + WSL。** 请按照我们Wiki上的[指南](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FRun-Petals-server-on-Windows)操作。\n\n🐋 **Docker。** 对于NVIDIA GPU，可运行我们的[Docker](https:\u002F\u002Fwww.docker.com)镜像（AMD则请参考[这篇文档](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FRunning-on-AMD-GPU)）：\n\n```bash\nsudo docker run -p 31330:31330 --ipc host --gpus all --volume petals-cache:\u002Fcache --rm \\\n    learningathome\u002Fpetals:main \\\n    python -m petals.cli.run_server --port 31330 meta-llama\u002FMeta-Llama-3.1-405B-Instruct\n```\n\n🍏 **macOS + Apple M1\u002FM2 GPU。** 安装[Homebrew](https:\u002F\u002Fbrew.sh\u002F)后，运行以下命令：\n\n```bash\nbrew install python\npython3 -m pip install git+https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\npython3 -m petals.cli.run_server meta-llama\u002FMeta-Llama-3.1-405B-Instruct\n```\n\n\u003Cp align=\"center\">\n    📚 &nbsp;\u003Cb>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FFAQ:-Frequently-asked-questions#running-a-server\">了解更多\u003C\u002Fa>\u003C\u002Fb>（如何使用多块GPU、开机自启动服务器等）\n\u003C\u002Fp>\n\n🔒 **安全性。** 托管服务器并不会让他人在您的计算机上运行自定义代码。更多信息请参阅[这里](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FSecurity,-privacy,-and-AI-safety)。\n\n💬 **有任何问题吗？** 请在我们的[Discord](https:\u002F\u002Fdiscord.gg\u002FX7DgtxgMhc)中联系我们！\n\n🏆 **感谢！** 当您加载并托管超过10个区块时，我们可以在[蜂群监控器](https:\u002F\u002Fhealth.petals.dev)上展示您的姓名或链接，以表达谢意。您可以通过`--public_name YOUR_NAME`来指定这些信息。\n\n## 它是如何工作的？\n\n- 您只需加载模型的一小部分，然后加入一个由其他人提供其余部分服务的[网络](https:\u002F\u002Fhealth.petals.dev)。单批次推理速度可达**Llama 2**（700亿参数）每秒6个token，以及**Falcon**（1800亿参数）每秒4个token——足以支持[聊天机器人](https:\u002F\u002Fchat.petals.dev)和交互式应用。\n- 您可以采用任何微调和采样方法，执行自定义的模型路径，或查看其隐藏状态。您将享受到API的便利性，同时兼具**PyTorch**和**🤗 Transformers**的灵活性。\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FRTYF3yW.png\" width=\"800\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    📜 &nbsp;\u003Cb>\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.01188.pdf\">阅读论文\u003C\u002Fa>\u003C\u002Fb>\n    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n    📚 &nbsp;\u003Cb>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FFAQ:-Frequently-asked-questions\">查看常见问题解答\u003C\u002Fa>\u003C\u002Fb>\n\u003C\u002Fp>\n\n## 📚 教程、示例及其他资源\n\n基础教程：\n\n- 入门：[教程](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1uCphNY7gfAUkdDrTx21dZZwCOUDCMPw8?usp=sharing)\n- 使用Llama-65B对文本进行语义分类的提示调优：[教程](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fbigscience-workshop\u002Fpetals\u002Fblob\u002Fmain\u002Fexamples\u002Fprompt-tuning-sst2.ipynb)\n- 使用BLOOM创建个性化聊天机器人的提示调优：[教程](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fbigscience-workshop\u002Fpetals\u002Fblob\u002Fmain\u002Fexamples\u002Fprompt-tuning-personachat.ipynb)\n\n实用工具：\n\n- [聊天机器人Web应用](https:\u002F\u002Fchat.petals.dev)（通过HTTP\u002FWebSocket端点连接到Petals）：[源代码](https:\u002F\u002Fgithub.com\u002Fpetals-infra\u002Fchat.petals.dev)\n- 公共蜂群的[监控器](https:\u002F\u002Fhealth.petals.dev)：[源代码](https:\u002F\u002Fgithub.com\u002Fpetals-infra\u002Fhealth.petals.dev)\n\n进阶指南：\n\n- 启动私有蜂群：[指南](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FLaunch-your-own-swarm)\n- 运行自定义模型：[指南](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FRun-a-custom-model-with-Petals)\n\n### 基准测试\n\n请参阅我们的[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.01188.pdf)中的**第3.3节**。\n\n### 🛠️ 贡献\n\n请参阅我们的[常见问题解答](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FFAQ:-Frequently-asked-questions#contributing)，了解如何贡献。\n\n### 📜 引用\n\nAlexander Borzunov、Dmitry Baranchuk、Tim Dettmers、Max Ryabinin、Younes Belkada、Artem Chumachenko、Pavel Samygin 和 Colin Raffel。\n[Petals：大型模型的协作推理与微调。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.01188)\n《第61届计算语言学协会年会论文集（第3卷：系统演示）》。2023年。\n\n```bibtex\n@inproceedings{borzunov2023petals,\n  title = {Petals: Collaborative Inference and Fine-tuning of Large Models},\n  author = {Borzunov, Alexander and Baranchuk, Dmitry and Dettmers, Tim and Riabinin, Maksim and Belkada, Younes and Chumachenko, Artem and Samygin, Pavel and Raffel, Colin},\n  booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},\n  pages = {558--568},\n  year = {2023},\n  url = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.01188}\n}\n```\n\nAlexander Borzunov、Max Ryabinin、Artem Chumachenko、Dmitry Baranchuk、Tim Dettmers、Younes Belkada、Pavel Samygin 和 Colin Raffel。\n[互联网上的大型语言模型分布式推理与微调。](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.08361)\n《神经信息处理系统进展》第36卷（2023年）。\n\n```bibtex\n@inproceedings{borzunov2023distributed,\n  title = {Distributed inference and fine-tuning of large language models over the {I}nternet},\n  author = {Borzunov, Alexander and Ryabinin, Max and Chumachenko, Artem and Baranchuk, Dmitry and Dettmers, Tim and Belkada, Younes and Samygin, Pavel and Raffel, Colin},\n  booktitle = {Advances in Neural Information Processing Systems},\n  volume = {36},\n  pages = {12312--12331},\n  year = {2023},\n  url = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.08361}\n}\n```\n\n--------------------------------------------------------------------------------\n\n\u003Cp align=\"center\">\n    本项目是 \u003Ca href=\"https:\u002F\u002Fbigscience.huggingface.co\u002F\">BigScience\u003C\u002Fa> 研究研讨会的一部分。\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fbigscience-workshop_petals_readme_4e5bdd16a807.png\" width=\"150\">\n\u003C\u002Fp>","# Petals 快速上手指南\n\nPetals 是一个去分布式的开源工具，允许你在本地电脑或 Google Colab 上以 BitTorrent 风格运行超大语言模型（如 Llama 3.1 405B、Mixtral、Falcon 等）。它通过连接公共网络中的其他用户共享的 GPU 资源，实现模型的分布式推理和微调。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**：Linux (推荐), Windows (需 WSL), macOS (Apple M1\u002FM2)\n- **GPU**：NVIDIA GPU (推荐), AMD GPU, 或 Apple Silicon\n- **Python**：3.8 及以上版本\n- **网络**：稳定的互联网连接（用于连接分布式节点）\n\n### 前置依赖\n- **NVIDIA 用户**：需安装 CUDA 驱动及 Toolkit (推荐 CUDA 11.7+)\n- **Hugging Face 账号**：若需运行 Llama 等受限模型，需先申请权限并登录。\n\n## 安装步骤\n\n根据你的操作系统选择以下一种方式进行安装：\n\n### 方案 A：Linux + Conda (推荐 NVIDIA 用户)\n```bash\n# 创建环境并安装 PyTorch (CUDA 11.7)\nconda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia\n\n# 安装 Petals\npip install git+https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\n```\n\n### 方案 B：macOS (Apple M1\u002FM2)\n```bash\n# 安装 Homebrew (如果尚未安装)\nbrew install python\n\n# 安装 Petals\npython3 -m pip install git+https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\n```\n\n### 方案 C：Docker (通用，支持 NVIDIA GPU)\n```bash\nsudo docker run -p 31330:31330 --ipc host --gpus all --volume petals-cache:\u002Fcache --rm \\\n    learningathome\u002Fpetals:main \\\n    python -m petals.cli.run_server --port 31330 meta-llama\u002FMeta-Llama-3.1-405B-Instruct\n```\n*(注：Windows 用户建议使用 WSL 参考 Linux 方案，或查阅官方 Wiki 使用 Docker)*\n\n> **注意**：如果你计划运行 **Llama 3.1** 等受限模型，请在终端执行以下命令登录 Hugging Face：\n> ```bash\n> huggingface-cli login\n> ```\n\n## 基本使用\n\n以下是最简单的 Python 代码示例，展示如何加载分布式模型并生成文本。该示例将自动连接公共网络中托管 `Llama-3.1-405B` 模型层的节点。\n\n```python\nfrom transformers import AutoTokenizer\nfrom petals import AutoDistributedModelForCausalLM\n\n# 选择模型名称 (可在 https:\u002F\u002Fhealth.petals.dev 查看可用模型)\nmodel_name = \"meta-llama\u002FMeta-Llama-3.1-405B-Instruct\"\n\n# 初始化分词器并连接分布式网络加载模型\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoDistributedModelForCausalLM.from_pretrained(model_name)\n\n# 像使用本地模型一样进行推理\ninputs = tokenizer(\"A cat sat\", return_tensors=\"pt\")[\"input_ids\"]\noutputs = model.generate(inputs, max_new_tokens=5)\n\n# 输出结果\nprint(tokenizer.decode(outputs[0])) \n# 预期输出类似：A cat sat on a mat...\n```\n\n### 贡献算力（可选）\n如果你想为社区贡献自己的 GPU 算力来托管模型的一部分，可以运行以下命令启动服务器：\n\n```bash\n# Linux\u002FMac 示例：托管 Llama 3.1 405B 的部分层级\npython -m petals.cli.run_server meta-llama\u002FMeta-Llama-3.1-405B-Instruct\n```","一位独立开发者希望在仅配备单张消费级显卡的笔记本电脑上，对超大规模的 Llama 3.1-405B 模型进行本地推理测试，以验证其在新业务场景中的表现。\n\n### 没有 petals 时\n- **硬件门槛极高**：4050 亿参数的模型需要数十张 A100\u002FH100 显卡才能运行，个人设备显存完全不足，根本无法加载模型。\n- **云端成本昂贵**：若租用云厂商的多卡集群进行临时测试，每小时费用高达数百元，对于频繁的实验迭代来说预算难以承受。\n- **数据隐私风险**：将敏感的业务测试数据上传至第三方云平台存在泄露隐患，无法满足内部合规要求。\n- **部署流程繁琐**：配置分布式推理环境涉及复杂的网络通信和显存优化代码，开发周期长且容易出错。\n\n### 使用 petals 后\n- **打破硬件限制**：利用 BitTorrent 式的分布式网络，直接调用全球志愿者共享的算力，在本地笔记本上即可流畅运行 405B 超大模型。\n- **大幅降低成本**：无需租赁昂贵的专用集群，免费或低成本接入公共 swarm，使个人开发者的试错成本几乎降为零。\n- **灵活保障隐私**：对于敏感数据，可快速搭建仅限可信节点加入的私有 swarm，确保数据不出局域网即可完成推理。\n- **极简代码集成**：仅需几行 Python 代码替换 transformers 库的加载逻辑，即可像使用本地小模型一样调用分布式大模型，无缝衔接现有工作流。\n\npetals 通过众包算力共享模式，让超大规模语言模型的推理与微调真正走下神坛，成为个人开发者触手可及的日常工具。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fbigscience-workshop_petals_0c189823.png","bigscience-workshop","BigScience Workshop","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fbigscience-workshop_1ddc064e.png","Research workshop on large language models - The Summer of Language Models 21",null,"bigscience-contact@googlegroups.com","BigScienceW","https:\u002F\u002Fbigscience.huggingface.co","https:\u002F\u002Fgithub.com\u002Fbigscience-workshop",[82,86,90],{"name":83,"color":84,"percentage":85},"Python","#3572A5",99.7,{"name":87,"color":88,"percentage":89},"Dockerfile","#384d54",0.2,{"name":91,"color":92,"percentage":93},"Shell","#89e051",0.1,10070,602,"2026-04-15T13:33:45","MIT","Linux, macOS, Windows (via WSL)","运行客户端非必需（利用公共网络）；托管服务器节点必需。支持 NVIDIA GPU (需 CUDA 11.7+)、AMD GPU 或 Apple M1\u002FM2 GPU。具体显存需求取决于托管的模型块大小，未明确最低数值。","未说明",{"notes":102,"python":103,"dependencies":104},"该工具采用分布式架构，用户可在本地仅加载部分模型层，其余层通过公共网络由其他用户提供，从而在普通电脑或 Google Colab 上运行超大模型（如 Llama 3.1 405B）。若选择托管服务器贡献算力，Linux 推荐使用 Conda 安装 PyTorch 和 CUDA 11.7；macOS 需安装 Homebrew；Windows 需使用 WSL。托管 Llama 系列模型需先在 Hugging Face 申请权限并登录。隐私方面，公共集群中数据会经过他人节点，敏感数据建议搭建私有集群。","未说明 (通过 Homebrew 或 conda 安装)",[105,106,107,64],"pytorch","pytorch-cuda=11.7","transformers",[14,16,35,13],[110,111,112,113,114,115,116,105,117,118,119,120,121,122,123,124,125,126,127,128],"bloom","deep-learning","distributed-systems","language-models","large-language-models","machine-learning","neural-networks","volunteer-computing","pipeline-parallelism","tensor-parallelism","guanaco","llama","chatbot","gpt","transformer","nlp","pretrained-models","falcon","mixtral","2026-03-27T02:49:30.150509","2026-04-16T10:46:48.484638",[132,137,142,147,152,157],{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},35748,"如何在 macOS (包括 M1\u002FM2 芯片) 上运行 Petals？","Petals 现已原生支持 macOS，包括客户端和服务端（支持 Apple M1\u002FM2 GPU）。请确保使用 Python 3.10+（可通过 Homebrew 安装：`brew install python`），然后运行以下命令升级至最新版本即可直接使用：\n\n```bash\npip install --upgrade git+https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\n```\n\n如果遇到安装问题，请检查是否使用了正确的 Python 版本。","https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fissues\u002F147",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},35749,"启动 Mixtral 私有集群服务器时出现 'missing 1 required positional argument: layer_idx' 错误怎么办？","这是一个已知问题，修复代码已合并到主分支（master）。请升级到最新版本的 Petals 以解决该错误：\n\n```bash\npip install --upgrade git+https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\n```\n\n升级后，使用 `python3 -m petals.cli.run_server` 启动 Mixtral 模型服务器应能正常工作。","https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fissues\u002F569",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},35750,"脚本退出时遇到 'NoneType object is not callable' 错误如何处理？","这是 P2P 守护进程终止时的一个已知问题，已在依赖库 `hivemind` 中修复。由于 Petals 主分支依赖 `hivemind` 的最新主分支，您只需升级 Petals 即可自动获取修复：\n\n```bash\npip install --upgrade git+https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\n```\n\n升级后该报错将不再出现。","https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fissues\u002F237",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},35751,"可以在 Windows 上安装和运行 Petals 吗？","目前官方原生安装主要支持 Linux 和 macOS。在 Windows 上直接通过 pip 安装可能会因为 `uvloop` 不支持 Windows 而报错 (`RuntimeError: uvloop does not support Windows at the moment`)。\n\n解决方案：\n1. 推荐使用 WSL (Windows Subsystem for Linux) 在 Windows 上运行 Linux 环境来安装 Petals。\n2. 或者关注社区提供的集成方案（如 Lollms 项目发布的 Windows 安装包），这些第三方工具可能已经处理了兼容性问题。","https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fissues\u002F488",{"id":153,"question_zh":154,"answer_zh":155,"source_url":156},35752,"chat.petals 服务频繁断开或生成失败是什么原因？","如果在使用 chat.petals.ml 时遇到请求失败、Traceback 错误或连接中断，通常是因为公共网络中的节点不稳定或负载过高。\n\n建议尝试以下方法：\n1. 刷新页面重试。\n2. 如果是自行搭建的客户端，检查网络连接及防火墙设置。\n3. 考虑自行部署私有节点或加入更稳定的私有集群（Swarm），以获得更可靠的推理服务。公共演示服务可能因资源限制而出现波动。","https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fissues\u002F382",{"id":158,"question_zh":159,"answer_zh":160,"source_url":136},35753,"在 Docker 中部署时遇到工作目录或端口暴露问题如何解决？","如果在 Docker 容器中运行 Petals 相关服务（如 Web 服务器）时遇到问题，请确保在 Dockerfile 中正确设置了工作目录和端口暴露。一个典型的配置示例如下：\n\n```dockerfile\nEXPOSE 5000\nWORKDIR \u002Fapp\nCMD [\"gunicorn\", \"app:app\", \"--bind\", \"0.0.0.0:5000\", \"--threads\", \"100\", \"--timeout\", \"1000\"]\n```\n\n确保在启动命令前将 `WORKDIR` 切换回应用目录，否则可能导致文件找不到或服务无法启动。",[162,167,172,177,182,187,192,197,202,207,212],{"id":163,"version":164,"summary_zh":165,"released_at":166},280944,"v2.2.0","## 亮点\n\n🦅 **Falcon 支持。** Petals 现在支持所有基于 [Falcon](https:\u002F\u002Fhuggingface.co\u002Fblog\u002Ffalcon) 的模型，包括今天发布的 [Falcon 180B](https:\u002F\u002Fhuggingface.co\u002Ftiiuae\u002Ffalcon-180B)。我们优化了 🤗 Transformers 的 `FalconModel` 实现，在近期的 GPU 上速度提升了高达 40%。我们的 [聊天机器人应用](http:\u002F\u002Fchat.petals.dev) 可以以约 2 tokens\u002F秒的速度运行 Falcon 180B-Chat。\n\nFalcon-40B 采用 Apache 2.0 许可证，因此您可以通过指定 `tiiuae\u002Ffalcon-40b` 或 `tiiuae\u002Ffalcon-40b-instruct` 作为模型名称来加载它。而 Falcon-180B 则采用自定义许可证，目前尚不清楚我们是否能够为该模型提供用于推理和微调的 Python 接口。现阶段，它仅在聊天机器人应用中可用，我们正在等待 TII 方面就此问题的进一步说明。\n\n🍏 **原生 macOS 支持。** 您现在可以在 macOS 上原生运行 Petals 客户端和服务器——只需安装 [Homebrew](https:\u002F\u002Fbrew.sh\u002F) 并执行以下命令：\n\n```bash\nbrew install python\npython3 -m pip install git+https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\npython3 -m petals.cli.run_server petals-team\u002FStableBeluga2\n```\n\n如果您的电脑配备 Apple M1\u002FM2 芯片，Petals 服务器将自动使用集成显卡。我们建议仅部署基于 Llama 的模型，因为其他受支持的架构目前在 M1\u002FM2 芯片上运行效率仍不高。此外，我们也推荐在 macOS 上使用 Python 3.10 或更高版本（由 Homebrew 自动安装）。\n\n🔌 **部署自定义模型。** 自定义模型现在会自动出现在 https:\u002F\u002Fhealth.petals.dev 上，并标记为“非官方支持”模型。提醒您，您并不局限于 https:\u002F\u002Fhealth.petals.dev 上提供的模型，完全可以运行一个托管任何基于 BLOOM、Llama 或 Falcon 架构的模型的服务器（前提是该模型的许可证允许），甚至可以自行添加对 [新架构](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FRun-a-custom-model-with-Petals) 的支持。本次发布还进一步提升了 Petals 与一些流行的 Llama 基础模型（例如来自 [NousResearch](https:\u002F\u002Fhuggingface.co\u002FNousResearch) 的模型）的兼容性。\n\n🐞 **错误修复。** 本版本还修复了前缀调优模型的推理问题，该问题在 Petals 2.1.0 中曾出现故障。\n\n## 变更内容\n* 要求 transformers>=4.32.0，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F479 中提出\n* 修复 transformers>=4.32.0 的依赖要求，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F480 中完成\n* 重写 MemoryCache 的 alloc_timeout 逻辑，由 @justheuristic 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F434 中完成\n* 重构 README 文件，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F482 中完成\n* 原生支持 macOS，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F477 中实现\n* 移除 PrioritizedTaskPool 中的空操作进程，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F484 中完成\n* 修复 `.generate(input_ids=...)` 方法，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F485 中完成\n* 等待 DHT ","2023-09-06T17:29:56",{"id":168,"version":169,"summary_zh":170,"released_at":171},280945,"v2.1.0","## 亮点\n\n🔌 **与 🤗 Transformers 生成工具的兼容性。** Petals 模型现在直接使用 🤗 Transformers 的 **[.generate()](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmain\u002Fen\u002Fmain_classes\u002Ftext_generation#transformers.GenerationMixin.generate)** 实现，而非自定义的生成代码。这意味着您可以使用 🤗 Transformers 中实现的各种生成方法和约束条件（例如 `repetition_penalty`、束搜索等），并确保 Petals 与本地运行的模型在生成结果上完全一致。\n\n大多数常用方法都支持复用推理会话，因此您可以在不从头重新处理对话历史的情况下多次调用 `.generate()`：\n\n```python\nwith model.inference_session(max_length=100):\n    outputs1 = model.generate(user_prompt1, repetition_penalty=1.2)\n    outputs2 = model.generate(user_prompt2, repetition_penalty=1.2)\n```\n\n⚡ **Stable Beluga 2 加载速度更快。** 我们对目前最受欢迎的模型 [Stable Beluga 2](https:\u002F\u002Fhuggingface.co\u002Fpetals-team\u002FStableBeluga2) 进行了重新打包，以提升其加载速度，并最大限度地降低内存和磁盘空间需求。重新打包后的版本可从 `petals-team\u002FStableBeluga2` 仓库加载，且与使用标准仓库（`stabilityai\u002FStableBeluga2`）的客户端和服务器完全兼容。\n\n现在，客户端只需下载 **1.05 GB 的数据** 即可运行 Stable Beluga 2（此前约需 20 GB），并且仅需 **4 GB 内存**（此前约需 20 GB）。服务器则需要下载和存储的数据量减少至原来的 **一半**，同时从磁盘加载模型的速度也显著加快。如果您正从旧仓库切换，请别忘了删除 `~\u002F.cache\u002Fpetals\u002Fmodels--stabilityai--StableBeluga2` 目录下的旧缓存，以节省磁盘空间。\n\n⏱️ **推理响应更迅速。** 在旧版本中，当服务器处理大型前缀（数千个标记）时，可能会出现几秒钟的无响应状态。而本次发布允许在处理大型请求的过程中插入小型推理请求（几个标记），从而避免因有人正在处理大型前缀而导致逐标记推理时出现卡顿的情况。\n\n🔒 **其他小幅改进。** 本版本新增了对服务器端以 [safetensors](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fsafetensors) 格式加载权重的支持，并添加了 `blocked_servers` 客户端选项，用于避开指定的一组服务器：\n\n```python\nfrom petals import AutoDistributedModelForCausalLM\n\nblocked_servers = [\"12D3KooWA6g...\", \"12D3KooWGyD...\"]  # 来自 https:\u002F\u002Fhealth.petals.dev 的完整对等节点 ID\nmodel = AutoDistributedModelForCausalLM.from_pretrained(model_name, blocked_servers=blocked_servers)\n```\n\n🐞 **错误修复。** 本版本还包含多项错误修复，旨在加速 [聊天机器人应用](https:\u002F\u002Fchat.petals.dev) 和微调过程，更好地绕过近期断开连接的服务器，改进负载均衡算法和基准测试的易用性，修正吞吐量测量问题以及安装问题等。","2023-08-24T16:42:00",{"id":173,"version":174,"summary_zh":175,"released_at":176},280946,"v2.0.1","## 亮点\n\n🛣️ **更长序列的推理。** 我们将 Llama 2 的最大序列长度扩展至 **8192 个标记**，并引入了分块处理机制，以避免服务器因处理过长前缀而出现内存不足错误。这一改进得益于 Llama 2 中采用的多查询注意力机制，该机制使注意力缓存所需的 GPU 内存减少了 8 倍。现在，您可以通过 Petals 客户端处理更长的序列，并在 https:\u002F\u002Fchat.petals.dev 上进行长达 8192 个标记的对话。\n\n🐍 **支持 Python 3.11。** Petals 客户端和服务器现已兼容 Python 3.11。\n\n🐞 **Bug 修复。** 我们修复了服务器的 `--token` 参数（用于提供您的 🤗 Model Hub [访问令牌](https:\u002F\u002Fhuggingface.co\u002Fsettings\u002Ftokens)，以便加载 Llama 2）、服务器中可能出现的死锁、微调速度相关的问题（通过中继提供的服务器会被降级优先级）以及其他一些次要的负载均衡问题。\n\n🪟 **在 Windows 上运行服务器。** 我们为在 WSL（Windows 子系统 for Linux）中运行服务器提供了更完善的[指南](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FRun-Petals-server-on-Windows)。\n\n📦 **在 Runpod 上运行服务器。** 我们添加了关于如何在 Runpod 上使用 Petals 模板的[指南](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FFAQ:-Frequently-asked-questions#cloud-providers)。\n\n## 变更内容\n* @justheuristic 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F390 中更新了 petals.dev。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F391 中将版本号提升至 2.0.0.post3。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F392 中修复了 `--attn_cache_tokens` 的默认值。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F396 中修复了 MemoryCache 中的死锁问题。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F393 中增加了对 Python 3.11 的支持。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F399 中修复了通过中继的路由、默认网络 RPS、`--token` 参数、日志记录以及 README 文件中的问题。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F404 中提出，如果测速失败，则假设网络速率为 100 Mbit\u002Fs。\n* @justheuristic 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F403 中实现了将长序列拆分为多个分块的功能。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F406 中向 README 文件添加了 Llama 2 和 WSL 使用说明。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F407 中更新了 README.md 文件。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F409 中更新了 README 文件中关于托管 Llama 2 的命令。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F410 中更新了 `--update_period` 和 `--expiration` 的默认值。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F411 中将版本号提升至 2.0.1。\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fcompare\u002Fv2.0.0.post1...v2.0.1","2023-07-23T14:54:09",{"id":178,"version":179,"summary_zh":180,"released_at":181},280947,"v2.0.0.post1","我们很高兴地宣布 Petals 2.0.0 — 这是迄今为止最大的 Petals 版本！\n\n## 亮点\n\n🦙 **支持 LLaMA 和 LLaMA 2。** 我们新增了对基于 🤗 Transformers [`LlamaModel`](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmain\u002Fmodel_doc\u002Fllama) 的任何模型的 **推理与微调** 支持，包括 [LLaMA](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fllama\u002Fblob\u002Fllama_v1\u002FMODEL_CARD.md) 和 [LLaMA 2](https:\u002F\u002Fai.meta.com\u002Fllama\u002F) 的所有变体——这是当今最强的开源模型之一。公共 Swarm 托管了这些模型中规模最大的版本：LLaMA-65B 以及 LLaMA 2（70B 和 70B-Chat），能够以 **每秒 5–6 个 token** 的速度提供推理服务。\n\n- 您可以在 💬 **聊天机器人 Web 应用** 中试用，或在 🚀 **我们的 Colab 教程** 中体验。\n\n🗜️ **4 位量化。** 我们集成了来自最新论文《QLoRA：高效微调量化大语言模型》（[arXiv:2305.14314](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14314)）的高效 4 位（NF4）量化技术。与我们此前采用的 8 位量化相比，这种方法可减少约 40% 的 GPU 内存占用（从而减少约 40% 的服务器需求），同时使逐 token 推理的速度提升约 2 倍，且仅带来相对较小的质量损失。\n\n🔌 **预加载 LoRA 适配器，例如 Guanaco。** 我们新增了预加载与 🤗 [PEFT](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpeft) 库兼容的 LoRA 适配器的功能，这些适配器可以为您托管的模型增添额外功能。您可以通过在服务器端使用 `--adapters` 参数来实现这一点（例如：`--adapters repo1\u002Fadapter1 repo2\u002Fadapter2`）。这些适配器会在客户端请求时被激活——具体来说，客户端在加载分布式模型时可以指定 `.from_pretrained(..., active_adapter=\"repo1\u002Fadapter1\")`。一个例子就是 [Guanaco](https:\u002F\u002Fhuggingface.co\u002Ftimdettmers\u002Fguanaco-65b)，它是针对 LLaMA 的 **指令微调适配器**，能够将 LLaMA 转化为一个善于遵循用户指令、响应细致的聊天机器人。您可以在我们的 **聊天机器人应用** 中尝试使用带有此适配器的 LLaMA。\n\n➡️ **服务器间直接通信。** 此前，由于我们容错推理算法的特殊性，服务器之间无法直接传输张量。此次更新改变了这一状况，从而减少了服务器与客户端之间的往返时间，并显著提升远距离客户端的推理速度。\n\n🛣️ **推理的最短路径路由。** 以往，客户端无法准确选择地理位置上靠近且性能优异的服务器，因此可能会选中一条较慢的推理链，尤其是在 Swarm 中存在大量远离客户端的服务器时。现在，客户端会构建完整的客户端—服务器及服务器—服务器延迟图，并结合各服务器的推理速度，从所有可能的服务器链路中找到 **最快的推理路径**。同时，系统还会考虑每个服务器剩余的用于注意力缓存的 GPU 内存容量，以便…","2023-07-19T18:29:48",{"id":183,"version":184,"summary_zh":185,"released_at":186},280948,"v1.1.5","## 亮点\n\n**⏱ 更快的微调。** 微调时使用的流量减少了约 2 倍（张量现在默认以 bfloat16 格式传输），并且路由构建采用了一种最大化集群吞吐量的启发式方法。这应能解决微调过程中可能出现的超时错误。\n\n**🐞 错误修复。** 在服务器端，本次发布修复了内存不足错误以及网络吞吐量评估卡死的问题。在客户端，修复了对 `RemoteSequential` 进行切片时出现的问题，以及静默忽略不支持的 `.generate()` 关键字参数的问题。此外，还修复了源自 `hivemind.p2p` 和 `hivemind.compression` 的警告。\n\n**🛣️ 更新的吞吐量公式。** 我们更新了吞吐量公式，以反映托管多个区块的服务器在执行前向和反向传播时，每次仍只处理一个区块。如果您的吞吐量比 1.1.4 版本有所下降，请不必惊讶——这些数值并不具有直接可比性！\n\n**🖼️ 改进的底层接口。** 我们重构了 `RemoteSequential` 和 `RemoteSequenceManager` 等底层接口，使其更加可靠（例如在重试时）且使用起来更为简便。同时，移除了 `petals.dht_utils` 中一些较少使用的低级函数。\n\n## 变更内容\n* 修复在 accelerate ≥ 0.16.0 时发生的内存溢出问题，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F310 中完成。\n* 重构 RemoteSequenceManager，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F309 中完成。\n* 将 hivemind 更新至 1.1.8，并启用高效的 bfloat16 编码，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F311 中完成。\n* 将 `.make_sequence(..., mode=\"random\")` 替换为 `mode=\"max_throughput\"`，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F313 中完成。\n* 将计算吞吐量除以平均使用的区块数量，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F314 中完成。\n* 对于意外的 `.generate()` 关键字参数抛出错误，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F315 中完成。\n* 如果速度测试运行时间过长，则中止测试，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F316 中完成。\n* 将版本号提升至 1.1.5，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F312 中完成。\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fcompare\u002Fv1.1.4...v1.1.5","2023-05-09T23:03:13",{"id":188,"version":189,"summary_zh":190,"released_at":191},280949,"v1.1.4","## 亮点\n\n🗝️ **8位服务器支持更多GPU。** [bitsandbytes](https:\u002F\u002Fgithub.com\u002FTimDettmers\u002Fbitsandbytes\u002Freleases\u002Ftag\u002F0.38.0) 的更新为较旧一代的 NVIDIA GPU 以及 [GeForce 16](https:\u002F\u002Fru.wikipedia.org\u002Fwiki\u002FGeForce_16) 系列 GPU（例如 1660 Ti）带来了 8 位支持。如果您之前在某些 GPU 上遇到过类似 `您的 GPU 不支持 Int8 Matmul！` 和 `cublasLt 遇到错误！` 的报错，请尝试 Petals 1.1.4。此版本还在启用 [张量并行](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FFAQ:-Frequently-asked-questions#managing-gpus) 时，默认以 8 位加载权重。\n\n⏱️ **服务器启动更快。** 服务器将块权重从磁盘缓存加载到 GPU 显存所需的时间缩短了约两倍。下一版还将减少从互联网下载权重所需的时间，因为这些权重将以 8 位格式而非 16 位格式下载。\n\n🧵 **多线程客户端运行更快。** 此前，由于 hivemind 中的一个 bug，多线程客户端实际上每次只能执行一个网络请求。该 bug 最近已在 [hivemind](https:\u002F\u002Fgithub.com\u002Flearning-at-home\u002Fhivemind\u002Freleases\u002Ftag\u002F1.1.7) 中修复。这显著提升了 [chat.petals.ml](https:\u002F\u002Fgithub.com\u002Fborzunov\u002Fchat.petals.ml) 应用在多名用户同时聊天时的速度。\n\n⏱️ **客户端启动更快。** 客户端加载模型所需的时间减少了约 10%，因为它们会在加载本地部分模型（输入\u002F输出嵌入）的同时，并行构建通往远程服务器的路由。\n\n🌳 **放宽依赖项要求。** 我们放宽了对 [transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) 和其他 [huggingface](https:\u002F\u002Fgithub.com\u002Fhuggingface) 库的版本要求，因此您可以独立于 Petals 更新这些库。特别是，Petals 现已兼容 PyTorch 2.0 和最新的 `transformers` 版本。此外，我们还修复了一个 bug：在某些 `transformers` 版本中，客户端默认会以 float32 格式加载模型（而不是 bfloat16\u002Ffloat16）。如果您之前在运行客户端时遇到内存不足的错误，请尝试 Petals 1.1.4。\n\n## 变更内容\n\n* @mryab 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F285 中通过使用元权重初始化加速块加载\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F284 中向 README 添加基准测试\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F287 中修复 setup.cfg 中无效的作者邮箱\n* 紧急修复：@borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F292 中增加守护进程启动超时时间\n* @justheuristic 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F290 中更新 bitsandbytes、hivemind 和 transformers\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F298 中修复依赖项，并为张量并行默认启用 8 位模式\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F299 中将 Python 3.10 添加到 CI 流程\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002F 中移除 CustomLinear8bitLt","2023-04-21T02:26:19",{"id":193,"version":194,"summary_zh":195,"released_at":196},280950,"v1.1.3","## 亮点\r\n\r\n🐞 **Bug修复。** 我们修复了客户端、提示调优和张量并行中与超时错误相关的一系列小问题。\r\n\r\n⚙️ **客户端新增选项。** 新增了 `allowed_servers` 和 `max_retries` 选项：\r\n\r\n- `allowed_servers` 可用于限制客户端可用于请求的服务器集合（例如，仅使用被信任可处理您数据的服务器）。\r\n- `max_retries` 可用于限制客户端在抛出异常之前的最大重试次数（此前客户端会无限重试）。\r\n\r\n📚 **常见问题解答。** 我们发布了[FAQ页面](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FFAQ:-Frequently-asked-questions)，其中涵盖了运行客户端和服务器的常见问题，以及常见问题的故障排除方法。\r\n\r\n## 变更内容\r\n* 修复 prompt-tuning-sst2.ipynb 中的拼写错误，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F245 中完成。\r\n* 对 examples\u002Fprompt-tuning 笔记本进行小幅修改，由 @justheuristic 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F247 中完成。\r\n* 修复 sst 示例，并添加 cls_model 嵌入，由 @justheuristic 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F248 中完成。\r\n* 修复当使用 hypo_ids 时 TP 会崩溃的问题，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F249 中完成。\r\n* 向客户端添加 `allowed_servers` 和 `max_retries` 选项，并改进日志记录，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F235 中完成。\r\n* 降低流式处理程序的有效载荷大小阈值，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F251 中完成。\r\n* 改进可达性日志记录，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F253 中完成。\r\n* 在自述文件中添加 FAQ 链接，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F260 中完成。\r\n* 为公共 swarm 也显示可见地址，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F263 中完成。\r\n* 将重试之间的最大延迟限制为 15 分钟，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F264 中完成。\r\n* 使用 get_logger(__name__) 代替 get_logger(__file__)，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F265 中完成。\r\n* 改进“连接您的 GPU”提示信息，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F266 中完成。\r\n* 修复在非 x86_64 架构机器上使用 use_chunked_forward=\"auto\" 的问题，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F267 中完成。\r\n* 在 _MergedInferenceStep 中使用推理模式，由 @justheuristic 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F275 中完成。\r\n* 提高默认请求超时时间，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F276 中完成。\r\n\r\n\r\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fcompare\u002Fv1.1.2...v1.1.3","2023-03-01T09:15:25",{"id":198,"version":199,"summary_zh":200,"released_at":201},280951,"v1.1.2","## 亮点\n\n🏃‍♀️ **更快的推理速度。** 我们已部署服务器端改进，将推理速度提升多达30%。这是通过对服务器推理性能进行剖析得出的结果（详情请参阅 #224 和 #225）。一旦所有用户升级到最新版本的 Petals 并重启其服务器，公共 swarm 将会变得更加高效。\n\n🐞 **提示调优的错误修复。** 我们已发布针对提示调优笔记本的错误修复（详情请参阅 #231）。\n\n🧑‍🏫 **全新预训练模型。** 我们已将 BigScience 团队的 [BLOOMZ-176B](https:\u002F\u002Fhuggingface.co\u002Fbigscience\u002Fbloomz) 模型加入公共 swarm。您可以通过指定 `bigscience\u002Fbloomz-petals` 作为模型名称来运行该模型（或托管其区块）。\n\n- BLOOMZ 是 BLOOM 的一个版本，在零样本场景下经过微调，能够**遵循人类指令**。详情请参阅其[模型卡片](https:\u002F\u002Fhuggingface.co\u002Fbigscience\u002Fbloomz)和[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.01786)。\n- 现有的[聊天机器人应用](http:\u002F\u002Fchat.petals.ml\u002F)现已默认使用 BLOOMZ。您可以让它生成文本、代码，或执行各种任务。相比通常容易跑题、无法真正完成您请求任务的 BLOOM，BLOOMZ 的响应表现更为出色。\n\n## 变更内容\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F217 中为所有模型自动选择 --num_blocks 参数。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F218 中为“入门教程”添加了另一条链接。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F221 中在 README 文件中提及 BLOOMZ。\n* @zsc 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F227 中修复了错误信息中的一个拼写错误。\n* @justheuristic 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F225 中将推理池合并为一个，以提升推理速度。\n* @Muhtasham 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F219 中向 README 添加了引用说明。\n* @artek0chumak 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F231 中修复了微调笔记本中的 dtype 错误。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F234 中建议在提示调优时使用较小的模型以加快原型开发。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F244 中将版本号更新至 1.1.2。\n\n## 新贡献者\n* @zsc 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F227 中完成了首次贡献。\n* @Muhtasham 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F219 中完成了首次贡献。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fcompare\u002Fv1.1.1...v1.1.2","2023-01-30T20:38:50",{"id":203,"version":204,"summary_zh":205,"released_at":206},280952,"v1.1.1","## 亮点\n\n⛰️ **稳定性。** 本次发布在大量通过 NAT 穿透和中继接入的节点存在的情况下，提升了 Petals [DHT](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FDistributed_hash_table) 的 **稳定性和性能**。现在，DHT 更倾向于将密钥存储在可直接访问的对等节点上，从而使所有节点能够更快、更少出错地访问这些密钥。此外，本次还对块重新分配算法进行了小幅修复，减少了过去曾导致 Swarm 停机的过度重新分配问题。\n\n🌎 **基础路由。** 我们优化了推理阶段的路由算法，使客户端会优先选择持有更多数据块的服务器，以最小化延迟并 **提升推理速度**。目前这仍是一个基础算法，我们正在开发更加智能的路由策略（例如综合考虑延迟、吞吐量等因素），用于未来的推理和微调任务。此外，本次发布还让服务器共享更多自身的技术信息（如版本号、可用缓存等），以便未来更智能的路由算法使用，并可在 http:\u002F\u002Fhealth.petals.ml 上展示，便于调试。\n\n## 变更内容\n* 修复微调笔记本的自省功能，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F194 中完成。\n* 如果未能成功测量网络 RPS，则忽略该指标，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F198 中完成。\n* 当某个数据块的所有持有节点均被列入黑名单时，客户端应忽略该黑名单，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F197 中完成。\n* 提高 test_tp_block 测试中的容差范围，由 @justheuristic 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F196 中完成。\n* 修复 --no_auto_relay 帮助信息中的错误，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F199 中完成。\n* 在推理阶段的路由中采用按长度加权采样的方式，由 @justheuristic 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F204 中完成。\n* 在 rpc_info() 接口中返回可用缓存大小，由 @justheuristic 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F191 中完成。\n* 添加检查服务是否可被对等节点直接访问的功能，由 @justheuristic 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F195 中完成。\n* 在 rpc_info() 接口中报告服务器版本和 dht.client_mode，并在启动时检查更新，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F209 中完成。\n* 如果切换数据块会导致 Swarm 分裂，则不进行切换，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F210 中完成。\n* 修复恢复生成时的输出形状问题，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F211 中完成。\n* 改进缺失数据块时的错误提示，并建议用户加入自己的服务器，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F212 中完成。\n* CI：仅当 convert_model.py 或 setup.cfg 发生变化时才执行模型转换，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F213 中完成。\n* CI：更新已弃用的操作步骤，并不再测量网络 RPS，由 @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F215 中完成。\n* 将版本号升级至 1.1.1，由 @borzunov 在 https:\u002F\u002Fgithub.co","2023-01-13T21:41:32",{"id":208,"version":209,"summary_zh":210,"released_at":211},280953,"v1.1.0","## 亮点\n\n🏠  **NAT 穿透与中继。** 现在，即使您的机器位于 NAT 或防火墙之后，或者拥有动态 IP 地址，服务器也能自动加入 swarm。您无需手动设置端口转发，也不需要提供任何参数即可实现这一功能。\n\n- 请升级 Petals 包并重启所有服务器和客户端，以使用此功能或访问通过中继节点加入的服务器：\n\n    `pip install --upgrade petals`\n\n- __它是如何工作的？__ 如果服务器检测到由于 NAT 或防火墙而无法接受入站连接，它会向其中一个 **中继节点** 建立一条长期的出站连接，随后中继节点会通过这条连接将所有请求转发给该服务器。反过来，任何具有公网 IP 的服务器在必要时都可以充当中继节点。我们底层使用的是 libp2p 的电路中继技术：https:\u002F\u002Fdocs.libp2p.io\u002Fconcepts\u002Fnat\u002Fcircuit-relay\u002F\n\n💬 **聊天机器人应用。** 我们发布了一个基于 Petals 运行的聊天机器人应用：http:\u002F\u002Fchat.petals.ml ([源代码](https:\u002F\u002Fgithub.com\u002Fborzunov\u002Fchat.petals.ml))。\n\n- __免责声明：__ 此聊天机器人使用的是未经问答任务微调的常规 BLOOM 模型。请不要期望它能像 ChatGPT 那样表现。\n\n- __它是如何工作的？__ 在后台，这个 Web 应用程序使用我们的 HTTP 端点来运行公共 Petals swarm 上的推理任务。您可以将此端点用于自己的项目，也可以自行搭建另一个端点（无需 GPU）。API 文档请参见：https:\u002F\u002Fgithub.com\u002Fborzunov\u002Fchat.petals.ml#http-api-methods\n\n🏃‍♀️ **更快的纯 CPU 客户端。** 如果您的 CPU 支持 AVX512 指令集，那么纯 CPU 客户端现在运行速度几乎可以媲美配备 GPU 的客户端。这样一来，您就可以租用价格低廉的 CPU 实例来运行客户端或 HTTP 服务端点，就像我们为聊天机器人应用所使用的那样。\n\n- __如何使用？__ AVX512 主要存在于较新的 Intel Xeon CPU 中。您可以在 [DigitalOcean](https:\u002F\u002Fm.do.co\u002Fc\u002F4fc38037f84c) 上选择一个配备 16GB 以上内存的“专用 CPU”实例来租用。\n\n🏥 **Swarm 健康监控器。** 我们更新了 swarm 健康监控器：http:\u002F\u002Fhealth.petals.ml ([源代码](https:\u002F\u002Fgithub.com\u002Fborzunov\u002Fhealth.petals.ml))。它提供了已加入公共 swarm 的服务器概览，并报告任何连接问题。\n\n## 变更内容\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F172 中添加了 PyPI 徽章，并更新了 README 中的说明和链接。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F173 中添加了指向 PyPI 的链接。\n* @justheuristic 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F143 中增加了本地张量并行前向\u002F反向传播功能。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F175 中使 Docker 命令更加醒目。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F176 中允许禁用分块前向传播。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F179 中对支持 AVX512 的 CPU 禁用了 chunked_forward() 函数。\n* @borzunov 在 https:\u002F\u002Fgithub.com\u002Fbigscie 中减少了 .generate() 方法中的内存占用。","2023-01-10T11:53:49",{"id":213,"version":214,"summary_zh":215,"released_at":216},280954,"v1.0.0","## General\r\n\r\nThis release contains the core functionality of the Petals platform described in [our paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.01188.pdf).\r\n\r\n## What's Changed\r\n* Rudimentary decentralization by @justheuristic in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F9\r\n* Update model by @dbaranchuk in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F17\r\n* Chained rpc_forward & rpc_backward by @dbaranchuk in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F18\r\n* Implement block selection on servers by @borzunov in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F20\r\n* LM head module by @dbaranchuk in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F19\r\n* Measure and cache network & compute throughput by @borzunov in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F21\r\n* Shallow prompt tuning with run example on SST-2 by @dbaranchuk in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F22\r\n* minimalistic automated tests by @justheuristic in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F23\r\n* Clean up readme by @justheuristic in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F24\r\n* [Test CI] add instructions to test the full model by @justheuristic in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F25\r\n* Fix default branch in CI by @justheuristic in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F26\r\n* Fix CI runs in master by @justheuristic in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F27\r\n* CI: use GIT_REF_NAME instead of GIT_HEAD_REF by @justheuristic in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F28\r\n* Add GenerationMixin class by @artek0chumak in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F29\r\n* Decouple make_sequence and move to RemoteSequenceManager by @justheuristic in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F30\r\n* fix is_subsequence by @dbaranchuk in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F32\r\n* Miscellaneous fixes to automatic tests by @justheuristic in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F35\r\n* Efficient forward & backward by @dbaranchuk in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F36\r\n* Pack of Inference Changes by @artek0chumak in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F37\r\n* Support various backend dtypes & async serialization by @dbaranchuk in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F38\r\n* Use \"PETALS\" as the readme title by @borzunov in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F40\r\n* integrate mixed-8bit model by @dbaranchuk in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F39\r\n* Rename 350m -> 560m by @dbaranchuk in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F43\r\n* make pytest outputs more verbose by @justheuristic in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F44\r\n* Distributed prompt tuning by @dbaranchuk in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F42\r\n* Reduce vocabulary size in test model, fix bug in routing when overlapped by @justheuristic in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F45\r\n* Convert actual model weights by @dbaranchuk in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F46\r\n* [quickfix 1\u002Fn] remove expensive assertions in inference code by @justheuristic in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F48\r\n* [Fix] make distributed seq cls to not create the full bloom model by @dbaranchuk in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F49\r\n* Fix recovering for sequential_backward by @dbaranchuk in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F50\r\n* Inference: require max sequence length instead of assuming 2048 by @justheuristic in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F52\r\n* Add shallow prefix-tuned inference by @artek0chumak in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F55\r\n* remove transformer block, implement as sequence size 1 by @GreenFatGuy in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F54\r\n* Update readme for the 1st public release by @borzunov in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F57\r\n* Use latest version of Petals scheme, shrink Petals logo by @borzunov in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F59\r\n* Update bullet points with feedback from Tim and other people by @borzunov in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F61\r\n* Update readme with arxiv link and more discussions by @borzunov in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F62\r\n* Warn that current instructions involve 6B model but we will replace them soon by @borzunov in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F63\r\n* Add deep prompt inference by @artek0chumak in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F66\r\n* Fix calling rpc_info multiple times by @justheuristic in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F60\r\n* Make attention cache wait until memory is freed by @justheuristic in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fpull\u002F53\r\n* Build cpuonly from bitsandbytes main by @justheuristic in https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpet","2022-12-30T21:57:23"]