[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-databrickslabs--dolly":3,"tool-databrickslabs--dolly":61},[4,18,26,36,44,52],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",144730,2,"2026-04-07T23:26:32",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":53,"name":54,"github_repo":55,"description_zh":56,"stars":57,"difficulty_score":10,"last_commit_at":58,"category_tags":59,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,60],"视频",{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":77,"owner_url":78,"languages":79,"stars":88,"forks":89,"last_commit_at":90,"license":91,"difficulty_score":10,"env_os":92,"env_gpu":93,"env_ram":94,"env_deps":95,"category_tags":103,"github_topics":104,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":108,"updated_at":109,"faqs":110,"releases":140},5277,"databrickslabs\u002Fdolly","dolly","Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform","Dolly 是 Databricks 推出的一款开源大型语言模型，专为理解并执行用户指令而设计。它基于 EleutherAI 的 Pythia-12b 架构，利用约 1.5 万条由员工精心编写的高质量指令数据进行微调，涵盖了头脑风暴、分类、问答、信息提取及摘要等多种任务场景。\n\nDolly 主要解决了开源社区中缺乏可商用、且具备良好指令遵循能力模型的痛点。与许多仅限研究的模型不同，Dolly 获得了宽松的商业使用许可，让企业和开发者能够放心地将其集成到实际产品中。虽然它在数学计算、复杂编程或事实准确性上并非当前最顶尖的水平，但在处理日常自然语言指令时，展现出了远超其基础模型的惊喜表现。\n\n这款模型特别适合开发者、研究人员以及希望探索大模型应用的企业团队使用。对于想要快速构建原型、测试指令微调效果或寻找合规商用基座的技术人员来说，Dolly 是一个极佳的起点。其独特的技术亮点在于证明了使用少量高质量、人工生成的指令数据，也能显著提升模型的实际交互能力，为资源有限的团队提供了可行的技术路径。尽管存在已知局限，Dolly 仍代表了迈向普惠人工智能的重要一步。","# Dolly\n\nDatabricks’ [Dolly](https:\u002F\u002Fhuggingface.co\u002Fdatabricks\u002Fdolly-v2-12b) is an instruction-following large language model trained on the Databricks machine learning platform\nthat is licensed for commercial use. Based on `pythia-12b`, Dolly is trained on ~15k instruction\u002Fresponse fine tuning records\n[`databricks-dolly-15k`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdatabricks\u002Fdatabricks-dolly-15k) generated\nby Databricks employees in capability domains from the InstructGPT paper, including brainstorming, classification, closed QA, generation,\ninformation extraction, open QA and summarization. `dolly-v2-12b` is not a state-of-the-art model, but does exhibit surprisingly\nhigh quality instruction following behavior not characteristic of the foundation model on which it is based.\n\nDatabricks is committed to ensuring that every organization and individual benefits from the transformative power of artificial intelligence. The Dolly model family represents our first steps along this journey, and we’re excited to share this technology with the world.\n\nThe model is available on Hugging Face as [databricks\u002Fdolly-v2-12b](https:\u002F\u002Fhuggingface.co\u002Fdatabricks\u002Fdolly-v2-12b).\n\n## Model Overview\n\n`dolly-v2-12b` is a 12 billion parameter causal language model created by [Databricks](https:\u002F\u002Fdatabricks.com\u002F) that is derived from\n[EleutherAI’s](https:\u002F\u002Fwww.eleuther.ai\u002F) [Pythia-12b](https:\u002F\u002Fhuggingface.co\u002FEleutherAI\u002Fpythia-12b) and fine-tuned\non a [~15K record instruction corpus](https:\u002F\u002Fgithub.com\u002Fdatabrickslabs\u002Fdolly\u002Ftree\u002Fmaster\u002Fdata) generated by Databricks employees and released under a permissive license (CC-BY-SA)\n\n\n## Known Limitations\n\n### Performance Limitations\n**`dolly-v2-12b` is not a state-of-the-art generative language model** and, though quantitative benchmarking is ongoing, is not designed to perform\ncompetitively with more modern model architectures or models subject to larger pretraining corpuses.\n\nThe Dolly model family is under active development, and so any list of shortcomings is unlikely to be exhaustive, but we include known limitations and misfires here as a means to document and share our preliminary findings with the community.\nIn particular, `dolly-v2-12b` struggles with: syntactically complex prompts, programming problems, mathematical operations, factual errors,\ndates and times, open-ended question answering, hallucination, enumerating lists of specific length, stylistic mimicry, having a sense of humor, etc.\nMoreover, we find that `dolly-v2-12b` does not have some capabilities, such as well-formatted letter writing, present in the original model.\n\n### Dataset Limitations\nLike all language models, `dolly-v2-12b` reflects the content and limitations of its training corpuses.\n\n- **The Pile**: GPT-J’s pre-training corpus contains content mostly collected from the public internet, and like most web-scale datasets,\nit contains content many users would find objectionable. As such, the model is likely to reflect these shortcomings, potentially overtly\nin the case it is explicitly asked to produce objectionable content, and sometimes subtly, as in the case of biased or harmful implicit\nassociations.\n\n- **`databricks-dolly-15k`**: The training data on which `dolly-v2-12b` is instruction tuned represents natural language instructions generated\nby Databricks employees during a period spanning March and April 2023 and includes passages from Wikipedia as references passages\nfor instruction categories like closed QA and summarization. To our knowledge it does not contain obscenity, intellectual property or\npersonally identifying information about non-public figures, but it may contain typos and factual errors.\nThe dataset may also reflect biases found in Wikipedia. Finally, the dataset likely reflects\nthe interests and semantic choices of Databricks employees, a demographic which is not representative of the global population at large.\n\nDatabricks is committed to ongoing research and development efforts to develop helpful, honest and harmless AI technologies that\nmaximize the potential of all individuals and organizations.\n\n## Getting Started with Response Generation\n\nIf you'd like to simply test the model without training, the model is available on Hugging Face as [databricks\u002Fdolly-v2-12b](https:\u002F\u002Fhuggingface.co\u002Fdatabricks\u002Fdolly-v2-12b).\n\nTo use the model with the `transformers` library on a machine with A100 GPUs:\n\n```\nfrom transformers import pipeline\nimport torch\n\ninstruct_pipeline = pipeline(model=\"databricks\u002Fdolly-v2-12b\", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map=\"auto\")\n```\n\nYou can then use the pipeline to answer instructions:\n\n```\ninstruct_pipeline(\"Explain to me the difference between nuclear fission and fusion.\")\n```\n\n### Generating on Other Instances\n\nA100 instance types are not available in all cloud regions, or can be hard to provision. Inference is possible on other GPU instance types.\n\n#### A10 GPUs\n\nThe 6.9B and 2.8B param models should work as-is.\n\nTo generate using the 12B param model on A10s (ex: `g5.4xlarge`, 1 x A10 24GB), it's necessary to load and run generating using 8-bit weights, which impacts the results slightly:\n\n- Also install `bitsandbytes`\n- Add `model_kwargs={'load_in_8bit': True}` to the `pipeline()` command shown above\n\n#### V100 GPUs\n\nWhen using V100s (ex: `p3.2xlarge`, 1 x V100 16GB, `NC6s_v3`), in all cases, set `torch_dtype=torch.float16` in `pipeline()` instead.\n\nOtherwise, follow the steps above. The 12B param model may not function well in 8-bit on V100s.\n\n## Getting Started with Training\n\n- Add the `dolly` repo to Databricks (under Repos click Add Repo, enter `https:\u002F\u002Fgithub.com\u002Fdatabrickslabs\u002Fdolly.git`, then click Create Repo).\n- Start a `13.x ML (includes Apache Spark 3.4.0, GPU, Scala 2.12)` or later single-node cluster with node type having 8 A100 GPUs (e.g. `Standard_ND96asr_v4` or `p4d.24xlarge`). Note that these instance types may not be available in all regions, or may be difficult to provision. In Databricks, note that you must select the GPU runtime first, and unselect \"Use Photon\", for these instance types to appear (where supported).\n- Open the `train_dolly` notebook in the Repo (which is the `train_dolly.py` file in the Github `dolly` repo), attach to your GPU cluster, and run all cells.  When training finishes, the notebook will save the model under `\u002Fdbfs\u002Fdolly_training`.\n\n### Training on Other Instances\n\nA100 instance types are not available in all cloud regions, or can be hard to provision. Training is possible on other GPU instance types, \nfor smaller Dolly model sizes, and with small modifications to reduce memory usage. These modifications are not optimal, but are simple to make. \n\nSelect your GPU family type from the `gpu_family` widget, enter the number of GPUs available in the `num_gpus` widget, and then run the rest of the code. \nA number of different options will be set for you to train the model for one of the following GPU types:\n- A100 (default)\n- A10 \n- V100\n\nDetails of the different configurations are below.\n\n#### A100 GPUs\n\nA100 GPUs are preferred for training all model sizes, and are the only GPUs that can train the 12B param model in a reasonable amount of time.\nAs such, this is the default configuration, as set in the `a100_config.json` deepspeed config file.\n\n#### A10 GPUs\n\nTraining the 12B param model is not recommended on A10s.\n\nTo train the 6.9B param model on A10 instances (ex: `g5.24xlarge`, 4 x A10 24GB; `Standard_NV72ads_A10_v5`, 2 x A10),\nsimply select `a10` from the `gpu_family` widget and enter the number of GPUs available in the `num_gpus` widget, then run the rest of the code. \nThis will use the `a10_config.json` deepspeed config file, which makes the following changes:\n\n- `per-device-train-batch-size` and `per-device-eval-batch-size` are set to 3 in the `train_dolly.py` invocation of `deepspeed`\n- Within the `\"zero_optimization\"` section of the deepspeed config, we have added:\n  ```\n  \"offload_optimizer\": {\n    \"device\": \"cpu\",\n    \"pin_memory\": true\n  },\n  ```\n\n#### V100 GPUs\n\nTo run on V100 instances with 32GB of GPU memory (ex: `p3dn.24xlarge` or `Standard_ND40rs_v2`), \nsimply select `v100` from the `gpu_family` widget and enter the number of GPUs available in the `num_gpus` widget, and then run the rest of the code. \nThis will use the `v100_config.json` deepspeed config file, which makes the following changes:\n\n- It makes the changes described above for A10s\n- It enables fp16 floating point format\n- It sets the `per-device-train-batch-size` and `per-device-eval-batch-size` to 3\n  \nYou may be able to slightly increase the batch size with 32GB instances, compared to what works above for 24GB A10s.\n\n## Running Unit Tests Locally\n\n```\npyenv local 3.8.13\npython -m venv .venv\n. .venv\u002Fbin\u002Factivate\npip install -r requirements_dev.txt\n.\u002Frun_pytest.sh\n```\n\n## Citation\n\n```\n@online{DatabricksBlog2023DollyV2,\n    author    = {Mike Conover and Matt Hayes and Ankit Mathur and Jianwei Xie and Jun Wan and Sam Shah and Ali Ghodsi and Patrick Wendell and Matei Zaharia and Reynold Xin},\n    title     = {Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM},\n    year      = {2023},\n    url       = {https:\u002F\u002Fwww.databricks.com\u002Fblog\u002F2023\u002F04\u002F12\u002Fdolly-first-open-commercially-viable-instruction-tuned-llm},\n    urldate   = {2023-06-30}\n}\n```\n","# Dolly\n\nDatabricks 的 [Dolly](https:\u002F\u002Fhuggingface.co\u002Fdatabricks\u002Fdolly-v2-12b) 是一款在 Databricks 机器学习平台上训练的指令遵循型大型语言模型，允许商业使用。该模型基于 `pythia-12b`，并在由 Databricks 员工生成的约 1.5 万条指令-响应微调数据上进行训练，这些数据来自 InstructGPT 论文中的能力领域，包括头脑风暴、分类、封闭式问答、生成、信息抽取、开放式问答和摘要等。尽管 `dolly-v2-12b` 并非最先进模型，但它展现出与其基础模型不相符的高质量指令遵循能力。\n\nDatabricks 致力于让每个组织和个人都能受益于人工智能的变革力量。Dolly 模型家族是我们在这条道路上迈出的第一步，我们很高兴能与全世界分享这项技术。\n\n该模型已在 Hugging Face 上发布，地址为 [databricks\u002Fdolly-v2-12b](https:\u002F\u002Fhuggingface.co\u002Fdatabricks\u002Fdolly-v2-12b)。\n\n## 模型概述\n\n`dolly-v2-12b` 是由 Databricks 开发的一款拥有 120 亿参数的因果语言模型，其基础源自 EleutherAI 的 Pythia-12b，并在由 Databricks 员工生成、采用宽松许可协议（CC-BY-SA）发布的约 1.5 万条指令语料库上进行了微调。\n\n## 已知局限性\n\n### 性能局限性\n**`dolly-v2-12b` 并非最先进的生成式语言模型**，尽管量化基准测试仍在进行中，但它并未设计成能够与更现代的模型架构或经过更大规模预训练语料库训练的模型相媲美。\n\nDolly 模型家族目前仍在积极开发中，因此任何不足之处的列表都不可能详尽无遗。我们在此列出已知的局限性和失误，以记录并向社区分享我们的初步发现。具体而言，`dolly-v2-12b` 在以下方面存在困难：语法复杂的提示、编程问题、数学运算、事实性错误、日期和时间相关问题、开放式问答、幻觉现象、列举特定长度的列表、风格模仿、幽默感等。此外，我们还发现 `dolly-v2-12b` 缺乏一些原始模型所具备的能力，例如格式良好的书信写作。\n\n### 数据集局限性\n与所有语言模型一样，`dolly-v2-12b` 反映了其训练语料的内容和局限性。\n\n- **The Pile**：GPT-J 的预训练语料主要来源于公开互联网，如同大多数网络规模的数据集一样，其中包含许多用户认为令人反感的内容。因此，该模型很可能反映出这些缺陷，尤其是在被明确要求生成令人反感的内容时会更加明显；而在某些情况下，例如带有偏见或有害的隐含关联，则表现得更为微妙。\n  \n- **`databricks-dolly-15k`**：用于 `dolly-v2-12b` 指令微调的训练数据是由 Databricks 员工在 2023 年 3 月至 4 月期间生成的自然语言指令，其中还包括维基百科的相关段落，用作封闭式问答和摘要等指令类别的参考材料。据我们所知，该数据集中不包含淫秽内容、知识产权或关于非公众人物的个人身份信息，但可能存在错别字和事实性错误。此外，该数据集也可能反映维基百科中存在的偏见。最后，该数据集很可能反映了 Databricks 员工的兴趣和语义选择，而这一群体并不具有全球人口的代表性。\n\nDatabricks 致力于持续开展研究与开发工作，以打造有益、诚实且无害的人工智能技术，从而最大限度地发挥每个人和每个组织的潜力。\n\n## 开始生成响应\n\n如果您只想简单测试模型而无需进行训练，该模型已在 Hugging Face 上发布，地址为 [databricks\u002Fdolly-v2-12b](https:\u002F\u002Fhuggingface.co\u002Fdatabricks\u002Fdolly-v2-12b)。\n\n要在配备 A100 GPU 的机器上使用 `transformers` 库运行该模型：\n\n```\nfrom transformers import pipeline\nimport torch\n\ninstruct_pipeline = pipeline(model=\"databricks\u002Fdolly-v2-12b\", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map=\"auto\")\n```\n\n随后您可以使用该管道来回答指令：\n\n```\ninstruct_pipeline(\"请向我解释核裂变和核聚变的区别。\")\n```\n\n### 在其他实例上生成\n\n并非所有云区域都提供 A100 实例类型，或者可能难以部署。您也可以在其他 GPU 实例上进行推理。\n\n#### A10 GPU\n\n69 亿和 28 亿参数的模型可以直接使用。\n\n若要在 A10（如 `g5.4xlarge`，配备 1 块 24GB 的 A10）上运行 120 亿参数的模型，则需要加载并以 8 位权重运行，这会对结果产生轻微影响：\n\n- 需要安装 `bitsandbytes`\n- 在上述 `pipeline()` 命令中添加 `model_kwargs={'load_in_8bit': True}`\n\n#### V100 GPU\n\n使用 V100（如 `p3.2xlarge`，配备 1 块 16GB 的 V100，或 `NC6s_v3`）时，在所有情况下都应将 `pipeline()` 中的 `torch_dtype` 设置为 `torch.float16`。\n\n其余步骤与之前相同。不过，120 亿参数的模型在 V100 上可能无法很好地以 8 位模式运行。\n\n## 开始训练\n\n- 将 `dolly` 仓库添加到 Databricks（在“仓库”选项中点击“添加仓库”，输入 `https:\u002F\u002Fgithub.com\u002Fdatabrickslabs\u002Fdolly.git`，然后点击“创建仓库”）。\n- 启动一个包含 8 块 A100 GPU 的单节点集群，节点类型可选 `Standard_ND96asr_v4` 或 `p4d.24xlarge`，版本为 `13.x ML`（包含 Apache Spark 3.4.0、GPU 和 Scala 2.12）或更高。请注意，这些实例类型并非在所有地区都可用，或者可能难以部署。在 Databricks 中，对于支持这些实例类型的区域，必须先选择 GPU 运行时环境，并取消勾选“使用 Photon”，才能显示这些实例类型。\n- 打开仓库中的 `train_dolly` 笔记本（即 GitHub `dolly` 仓库中的 `train_dolly.py` 文件），将其连接到您的 GPU 集群，并运行所有单元格。训练完成后，笔记本会将模型保存在 `\u002Fdbfs\u002Fdolly_training` 目录下。\n\n### 其他实例上的训练\n\n并非所有云区域都提供 A100 实例类型，或者可能难以预置。对于较小的 Dolly 模型规模，并通过少量修改以降低内存使用量，可以在其他 GPU 实例类型上进行训练。这些修改虽然不是最优方案，但实现起来较为简单。\n\n请从 `gpu_family` 小部件中选择您的 GPU 系列类型，在 `num_gpus` 小部件中输入可用的 GPU 数量，然后运行其余代码。系统将为您设置若干不同的选项，以便在以下 GPU 类型之一上训练模型：\n- A100（默认）\n- A10\n- V100\n\n不同配置的详细信息如下。\n\n#### A100 GPU\n\nA100 GPU 是训练所有模型规模的首选，也是唯一能够在合理时间内训练 12B 参数模型的 GPU。因此，这是默认配置，由 `a100_config.json` DeepSpeed 配置文件设定。\n\n#### A10 GPU\n\n不建议在 A10 上训练 12B 参数模型。\n\n要在 A10 实例（例如 `g5.24xlarge`，配备 4 个 24GB 的 A10；或 `Standard_NV72ads_A10_v5`，配备 2 个 A10）上训练 6.9B 参数模型，只需在 `gpu_family` 小部件中选择 `a10`，并在 `num_gpus` 小部件中输入可用的 GPU 数量，然后运行其余代码即可。这将使用 `a10_config.json` DeepSpeed 配置文件，该文件会进行以下更改：\n\n- 在 `train_dolly.py` 中调用 `deepspeed` 时，将 `per-device-train-batch-size` 和 `per-device-eval-batch-size` 设置为 3。\n- 在 DeepSpeed 配置文件的 `\"zero_optimization\"` 部分中，添加了以下内容：\n  ```\n  \"offload_optimizer\": {\n    \"device\": \"cpu\",\n    \"pin_memory\": true\n  },\n  ```\n\n#### V100 GPU\n\n要在配备 32GB 显存的 V100 实例（例如 `p3dn.24xlarge` 或 `Standard_ND40rs_v2`）上运行，只需在 `gpu_family` 小部件中选择 `v100`，并在 `num_gpus` 小部件中输入可用的 GPU 数量，然后运行其余代码即可。这将使用 `v100_config.json` DeepSpeed 配置文件，该文件会进行以下更改：\n\n- 应用上述针对 A10 的修改。\n- 启用 fp16 浮点格式。\n- 将 `per-device-train-batch-size` 和 `per-device-eval-batch-size` 均设置为 3。\n\n与 24GB 的 A10 相比，您或许可以在 32GB 的实例上略微提高批次大小。\n\n## 在本地运行单元测试\n\n```\npyenv local 3.8.13\npython -m venv .venv\n. .venv\u002Fbin\u002Factivate\npip install -r requirements_dev.txt\n.\u002Frun_pytest.sh\n```\n\n## 引用\n\n```\n@online{DatabricksBlog2023DollyV2,\n    author    = {Mike Conover and Matt Hayes and Ankit Mathur and Jianwei Xie and Jun Wan and Sam Shah and Ali Ghodsi and Patrick Wendell and Matei Zaharia and Reynold Xin},\n    title     = {免费 Dolly：推出全球首个真正开放的指令微调大型语言模型},\n    year      = {2023},\n    url       = {https:\u002F\u002Fwww.databricks.com\u002Fblog\u002F2023\u002F04\u002F12\u002Fdolly-first-open-commercially-viable-instruction-tuned-llm},\n    urldate   = {2023-06-30}\n}\n```","# Dolly 快速上手指南\n\nDolly 是 Databricks 推出的开源指令遵循大语言模型（基于 Pythia-12b），拥有约 120 亿参数，支持商业用途。它经过约 1.5 万条人工生成的指令数据微调，具备出色的指令跟随能力。\n\n## 环境准备\n\n### 系统要求\n- **GPU 推荐**：\n  - **推理**：推荐使用 A100 GPU。若使用 A10 (24GB) 或 V100 (16GB\u002F32GB)，需调整量化或精度设置（详见下文）。\n  - **训练**：全量训练 12B 模型强烈建议使用 8 卡 A100 实例。较小模型或显存优化后可在 A10\u002FV100 上运行。\n- **Python 版本**：建议 Python 3.8+\n- **框架依赖**：PyTorch, Transformers, Accelerate, DeepSpeed (训练必需)\n\n### 前置依赖安装\n确保已安装 CUDA 驱动及对应的 PyTorch 版本。若需使用 8-bit 量化加载大模型，还需安装 `bitsandbytes`。\n\n```bash\npip install transformers accelerate torch bitsandbytes\n```\n\n> **国内加速提示**：如遇下载缓慢，可配置 Hugging Face 镜像源：\n> ```bash\n> export HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com\n> ```\n\n## 安装步骤\n\nDolly 无需复杂的源码编译，主要通过 Hugging Face `transformers` 库直接加载。\n\n1. **创建虚拟环境（可选但推荐）**\n   ```bash\n   python -m venv .venv\n   source .venv\u002Fbin\u002Factivate\n   ```\n\n2. **安装核心依赖**\n   ```bash\n   pip install -r requirements_dev.txt\n   # 或者手动安装关键包\n   pip install transformers accelerate torch bitsandbytes\n   ```\n\n3. **验证安装**\n   确保 `torch.cuda.is_available()` 返回 `True`。\n\n## 基本使用\n\n以下示例展示如何使用 `transformers` pipeline 快速加载模型并进行推理。\n\n### 场景一：标准环境（推荐 A100 GPU）\n\n在配备 A100 GPU 的机器上，可直接加载 `bfloat16` 精度模型：\n\n```python\nfrom transformers import pipeline\nimport torch\n\ninstruct_pipeline = pipeline(model=\"databricks\u002Fdolly-v2-12b\", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map=\"auto\")\n```\n\n执行指令：\n\n```python\ninstruct_pipeline(\"Explain to me the difference between nuclear fission and fusion.\")\n```\n\n### 场景二：A10 GPU (24GB 显存)\n\n若在单张 A10 (24GB) 上运行 12B 模型，需启用 8-bit 量化以节省显存：\n\n1. 确保已安装 `bitsandbytes`。\n2. 修改加载代码，添加 `load_in_8bit` 参数：\n\n```python\nfrom transformers import pipeline\nimport torch\n\ninstruct_pipeline = pipeline(\n    model=\"databricks\u002Fdolly-v2-12b\", \n    torch_dtype=torch.bfloat16, \n    trust_remote_code=True, \n    device_map=\"auto\",\n    model_kwargs={'load_in_8bit': True}\n)\n```\n\n### 场景三：V100 GPU\n\n在使用 V100 (16GB 或 32GB) 时，需将数据类型改为 `float16`。注意 12B 模型在 V100 上使用 8-bit 量化可能效果不佳。\n\n```python\nfrom transformers import pipeline\nimport torch\n\ninstruct_pipeline = pipeline(\n    model=\"databricks\u002Fdolly-v2-12b\", \n    torch_dtype=torch.float16, \n    trust_remote_code=True, \n    device_map=\"auto\"\n)\n```\n\n> **注意**：Dolly 并非最顶尖的生成式模型，在处理复杂编程、数学运算、事实性问答或长列表枚举时可能存在局限。","某电商初创公司的数据团队需要快速从每日累积的客户评论中提取关键反馈并生成摘要报告，以指导产品迭代。\n\n### 没有 dolly 时\n- 数据分析师需手动阅读数千条评论，耗时数小时才能整理出零散的改进建议，效率极低。\n- 缺乏统一的指令遵循模型，通用大模型往往忽略特定的提取格式要求，导致输出结果杂乱无章，难以直接入库。\n- 由于预算有限无法调用昂贵的商业 API，团队只能依赖基础模型，其生成的摘要常遗漏重要细节或产生幻觉，可信度存疑。\n- 每次调整提取维度（如从“物流”改为“包装”）都需要重新编写复杂的规则代码，维护成本高昂且灵活性差。\n\n### 使用 dolly 后\n- 利用 dolly 强大的指令遵循能力，只需输入自然语言指令，即可在几分钟内自动完成万条评论的分类、关键信息提取及摘要生成。\n- dolly 能严格遵照预设的 JSON 格式输出结构化数据，无缝对接内部数据库，彻底消除了人工清洗格式的时间成本。\n- 基于开源免费且允许商用的特性，团队在零 API 调用成本下，获得了远超基础模型的高质量回复，显著降低了运营支出。\n- 面对多变的分析需求，仅需微调提示词即可让 dolly 即时切换任务场景（如从情感分析转为竞品对比），响应速度提升十倍。\n\ndolly 让中小团队也能以低成本拥有高质量、可定制的指令跟随能力，将繁琐的数据处理工作转化为即时的业务洞察。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fdatabrickslabs_dolly_f700a9b0.png","databrickslabs","Databricks Labs","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fdatabrickslabs_a62f5a96.png","Labs projects to accelerate use cases on the Databricks Unified Analytics Platform",null,"https:\u002F\u002Fdatabricks.com\u002Flearn\u002Flabs","https:\u002F\u002Fgithub.com\u002Fdatabrickslabs",[80,84],{"name":81,"color":82,"percentage":83},"Python","#3572A5",99.7,{"name":85,"color":86,"percentage":87},"Shell","#89e051",0.3,10793,1145,"2026-04-07T08:56:28","Apache-2.0","未说明","必需 NVIDIA GPU。推理：推荐 A100；A10 (24GB) 需使用 8-bit 量化运行 12B 模型；V100 (16GB\u002F32GB) 需使用 float16，12B 模型在 V100 上 8-bit 效果不佳。训练：12B 模型推荐多卡 A100 (如 8x A100)；6.9B 模型可在多卡 A10 (24GB) 上训练；32GB V100 可尝试训练但需调整配置。","未说明 (取决于 GPU 显存及是否启用 CPU 卸载)",{"notes":96,"python":97,"dependencies":98},"1. 推理时若使用 A10 或显存受限，需安装 bitsandbytes 并启用 8-bit 加载 (`load_in_8bit=True`)；V100 需指定 `torch_dtype=torch.float16`。2. 训练 12B 模型强烈建议使用 8 卡 A100 环境；较小模型在 A10\u002FV100 上训练需修改 DeepSpeed 配置以启用优化器卸载 (offload_optimizer) 和调整批次大小。3. 在 Databricks 平台训练时需选择 GPU 运行时并取消勾选 'Use Photon'。4. 本地开发测试建议创建虚拟环境并安装 requirements_dev.txt。","3.8.13",[99,100,101,102],"torch","transformers","bitsandbytes","deepspeed",[35,13,16],[105,106,107,64],"databricks","gpt","chatbot","2026-03-27T02:49:30.150509","2026-04-08T10:48:07.167461",[111,116,121,126,131,136],{"id":112,"question_zh":113,"answer_zh":114,"source_url":115},24167,"加载模型时出现 'Could not load model... with any of the following classes' 错误怎么办？","该错误通常由环境配置问题引起。常见解决方案包括：\n1. 检查 NVIDIA 驱动是否正常安装和运行，如有问题可参考 AWS 文档重新安装驱动。\n2. 如果在 Mac (M1\u002FM2) 上运行，确保使用 Python 3.10+、macOS Ventura 13.3+ 以及 PyTorch 和 Transformers 的 nightly 版本。\n3. 在 Mac 上运行时，建议设置环境变量 `PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.5` 以防止 MPS 后端内存溢出。","https:\u002F\u002Fgithub.com\u002Fdatabrickslabs\u002Fdolly\u002Fissues\u002F60",{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},24168,"遇到 'fused_adam.so: cannot open shared object file' 导入错误如何解决？","这是因为 DeepSpeed 未正确编译包含 fused adam 优化器的扩展。请尝试以下步骤重新安装 DeepSpeed：\n1. 克隆仓库：`git clone https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDeepSpeed.git`\n2. 进入目录并安装：`cd DeepSpeed` 然后运行 `DS_BUILD_UTILS=1 DS_BUILD_FUSED_ADAM=1 pip install .`\n注意：如果遇到 cuda 和 gcc 版本不匹配的错误，可能需要降低 gcc 版本后重试。如果直接使用 `pip install deepspeed` 失败，务必使用上述源码编译方式。","https:\u002F\u002Fgithub.com\u002Fdatabrickslabs\u002Fdolly\u002Fissues\u002F119",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},24169,"训练时报错 'HFValidationError: Repo id must be in the form repo_name or namespace\u002Frepo_name' 怎么处理？","当加载本地模型路径时，Hugging Face 可能会误将其识别为仓库 ID。解决方法如下：\n1. 确保传入的是模型的**绝对完整路径**（例如 `\u002Fhome\u002Fmodels\u002Fumt5-xxl`），相对路径（如 `~\u002Fmodels`）通常会失败。\n2. 在加载 tokenizer 或模型时，添加参数 `local_files_only=True`。\n3. 设置环境变量以启用离线模式：`import os; os.environ[\"TRANSFORMERS_OFFLINE\"] = \"1\"`。\n此外，检查是否错误地使用了包含 'base' 字样的变量名，有时去掉 'base' 前缀也能解决问题。","https:\u002F\u002Fgithub.com\u002Fdatabrickslabs\u002Fdolly\u002Fissues\u002F118",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},24170,"在使用多 GPU 训练时遇到 'CUDA out of memory' 错误该如何调整？","显存不足通常可以通过调整 DeepSpeed 配置和输入数据来解决：\n1. 在 DeepSpeed 配置文件 (`ds_config.json`) 中启用 ZeRO Stage 3 优化，并开启 `offload_optimizer` 将优化器状态卸载到 CPU。\n2. 减小 `per-device-train-batch-size`（每个设备的批次大小），甚至设为 1。\n3. 检查输入数据长度，如果进行了截断（truncating），过长的输入仍可能导致显存碎片化，建议直接丢弃过长的样本而不是截断。","https:\u002F\u002Fgithub.com\u002Fdatabrickslabs\u002Fdolly\u002Fissues\u002F140",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},24171,"在特定集群（如 NCasT4_v3）上运行时出现 CUDA 内存分配错误怎么办？","即使集群总内存充足，也可能因 PyTorch 内存管理策略导致分配失败。建议措施：\n1. 确认是否遗漏了 `--deepspeed` 启动参数，缺少该参数会导致无法利用分布式显存优化。\n2. 检查 PyTorch 版本兼容性，某些旧版本（如 1.13）可能存在内存管理问题，尝试升级或匹配推荐的 PyTorch 版本。\n3. 根据报错提示，可以尝试设置 `max_split_size_mb` 环境变量来避免内存碎片化。","https:\u002F\u002Fgithub.com\u002Fdatabrickslabs\u002Fdolly\u002Fissues\u002F35",{"id":137,"question_zh":138,"answer_zh":139,"source_url":115},24172,"如何在 MacBook Air M2 上成功运行 Dolly 模型？","在 Apple Silicon (M1\u002FM2) 上运行需要特定的环境配置：\n1. 系统要求：macOS Ventura 13.3 或更高版本，Python 3.10。\n2. 必须安装 PyTorch 和 Transformers 的 nightly 构建版本（非稳定版）。\n3. 运行前务必设置环境变量：`export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.5`，这能防止 MPS 后端因内存水位线过高而崩溃。\n4. 可选：安装 `xformers` 库可略微提升推理速度（约节省 10 秒）。",[]]