[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tool-arielnlee--Platypus":3,"similar-arielnlee--Platypus":93},{"id":4,"github_repo":5,"name":6,"description_en":7,"description_zh":8,"ai_summary_zh":8,"readme_en":9,"readme_zh":10,"quickstart_zh":11,"use_case_zh":12,"hero_image_url":13,"owner_login":14,"owner_name":15,"owner_avatar_url":16,"owner_bio":17,"owner_company":18,"owner_location":19,"owner_email":20,"owner_twitter":21,"owner_website":22,"owner_url":23,"languages":24,"stars":37,"forks":38,"last_commit_at":39,"license":20,"difficulty_score":40,"env_os":41,"env_gpu":42,"env_ram":43,"env_deps":44,"category_tags":55,"github_topics":20,"view_count":58,"oss_zip_url":20,"oss_zip_packed_at":20,"status":59,"created_at":60,"updated_at":61,"faqs":62,"releases":92},2152,"arielnlee\u002FPlatypus","Platypus","Code for fine-tuning Platypus fam LLMs using LoRA","Platypus 是一套基于 LLaMA 和 LLaMA-2 架构的高效大语言模型微调方案，旨在通过低成本、快速的方式提升模型性能。它核心解决了传统大模型微调过程中资源消耗巨大、训练周期长以及数据利用效率低等痛点，让开发者无需昂贵算力即可定制专属模型。\n\n该项目主要面向 AI 研究人员、算法工程师及希望深入探索大模型微调的开发者。Platypus 的独特技术亮点在于巧妙结合了 LoRA（低秩适应）和 PEFT（参数高效微调）技术，仅更新极少量参数即可实现显著的效果提升。项目不仅开源了完整的微调代码和数据清洗流程，还提供了经过验证的最佳超参数配置（如学习率、LoRA 秩等），并支持多 GPU 数据并行与模型并行策略，灵活适配不同规模的计算资源。此外，Platypus 社区活跃，已推出多个融合变体模型（如 OpenOrca-Platypus2），并提供了便捷的命令行工具与 FastChat 集成方案，帮助用户轻松部署本地聊天机器人或进行二次开发。无论是学术研究还是工程落地，Platypus 都为大模型的精细化打磨提供了一条务实且高效的路径。","# Platypus: Quick, Cheap, and Powerful Refinement of LLMs (https:\u002F\u002Fplatypus-llm.github.io)\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Farielnlee_Platypus_readme_9cf79195ed76.png\" alt=\"Platypus\" width=\"300\"\u002F>\n\u003C\u002Fp>\n\nThe Platypus models are a series of fine-tuned and merged variants based on the LLaMA and LLaMa-2 transformer architectures. Platypus takes advantage of [LoRA](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2106.09685.pdf) and [PEFT](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpeft). \n\nAll models and dataset available via HuggingFace: [`garage-bAInd`](https:\u002F\u002Fhuggingface.co\u002Fgarage-bAInd)\n\n## Updates\n\n**8\u002F21\u002F23**: If you're fine-tuning LLaMa-2 7B, please add `bf16=True` and change `fp16=False` in the HF trainer. LLaMa-1 7B works as is. **This only applies to LLaMa-2 7B.** Additionally, if you are using 1 GPU, please change `ddp_find_unused_paramters=False` in the HF trainer. We will be updating the fine-tuning script to handle these changes automatically. \n\n**8\u002F14\u002F23**: We have cleaned up our pipeline and added data refinement and similarity code. Within in the next few days we'll have a script to reproduce our exact dataset from 11 open-source datasets.\n\n**8\u002F13\u002F23**: An unquantized GPU chatbot of OpenOrca-Platypus2-13B, our most recent collab, is available via Hugging Face spaces, courtesy of OpenOrca: [Chat now!](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FOpen-Orca\u002FOpenOrca-Platypus2-13B)\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Farielnlee_Platypus_readme_20fa22a8c60f.jpeg\" alt=\"Platypus\" width=\"120\"\u002F>\n\u003C\u002Fp>\n\n**8\u002F11\u002F23**: Our [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.07317) and [project website](https:\u002F\u002Fplatypus-llm.github.io) have been released!\n\n## CLI \n\n[Fastchat](https:\u002F\u002Fgithub.com\u002Flm-sys\u002FFastChat) provides a simple setup for those interested in running the model. After downloading the model through HuggingFace, clone the Fastchat repository:\n\n```\ngit clone https:\u002F\u002Fgithub.com\u002Flm-sys\u002FFastChat.git\ncd FastChat\n```\n\nDownload the required packages:\n\n```\npip3 install --upgrade pip  # enable PEP 660 support\npip3 install -e .\n```\n\nFinally, run the following:\n\n```\npython3 -m fastchat.serve.cli --model-path garage-bAInd\u002FPlatypus-30B --conv_template alpaca\n```\n\n## Local Setup\n\nThis repository is multi-GPU friendly, and provides code to use model or data parellelism, depending on your computational resources. \n\n1. Install dependencies\n\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n2. Be sure to use these exact requirements or you may run into model saving or OOM issues.\n\n## Fine-tuning (`finetune.py`)\n\nRun `fine-tuning.sh`.\n\nNote: The script above uses `torchrun` for data parallelism. PyTorch is not in `requirements.txt` since technically you can run fine-tuning without it (after a few minor changes to the .py file). To use `fine-tuning.sh`, please install [PyTorch](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F). We recommend using `torchrun` and PyTorch 2.0+ for speed + `torch.compile`. If you do not install pytorch, or use an alternative method like `accelerate launch`, please take time to comment out any torch related lines in the scirpts.\n\nHyperparameters used to fine-tune Platypus:\n\n| Hyperparameter      | Value 13B \u002F 70B  |\n|---------------------|--------|\n| learning rate       | 4e-4 \u002F 3e-4   |\n| batch size          | 16     |\n| microbatch  size    | 1      |\n| warmup steps        | 100    |\n| epochs              | 1      |\n| weight decay        | 0.     |\n| lr scheduler        | cosine |\n| lora alpha          | 16     |\n| lora rank           | 16     |\n| lora dropout        | 0.05   |\n| lora target modules | gate_proj, up_proj, down_proj|\n| cutoff length       | 4096   |\n| train on inputs     | False  |\n| group by length     | False  |\n| add eos token       | False  |\n\nExample for how to calcualte gradient accumulation steps using 2 GPUs: = global_batch_size \u002F micro_batch_size \u002F num_gpus = 16 \u002F 1 \u002F 2 = 8.\n\nIf your model **cannot** fit on the memory of each GPU, please use the alternative fine-tuning option below (or utilize accelerate, FDSP, etc.) to take advantage of model parallelism. A good alternative to torchrun is accelerate. \n\n```bash\npython finetune.py \\\n    --base_model meta-llama\u002FLlama-2-70b-hf \\\n    --data-path .\u002Ffinal_data.json \\\n    --output_dir .\u002Fllama2-platypus-70b \\\n    --batch_size 16 \\\n    --micro_batch_size 1 \\\n    --num_epochs 1 \\\n    --learning_rate 0.0003 \\\n    --cutoff_len 4096 \\\n    --val_set_size 0 \\\n    --lora_r 16 \\\n    --lora_alpha 16 \\\n    --lora_dropout 0.05 \\\n    --lora_target_modules '[gate_proj, down_proj, up_proj]' \\\n    --train_on_inputs False \\\n    --add_eos_token False \\\n    --group_by_length False \\\n    --prompt_template_name alpaca \\\n    --lr_scheduler 'cosine' \\\n    --warmup_steps 100\n```\n\n## Merging\n\nOnce you've completed a fine-tuning, use `merge.sh` to merge the LoRA weights back into the base LLaMa model (or base model of your choice) for export to HuggingFace format.\n\nWhile we are experimenting on better and alternative ways to merge (stay tuned!), our current merging process relies on the basic linear merge provided by PEFT. Before we fine-tune, we search for possible models to merge with and the datasets used to create them (to the best of our ability). The success of our LoRA merges stems from using the right data. Our most successful merges have little to no overlap in fine-tuning data. For example, GPlatty-30B is a merge of Platypus-30B and gpt4-alpaca-lora-30b. We saw a 2% jump in accuracy for GPlatty, and the datasets used to fine-tune the aforementioned two LoRA-based models had very low similarity scores. Please see [our paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.07317) for additional information. \n\n**NOTE:** If you encounter any errors while merging, please try uninstalling bitsandbytes and peft, then reinstalling with the newest versions (peft should always be installed from source).\n\n## Dataset Refinement\n\nWe used keyword search to find STEM and logic questions in the 11 open-source datasets that make up [Open-Platypus](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fgarage-bAInd\u002FOpen-Platypus). Then, to remove duplicates and redundancy, we perform a cosine similarity check of the questions using SentenceTransformers embeddings. Lastly, we do a similarity check to remove any questions from our training set that are too similiar to the test set.\n\nYou can access all of the related code in the `data_pipeline` folder of this repo.\n\n## Reproducing Benchmark Eval Results\nInstall LM Evaluation Harness:\n```\ngit clone https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Flm-evaluation-harness\ncd lm-evaluation-harness\ngit checkout b281b0921b636bc36ad05c0b0b0763bd6dd43463 # The commit used by the Open LLM Leaderboard\npip install -e .\n```\nEach task was evaluated on a single A100 80GB GPU for 13B, and 2 A100s for 70B.\n\nARC:\n```\npython main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd\u002FPlatypus-13B,use_accelerate=True --tasks arc_challenge --batch_size 2 --no_cache --write_out --output_path results\u002FPlatypus-13B\u002Farc_challenge_25shot.json --device cuda --num_fewshot 25\n```\n\nHellaSwag:\n```\npython main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd\u002FPlatypus-13B,use_accelerate=True --tasks hellaswag --batch_size 2 --no_cache --write_out --output_path results\u002FPlatypus-13B\u002Fhellaswag_10shot.json --device cuda --num_fewshot 10\n```\n\nMMLU:\n```\npython main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd\u002FPlatypus-13B,use_accelerate=True --tasks hendrycksTest-* --batch_size 2 --no_cache --write_out --output_path results\u002FPlatypus-13B\u002Fmmlu_5shot.json --device cuda --num_fewshot 5\n```\n\nTruthfulQA:\n```\npython main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd\u002FPlatypus-13B,use_accelerate=True --tasks truthfulqa_mc --batch_size 2 --no_cache --write_out --output_path results\u002FPlatypus-13B\u002Ftruthfulqa_0shot.json --device cuda\n```\n## Inference for Adapters (`inference.py`)\n\nThis a basic example script for running inference directly using fine-tuned adapters and\u002For local data. The current version reads data from a csv file. You can easily edit this to pull from HF or use a json file. Please make any necessary edits before using this script (it assumes alpaca formatting).\n\n## BibTeX\n\n```\n@article{platypus2023,\n    title={Platypus: Quick, Cheap, and Powerful Refinement of LLMs}, \n    author={Ariel N. Lee and Cole J. Hunter and Nataniel Ruiz},\n    booktitle={arXiv preprint arxiv:2308.07317},\n    year={2023}\n}\n```\n","# 鸭嘴兽：快速、经济且强大的大语言模型微调工具（https:\u002F\u002Fplatypus-llm.github.io）\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Farielnlee_Platypus_readme_9cf79195ed76.png\" alt=\"Platypus\" width=\"300\"\u002F>\n\u003C\u002Fp>\n\n鸭嘴兽模型系列是基于LLaMA和LLaMa-2 Transformer架构进行微调与合并的变体。鸭嘴兽利用了[LoRA](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2106.09685.pdf)和[PEFT](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpeft)技术。\n\n所有模型及数据集均可通过HuggingFace获取：[`garage-bAInd`](https:\u002F\u002Fhuggingface.co\u002Fgarage-bAInd)\n\n## 更新\n\n**2023年8月21日**：如果您正在微调LLaMa-2 7B，请在HF训练器中添加`bf16=True`并将`fp16=False`改为`True`。LLaMa-1 7B则无需更改。**此调整仅适用于LLaMa-2 7B。** 此外，若您仅使用1张GPU，请将HF训练器中的`ddp_find_unused_paramters=False`改为`True`。我们将会更新微调脚本以自动处理这些变更。\n\n**2023年8月14日**：我们已清理并优化了数据流水线，新增了数据精炼与相似度计算功能。未来几天内，我们将发布一个脚本，用于从11个开源数据集中复现我们的完整数据集。\n\n**2023年8月13日**：由OpenOrca合作推出的最新模型OpenOrca-Platypus2-13B的未量化GPU聊天机器人已在Hugging Face Spaces上上线，由OpenOrca提供支持：[立即聊天！](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FOpen-Orca\u002FOpenOrca-Platypus2-13B)\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Farielnlee_Platypus_readme_20fa22a8c60f.jpeg\" alt=\"Platypus\" width=\"120\"\u002F>\n\u003C\u002Fp>\n\n**2023年8月11日**：我们的[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.07317)和[项目官网](https:\u002F\u002Fplatypus-llm.github.io)已正式发布！\n\n## 命令行界面\n\n[Fastchat](https:\u002F\u002Fgithub.com\u002Flm-sys\u002FFastChat)为希望运行该模型的用户提供了一个简单的部署方案。首先通过HuggingFace下载模型，然后克隆Fastchat仓库：\n\n```\ngit clone https:\u002F\u002Fgithub.com\u002Flm-sys\u002FFastChat.git\ncd FastChat\n```\n\n接着安装所需依赖：\n\n```\npip3 install --upgrade pip  # 启用PEP 660支持\npip3 install -e .\n```\n\n最后运行以下命令：\n\n```\npython3 -m fastchat.serve.cli --model-path garage-bAInd\u002FPlatypus-30B --conv_template alpaca\n```\n\n## 本地部署\n\n本仓库支持多GPU环境，并提供了根据计算资源选择模型并行或数据并行的代码。\n\n1. 安装依赖项\n\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n2. 请务必使用精确的依赖版本，否则可能会遇到模型保存或内存不足的问题。\n\n## 微调（`finetune.py`）\n\n运行`fine-tuning.sh`脚本。\n\n注意：上述脚本使用`torchrun`进行数据并行。由于在对`.py`文件进行少量修改后，理论上无需PyTorch即可完成微调，因此PyTorch并未包含在`requirements.txt`中。若要使用`fine-tuning.sh`脚本，请安装[PyTorch](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F)。我们建议使用`torchrun`以及PyTorch 2.0及以上版本，以获得更快的速度和`torch.compile`的支持。如果您不安装PyTorch，或采用其他方法如`accelerate launch`，请务必注释掉脚本中所有与PyTorch相关的代码行。\n\n用于微调鸭嘴兽模型的超参数如下：\n\n| 超参数              | 13B \u002F 70B 值  |\n|---------------------|--------|\n| 学习率              | 4e-4 \u002F 3e-4   |\n| 批量大小            | 16     |\n| 微批次大小          | 1      |\n| 热身步数            | 100    |\n| 轮次                | 1      |\n| 权重衰减            | 0.     |\n| 学习率调度器        | 余弦   |\n| LoRA alpha          | 16     |\n| LoRA rank           | 16     |\n| LoRA dropout        | 0.05   |\n| LoRA目标模块        | gate_proj, up_proj, down_proj|\n| 截断长度            | 4096   |\n| 在输入上训练        | False  |\n| 按长度分组          | False  |\n| 添加EOS标记         | False  |\n\n使用2张GPU计算梯度累积步数的示例：= 全局批量大小 \u002F 微批次大小 \u002F GPU数量 = 16 \u002F 1 \u002F 2 = 8。\n\n如果您的模型**无法**容纳在每张GPU的显存中，请使用下方的替代微调方案（或借助accelerate、FDSP等工具）来利用模型并行性。accelerate是一个不错的替代方案。\n\n```bash\npython finetune.py \\\n    --base_model meta-llama\u002FLlama-2-70b-hf \\\n    --data-path .\u002Ffinal_data.json \\\n    --output_dir .\u002Fllama2-platypus-70b \\\n    --batch_size 16 \\\n    --micro_batch_size 1 \\\n    --num_epochs 1 \\\n    --learning_rate 0.0003 \\\n    --cutoff_len 4096 \\\n    --val_set_size 0 \\\n    --lora_r 16 \\\n    --lora_alpha 16 \\\n    --lora_dropout 0.05 \\\n    --lora_target_modules '[gate_proj, down_proj, up_proj]' \\\n    --train_on_inputs False \\\n    --add_eos_token False \\\n    --group_by_length False \\\n    --prompt_template_name alpaca \\\n    --lr_scheduler 'cosine' \\\n    --warmup_steps 100\n```\n\n## 合并\n\n完成微调后，使用`merge.sh`脚本将LoRA权重合并回基础LLaMa模型（或您选择的基础模型），以便导出为HuggingFace格式。\n\n尽管我们仍在探索更优的合并方式（敬请期待！），但目前的合并流程仍依赖于PEFT提供的基础线性合并方法。在开始微调之前，我们会尽可能地寻找可合并的基础模型及其对应的训练数据集。LoRA合并的成功与否很大程度上取决于所使用的数据质量。我们最成功的合并案例中，两个微调数据集之间的相似度极低。例如，GPlatty-30B就是Platypus-30B与gpt4-alpaca-lora-30b的合并结果。GPlatty的准确率提升了2%，而这两个LoRA模型的训练数据集相似度评分非常低。更多信息请参阅我们的[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.07317)。\n\n**注意**：若在合并过程中遇到任何错误，请尝试卸载bitsandbytes和peft，然后重新安装最新版本（peft应始终从源码安装）。\n\n## 数据集精炼\n\n我们通过关键词搜索，在构成[Open-Platypus](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fgarage-bAInd\u002FOpen-Platypus)的11个开源数据集中找到了STEM和逻辑相关问题。随后，为了去除重复内容，我们使用SentenceTransformers嵌入技术对问题进行了余弦相似度检查。最后，我们再次进行相似度检查，移除训练集中与测试集过于相似的问题。\n\n与此相关的所有代码均可在本仓库的`data_pipeline`文件夹中找到。\n\n## 复现基准评测结果\n安装 LM 评估框架：\n```\ngit clone https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Flm-evaluation-harness\ncd lm-evaluation-harness\ngit checkout b281b0921b636bc36ad05c0b0b0763bd6dd43463 # Open LLM Leaderboard 使用的提交版本\npip install -e .\n```\n每个任务在单块 A100 80GB GPU 上对 13B 参数模型进行评估，而对于 70B 参数模型则使用两块 A100 GPU。\n\nARC：\n```\npython main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd\u002FPlatypus-13B,use_accelerate=True --tasks arc_challenge --batch_size 2 --no_cache --write_out --output_path results\u002FPlatypus-13B\u002Farc_challenge_25shot.json --device cuda --num_fewshot 25\n```\n\nHellaSwag：\n```\npython main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd\u002FPlatypus-13B,use_accelerate=True --tasks hellaswag --batch_size 2 --no_cache --write_out --output_path results\u002FPlatypus-13B\u002Fhellaswag_10shot.json --device cuda --num_fewshot 10\n```\n\nMMLU：\n```\npython main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd\u002FPlatypus-13B,use_accelerate=True --tasks hendrycksTest-* --batch_size 2 --no_cache --write_out --output_path results\u002FPlatypus-13B\u002Fmmlu_5shot.json --device cuda --num_fewshot 5\n```\n\nTruthfulQA：\n```\npython main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd\u002FPlatypus-13B,use_accelerate=True --tasks truthfulqa_mc --batch_size 2 --no_cache --write_out --output_path results\u002FPlatypus-13B\u002Ftruthfulqa_0shot.json --device cuda\n```\n## 适配器推理（`inference.py`）\n\n这是一个基本的示例脚本，用于直接使用微调后的适配器和\u002F或本地数据进行推理。当前版本从 CSV 文件中读取数据。您可以轻松修改此脚本以从 Hugging Face 拉取数据或使用 JSON 文件。请在使用此脚本之前进行必要的编辑（它假设采用 Alpaca 格式）。\n\n## BibTeX\n\n```\n@article{platypus2023,\n    title={鸭嘴兽：快速、廉价且强大的大语言模型精调}, \n    author={艾瑞尔·N·李、科尔·J·亨特、纳塔尼尔·鲁伊斯},\n    booktitle={arXiv 预印本 arxiv:2308.07317},\n    year={2023}\n}\n```","# Platypus 快速上手指南\n\nPlatypus 是一系列基于 LLaMA 和 LLaMA-2 架构的微调与合并模型，利用 LoRA 和 PEFT 技术实现高效、低成本的大语言模型优化。所有模型和数据集均可在 Hugging Face (`garage-bAInd`) 获取。\n\n## 环境准备\n\n### 系统要求\n- **GPU**: 推荐多 GPU 环境以支持数据并行或模型并行。\n- **显存**: \n  - 微调 7B\u002F13B 模型：建议单卡或多卡总显存充足。\n  - 微调 70B 模型：需多卡支持模型并行，或使用 `accelerate` 等方案。\n- **Python**: 建议 Python 3.8+。\n- **PyTorch**: 推荐使用 PyTorch 2.0+ 以启用 `torch.compile` 加速（需单独安装）。\n\n### 前置依赖\n确保已安装 Git 和 pip。若使用国内网络，建议在 pip 命令中指定清华或阿里镜像源加速下载：\n```bash\npip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 安装步骤\n\n### 1. 克隆仓库并安装基础依赖\n```bash\ngit clone \u003Crepository_url>\ncd \u003Crepository_directory>\npip install -r requirements.txt\n```\n*注：若需进行微调训练，请额外安装 PyTorch（官方文档：https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F），推荐使用 `torchrun` 配合 PyTorch 2.0+。*\n\n### 2. (可选) 安装 FastChat 用于 CLI 推理\n若希望通过命令行快速体验模型，可安装 FastChat：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Flm-sys\u002FFastChat.git\ncd FastChat\npip3 install --upgrade pip\npip3 install -e . -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 基本使用\n\n### 方式一：通过 FastChat 运行推理（最简单）\n下载模型后，使用以下命令启动交互式对话（以 Platypus-30B 为例）：\n```bash\npython3 -m fastchat.serve.cli --model-path garage-bAInd\u002FPlatypus-30B --conv_template alpaca\n```\n\n### 方式二：本地微调（Fine-tuning）\n使用提供的脚本进行 LoRA 微调。以下为使用 `torchrun` 进行数据并行的示例（需确保已安装 PyTorch）：\n```bash\n.\u002Ffine-tuning.sh\n```\n若显存不足无法容纳整个模型，可使用 `accelerate` 进行模型并行微调，示例命令如下：\n```bash\npython finetune.py \\\n    --base_model meta-llama\u002FLlama-2-70b-hf \\\n    --data-path .\u002Ffinal_data.json \\\n    --output_dir .\u002Fllama2-platypus-70b \\\n    --batch_size 16 \\\n    --micro_batch_size 1 \\\n    --num_epochs 1 \\\n    --learning_rate 0.0003 \\\n    --cutoff_len 4096 \\\n    --val_set_size 0 \\\n    --lora_r 16 \\\n    --lora_alpha 16 \\\n    --lora_dropout 0.05 \\\n    --lora_target_modules '[gate_proj, down_proj, up_proj]' \\\n    --train_on_inputs False \\\n    --add_eos_token False \\\n    --group_by_length False \\\n    --prompt_template_name alpaca \\\n    --lr_scheduler 'cosine' \\\n    --warmup_steps 100\n```\n\n### 方式三：合并 LoRA 权重\n微调完成后，使用 `merge.sh` 将 LoRA 权重合并回基座模型以便导出：\n```bash\n.\u002Fmerge.sh\n```\n*注意：若合并报错，请尝试卸载并重新安装最新版的 `bitsandbytes` 和 `peft`（peft 建议从源码安装）。*","某初创教育科技公司希望基于开源 LLaMA-2 模型，快速构建一个能精准解答高中数学题的专属辅导助手，但面临算力有限且缺乏大规模标注数据的困境。\n\n### 没有 Platypus 时\n- **训练成本高昂**：全量微调大模型需要昂贵的多卡集群，公司现有的单张或少量 GPU 资源根本无法加载模型，导致项目无法启动。\n- **开发周期漫长**：缺乏高效的参数微调方案，团队需花费数周时间调试分布式训练环境，且容易遭遇显存溢出（OOM）错误。\n- **领域适配性差**：通用模型在面对复杂的数学推导和特定解题格式时表现生硬，经常产生幻觉或逻辑断裂，无法满足教学严谨性要求。\n- **数据利用低效**：难以将分散的开源数学数据集（如 OpenOrca 等）高效整合并转化为模型可理解的高质量指令数据。\n\n### 使用 Platypus 后\n- **低成本快速落地**：借助 Platypus 集成的 LoRA 技术，团队仅需少量显存即可在消费级显卡上完成 7B 或 13B 模型的微调，大幅降低硬件门槛。\n- **流程标准化与加速**：利用其提供的 `finetune.py` 脚本和预设超参数（如 rank=16, alpha=16），一天内即可完成从数据清洗到模型产出的全流程。\n- **专业能力显著提升**：经过特定数学语料微调后的模型，在解题步骤的逻辑连贯性和公式准确性上大幅提升，能像真人老师一样逐步推导。\n- **灵活的数据融合**：直接复用其数据精炼管道，轻松合并多个开源数据集，快速构建出高质量的领域专用训练集。\n\nPlatypus 让资源有限的团队也能以极低的成本和极快的速度，将通用大模型“变身”为垂直领域的专家助手。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Farielnlee_Platypus_9cf79195.png","arielnlee","Ariel N. Lee","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Farielnlee_40da3575.jpg","Senior Research Engineer @ Code Metal","@CodeMetalAI ","San Diego",null,"ArielNLee","arielnlee.ai","https:\u002F\u002Fgithub.com\u002Farielnlee",[25,29,33],{"name":26,"color":27,"percentage":28},"Python","#3572A5",55,{"name":30,"color":31,"percentage":32},"Jupyter Notebook","#DA5B0B",44.3,{"name":34,"color":35,"percentage":36},"Shell","#89e051",0.8,628,56,"2026-03-27T17:35:22",3,"Linux","必需 NVIDIA GPU。推理\u002F评估：13B 模型需单张 A100 80GB，70B 模型需两张 A100 80GB。微调：支持多卡数据并行，若显存不足需使用模型并行（如 accelerate）。","未说明",{"notes":45,"python":46,"dependencies":47},"1. 微调 LLaMA-2 7B 时必须在 HF trainer 中设置 bf16=True 且 fp16=False；单卡微调需设置 ddp_find_unused_parameters=False。2. 合并模型报错时，建议卸载并重新从源码安装最新版的 peft 和 bitsandbytes。3. 复现基准测试结果需使用特定版本的 lm-evaluation-harness (commit b281b09)。4. 项目主要针对多 GPU 环境优化，支持 torchrun 数据并行。","3.x (通过 pip3 安装)",[48,49,50,51,52,53,54],"torch>=2.0","peft","transformers","accelerate","bitsandbytes","sentence-transformers","fastchat",[56,57],"语言模型","开发框架",2,"ready","2026-03-27T02:49:30.150509","2026-04-06T07:12:46.551258",[63,68,73,78,83,88],{"id":64,"question_zh":65,"answer_zh":66,"source_url":67},9922,"分布式模式下训练时出现 NCCL 通信超时（Timeout）错误怎么办？","这通常是环境配置问题而非代码本身的问题。建议检查并调整以下版本组合以确保兼容性：\n1. 使用 PyTorch 2.0.1。\n2. 将 `pytorch-cuda` 升级为 11.8 版本（原报错环境为 11.7）。\n3. 确保 `bitsandbytes` 版本为 0.41.1。\n4. `accelerate` 版本建议为 0.21.0。\n\n成功运行的环境示例：\n- GPU: Nvidia A100\n- Driver Version: 510.39.01\n- 关键包版本：torch 2.0.1, datasets 2.13.1, transformers 4.31.0, accelerate 0.21.0, bitsandbytes 0.41.1, pytorch-cuda 11.8。","https:\u002F\u002Fgithub.com\u002Farielnlee\u002FPlatypus\u002Fissues\u002F19",{"id":69,"question_zh":70,"answer_zh":71,"source_url":72},9923,"微调 LLaMA-2-70B 模型时遇到显存溢出（OOM）错误如何解决？","LLaMA-2-70B 模型过大，无法仅通过数据并行（torchrun）在单卡或多卡上直接加载。必须使用支持模型并行（Model Parallelism）的方案。\n解决方案：\n1. 不要直接使用 `torchrun` 脚本，因为它主要处理数据并行。\n2. 改用 `accelerate` 库或其他支持模型并行的工具来运行 `finetune.py`。\n3. 如果必须尝试当前脚本，可设置 `world_size=1` 看看是否有帮助，但强烈建议使用 `accelerate`。\n4. 对于资源受限的情况，可以尝试降低 `lora_r` (如设为 8) 和 `cutoff_len` (如设为 512)，但这可能会影响性能。","https:\u002F\u002Fgithub.com\u002Farielnlee\u002FPlatypus\u002Fissues\u002F10",{"id":74,"question_zh":75,"answer_zh":76,"source_url":77},9924,"在单张 GPU（如 A100）上运行 fine-tuning.sh 时报错 \"CUDA error: invalid device ordinal\" 怎么办？","当只在单张 GPU 上运行时，分布式训练参数配置会导致此错误。\n解决方法有两种：\n1. **修改代码参数**：在代码中找到 `ddp_find_unused_parameters` 的设置，将其强制设为 `False`。即把 `ddp_find_unused_parameters=False if ddp else None` 改为 `ddp_find_unused_parameters=False`。\n2. **绕过 torchrun**：对于能放入单张显卡的较小模型（如 7B 或 13B），可以直接运行 python 脚本而不使用 `torchrun`。命令示例：`python finetune.py [你的参数...]`，而不是 `torchrun --nproc_per_node=2 ... finetune.py`。","https:\u002F\u002Fgithub.com\u002Farielnlee\u002FPlatypus\u002Fissues\u002F12",{"id":79,"question_zh":80,"answer_zh":81,"source_url":82},9925,"微调时可以移除 Prompt 模板（templates）吗？这对最终性能有影响吗？","是否移除模板取决于你的基础模型：\n1. **如果使用合并后的 Platypus 模型作为基础**：建议保留 Alpaca 模板进行任何额外的微调，以保持一致性。\n2. **如果使用原始 Llama 2 作为基础**：可以根据任务修改模板。例如，针对多项选择题，可以自定义如下模板：\n   - `prompt_input`: \"Following is a multiple choice question...\\n\\n### Instruction:\\n{instruction}\\n\\n### Input:\\n{input}\\n\\n### Response:\\n\"\n   - `prompt_no_input`: \"Following is a multiple choice question...\\n\\n### Instruction:\\n{instruction}\\n\\n### Response:\\n\"\n如果是单一目的的微调（如固定指令的多选题），定义自己的专用模板通常比强行套用 Alpaca 模板效果更好。","https:\u002F\u002Fgithub.com\u002Farielnlee\u002FPlatypus\u002Fissues\u002F15",{"id":84,"question_zh":85,"answer_zh":86,"source_url":87},9926,"训练过程中评估损失（Eval loss）显示为 'nan' 是什么原因？","虽然具体讨论在评论区被截断，但此类问题通常与学习率过高、数据包含异常值、或混合精度训练不稳定有关。在该项目的上下文中，维护者对用户反馈表示感谢并计划尝试修复，暗示这可能是特定配置下的已知问题。建议检查：\n1. 降低学习率（learning_rate）。\n2. 检查数据预处理是否正确，确保没有空值或极端长度的数据。\n3. 尝试关闭混合精度训练（如果启用）。\n*(注：由于原始评论信息不完整，建议参考项目最新 README 或尝试上述常规调试步骤)*","https:\u002F\u002Fgithub.com\u002Farielnlee\u002FPlatypus\u002Fissues\u002F14",{"id":89,"question_zh":90,"answer_zh":91,"source_url":77},9927,"如何在单卡环境下正确配置微调脚本以避免分布式相关错误？","在单卡环境下，应避免使用 `torchrun` 启动器，因为它会尝试初始化分布式环境从而导致 \"invalid device ordinal\" 等错误。\n推荐做法：\n直接通过 `python` 命令运行微调脚本：\n`python finetune.py --base_model ... --data_path ... [其他参数]`\n不要去执行 `fine-tuning.sh` 或使用 `torchrun --nproc_per_node=1`。这样可以跳过不必要的分布式初始化逻辑，使训练在单卡上稳定运行。",[],[94,104,112,120,128,141],{"id":95,"name":96,"github_repo":97,"description_zh":98,"stars":99,"difficulty_score":40,"last_commit_at":100,"category_tags":101,"status":59},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[57,102,103],"图像","Agent",{"id":105,"name":106,"github_repo":107,"description_zh":108,"stars":109,"difficulty_score":58,"last_commit_at":110,"category_tags":111,"status":59},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,"2026-04-05T11:33:21",[57,103,56],{"id":113,"name":114,"github_repo":115,"description_zh":116,"stars":117,"difficulty_score":58,"last_commit_at":118,"category_tags":119,"status":59},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[57,102,103],{"id":121,"name":122,"github_repo":123,"description_zh":124,"stars":125,"difficulty_score":58,"last_commit_at":126,"category_tags":127,"status":59},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[57,56],{"id":129,"name":130,"github_repo":131,"description_zh":132,"stars":133,"difficulty_score":58,"last_commit_at":134,"category_tags":135,"status":59},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[102,136,137,138,103,139,56,57,140],"数据工具","视频","插件","其他","音频",{"id":142,"name":143,"github_repo":144,"description_zh":145,"stars":146,"difficulty_score":40,"last_commit_at":147,"category_tags":148,"status":59},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[103,102,57,56,139]]