[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-rhymes-ai--Aria":3,"tool-rhymes-ai--Aria":65},[4,23,32,40,49,57],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":22},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,2,"2026-04-05T10:45:23",[13,14,15,16,17,18,19,20,21],"图像","数据工具","视频","插件","Agent","其他","语言模型","开发框架","音频","ready",{"id":24,"name":25,"github_repo":26,"description_zh":27,"stars":28,"difficulty_score":29,"last_commit_at":30,"category_tags":31,"status":22},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,3,"2026-04-04T04:44:48",[17,13,20,19,18],{"id":33,"name":34,"github_repo":35,"description_zh":36,"stars":37,"difficulty_score":29,"last_commit_at":38,"category_tags":39,"status":22},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74913,"2026-04-05T10:44:17",[19,13,20,18],{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":46,"last_commit_at":47,"category_tags":48,"status":22},3215,"awesome-machine-learning","josephmisiti\u002Fawesome-machine-learning","awesome-machine-learning 是一份精心整理的机器学习资源清单，汇集了全球优秀的机器学习框架、库和软件工具。面对机器学习领域技术迭代快、资源分散且难以甄选的痛点，这份清单按编程语言（如 Python、C++、Go 等）和应用场景（如计算机视觉、自然语言处理、深度学习等）进行了系统化分类，帮助使用者快速定位高质量项目。\n\n它特别适合开发者、数据科学家及研究人员使用。无论是初学者寻找入门库，还是资深工程师对比不同语言的技术选型，都能从中获得极具价值的参考。此外，清单还延伸提供了免费书籍、在线课程、行业会议、技术博客及线下聚会等丰富资源，构建了从学习到实践的全链路支持体系。\n\n其独特亮点在于严格的维护标准：明确标记已停止维护或长期未更新的项目，确保推荐内容的时效性与可靠性。作为机器学习领域的“导航图”，awesome-machine-learning 以开源协作的方式持续更新，旨在降低技术探索门槛，让每一位从业者都能高效地站在巨人的肩膀上创新。",72149,1,"2026-04-03T21:50:24",[20,18],{"id":50,"name":51,"github_repo":52,"description_zh":53,"stars":54,"difficulty_score":46,"last_commit_at":55,"category_tags":56,"status":22},2234,"scikit-learn","scikit-learn\u002Fscikit-learn","scikit-learn 是一个基于 Python 构建的开源机器学习库，依托于 SciPy、NumPy 等科学计算生态，旨在让机器学习变得简单高效。它提供了一套统一且简洁的接口，涵盖了从数据预处理、特征工程到模型训练、评估及选择的全流程工具，内置了包括线性回归、支持向量机、随机森林、聚类等在内的丰富经典算法。\n\n对于希望快速验证想法或构建原型的数据科学家、研究人员以及 Python 开发者而言，scikit-learn 是不可或缺的基础设施。它有效解决了机器学习入门门槛高、算法实现复杂以及不同模型间调用方式不统一的痛点，让用户无需重复造轮子，只需几行代码即可调用成熟的算法解决分类、回归、聚类等实际问题。\n\n其核心技术亮点在于高度一致的 API 设计风格，所有估算器（Estimator）均遵循相同的调用逻辑，极大地降低了学习成本并提升了代码的可读性与可维护性。此外，它还提供了强大的模型选择与评估工具，如交叉验证和网格搜索，帮助用户系统地优化模型性能。作为一个由全球志愿者共同维护的成熟项目，scikit-learn 以其稳定性、详尽的文档和活跃的社区支持，成为连接理论学习与工业级应用的最",65628,"2026-04-05T10:10:46",[20,18,14],{"id":58,"name":59,"github_repo":60,"description_zh":61,"stars":62,"difficulty_score":10,"last_commit_at":63,"category_tags":64,"status":22},3364,"keras","keras-team\u002Fkeras","Keras 是一个专为人类设计的深度学习框架，旨在让构建和训练神经网络变得简单直观。它解决了开发者在不同深度学习后端之间切换困难、模型开发效率低以及难以兼顾调试便捷性与运行性能的痛点。\n\n无论是刚入门的学生、专注算法的研究人员，还是需要快速落地产品的工程师，都能通过 Keras 轻松上手。它支持计算机视觉、自然语言处理、音频分析及时间序列预测等多种任务。\n\nKeras 3 的核心亮点在于其独特的“多后端”架构。用户只需编写一套代码，即可灵活选择 TensorFlow、JAX、PyTorch 或 OpenVINO 作为底层运行引擎。这一特性不仅保留了 Keras 一贯的高层易用性，还允许开发者根据需求自由选择：利用 JAX 或 PyTorch 的即时执行模式进行高效调试，或切换至速度最快的后端以获得最高 350% 的性能提升。此外，Keras 具备强大的扩展能力，能无缝从本地笔记本电脑扩展至大规模 GPU 或 TPU 集群，是连接原型开发与生产部署的理想桥梁。",63927,"2026-04-04T15:24:37",[20,14,18],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":80,"owner_website":81,"owner_url":82,"languages":83,"stars":96,"forks":97,"last_commit_at":98,"license":99,"difficulty_score":100,"env_os":101,"env_gpu":102,"env_ram":101,"env_deps":103,"category_tags":113,"github_topics":114,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":22,"created_at":118,"updated_at":119,"faqs":120,"releases":155},3344,"rhymes-ai\u002FAria","Aria","Codebase for Aria - an Open Multimodal Native MoE","Aria 是一款开源的多模态原生混合专家（MoE）模型，旨在为用户提供卓越的视频理解、文档分析及多轮对话能力。它有效解决了传统模型在处理长上下文多模态任务时效率低下或显存占用过高的问题，让复杂的视觉与文本联合分析变得更加流畅高效。\n\n这款工具特别适合开发者、研究人员以及需要处理大量图文视频数据的企业用户。无论是构建智能客服、开发文档自动化工具，还是进行前沿的 AI 研究，Aria 都能提供强大的底层支持。普通用户也可通过其网页演示体验先进的多模态交互。\n\nAria 的技术亮点十分突出：它拥有高达 64K 的超长多模态上下文窗口，能够轻松处理长篇文档或长视频；同时采用稀疏激活机制，每令牌仅激活 39 亿参数，在保持 253 亿总参数规模带来的高性能同时，显著提升了推理速度并降低了微调成本。此外，Aria 已适配 Hugging Face Transformers 和 vLLM 等主流框架，并提供了详细的代码示例与微调指南，方便用户快速上手集成到现有项目中。","# Aria\n\n😊 [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Frhymes-ai\u002FAria) | \n📄 [Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.05993) | \n📚 [Blog](https:\u002F\u002Frhymes.ai\u002Fblog-details\u002Faria-first-open-multimodal-native-moe-model) | \n🌐 [WebDemo](https:\u002F\u002Frhymes.ai\u002F)  | \n🟣 [Discord](https:\u002F\u002Fdiscord.com\u002Finvite\u002Fu8HxU23myj)\n\n\n## Introduction\nAria is a multimodal native MoE model. It features:\n- State-of-the-art performance on various multimodal and language tasks, superior in video and document understanding;\n- Long multimodal context window of 64K tokens;\n- 3.9B activated parameters per token, enabling fast inference speed and low fine-tuning cost.\n  \n\n## News\n- [Jan 20, 2025] 🚀🚀🚀 Aria is supported in [PaddleMIX](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddleMIX\u002Ftree\u002Fdevelop\u002Fpaddlemix\u002Fexamples\u002Faria) by Paddle Team.\n\n- [Dec 15, 2024] We release [Aria-Chat](https:\u002F\u002Fhuggingface.co\u002Frhymes-ai\u002FAria-Chat)! It is optimized for open-ended and multi-round dialogs, with enhanced reliability and multi-lingual support.\n\n- [Dec 1, 2024] We release the base models for Aria ([Aria-Base-8K](https:\u002F\u002Fhuggingface.co\u002Frhymes-ai\u002FAria-Base-8K) and [Aria-Base-64K](https:\u002F\u002Fhuggingface.co\u002Frhymes-ai\u002FAria-Base-64K))! They are fully compatible with this inference \\& fine-tuning codebase. \n\n- [Oct 10, 2024] We release Aria!\n\n## Quick Start\n\n### Installation\n\n```bash\npip install -e .\n# or install with dev dependencies if you want to contribute to the project\npip install -e .[dev] \n\npip install grouped_gemm\npip install flash-attn --no-build-isolation\n```\n\n### Inference\n\nAria has 25.3B total parameters, it can be loaded in one A100 (80GB) GPU with bfloat16 precision.\n\nHere is a code snippet to show you how to use Aria with Hugging Face Transformers.\n\n```python\nimport requests\nimport torch\nfrom PIL import Image\nfrom transformers import AutoModelForCausalLM, AutoProcessor\n\nmodel_id_or_path = \"rhymes-ai\u002FAria\"\n\nmodel = AutoModelForCausalLM.from_pretrained(model_id_or_path, device_map=\"auto\", torch_dtype=torch.bfloat16, trust_remote_code=True)\n\nprocessor = AutoProcessor.from_pretrained(model_id_or_path, trust_remote_code=True)\n\nimage_path = \"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fdocumentation-images\u002Fresolve\u002Fmain\u002Fdiffusers\u002Fcat.png\"\n\nimage = Image.open(requests.get(image_path, stream=True).raw)\n\nmessages = [\n    {\n        \"role\": \"user\",\n        \"content\": [\n            {\"text\": None, \"type\": \"image\"},\n            {\"text\": \"what is the image?\", \"type\": \"text\"},\n        ],\n    }\n]\n\ntext = processor.apply_chat_template(messages, add_generation_prompt=True)\ninputs = processor(text=text, images=image, return_tensors=\"pt\")\ninputs[\"pixel_values\"] = inputs[\"pixel_values\"].to(model.dtype)\ninputs = {k: v.to(model.device) for k, v in inputs.items()}\n\nwith torch.inference_mode(), torch.cuda.amp.autocast(dtype=torch.bfloat16):\n    output = model.generate(\n        **inputs,\n        max_new_tokens=500,\n        stop_strings=[\"\u003C|im_end|>\"],\n        tokenizer=processor.tokenizer,\n        do_sample=True,\n        temperature=0.9,\n    )\n    output_ids = output[0][inputs[\"input_ids\"].shape[1]:]\n    result = processor.decode(output_ids, skip_special_tokens=True)\n\nprint(result)\n```\n\nWe offer additional inference methods, such as utilizing [vLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm) for enhanced performance. For comprehensive details, please refer to [docs\u002Finference.md](docs\u002Finference.md).\n\n### Cookbook\nCheckout these [inference examples](https:\u002F\u002Fgithub.com\u002Frhymes-ai\u002FAria\u002Ftree\u002Fmain\u002Finference\u002Fnotebooks) that demonstrate how to use Aria on various applications such as chart understanding, PDF reading, video understanding, etc, available with both Hugging Face Transformers and [vLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm) backends.\n\n## Fine-tuning\n> ⚠️ **Important Note on Fine-tuning**: Due to changes in the weight mapping after Aria's integration into transformers, the training code requires specific versions to work properly:\n> - Use transformers version 4.45.0\n> - Use model revision \"4844f0b5ff678e768236889df5accbe4967ec845\"\n\n\n> **Note:** For optimal fine-tuning performance, install the optional `grouped_gemm` dependency:\n> ```bash\n> pip install grouped_gemm\n> ```\n\nWe offer both LoRA fine-tuning and full parameter tuning, using various dataset types:\n- Single-image datasets\n- Multi-image datasets\n- Video datasets\n- Code datasets\n\nFor a quick try, visit the [examples](.\u002Fexamples) folder and choose one of the fine-tuning examples. If you would like to fine-tune from base models (recommended when you have a large database), please change the following model paths in the configs ([full](recipes\u002Fconfig_full.yaml) or [lora](recipes\u002Fconfig_lora.yaml))\n\n```yaml\nmodel_name_or_path: rhymes-ai\u002FAria\ntokenizer_path: rhymes-ai\u002FAria\n```\n\nto the ones corresponding to one of the base models:\n\n```yaml\nmodel_name_or_path: rhymes-ai\u002FAria-Base-64K # rhymes-ai\u002FAria-Base-8K\ntokenizer_path: rhymes-ai\u002FAria-Base-64K # rhymes-ai\u002FAria-Base-8K\n```\n\n### Prepare dataset\nPlease refer to [custom_dataset.md](docs\u002Fcustom_dataset.md) for how to prepare your dataset.\n\n### Fine-tune with LoRA\n\nAfter preparing your dataset, follow these steps to fine-tune Aria using LoRA:\n\n1. Open the configuration file `recipes\u002Fconfig_lora.yaml`. Locate the `dataset_mixer` section and update it with your dataset paths:\n\n```yaml\ndataset_mixer:\n  \"path\u002Fto\u002Fdataset1\": 1\n  \"path\u002Fto\u002Fdataset2\": 0.5\n  \"path\u002Fto\u002Fdataset3\": 2\n```\n\n> **Note on dataset mixing:** Aria supports combining multiple datasets with different sampling rates. In the example above:\n> - `dataset1` will be used entirely (weight 1)\n> - `dataset2` will use 50% of its data (weight 0.5)\n> - `dataset3` will be used twice (weight 2)\n\n2. Start the fine-tuning process by running the following command on one A100 (80GB) or H100 (80GB) GPU:\n\n```bash\npython aria\u002Ftrain.py --config recipes\u002Fconfig_lora.yaml\n```\n\n3. For multi-GPU training, use the [`accelerate`](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Faccelerate\u002Findex) library:\n\n```bash\naccelerate launch --config_file recipes\u002Faccelerate_configs\u002Fzero2.yaml aria\u002Ftrain.py --config recipes\u002Fconfig_lora.yaml --num_processes [number_of_gpus]\n```\n\n   - Choose from pre-configured accelerate settings in `recipes\u002Faccelerate_configs\u002F`\n   - Adjust the `--num_processes` argument to match your available GPUs\n   - For custom configurations, refer to the [accelerate documentation](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Faccelerate\u002Fusage_guides\u002Fdeepspeed)\n  \n4. Inference with the fine-tuned model:\n\n   See [inference with LoRA support](docs\u002Finference.md#2-inference-with-lora-support) for how to inference with the fine-tuned model.\n\n### Full parameter fine-tuning\n\nEverything is the same as the LoRA fine-tuning process, except for the configuration file `recipes\u002Fconfig_full.yaml`.\n\nFull parameter tuning consumes more GPU memory, thus multiple GPUs are required. The following command has been tested on 8 A100 (80GB) GPUs.\n\n```bash\naccelerate launch --config_file recipes\u002Faccelerate_configs\u002Fzero2.yaml aria\u002Ftrain.py --config recipes\u002Fconfig_full.yaml\n```\n\nIf you encounter out-of-memory errors, try reducing the `per_device_train_batch_size` in the config file. Adjust the `gradient_accumulation_steps` accordingly to maintain the effective training batch size.\n\n```yaml\nper_device_train_batch_size: 8\ngradient_accumulation_steps: 2\n```\n\nMemory consumption varies across datasets. Generally, more memory is required for multi-image and video datasets. Adjust the `deepspeed_config` parameters to optimize memory consumption, such as using `zero_stage` 3 and offloading parameters and optimizer to the CPU.\n\n```yaml\ndeepspeed_config:\n  gradient_accumulation_steps: auto\n  gradient_clipping: auto\n  offload_optimizer_device: cpu\n  offload_param_device: cpu\n  zero3_init_flag: true\n  zero_stage: 3\n```\n\n#### Inference with Your Trained Model\n\nFirst, you need to extract the FP32 consolidated weights from ZeRO 1, 2, or 3 DeepSpeed checkpoints:\n```bash\ncd \u002Fpath\u002Fto\u002Fyour\u002Foutput\u002Fdir\npython zero_to_fp32.py . pytorch_model.bin\n```\n\nSee [inference.md](docs\u002Finference.md) for instructions on how to perform inference with the fine-tuned model.\n\n## Citation\nIf you find our work helpful, please consider citing.\n```\n@article{aria,\n  title={Aria: An Open Multimodal Native Mixture-of-Experts Model}, \n  author={Dongxu Li and Yudong Liu and Haoning Wu and Yue Wang and Zhiqi Shen and Bowen Qu and Xinyao Niu and Guoyin Wang and Bei Chen and Junnan Li},\n  year={2024},\n  journal={arXiv preprint arXiv:2410.05993},\n}\n```\n\n\n","# Aria\n\n😊 [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Frhymes-ai\u002FAria) | \n📄 [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.05993) | \n📚 [博客](https:\u002F\u002Frhymes.ai\u002Fblog-details\u002Faria-first-open-multimodal-native-moe-model) | \n🌐 [WebDemo](https:\u002F\u002Frhymes.ai\u002F)  | \n🟣 [Discord](https:\u002F\u002Fdiscord.com\u002Finvite\u002Fu8HxU23myj)\n\n\n## 简介\nAria 是一款多模态原生 MoE 模型。其特点包括：\n- 在多种多模态和语言任务上表现处于行业领先水平，尤其在视频和文档理解方面表现出色；\n- 具有长达 64K 个 token 的多模态上下文窗口；\n- 每个 token 的激活参数量为 3.9B，从而实现快速推理速度和较低的微调成本。\n  \n\n## 最新消息\n- [2025年1月20日] 🚀🚀🚀 Paddle 团队已在 [PaddleMIX](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddleMIX\u002Ftree\u002Fdevelop\u002Fpaddlemix\u002Fexamples\u002Faria) 中支持 Aria。\n\n- [2024年12月15日] 我们发布了 [Aria-Chat](https:\u002F\u002Fhuggingface.co\u002Frhymes-ai\u002FAria-Chat)! 它针对开放式和多轮对话进行了优化，具有更高的可靠性及多语言支持。\n\n- [2024年12月1日] 我们发布了 Aria 的基础模型（[Aria-Base-8K](https:\u002F\u002Fhuggingface.co\u002Frhymes-ai\u002FAria-Base-8K) 和 [Aria-Base-64K](https:\u002F\u002Fhuggingface.co\u002Frhymes-ai\u002FAria-Base-64K)）! 它们与本推理和微调代码库完全兼容。\n\n- [2024年10月10日] 我们正式发布了 Aria!\n\n## 快速入门\n\n### 安装\n\n```bash\npip install -e .\n# 或者如果你打算参与项目贡献，可以安装包含开发依赖的版本\npip install -e .[dev] \n\npip install grouped_gemm\npip install flash-attn --no-build-isolation\n```\n\n### 推理\n\nAria 总共有 25.3B 个参数，在单张 A100（80GB）GPU 上以 bfloat16 精度即可加载运行。\n\n以下是一个使用 Hugging Face Transformers 调用 Aria 的代码示例。\n\n```python\nimport requests\nimport torch\nfrom PIL import Image\nfrom transformers import AutoModelForCausalLM, AutoProcessor\n\nmodel_id_or_path = \"rhymes-ai\u002FAria\"\n\nmodel = AutoModelForCausalLM.from_pretrained(model_id_or_path, device_map=\"auto\", torch_dtype=torch.bfloat16, trust_remote_code=True)\n\nprocessor = AutoProcessor.from_pretrained(model_id_or_path, trust_remote_code=True)\n\nimage_path = \"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fdocumentation-images\u002Fresolve\u002Fmain\u002Fdiffusers\u002Fcat.png\"\n\nimage = Image.open(requests.get(image_path, stream=True).raw)\n\nmessages = [\n    {\n        \"role\": \"user\",\n        \"content\": [\n            {\"text\": None, \"type\": \"image\"},\n            {\"text\": \"这是一张什么图片？\", \"type\": \"text\"},\n        ],\n    }\n]\n\ntext = processor.apply_chat_template(messages, add_generation_prompt=True)\ninputs = processor(text=text, images=image, return_tensors=\"pt\")\ninputs[\"pixel_values\"] = inputs[\"pixel_values\"].to(model.dtype)\ninputs = {k: v.to(model.device) for k, v in inputs.items()}\n\nwith torch.inference_mode(), torch.cuda.amp.autocast(dtype=torch.bfloat16):\n    output = model.generate(\n        **inputs,\n        max_new_tokens=500,\n        stop_strings=[\"\u003C|im_end|>\"],\n        tokenizer=processor.tokenizer,\n        do_sample=True,\n        temperature=0.9,\n    )\n    output_ids = output[0][inputs[\"input_ids\"].shape[1]:]\n    result = processor.decode(output_ids, skip_special_tokens=True)\n\nprint(result)\n```\n\n我们还提供了其他推理方法，例如利用 [vLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm) 来提升性能。更多详细信息请参阅 [docs\u002Finference.md](docs\u002Finference.md)。\n\n### 食谱\n请查看这些 [推理示例](https:\u002F\u002Fgithub.com\u002Frhymes-ai\u002FAria\u002Ftree\u002Fmain\u002Finference\u002Fnotebooks)，它们展示了如何在不同应用场景中使用 Aria，比如图表理解、PDF 阅读、视频理解等，同时支持 Hugging Face Transformers 和 [vLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm) 后端。\n\n## 微调\n> ⚠️ **关于微调的重要提示**：由于 Aria 集成到 transformers 后权重映射发生了变化，训练代码需要特定版本才能正常工作：\n> - 请使用 transformers 版本 4.45.0\n> - 使用模型修订版 \"4844f0b5ff678e768236889df5accbe4967ec845\"\n\n\n> **注意**：为了获得最佳的微调效果，建议安装可选的 `grouped_gemm` 依赖：\n> ```bash\n> pip install grouped_gemm\n> ```\n\n我们提供 LoRA 微调和全参数微调两种方式，并支持多种数据集类型：\n- 单张图像数据集\n- 多张图像数据集\n- 视频数据集\n- 代码数据集\n\n若想快速尝试，请访问 [examples](.\u002Fexamples) 文件夹，选择其中一个微调示例。如果希望从基础模型开始微调（推荐用于大型数据集），请在配置文件中将以下模型路径（[全参数](recipes\u002Fconfig_full.yaml) 或 [LoRA](recipes\u002Fconfig_lora.yaml)）\n\n```yaml\nmodel_name_or_path: rhymes-ai\u002FAria\ntokenizer_path: rhymes-ai\u002FAria\n```\n\n修改为对应的基础模型路径：\n\n```yaml\nmodel_name_or_path: rhymes-ai\u002FAria-Base-64K # rhymes-ai\u002FAria-Base-8K\ntokenizer_path: rhymes-ai\u002FAria-Base-64K # rhymes-ai\u002FAria-Base-8K\n```\n\n### 准备数据集\n请参考 [custom_dataset.md](docs\u002Fcustom_dataset.md)，了解如何准备您的数据集。\n\n### 使用 LoRA 进行微调\n\n准备好数据集后，按照以下步骤使用 LoRA 对 Aria 进行微调：\n\n1. 打开配置文件 `recipes\u002Fconfig_lora.yaml`。找到 `dataset_mixer` 部分，并更新为您的数据集路径：\n\n```yaml\ndataset_mixer:\n  \"path\u002Fto\u002Fdataset1\": 1\n  \"path\u002Fto\u002Fdataset2\": 0.5\n  \"path\u002Fto\u002Fdataset3\": 2\n```\n\n> **关于数据集混合的说明**：Aria 支持以不同的采样率组合多个数据集。在上面的例子中：\n> - `dataset1` 将被完全使用（权重 1）\n> - `dataset2` 将使用其数据的 50%（权重 0.5）\n> - `dataset3` 将被使用两次（权重 2）\n\n2. 在一台 A100（80GB）或 H100（80GB）GPU 上运行以下命令开始微调过程：\n\n```bash\npython aria\u002Ftrain.py --config recipes\u002Fconfig_lora.yaml\n```\n\n3. 如果进行多 GPU 训练，请使用 [`accelerate`](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Faccelerate\u002Findex) 库：\n\n```bash\naccelerate launch --config_file recipes\u002Faccelerate_configs\u002Fzero2.yaml aria\u002Ftrain.py --config recipes\u002Fconfig_lora.yaml --num_processes [number_of_gpus]\n```\n\n   - 可以从 `recipes\u002Faccelerate_configs\u002F` 中选择预配置的 accelerate 设置\n   - 根据您可用的 GPU 数量调整 `--num_processes` 参数\n   - 如需自定义配置，请参考 [accelerate 文档](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Faccelerate\u002Fusage_guides\u002Fdeepspeed)\n  \n4. 使用微调后的模型进行推理：\n\n   请参阅 [支持 LoRA 的推理](docs\u002Finference.md#2-inference-with-lora-support) 了解如何使用微调后的模型进行推理。\n\n### 全参数微调\n\n除配置文件 `recipes\u002Fconfig_full.yaml` 外，其余步骤与 LoRA 微调流程相同。\n\n全参数微调会占用更多显存，因此需要使用多张 GPU。以下命令已在 8 张 A100（80GB）GPU 上测试通过：\n\n```bash\naccelerate launch --config_file recipes\u002Faccelerate_configs\u002Fzero2.yaml aria\u002Ftrain.py --config recipes\u002Fconfig_full.yaml\n```\n\n如果遇到显存不足的错误，请尝试在配置文件中降低 `per_device_train_batch_size`，并相应调整 `gradient_accumulation_steps` 以保持有效的训练批次大小。\n\n```yaml\nper_device_train_batch_size: 8\ngradient_accumulation_steps: 2\n```\n\n不同数据集的显存消耗差异较大。通常，多图像和视频数据集需要更多的显存。可以通过调整 `deepspeed_config` 参数来优化显存使用，例如使用 `zero_stage` 3，并将参数和优化器卸载到 CPU 上。\n\n```yaml\ndeepspeed_config:\n  gradient_accumulation_steps: auto\n  gradient_clipping: auto\n  offload_optimizer_device: cpu\n  offload_param_device: cpu\n  zero3_init_flag: true\n  zero_stage: 3\n```\n\n#### 使用您训练好的模型进行推理\n\n首先，您需要从 ZeRO 1、2 或 3 的 DeepSpeed 检查点中提取 FP32 整合权重：\n\n```bash\ncd \u002Fpath\u002Fto\u002Fyour\u002Foutput\u002Fdir\npython zero_to_fp32.py . pytorch_model.bin\n```\n\n有关如何使用微调后的模型进行推理的说明，请参阅 [inference.md](docs\u002Finference.md)。\n\n## 引用\n如果您觉得我们的工作对您有所帮助，请考虑引用：\n```\n@article{aria,\n  title={Aria: 一个开放的多模态原生专家混合模型}, \n  author={李东旭、刘宇东、吴浩宁、王岳、沈志奇、曲博文、牛鑫尧、王国银、陈贝、李俊楠},\n  year={2024},\n  journal={arXiv 预印本 arXiv:2410.05993},\n}\n```","# Aria 快速上手指南\n\nAria 是一款原生多模态混合专家（MoE）模型，在视频和文档理解任务上表现卓越。它支持 64K 长上下文窗口，每 token 仅激活 39 亿参数，兼具高性能与低推理成本。\n\n## 环境准备\n\n### 系统要求\n- **GPU**: 推理至少需要 1 张 A100 (80GB) 或同等显存容量的 GPU（使用 bfloat16 精度）。\n- **显存**: \n  - LoRA 微调：推荐 1 张 A100\u002FH100 (80GB)。\n  - 全量参数微调：推荐 8 张 A100 (80GB)，需配合 DeepSpeed ZeRO 优化。\n- **Python**: 建议 Python 3.10+。\n\n### 前置依赖\n为确保微调功能正常运作，请严格遵循以下版本要求：\n- `transformers` == 4.45.0\n- 模型版本需指定 revision: `4844f0b5ff678e768236889df5accbe4967ec845`\n\n> **提示**：国内开发者若遇到网络连接问题，建议在 `pip` 命令后添加 `-i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple` 使用清华镜像源。\n\n## 安装步骤\n\n1. **克隆项目并安装基础依赖**\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Frhymes-ai\u002FAria.git\n   cd Aria\n   pip install -e .\n   # 如需开发贡献，可安装开发依赖\n   # pip install -e .[dev]\n   ```\n\n2. **安装关键加速库**\n   必须安装 `grouped_gemm` 和 `flash-attn` 以获得最佳性能：\n   ```bash\n   pip install grouped_gemm\n   pip install flash-attn --no-build-isolation\n   ```\n\n## 基本使用\n\n以下示例展示如何使用 Hugging Face Transformers 加载 Aria 模型并进行简单的图文对话推理。\n\n### 代码示例\n\n```python\nimport requests\nimport torch\nfrom PIL import Image\nfrom transformers import AutoModelForCausalLM, AutoProcessor\n\n# 指定模型路径\nmodel_id_or_path = \"rhymes-ai\u002FAria\"\n\n# 加载模型 (自动映射设备，使用 bfloat16)\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_id_or_path, \n    device_map=\"auto\", \n    torch_dtype=torch.bfloat16, \n    trust_remote_code=True\n)\n\n# 加载处理器\nprocessor = AutoProcessor.from_pretrained(model_id_or_path, trust_remote_code=True)\n\n# 准备图片 (此处使用在线示例图片，也可替换为本地路径)\nimage_path = \"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fdocumentation-images\u002Fresolve\u002Fmain\u002Fdiffusers\u002Fcat.png\"\nimage = Image.open(requests.get(image_path, stream=True).raw)\n\n# 构建对话消息\nmessages = [\n    {\n        \"role\": \"user\",\n        \"content\": [\n            {\"text\": None, \"type\": \"image\"},\n            {\"text\": \"what is the image?\", \"type\": \"text\"},\n        ],\n    }\n]\n\n# 处理输入\ntext = processor.apply_chat_template(messages, add_generation_prompt=True)\ninputs = processor(text=text, images=image, return_tensors=\"pt\")\ninputs[\"pixel_values\"] = inputs[\"pixel_values\"].to(model.dtype)\ninputs = {k: v.to(model.device) for k, v in inputs.items()}\n\n# 生成回复\nwith torch.inference_mode(), torch.cuda.amp.autocast(dtype=torch.bfloat16):\n    output = model.generate(\n        **inputs,\n        max_new_tokens=500,\n        stop_strings=[\"\u003C|im_end|>\"],\n        tokenizer=processor.tokenizer,\n        do_sample=True,\n        temperature=0.9,\n    )\n    output_ids = output[0][inputs[\"input_ids\"].shape[1]:]\n    result = processor.decode(output_ids, skip_special_tokens=True)\n\nprint(result)\n```\n\n### 进阶提示\n- **更多后端**: 如需更高吞吐量的推理，可参考官方文档使用 [vLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm) 后端。\n- **应用场景**: 项目 `inference\u002Fnotebooks` 目录下提供了图表理解、PDF 阅读、视频理解等更多示例代码。","某金融分析团队需要每日处理数百份包含复杂图表、长篇幅文字及市场视频简报的混合格式研报，以提取关键投资信号。\n\n### 没有 Aria 时\n- **多模态割裂**：必须分别使用 OCR 工具提取文字、独立模型分析图表、另起一套流程处理视频，数据流转繁琐且容易丢失上下文关联。\n- **长文档失忆**：面对超过几十页的 PDF 研报或长视频，传统模型受限于短上下文窗口，往往遗漏后半部分的关键风险披露或趋势总结。\n- **推理成本高昂**：为了维持高精度，不得不部署参数量巨大的稠密模型，导致单张显卡无法承载，推理延迟高且微调训练费用惊人。\n- **细粒度理解不足**：对于研报中复杂的动态走势图和嵌套表格，通用模型常出现“看图说话”式的表面描述，无法深入解读数据背后的逻辑。\n\n### 使用 Aria 后\n- **原生多模态融合**：Aria 直接原生支持图像、文本和视频输入，团队只需一次调用即可让模型同时“阅读”研报全文、“看懂”K 线图并“观看”分析师视频解说。\n- **64K 超长上下文掌控**：凭借 64K token 的上下文窗口，Aria 能完整消化整份百页研报或长达数十分钟的视频会议记录，精准定位首尾呼应的关键信息。\n- **高效低耗推理**：得益于 MoE 架构，Aria 每令牌仅激活 39 亿参数，在单张 A100 显卡上即可实现快速推理，大幅降低了日常批量处理的算力成本和微调门槛。\n- **深度逻辑洞察**：在图表理解任务中，Aria 不仅能识别数据点，还能结合文中语境分析波动原因，输出具备专业深度的投资摘要而非简单的画面描述。\n\nAria 通过原生多模态与超长上下文的结合，将原本碎片化、高成本的复杂文档分析流程转化为单一、高效且深度的智能决策辅助系统。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frhymes-ai_Aria_ee08baa4.png","rhymes-ai","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Frhymes-ai_68765816.png","",null,"rhymes_ai_","https:\u002F\u002Frhymes.ai","https:\u002F\u002Fgithub.com\u002Frhymes-ai",[84,88,92],{"name":85,"color":86,"percentage":87},"Jupyter Notebook","#DA5B0B",99.4,{"name":89,"color":90,"percentage":91},"Python","#3572A5",0.6,{"name":93,"color":94,"percentage":95},"Makefile","#427819",0,1085,89,"2026-03-31T05:40:20","Apache-2.0",4,"未说明","必需 NVIDIA GPU。推理：单卡 A100 (80GB) 可运行 bfloat16 精度模型；LoRA 微调：单卡 A100 (80GB) 或 H100 (80GB)；全参数微调：需多卡（测试环境为 8x A100 80GB）。需安装 flash-attn 和 grouped_gemm，暗示需要支持 CUDA 的 NVIDIA 显卡。",{"notes":104,"python":101,"dependencies":105},"1. 微调时必须使用 transformers 4.45.0 版本及特定的模型 revision ('4844f0b...')，否则权重映射会出错。2. 模型总参数量 25.3B，但每 token 仅激活 3.9B 参数 (MoE 架构)。3. 全参数微调显存消耗巨大，若遇 OOM 需调整 batch_size 或使用 DeepSpeed ZeRO-3 并将优化器\u002F参数卸载至 CPU。4. 支持 PaddleMIX 框架。5. 基础模型有 8K 和 64K 上下文版本可选。",[106,107,108,109,110,111,112],"torch","transformers==4.45.0","accelerate","flash-attn","grouped_gemm","PIL","vLLM (可选)",[18],[115,116,117],"mixture-of-experts","multimodal","vision-and-language","2026-03-27T02:49:30.150509","2026-04-06T05:37:37.998077",[121,126,130,135,140,145,150],{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},15353,"Aria 模型是否支持多图像交错输入？在基准测试（如 MMMU）中应如何构建提示词？","支持。您可以将图像特殊标记与文本提示词交错排列，无需强制将所有图像标记移至提示词开头。对于包含嵌入图像的复杂场景（如 MMMU 基准测试），直接保留图像在文本中的原始位置即可。","https:\u002F\u002Fgithub.com\u002Frhymes-ai\u002FAria\u002Fissues\u002F90",{"id":127,"question_zh":128,"answer_zh":129,"source_url":125},15354,"Aria 在推理时应该使用什么分辨率？论文中的性能指标是基于哪种分辨率得出的？","论文中大多数基准测试（视频基准除外）均使用了超高分辨率（split_image=True）。在实际应用中，如果图像包含文本、图表等细节丰富的内容，强烈建议使用超高分辨率以获得最佳效果。",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},15355,"如何使用 vLLM 运行经过 LoRA 微调的 Aria 模型？","目前官方暂无计划在近期支持 vLLM 与 LoRA 的集成。如果您需要此功能，欢迎社区贡献代码。当前变通方法包括尝试在本地环境中升级 vLLM 版本自行开发，或者等待官方后续更新（参考 PR #77 对 vLLM 版本的升级进展）。","https:\u002F\u002Fgithub.com\u002Frhymes-ai\u002FAria\u002Fissues\u002F57",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},15356,"是否有可用于继续预训练或扩展模态（如音频）的 Aria 基础模型？","是的，官方已发布带有原生多模态预训练的基础模型，可用于进一步微调或扩展模态。您可以在 Hugging Face 上获取以下模型：Aria-Base-8K 和 Aria-Base-64K。","https:\u002F\u002Fgithub.com\u002Frhymes-ai\u002FAria\u002Fissues\u002F48",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},15357,"运行视频理解示例笔记（04_video_understanding_vllm.ipynb）时报错\"Model architectures ... are not supported\"，如何解决？","这是因为官方已将 Aria 模型合并至 vLLM 主仓库，移除了本地的 vLLM 实现但未及时更新笔记。解决方法：1. 拉取最新代码（PR #82 已修复）；2. 确保安装了最新版的 vLLM，可使用命令：pip install https:\u002F\u002Fvllm-wheels.s3.us-west-2.amazonaws.com\u002Fnightly\u002Fvllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl。","https:\u002F\u002Fgithub.com\u002Frhymes-ai\u002FAria\u002Fissues\u002F81",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},15358,"如何通过配置启用 torch.compile 加速或使用 SDPA 注意力机制？","可以通过在加载模型时指定 attn_implementation 参数来切换注意力实现。例如，对文本部分使用 SDPA，视觉部分保留 Flash Attention，代码如下：\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_id_or_path, \n    device_map=\"auto\", \n    torch_dtype=torch.bfloat16, \n    trust_remote_code=True, \n    attn_implementation={\n        \"text_config\": \"sdpa\", \n        \"vision_config\": \"flash_attention_2\"\n    }\n)\n注意：需确保 transformers 库为最新版本，且当前版本可能尚不完全支持视觉模型的 SDPA。","https:\u002F\u002Fgithub.com\u002Frhymes-ai\u002FAria\u002Fissues\u002F54",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},15359,"项目计划支持最新版本的 vLLM 吗？","是的，维护者计划升级 vLLM 版本以支持更多功能（如 LoRA 与多模态的同时支持）。您可以先在本地环境中尝试升级 vLLM 及相关依赖（如 PyTorch, transformers）进行开发，但需注意全面测试以确保不破坏现有功能。相关升级工作已在 PR #77 中推进。","https:\u002F\u002Fgithub.com\u002Frhymes-ai\u002FAria\u002Fissues\u002F76",[]]