[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-taokz--BiomedGPT":3,"tool-taokz--BiomedGPT":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",159636,2,"2026-04-17T23:33:34",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":77,"owner_email":77,"owner_twitter":78,"owner_website":77,"owner_url":79,"languages":80,"stars":112,"forks":113,"last_commit_at":114,"license":115,"difficulty_score":10,"env_os":116,"env_gpu":117,"env_ram":118,"env_deps":119,"category_tags":128,"github_topics":77,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":130,"updated_at":131,"faqs":132,"releases":162},8804,"taokz\u002FBiomedGPT","BiomedGPT","BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks","BiomedGPT 是一款专为生物医学领域打造的通用视觉 - 语言基础模型，旨在通过统一架构高效处理多样化的医疗任务。它不仅能解读医学影像（如 X 光片、病理切片），还能理解相关的专业文本，从而胜任医疗视觉问答、影像报告生成、文本摘要及自然语言推理等复杂工作。\n\n该工具主要解决了传统医疗 AI 模型功能单一、难以跨模态协同的痛点。以往针对特定任务往往需要训练独立的模型，而 BiomedGPT 通过在大规模多模态生物医学数据集上进行预训练和微调，实现了“一个模型搞定多种任务”，显著提升了在各类基准测试中的表现，为医疗数据分析提供了更统一的解决方案。\n\nBiomedGPT 特别适合生物医学领域的研究人员、AI 开发者以及高校学者使用，用于辅助科研探索、算法验证或构建原型系统。需要注意的是，受限于授权协议及安全考量，目前它严格仅限于学术研究用途，严禁直接应用于商业场景或临床诊疗。\n\n其技术亮点在于强大的多任务学习能力，并持续迭代升级。最新发布的版本参数量高达 9.3 亿，相比早期模型规模扩大了五倍，进一步增强了模型对复杂医学信息的理解与推理精度，展现了作为通用基座模型的巨大潜力。","\u003C!---\nCopyright 2022 The OFA-Sys Team. \nCopyright 2023 Kai Zhang @ Lehigh. \nAll rights reserved.\nThis source code is licensed under the Apache 2.0 license found in the LICENSE file in the root directory.\n-->\n\n# [Nature Medicine'24] \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaokz_BiomedGPT_readme_c106a5fa5eff.jpg\" alt=\"logo\" width=\"35\">iomedGPT\n*A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks.* [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2305.17100-B21A1B)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17100\n)\n\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fbiomedgpt-a-unified-and-generalist-biomedical\u002Fmedical-visual-question-answering-on-vqa)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fmedical-visual-question-answering-on-vqa?p=biomedgpt-a-unified-and-generalist-biomedical)\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fbiomedgpt-a-unified-and-generalist-biomedical\u002Fmedical-visual-question-answering-on-pathvqa)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fmedical-visual-question-answering-on-pathvqa?p=biomedgpt-a-unified-and-generalist-biomedical)\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fbiomedgpt-a-unified-and-generalist-biomedical\u002Fimage-captioning-on-iu-x-ray)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fimage-captioning-on-iu-x-ray?p=biomedgpt-a-unified-and-generalist-biomedical)\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fbiomedgpt-a-unified-and-generalist-biomedical\u002Ftext-summarization-on-meqsum)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Ftext-summarization-on-meqsum?p=biomedgpt-a-unified-and-generalist-biomedical)\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fbiomedgpt-a-unified-and-generalist-biomedical\u002Fnatural-language-inference-on-mednli)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fnatural-language-inference-on-mednli?p=biomedgpt-a-unified-and-generalist-biomedical)\n\n**BiomedGPT** is pre-trained and fine-tuned with multi-modal & multi-task biomedical datasets. Details of used datasets are shown in [datasets.md](datasets.md). If you have any questions, feel free to contact us or post issues. \n\n- **[2025\u002F07\u002F07]** Released larger-scale checkpoints—up to 5× larger (930M parameters)—including stronger *large* and *xlarge* pre-trained models. [[ckpt](https:\u002F\u002Fwww.dropbox.com\u002Fsh\u002Fcu2r5zkj2r0e6zu\u002FAADZ-KHn-emsICawm9CM4MqVa?dl=0)] [[technical report](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.17436)]\n\n## Installation (Linux)\n\n1. Clone this repository and navigate to the BiomedGPT folder\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ftaokz\u002FBiomedGPT\ncd BiomedGPT\u002F\n```\n\n2. Install required packages\n```Shell\nconda create --name biomedgpt python=3.7.4\npython -m pip install pip==21.2.4\npip install -r requirements.txt\n```\n\n### Quick Start with Huggingface's transformers\n\nPlease check out this [Colab notebook](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1AMG-OwmDpnu24a9ZvCNvZi3BZwb3nSfS?usp=sharing) for Fairseq-free inference. \n\n**Warning:** Extensive experiments using transformers have not been conducted, so we cannot confirm whether the results from transformers and fairseq are fully aligned.\n\n## Checkpoints\nWe provid pretrained checkpoints of BiomedGPT (\u003Ca href=\"https:\u002F\u002Fwww.dropbox.com\u002Fsh\u002Fcu2r5zkj2r0e6zu\u002FAADZ-KHn-emsICawm9CM4MqVa?dl=0\">Dropbox\u003C\u002Fa>), which can be put in the `scripts\u002F` folder for further development. For finetuned checkpoints, please refer to [checkpoints.md](checkpoints.md). \n\ntransformers-compatible weights are accessible through the [collection ](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FPanaceaAI\u002Fbiomedgpt-v1-66ca7c51e378662e15178be3).\n\n### Note:\nWe emphasize that BiomedGPT, including its files, code, and checkpoints, is strictly for academic research purposes. Commercial and clinical uses are strictly prohibited for three key reasons: First, BiomedGPT is based on the OFA framework, which carries a non-commercial license that we have inherited. Second, our model is not licensed for use in healthcare settings. Finally, we have not implemented sufficient security measures, and the current model cannot guarantee the accuracy required for medical diagnoses.\n\n\n## Implementation\nWe provide the preprocessing, pretraining, finetuning and inference scripts in the `scripts\u002F` folder. You can follow the directory setting below:\n\n```\nBiomedGPT\u002F\n├── checkpoints\u002F\n├── datasets\u002F\n│   ├── pretraining\u002F\n│   ├── finetuning\u002F\n│   └── ...\n├── scripts\u002F\n│   ├── preprocess\u002F\n│   │   ├── pretraining\u002F\n│   │   └── finetuning\u002F\n│   ├── pretrain\u002F\n│   ├── vqa\u002F\n│   └── ...\n└── ...\n```\n\n## Pretraining\nPlease follow [datasets.md](datasets.md) to prepare pretraining datasets, which includes 4 TSV files: \u003Ccode>vision_language.tsv\u003C\u002Fcode>, \u003Ccode>text.tsv\u003C\u002Fcode>, \u003Ccode>image.tsv\u003C\u002Fcode> and \u003Ccode>detection.tsv\u003C\u002Fcode> in the directory of `.\u002Fdatasets\u002Fpretraining\u002F`.\n\n\u003Cpre>\ncd scripts\u002Fpretrain\nbash pretrain_tiny.sh\n\u003C\u002Fpre>\nFeel free to modify the hyperparameters in the bash script for your requirements or ablation study.\n\n### Zero-shot VQA inference using pre-trained checkpoints\nAdd ```--zero-shot``` argument in the script. Example script: ```\u002Fscripts\u002Fvqa\u002Fevaluate_vqa_rad_zero_shot.sh```.\n\n**Warning:** The current implementation is not yet designed for chatbot or copilot applications, as its primary focus is on learning general representations in medicine that can be transferred to downstream tasks, as outlined in our paper. Large-scale training and instruction tuning for improving robust conversational abilities are still in progress.\n\n## Downstreams\nWe provide the run scripts of fine-tuning and inference. There will be log files during execution. Before fine-tuning or inference, please refer to \n\u003Cdetails>\n    \u003Csummary>\u003Cb>Visual Question Answering\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cpre>\ncd scripts\u002Fvqa\n# for fine-tuning\nbash train_vqa_rad_beam.sh\n# for inference using fine-tuned weights\nbash evaluate_vqa_rad_beam.sh\n# for zero-shot inference using instruction-tuned weights\nbash evaluate_vqa_rad_unconstrained.sh\n\u003C\u002Fpre>\n\u003C\u002Fdetails>\n\u003Cdetails>\n    \u003Csummary>\u003Cb>Image Captioning\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cpre>\ncd scripts\u002Fcaption\n# for fine-tuning\nbash train_peir_gross.sh\n# for inference\nbash evaluate_peir_gross.sh\n\u003C\u002Fpre>\n\u003C\u002Fdetails>\n\u003Cdetails>\n    \u003Csummary>\u003Cb>Text Summarization\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cpre>\ncd scripts\u002Ftext_sum\n# for fine-tuning\nbash train_meqsum.sh\n# for inference\nbash evaluate_meqsum.sh\n\u003C\u002Fpre>\n\u003C\u002Fdetails>\n\u003Cdetails>\n    \u003Csummary>\u003Cb>Natural Language Inference\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cpre>\ncd scripts\u002Fmednli\n# for fine-tuning\nbash train_mednli.sh\n# for inference\nbash evaluate_mednli.sh\n\u003C\u002Fpre>\n\u003C\u002Fdetails>\n\u003Cdetails>\n    \u003Csummary>\u003Cb>Image Classification\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cpre>\ncd scripts\u002Fimage_cls\n# for fine-tuning: I provide a template, please set different hyparameters for each dataset in MedMNIST if required.\nbash train_medmnist.sh \n# for inference: a template\nbash evaluate_medmnist.sh\n\u003C\u002Fpre>\n\u003C\u002Fdetails>\n\n\u003Cbr>\u003C\u002Fbr>\n\n# Related Codebase\n* [OFA](https:\u002F\u002Fgithub.com\u002FOFA-Sys\u002FOFA)\n* [Fairseq](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq)\n* [taming-transformers](https:\u002F\u002Fgithub.com\u002FCompVis\u002Ftaming-transformers)\n* [self-critical.pytorch](https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fself-critical.pytorch)\n\u003Cbr>\u003C\u002Fbr>\n\n\n# Citation\nIf you use BiomedGPT model or our code for publications, please cite 🤗: \n```\n@article{zhang2024generalist,\n  title={A generalist vision--language foundation model for diverse biomedical tasks},\n  author={Zhang, Kai and Zhou, Rong and Adhikarla, Eashan and Yan, Zhiling and Liu, Yixin and Yu, Jun and Liu, Zhengliang and Chen, Xun and Davison, Brian D and Ren, Hui and others},\n  journal={Nature Medicine},\n  pages={1--13},\n  year={2024},\n  publisher={Nature Publishing Group US New York}\n}\n\n@article{peng2025scaling,\n  title={Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning},\n  author={Peng, Cheng and Zhang, Kai and Lyu, Mengxian and Liu, Hongfang and Sun, Lichao and Wu, Yonghui},\n  journal={arXiv preprint arXiv:2505.17436},\n  year={2025}\n}\n```\n\u003Cbr>\u003C\u002Fbr>\n","\u003C!---\n版权所有 2022 OFA-Sys 团队。\n版权所有 2023 Kai Zhang @ Lehigh。\n保留所有权利。\n本源代码根据根目录下的 LICENSE 文件中所载的 Apache 2.0 许可证进行授权。\n-->\n\n# [Nature Medicine'24] \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaokz_BiomedGPT_readme_c106a5fa5eff.jpg\" alt=\"logo\" width=\"35\">iomedGPT\n*一种用于多样化生物医学任务的通用视觉-语言基础模型。* [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2305.17100-B21A1B)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17100\n)\n\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fbiomedgpt-a-unified-and-generalist-biomedical\u002Fmedical-visual-question-answering-on-vqa)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fmedical-visual-question-answering-on-vqa?p=biomedgpt-a-unified-and-generalist-biomedical)\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fbiomedgpt-a-unified-and-generalist-biomedical\u002Fmedical-visual-question-answering-on-pathvqa)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fmedical-visual-question-answering-on-pathvqa?p=biomedgpt-a-unified-and-generalist-biomedical)\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fbiomedgpt-a-unified-and-generalist-biomedical\u002Fimage-captioning-on-iu-x-ray)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fimage-captioning-on-iu-x-ray?p=biomedgpt-a-unified-and-generalist-biomedical)\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fbiomedgpt-a-unified-and-generalist-biomedical\u002Ftext-summarization-on-meqsum)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Ftext-summarization-on-meqsum?p=biomedgpt-a-unified-and-generalist-biomedical)\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fbiomedgpt-a-unified-and-generalist-biomedical\u002Fnatural-language-inference-on-mednli)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fnatural-language-inference-on-mednli?p=biomedgpt-a-unified-and-generalist-biomedical)\n\n**BiomedGPT** 使用多模态和多任务生物医学数据集进行预训练和微调。所用数据集的详细信息见 [datasets.md](datasets.md)。如有任何问题，请随时联系我们或提交问题。\n\n- **[2025\u002F07\u002F07]** 发布了更大规模的检查点——最大可达 5 倍（9.3 亿参数）——包括更强大的 *large* 和 *xlarge* 预训练模型。[[ckpt](https:\u002F\u002Fwww.dropbox.com\u002Fsh\u002Fcu2r5zkj2r0e6zu\u002FAADZ-KHn-emsICawm9CM4MqVa?dl=0)] [[技术报告](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.17436)]\n\n## 安装（Linux）\n\n1. 克隆此仓库并进入 BiomedGPT 文件夹\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ftaokz\u002FBiomedGPT\ncd BiomedGPT\u002F\n```\n\n2. 安装所需软件包\n```Shell\nconda create --name biomedgpt python=3.7.4\npython -m pip install pip==21.2.4\npip install -r requirements.txt\n```\n\n### 使用 Huggingface 的 transformers 快速入门\n\n请查看此 [Colab 笔记本](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1AMG-OwmDpnu24a9ZvCNvZi3BZwb3nSfS?usp=sharing) 以进行无需 Fairseq 的推理。\n\n**警告：** 尚未对 transformers 进行大量实验，因此我们无法确认 transformers 和 fairseq 的结果是否完全一致。\n\n## 检查点\n我们提供了 BiomedGPT 的预训练检查点（\u003Ca href=\"https:\u002F\u002Fwww.dropbox.com\u002Fsh\u002Fcu2r5zkj2r0e6zu\u002FAADZ-KHn-emsICawm9CM4MqVa?dl=0\">Dropbox\u003C\u002Fa>），可将其放入 `scripts\u002F` 文件夹中以供进一步开发。有关微调后的检查点，请参阅 [checkpoints.md](checkpoints.md)。\n\n与 transformers 兼容的权重可通过 [集合](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FPanaceaAI\u002Fbiomedgpt-v1-66ca7c51e378662e15178be3) 获取。\n\n### 注意：\n我们强调，BiomedGPT 及其文件、代码和检查点仅用于学术研究目的。出于三个主要原因，严格禁止商业和临床用途：首先，BiomedGPT 基于 OFA 框架，该框架带有我们继承的非商业许可；其次，我们的模型并未获得在医疗环境中使用的许可；最后，我们尚未实施足够的安全措施，当前模型无法保证医学诊断所需的准确性。\n\n## 实现\n我们在 `scripts\u002F` 文件夹中提供了预处理、预训练、微调和推理脚本。您可以按照以下目录结构操作：\n\n```\nBiomedGPT\u002F\n├── checkpoints\u002F\n├── datasets\u002F\n│   ├── pretraining\u002F\n│   ├── finetuning\u002F\n│   └── ...\n├── scripts\u002F\n│   ├── preprocess\u002F\n│   │   ├── pretraining\u002F\n│   │   └── finetuning\u002F\n│   ├── pretrain\u002F\n│   ├── vqa\u002F\n│   └── ...\n└── ...\n```\n\n## 预训练\n请按照 [datasets.md](datasets.md) 准备预训练数据集，其中包括 4 个 TSV 文件：`vision_language.tsv`、`text.tsv`、`image.tsv` 和 `detection.tsv`，位于 `.\u002Fdatasets\u002Fpretraining\u002F` 目录下。\n\n\u003Cpre>\ncd scripts\u002Fpretrain\nbash pretrain_tiny.sh\n\u003C\u002Fpre>\n您可以根据需求或消融实验修改 bash 脚本中的超参数。\n\n### 使用预训练检查点进行零样本 VQA 推理\n在脚本中添加 ```--zero-shot``` 参数。示例脚本：```\u002Fscripts\u002Fvqa\u002Fevaluate_vqa_rad_zero_shot.sh```。\n\n**警告：** 当前实现尚未针对聊天机器人或助手应用设计，因为其主要重点是学习可在下游任务中迁移的通用医学表征，正如我们在论文中所述。为提升稳健对话能力而进行的大规模训练和指令调优仍在进行中。\n\n## 下游任务\n我们提供了微调和推理的运行脚本。执行过程中会生成日志文件。在进行微调或推理之前，请参考以下内容：\n\u003Cdetails>\n    \u003Csummary>\u003Cb>视觉问答\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cpre>\ncd scripts\u002Fvqa\n# 用于微调\nbash train_vqa_rad_beam.sh\n# 使用微调后的权重进行推理\nbash evaluate_vqa_rad_beam.sh\n# 使用指令调优后的权重进行零样本推理\nbash evaluate_vqa_rad_unconstrained.sh\n\u003C\u002Fpre>\n\u003C\u002Fdetails>\n\u003Cdetails>\n    \u003Csummary>\u003Cb>图像字幕生成\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cpre>\ncd scripts\u002Fcaption\n# 用于微调\nbash train_peir_gross.sh\n# 用于推理\nbash evaluate_peir_gross.sh\n\u003C\u002Fpre>\n\u003C\u002Fdetails>\n\u003Cdetails>\n    \u003Csummary>\u003Cb>文本摘要\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cpre>\ncd scripts\u002Ftext_sum\n# 用于微调\nbash train_meqsum.sh\n# 用于推理\nbash evaluate_meqsum.sh\n\u003C\u002Fpre>\n\u003C\u002Fdetails>\n\u003Cdetails>\n    \u003Csummary>\u003Cb>自然语言推理\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cpre>\ncd scripts\u002Fmednli\n# 用于微调\nbash train_mednli.sh\n# 用于推理\nbash evaluate_mednli.sh\n\u003C\u002Fpre>\n\u003C\u002Fdetails>\n\u003Cdetails>\n    \u003Csummary>\u003Cb>图像分类\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cpre>\ncd scripts\u002Fimage_cls\n# 对于微调：我提供了一个模板，请根据需要为 MedMNIST 中的每个数据集设置不同的超参数。\nbash train_medmnist.sh \n# 对于推理：一个模板\nbash evaluate_medmnist.sh\n\u003C\u002Fpre>\n\u003C\u002Fdetails>\n\n\u003Cbr>\u003C\u002Fbr>\n\n# 相关代码库\n* [OFA](https:\u002F\u002Fgithub.com\u002FOFA-Sys\u002FOFA)\n* [Fairseq](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq)\n* [taming-transformers](https:\u002F\u002Fgithub.com\u002FCompVis\u002Ftaming-transformers)\n* [self-critical.pytorch](https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fself-critical.pytorch)\n\u003Cbr>\u003C\u002Fbr>\n\n\n# 引用\n如果您在发表论文时使用了 BiomedGPT 模型或我们的代码，请引用 🤗：\n```\n@article{zhang2024generalist,\n  title={面向多样化生物医学任务的通用视觉-语言基础模型},\n  author={Zhang, Kai 和 Zhou, Rong 和 Adhikarla, Eashan 和 Yan, Zhiling 和 Liu, Yixin 和 Yu, Jun 和 Liu, Zhengliang 和 Chen, Xun 和 Davison, Brian D 和 Ren, Hui 等},\n  journal={Nature Medicine},\n  pages={1--13},\n  year={2024},\n  publisher={Nature Publishing Group US New York}\n}\n\n@article{peng2025scaling,\n  title={生物医学视觉-语言模型的扩展：微调、指令微调与多模态学习},\n  author={Peng, Cheng 和 Zhang, Kai 和 Lyu, Mengxian 和 Liu, Hongfang 和 Sun, Lichao 和 Wu, Yonghui},\n  journal={arXiv 预印本 arXiv:2505.17436},\n  year={2025}\n}\n```\n\u003Cbr>\u003C\u002Fbr>","# BiomedGPT 快速上手指南\n\nBiomedGPT 是一个面向多样化生物医学任务的通用视觉 - 语言基础模型，支持医学视觉问答、图像描述、文本摘要等多种任务。\n\n## 环境准备\n\n- **操作系统**：Linux\n- **Python 版本**：3.7.4（必须严格匹配）\n- **依赖管理**：推荐使用 Conda 进行环境隔离\n- **网络要求**：需访问 GitHub、Dropbox 及 Hugging Face（国内用户建议配置代理或使用镜像加速）\n\n## 安装步骤\n\n1. 克隆项目仓库并进入目录：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ftaokz\u002FBiomedGPT\ncd BiomedGPT\u002F\n```\n\n2. 创建 Conda 环境并安装依赖：\n```bash\nconda create --name biomedgpt python=3.7.4\nconda activate biomedgpt\npython -m pip install pip==21.2.4\npip install -r requirements.txt\n```\n\n> 💡 国内用户可使用清华或中科大镜像加速 pip 安装：\n> ```bash\n> pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 基本使用\n\n### 方式一：通过 Hugging Face Transformers 快速推理（推荐初学者）\n\n项目提供了 Colab 示例，无需本地部署 Fairseq 即可进行推理：\n- [Colab 笔记本链接](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1AMG-OwmDpnu24a9ZvCNvZi3BZwb3nSfS?usp=sharing)\n\n> ⚠️ 注意：基于 Transformers 的推理结果尚未经过全面验证，可能与原版 Fairseq 实现存在细微差异。\n\n### 方式二：本地运行零样本医学视觉问答（Zero-shot VQA）\n\n1. 从 [Dropbox](https:\u002F\u002Fwww.dropbox.com\u002Fsh\u002Fcu2r5zkj2r0e6zu\u002FAADZ-KHn-emsICawm9CM4MqVa?dl=0) 下载预训练检查点，放入 `scripts\u002F` 目录。\n2. 执行零样本推理脚本：\n```bash\ncd scripts\u002Fvqa\nbash evaluate_vqa_rad_zero_shot.sh\n```\n\n该脚本已内置 `--zero-shot` 参数，可直接对医学图像进行问答推理。\n\n> ⚠️ 重要提示：本模型仅限学术研究使用，严禁用于临床诊断或商业场景。当前版本尚未针对对话式应用优化，不具备医疗级准确性保障。","某三甲医院病理科医生正面临海量组织病理切片图像与对应诊断报告的分析压力，急需从多模态数据中快速提取关键信息以辅助疑难病例会诊。\n\n### 没有 BiomedGPT 时\n- **多模型切换繁琐**：医生需分别使用独立的图像分类模型识别病变区域、另用 NLP 模型总结文本报告，工作流割裂且耗时。\n- **跨模态理解困难**：传统工具无法直接回答“这张切片中是否存在淋巴结转移？”这类结合图像细节与医学知识的问题，只能依赖人工肉眼比对。\n- **标注成本高昂**：针对特定罕见病任务训练专用模型，需要专家耗费数周时间进行精细的数据标注和模型微调。\n- **泛化能力不足**：在 X 光、病理图等不同模态间迁移时，原有模型表现大幅下降，难以适应多样化的临床场景。\n\n### 使用 BiomedGPT 后\n- **统一平台处理**：BiomedGPT 作为一个通用的视觉 - 语言基础模型，能在一个框架内同时完成图像分类、报告生成及视觉问答，大幅简化操作流程。\n- **智能交互诊断**：医生可直接输入自然语言提问，BiomedGPT 能精准定位图像特征并结合医学知识库给出推理依据，实现“看图说话”式的辅助诊断。\n- **低资源快速适配**：凭借强大的预训练基础，BiomedGPT 仅需少量样本即可微调适配罕见病任务，将模型部署周期从数周缩短至数天。\n- **跨任务无缝迁移**：BiomedGPT 在病理切片、X 光片及医学文本摘要等多种任务间展现出卓越的泛化性，无需重复训练即可胜任多样化需求。\n\nBiomedGPT 通过打破视觉与语言的壁垒，将分散的医疗 AI 任务整合为统一的智能交互流程，显著提升了生物医学研究的效率与深度。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaokz_BiomedGPT_7319898d.png","taokz","Kai Zhang","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ftaokz_0cd72f59.jpg","Incoming Postdoc @ Stanford · Ph.D. @ Lehigh · Trustworthy Medical AI","Lehigh University",null,"kai_z99","https:\u002F\u002Fgithub.com\u002Ftaokz",[81,85,89,93,97,101,105,109],{"name":82,"color":83,"percentage":84},"Python","#3572A5",95,{"name":86,"color":87,"percentage":88},"Shell","#89e051",3.6,{"name":90,"color":91,"percentage":92},"Cuda","#3A4E3A",0.7,{"name":94,"color":95,"percentage":96},"C++","#f34b7d",0.4,{"name":98,"color":99,"percentage":100},"Cython","#fedf5b",0.2,{"name":102,"color":103,"percentage":104},"Lua","#000080",0.1,{"name":106,"color":107,"percentage":108},"Batchfile","#C1F12E",0,{"name":110,"color":111,"percentage":108},"Makefile","#427819",708,81,"2026-04-13T10:44:42","Apache-2.0","Linux","未说明（基于多模态大模型特性，通常需 NVIDIA GPU 及 CUDA 支持，具体显存需求取决于模型规模，最大模型参数量达 930M）","未说明",{"notes":120,"python":121,"dependencies":122},"1. 官方安装指南仅明确支持 Linux 系统。2. 必须使用 Conda 创建名为 'biomedgpt' 的环境并固定 Python 版本为 3.7.4。3. 需先安装 pip==21.2.4 版本。4. 模型严格限于学术研究，禁止商业及临床使用。5. 提供基于 Fairseq 的训练\u002F微调脚本及实验性 Transformers 推理支持。6. 需手动下载预训练权重文件至 scripts\u002F 或 checkpoints\u002F 目录。","3.7.4",[123,124,125,126,127],"fairseq","torch","transformers","Pillow","numpy",[35,15,129],"其他","2026-03-27T02:49:30.150509","2026-04-18T09:20:09.828232",[133,138,143,148,153,158],{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},39480,"为什么我在 VQA-RAD 数据集上评估模型时，准确率（约 30%）远低于论文中报告的 71.6%？","这是因为之前提供的检查点文件上传错误。维护者已确认该问题并提供了新的检查点文件。请使用以下链接下载修复后的权重文件以获得论文中报告的性能：https:\u002F\u002Fwww.dropbox.com\u002Fscl\u002Ffi\u002F4ywpsoehfotqg5rvxo9n1\u002Fvqa_rad_fixed.pt?rlkey=0lmwuafggz2ianz57d80lhk67&dl=0 或 https:\u002F\u002Fwww.dropbox.com\u002Ft\u002FoRb9OCNlSdhFxSf6。此外，计算准确率时需注意将标准答案中的大写字母转换为小写后进行匹配。","https:\u002F\u002Fgithub.com\u002Ftaokz\u002FBiomedGPT\u002Fissues\u002F6",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},39481,"在运行 evaluate_medmnist.sh 脚本时，找不到路径中包含 '25000_1000_5e-5_256' 的 checkpoint_best.pt 文件，这些数字代表什么？","路径中的 '25000' 和 '1000' 分别代表训练步数（25,000）和预热步数（1,000）。维护者说明这些值是遗留的实验设置，与当前论文或实现不一致，是忘记更新脚本导致的。用户应参考论文中的补充表 5（Supplementary Table 5）获取正确的超参数设置。注意，实际运行只需 5 个 epoch，而非表中列出的 50 个。","https:\u002F\u002Fgithub.com\u002Ftaokz\u002FBiomedGPT\u002Fissues\u002F54",{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},39482,"评估预训练模型时报错 FileNotFoundError: No such file or directory: '...\u002Ftype2ans.json'，这个文件是什么以及如何生成？","该错误通常发生在尝试评估预训练模型（零样本设置）时。解决方法是在评估命令中添加 '--zero-shot' 参数。例如，将脚本改为使用 '--zero-shot' 标志即可绕过对该文件的依赖。注意，零样本评估下的得分可能会较低（低于 0.1），这是正常现象。","https:\u002F\u002Fgithub.com\u002Ftaokz\u002FBiomedGPT\u002Fissues\u002F19",{"id":149,"question_zh":150,"answer_zh":151,"source_url":152},39483,"如何在 PEIR Gross 等没有明确问题字段的数据集上进行图像描述（Image Captioning）任务的评估？","图像描述任务可以构建为 VQA 格式进行处理。根据论文描述，可以使用系统提示语 \"What does the image describe?\" 作为问题输入。如果数据集本身没有问题字段，用户可以手动设计提示语或利用 ChatGPT 生成多样化的提示语来构建输入数据，从而适配模型的 VQA 评估流程。","https:\u002F\u002Fgithub.com\u002Ftaokz\u002FBiomedGPT\u002Fissues\u002F23",{"id":154,"question_zh":155,"answer_zh":156,"source_url":157},39484,"在使用 biomedgpt_tiny.pt 进行微调时遇到 RuntimeError: result type Float can't be cast to the desired output type Long 错误，如何解决？","该错误通常与 PyTorch 版本兼容性或 EMA（指数移动平均）更新时的数据类型转换有关。虽然具体解决方案在截断的评论中未完全显示，但此类问题通常建议检查 PyTorch 和 fairseq 的版本兼容性，或者尝试在较稳定的 PyTorch 版本（如 1.x 系列）上运行，亦或在代码中显式处理类型转换。若问题持续，建议查看项目是否有关于 PyTorch 2.x 兼容性的最新补丁。","https:\u002F\u002Fgithub.com\u002Ftaokz\u002FBiomedGPT\u002Fissues\u002F16",{"id":159,"question_zh":160,"answer_zh":161,"source_url":147},39485,"VQA 任务评估得分偏低，是否与生成结果的大小写敏感有关？","是的，评估得分低可能与生成结果的大小写不匹配有关。维护者指出，为了复现论文中的结果，在计算准确率时，需要将标准答案（gold answers）中的所有大写字母转换为小写，然后再与模型的生成结果进行匹配。建议在评估脚本后处理阶段加入大小写统一逻辑。",[]]