[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-AI-in-Health--MedLLMsPracticalGuide":3,"tool-AI-in-Health--MedLLMsPracticalGuide":62},[4,19,28,37,46,54],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":18},9989,"n8n","n8n-io\u002Fn8n","n8n 是一款面向技术团队的公平代码（fair-code）工作流自动化平台，旨在让用户在享受低代码快速构建便利的同时，保留编写自定义代码的灵活性。它主要解决了传统自动化工具要么过于封闭难以扩展、要么完全依赖手写代码效率低下的痛点，帮助用户轻松连接 400 多种应用与服务，实现复杂业务流程的自动化。\n\nn8n 特别适合开发者、工程师以及具备一定技术背景的业务人员使用。其核心亮点在于“按需编码”：既可以通过直观的可视化界面拖拽节点搭建流程，也能随时插入 JavaScript 或 Python 代码、调用 npm 包来处理复杂逻辑。此外，n8n 原生集成了基于 LangChain 的 AI 能力，支持用户利用自有数据和模型构建智能体工作流。在部署方面，n8n 提供极高的自由度，支持完全自托管以保障数据隐私和控制权，也提供云端服务选项。凭借活跃的社区生态和数百个现成模板，n8n 让构建强大且可控的自动化系统变得简单高效。",184740,2,"2026-04-19T23:22:26",[13,14,15,16,17],"数据工具","开发框架","Agent","图像","插件","ready",{"id":20,"name":21,"github_repo":22,"description_zh":23,"stars":24,"difficulty_score":10,"last_commit_at":25,"category_tags":26,"status":18},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",161147,"2026-04-19T23:31:47",[14,15,27],"语言模型",{"id":29,"name":30,"github_repo":31,"description_zh":32,"stars":33,"difficulty_score":34,"last_commit_at":35,"category_tags":36,"status":18},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[15,17],{"id":38,"name":39,"github_repo":40,"description_zh":41,"stars":42,"difficulty_score":43,"last_commit_at":44,"category_tags":45,"status":18},10072,"DeepSeek-V3","deepseek-ai\u002FDeepSeek-V3","DeepSeek-V3 是一款由深度求索推出的开源混合专家（MoE）大语言模型，旨在以极高的效率提供媲美顶尖闭源模型的智能服务。它拥有 6710 亿总参数，但在处理每个 token 时仅激活 370 亿参数，这种设计巧妙解决了大规模模型推理成本高、速度慢的难题，让高性能 AI 更易于部署和应用。\n\n这款模型特别适合开发者、研究人员以及需要构建复杂 AI 应用的企业团队使用。无论是进行代码生成、逻辑推理还是多轮对话开发，DeepSeek-V3 都能提供强大的支持。其独特之处在于采用了无辅助损失的负载均衡策略和多令牌预测训练目标，前者在提升计算效率的同时避免了性能损耗，后者则显著增强了模型表现并加速了推理过程。此外，模型在 14.8 万亿高质量令牌上完成预训练，且整个训练过程异常稳定，未出现不可恢复的损失尖峰。凭借仅需 278.8 万 H800 GPU 小时即可完成训练的高效特性，DeepSeek-V3 为开源社区树立了一个兼顾性能与成本效益的新标杆。",102693,5,"2026-04-20T03:58:04",[27],{"id":47,"name":48,"github_repo":49,"description_zh":50,"stars":51,"difficulty_score":10,"last_commit_at":52,"category_tags":53,"status":18},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[17,15,16,14],{"id":55,"name":56,"github_repo":57,"description_zh":58,"stars":59,"difficulty_score":10,"last_commit_at":60,"category_tags":61,"status":18},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[17,14],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":78,"owner_email":79,"owner_twitter":79,"owner_website":79,"owner_url":80,"languages":79,"stars":81,"forks":82,"last_commit_at":83,"license":84,"difficulty_score":34,"env_os":85,"env_gpu":86,"env_ram":86,"env_deps":87,"category_tags":90,"github_topics":91,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":18,"created_at":97,"updated_at":98,"faqs":99,"releases":100},9997,"AI-in-Health\u002FMedLLMsPracticalGuide","MedLLMsPracticalGuide"," [Nature Reviews Bioengineering🔥] Application of Large Language Models in Medicine.  A curated list of practical guide resources of Medical LLMs (Medical LLMs Tree, Tables, and Papers)","MedLLMsPracticalGuide 是一个专注于医疗领域大语言模型（Medical LLMs）的精选资源库，旨在为研究人员和开发者提供一份实用的应用指南。随着人工智能在医疗行业的快速渗透，如何安全、有效地将大模型应用于临床诊断、患者管理及医学研究成为一大挑战。该项目通过系统梳理海量的学术论文、开源模型及实践案例，构建了清晰的“医疗大模型知识树”和数据表格，帮助用户快速把握该领域的最新进展、核心应用场景及潜在风险。\n\n其独特亮点在于依托发表于顶级期刊《Nature Reviews Bioengineering》的深度综述论文，内容由来自牛津大学、麻省理工学院等全球顶尖机构的学者共同维护，确保了信息的权威性与前沿性。资源库保持高频更新，不仅涵盖了从基础理论到落地实践的全链路资源，还提供了详细的分类索引，极大地降低了信息检索门槛。无论是希望深入了解医疗 AI 趋势的学术研究人员，还是致力于开发医疗辅助系统的工程师，都能从中获得极具价值的参考指引，是推动医疗大模型技术规范化发展的重要工具。","\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_e54a88f23569.png\" width=\"180px\">\n\u003C\u002Fdiv>\n\u003Ch2 align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.05112.pdf\"> [Nature Reviews Bioengineering] A Practical Guide for Medical Large Language Models \u003C\u002Fa>\u003C\u002Fh2>\n\u003Ch5 align=\"center\"> If you like our project, please give us a star ⭐ on GitHub for the latest update.\u003C\u002Fh5>\n\n\u003Ch5 align=\"center\">\n\n\n   [![Awesome](https:\u002F\u002Fawesome.re\u002Fbadge.svg)](https:\u002F\u002Fawesome.re)\n   [![arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2311.05112-red)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.05112.pdf)\n   [![twitter](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTwitter%40elvis%20-black?logo=twitter&logoColor=1D9BF0&color=black&link=https%3A%2F%2Ftwitter.com%2Fomarsar0%2Fstatus%2F1734599425568231513%3Fs%3D61%26t%3D8Li3X-wK0wxSRkHjNK7Pfw)](https:\u002F\u002Fx.com\u002Fomarsar0\u002Fstatus\u002F1734599425568231513?s=20)\n   [![TechBeat](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F将门创投%20-black)](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002FgV3HHkVQXgR-Cego1P0ZBQ)\n   [![YouTube](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-YouTube-000000?logo=youtube&logoColor=FF0000)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=mSlKPzmW3Ac&t=23s)\n   ![GitHub Repo stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAI-in-Health\u002FMedLLMsPracticalGuide?logoColor=%23C8A2C8&color=%23DCC6E0)\n\n\n\u003C\u002Fh5>\n\nThis is an actively updated list of practical guide resources for Medical Large Language Models (Medical LLMs).\nIt's based on our survey paper: \n\n> [Nature Reviews Bioengineering🔥]\n> [Application of Large Language Models in Medicine](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs44222-025-00279-5)\n\n> [arXiv Preprint]\n> [A Survey of Large Language Models in Medicine: Progress, Application, and Challenge](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05112)\n> \n> *Hongjian Zhou\u003Csup>1,\\*\u003C\u002Fsup>, Fenglin Liu\u003Csup>1,\\*\u003C\u002Fsup>, Boyang Gu\u003Csup>2,\\*\u003C\u002Fsup>, Xinyu Zou\u003Csup>3,\\*\u003C\u002Fsup>, Jinfa Huang\u003Csup>4,\\*\u003C\u002Fsup>, Jinge Wu\u003Csup>5\u003C\u002Fsup>, Yiru Li\u003Csup>6\u003C\u002Fsup>, Sam S. Chen\u003Csup>7\u003C\u002Fsup>, Peilin Zhou\u003Csup>8\u003C\u002Fsup>, Junling Liu\u003Csup>9\u003C\u002Fsup>, Yining Hua\u003Csup>10\u003C\u002Fsup>,\nChengfeng Mao\u003Csup>11\u003C\u002Fsup>, Chenyu You\u003Csup>12\u003C\u002Fsup>, Xian Wu\u003Csup>13\u003C\u002Fsup>, Yefeng Zheng\u003Csup>13\u003C\u002Fsup>, Lei Clifton\u003Csup>1\u003C\u002Fsup>,\nZheng Li\u003Csup>14,†\u003C\u002Fsup>, Jiebo Luo\u003Csup>4,†\u003C\u002Fsup>, \nDavid A. Clifton\u003Csup>1,†\u003C\u002Fsup>.* (\\*Core Contributors, †Corresponding Authors)\n\n> *\u003Csup>1\u003C\u002Fsup>University of Oxford, \u003Csup>2\u003C\u002Fsup>Imperial College London, \u003Csup>3\u003C\u002Fsup>University of Waterloo,\n\u003Csup>4\u003C\u002Fsup>University of Rochester, \u003Csup>5\u003C\u002Fsup>University College London, \u003Csup>6\u003C\u002Fsup>Western University,\n\u003Csup>7\u003C\u002Fsup>University of Georgia, \u003Csup>8\u003C\u002Fsup>Hong Kong University of Science and Technology (Guangzhou),\n\u003Csup>9\u003C\u002Fsup>Alibaba, \u003Csup>10\u003C\u002Fsup>Harvard T.H. Chan School of Public Health, \u003Csup>11\u003C\u002Fsup>MIT, \u003Csup>12\u003C\u002Fsup>Yale University, \u003Csup>13\u003C\u002Fsup>Tencent, \u003Csup>14\u003C\u002Fsup>Amazon*\n\n##  📣 Update News\n[2025-04-08] 🎉🎉🎉 Our paper has officially been published at Nature Reviews Bioengineering, and the GitHub Repo has reached 1,500 🌟!\n\n\u003C!--\n[2024-10-11] 🎉🎉🎉 Big News! Our repository has reached 1,000 🌟. Thank you to everyone who contributed.\n\n[2024-07-10] We have updated our [Version 6](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05112). Thank you all for your support!\n\n[2024-05-05] We have updated our [Version 5](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05112). Please check it out!\n\n[2024-03-03] We have updated our [Version 4](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05112). Please check it out!\n\n[2024-02-04] 🍻🍻🍻 Cheers! Happy Chinese New Year! We have updated our [Version 3](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05112). Please check it out!\n\n[2023-12-11] We have updated our survey [Version 2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05112). Please check it out!\n-->\n\n[2023-11-09] We have released the repository and [survey](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05112).\n\n## ⚡ Contributing\n\nIf you want to add your work or model to this list, please do not hesitate to email fenglin.liu@eng.ox.ac.uk and jhuang90@ur.rochester.edu or [pull requests]([https:\u002F\u002Fgithub.com\u002Frichard-peng-xia\u002Fawesome-multimodal-in-medical-imaging\u002Fpulls](https:\u002F\u002Fgithub.com\u002FAI-in-Health\u002FMedLLMsPracticalGuide\u002Fpulls)).\nMarkdown format:\n\n```markdown\n* [**Name of Conference or Journal + Year**] Paper Name. [[paper]](link) [[code]](link)\n```\n## 🤔 What are the Goals of the Medical LLM?\n\n**Goal 1: Surpassing Human-Level Expertise**.\n\n\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_ad4bc945cf49.png\" width=\"800px\">\n\u003C\u002Fdiv>\n\n**Goal 2: Emergent Properties of Medical LLM with the Model Size Scaling Up**.\n\n\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_6657bf9b6091.png\" width=\"800px\">\n\u003C\u002Fdiv>\n\n\n## 🤗 What is This Survey About?\nThis survey provides a comprehensive overview of the principles, applications, and challenges faced by LLMs in medicine. We address the following specific questions: \n1.  How should medical LLMs be built? \n2.  What are the measures for the downstream performance of medical LLMs? \n3.  How should medical LLMs be utilized in real-world clinical practice? \n4.  What challenges arise from the use of medical LLMs? \n5.  How should we better construct and utilize medical LLMs? \n\nThis survey aims to provide insights into the opportunities and challenges of LLMs in medicine, and serve as a practical resource for constructing effective medical LLMs. \n\n\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_6fa2df387eed.png\" width=\"800px\">\n\u003C\u002Fdiv>\n\n## Table of Contents\n- [📣 Update News](#-update-news)\n- [⚡ Contributing](#-contributing)\n- [🤔 What are the Goals of the Medical LLM?](#-what-are-the-goals-of-the-medical-llm)\n- [🤗 What is This Survey About?](#-what-is-this-survey-about)\n- [Table of Contents](#table-of-contents)\n- [🔥 Practical Guide for Building Pipeline](#-practical-guide-for-building-pipeline)\n  - [Pre-training from Scratch](#pre-training-from-scratch)\n  - [Fine-tuning General LLMs](#fine-tuning-general-llms)\n  - [Prompting General LLMs](#prompting-general-llms)\n- [📊 Practical Guide for Medical Data](#-practical-guide-for-medical-data)\n  - [Clinical Knowledge Bases](#clinical-knowledge-bases)\n  - [Pre-training Data](#pre-training-data)\n  - [Fine-tuning Data](#fine-tuning-data)\n- [🗂️ Downstream Biomedical Tasks](#️-downstream-biomedical-tasks)\n  - [Huggingface Leadboard](#huggingface-leadboard)\n  - [Generative Tasks](#generative-tasks)\n    - [Text Summarization](#text-summarization)\n    - [Text Simplification](#text-simplification)\n    - [Question Answering](#question-answering)\n  - [Discriminative Tasks](#discriminative-tasks)\n    - [Entity Extraction](#entity-extraction)\n    - [Relation Extraction](#relation-extraction)\n    - [Text Classification](#text-classification)\n    - [Natural Language Inference](#natural-language-inference)\n    - [Semantic Textual Similarity](#semantic-textual-similarity)\n    - [Information Retrieval](#information-retrieval)\n- [✨ Practical Guide for Clinical Applications](#-practical-guide-for-clinical-applications)\n  - [Retrieval-augmented Generation](#retrieval-augmented-generation)\n  - [Medical Decision-Making](#medical-decision-making)\n  - [Clinical Coding](#clinical-coding)\n  - [Clinical Report Generation](#clinical-report-generation)\n  - [Medical Education](#medical-education)\n  - [Medical Robotics](#medical-robotics)\n  - [Medical Language Translation](#medical-language-translation)\n  - [Mental Health Support](#mental-health-support)\n- [⚔️ Practical Guide for Challenges](#️-practical-guide-for-challenges)\n  - [Hallucination](#hallucination)\n  - [Lack of Evaluation Benchmarks and Metrics](#lack-of-evaluation-benchmarks-and-metrics)\n  - [Domain Data Limitations](#domain-data-limitations)\n  - [New Knowledge Adaptation](#new-knowledge-adaptation)\n  - [Behavior Alignment](#behavior-alignment)\n  - [Ethical, Legal, and Safety Concerns](#ethical-legal-and-safety-concerns)\n- [🚀 Practical Guide for Future Directions](#-practical-guide-for-future-directions)\n  - [Introduction of New Benchmarks](#introduction-of-new-benchmarks)\n  - [Interdisciplinary Collaborations](#interdisciplinary-collaborations)\n  - [Multi-modal LLM](#multi-modal-llm)\n  - [Medical Agents](#medical-agents)\n- [👍 Acknowledgement](#-acknowledgement)\n- [📑 Citation](#-citation)\n- [♥️ Contributors](#️-contributors)\n\n## 🔥 Practical Guide for Building Pipeline\n\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_abc3f4e9fddd.png\" width=\"1000px\">\n\u003C\u002Fdiv>\n\n### Pre-training from Scratch\n* [**Nature Medicine, 2024**] **BiomedGPT** A generalist vision–language foundation model for diverse biomedical tasks [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41591-024-03185-2)\n* [**Nature, 2023**] **NYUTron** Health system-scale language models are all-purpose prediction engines [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-023-06160-y)\n* [**Arxiv, 2023**] **OphGLM**: Training an Ophthalmology Large Language-and-Vision Assistant based on Instructions and Dialogue. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.12174)\n* [**npj Digital Medicine, 2023**] **GatorTronGPT**: A Study of Generative Large Language Model for Medical Research and Healthcare. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13523)\n* [**Bioinformatics, 2023**] **MedCPT**: Contrastive Pre-trained Transformers with Large-scale Pubmed Search Logs for Zero-shot Biomedical Information Retrieval. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.00589)\n* [**Bioinformatics, 2022**] **BioGPT**: Generative Pre-trained Transformer for Biomedical Text Generation and Mining. [paper](https:\u002F\u002Facademic.oup.com\u002Fbib\u002Farticle-abstract\u002F23\u002F6\u002Fbbac409\u002F6713511)\n* [**NeurIPS, 2022**] **DRAGON**: Deep Bidirectional Language-Knowledge Graph Pretraining. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.09338) [code](https:\u002F\u002Fgithub.com\u002Fmichiyasunaga\u002Fdragon)\n* [**ACL, 2022**] **BioLinkBERT\u002FLinkBERT**: Pretraining Language Models with Document Links. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.15827) [code](https:\u002F\u002Fgithub.com\u002Fmichiyasunaga\u002FLinkBERT)\n* [**npj Digital Medicine, 2022**] **GatorTron**: A Large Language Model for Electronic Health Records. [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-022-00742-2)\n* [**HEALTH, 2021**] **PubMedBERT**: Domain-specific Language Model Pretraining for Biomedical Natural Language Processing. [paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3458754)\n* [**Bioinformatics, 2020**] **BioBERT**: A Pre-trained Biomedical Language Representation Model for Biomedical Text Mining. [paper](https:\u002F\u002Facademic.oup.com\u002Fbioinformatics\u002Farticle-abstract\u002F36\u002F4\u002F1234\u002F5566506)\n* [**ENNLP, 2019**] **SciBERT**: A Pretrained Language Model for Scientific Text. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.10676)\n* [**NAACL Workshop, 2019**] **ClinicalBERT**: Publicly Available Clinical BERT Embeddings. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.03323)\n* [**BioNLP Workshop, 2019**] **BlueBERT**: Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.05474)\n\n### Fine-tuning General LLMs  \n* [**Nature Communications, 2024.9**] **MMed-Llama3**: Towards building multilingual language model for medicine. [[paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41467-024-52417-z) [[code]](https:\u002F\u002Fgithub.com\u002FMAGIC-AI4Med\u002FMMedLM)\n* [**Arxiv, 2024.8**] **Med42-v2**:  A Suite of Clinical LLMs. [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.06142) [Model](https:\u002F\u002Fhuggingface.co\u002Fm42-health)\n* [**JAMIA, 2024.5**] **Internist.ai 7b** Impact of high-quality, mixed-domain data on the performance of medical language models [paper](https:\u002F\u002Facademic.oup.com\u002Fjamia\u002Farticle-abstract\u002F31\u002F9\u002F1875\u002F7680487?redirectedFrom=fulltext) [Model](https:\u002F\u002Fhuggingface.co\u002Finternistai\u002Fbase-7b-v0.2)\n* [**Huggingface, 2024.5**] **OpenBioLLM-70b**: Advancing Open-source Large Language Models in Medical Domain [Model](https:\u002F\u002Fhuggingface.co\u002Faaditya\u002FLlama3-OpenBioLLM-70B)\n* [**Huggingface, 2024.5**] **MedLllama3** [model](https:\u002F\u002Fhuggingface.co\u002FProbeMedicalYonseiMAILab\u002Fmedllama3-v20)\n* [**Arxiv, 2024.5**] **Aloe**: A Family of Fine-tuned Open Healthcare LLMs. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.01886) [Model](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FHPAI-BSC\u002Fhealthcare-llms-aloe-family-6701b6a777f7e874a2123363)\n* [**Arxiv, 2024.4**] **Med-Gemini** Capabilities of Gemini Models in Medicine. [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.18416)\n* [**npj Digital Medicine, 2024**] **Meerkat**: Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks [Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.00376) \n* [**Arxiv, 2024.2**] **BioMistral** A Collection of Open-Source Pretrained Large Language Models for Medical Domains. [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.10373)\n* [**Arxiv, 2023.12**] **From Beginner to Expert**: Modeling Medical Knowledge into General LLMs. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.01040)\n* [**Arxiv, 2023.11**] **Taiyi**: A Bilingual Fine-Tuned Large Language Model for Diverse Biomedical Tasks. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.11608) [code](https:\u002F\u002Fgithub.com\u002FDUTIR-BioNLP\u002FTaiyi-LLM)\n* [**Arxiv, 2023.10**] **AlpaCare**: Instruction-tuned Large Language Models for Medical Application. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.14558) [code](https:\u002F\u002Fgithub.com\u002FXZhang97666\u002FAlpaCare)\n* [**Arxiv, 2023.10**] **BianQue**: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.15896)\n* [**Arxiv, 2023.10**] **Qilin-Med**: Multi-stage Knowledge Injection Advanced Medical Large Language Model. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.09089)\n* [**Arxiv, 2023.10**] **Qilin-Med-VL**: Towards Chinese Large Vision-Language Model for General Healthcare. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.17956)\n* [**Arxiv, 2023.10**] **MEDITRON-70B**: Scaling Medical Pretraining for Large Language Models. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.16079)\n* [**AAAI, 2024\u002F2023.10**] **Med42**:  Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.14779) [Model](https:\u002F\u002Fhuggingface.co\u002Fm42-health\u002Fmed42-70b)\n* [**Arxiv, 2023.9**] **CPLLM**: Clinical Prediction with Large Language Models. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11295)\n* [**Arxiv, 2023.8**] **BioMedGPT\u002FOpenBioMed** Open Multimodal Generative Pre-trained Transformer for BioMedicine. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.09442) [code](https:\u002F\u002Fgithub.com\u002FPharMolix\u002FOpenBioMed)\n* [**Nature Digital Medicine, 2023.8**] Large Language Models to Identify Social Determinants of Health in Electronic Health Records. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.06354) [[code]](https:\u002F\u002Fgithub.com\u002FAIM-Harvard\u002FSDoH)\n* [**Arxiv, 2023.8**] **Zhongjing**: Enhancing the Chinese medical capabilities of large language model through expert feedback and real-world multi-turn dialogue. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.03549)\n* [**Arxiv, 2023.7**] **Med-Flamingo**: Med-Flamingo: a Multimodal Medical Few-shot Learner. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.15189) [code](https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fmed-flamingo)\n* [**Arxiv, 2023.6**] **ClinicalGPT**: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.09968)\n* [**Cureus, 2023.6**] **ChatDoctor**: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge. [paper](https:\u002F\u002Fwww.cureus.com\u002Farticles\u002F152858-chatdoctor-a-medical-chat-model-fine-tuned-on-a-large-language-model-meta-ai-llama-using-medical-domain-knowledge.pdf)\n* [**NeurIPS Datasets\u002FBenchmarks Track, 2023.6**] **LLaVA-Med**: Training a large language-and-vision assistant for biomedicine in one day. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.00890)\n* [**Arxiv, 2023.6**] **MedPaLM 2**: Towards expert-level medical question answering with large language models. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.09617)\n* [**Arxiv, 2023.5**] **Clinical Camel**: An Open-Source Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.12031)\n* [**Arxiv, 2023.5**] **BiomedGPT**: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks. [paper]((https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17100))\n* [**Arxiv, 2023.5**] **HuatuoGPT**: HuatuoGPT, towards Taming Language Model to Be a Doctor. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.15075)\n* [**Arxiv, 2023.4**] **Baize-healthcare**: An open-source chat model with parameter-efficient tuning on self-chat data. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.01196)\n* [**Arxiv, 2023.4**] **Visual Med-Alpeca**: A parameter-efficient biomedical llm with visual capabilities. [github](https:\u002F\u002Fgithub.com\u002Fcambridgeltl\u002Fvisual-med-alpaca)\n* [**Arxiv, 2023.4**] **PMC-LLaMA**: Further finetuning llama on medical papers. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.14454)\n* [**Arxiv, 2023.4**] **MedPaLM M**: Towards Generalist Biomedical AI. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.14334) [code](https:\u002F\u002Fgithub.com\u002Fkyegomez\u002FMed-PaLM)\n* [**Arxiv, 2023.4**] **BenTsao\u002FHuatuo**: Tuning llama model with chinese medical knowledge. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.06975)\n* [**Github, 2023.4**] **ChatGLM-Med**: ChatGLM-Med: 基于中文医学知识的ChatGLM模型微调. [github](https:\u002F\u002Fgithub.com\u002FSCIR-HI\u002FMed-ChatGLM)\n* [**Arxiv, 2023.4**] **DoctorGLM**: Fine-tuning your chinese doctor is not a herculean task. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.01097)\n\n### Prompting General LLMs\n* [**NEJM AI, 2024**] GPT-4 for Information Retrieval and Comparison of Medical Oncology Guidelines. [paper](https:\u002F\u002Fai.nejm.org\u002Fdoi\u002Fabs\u002F10.1056\u002FAIcs2300235)\n* [**Arxiv, 2023.11**] **MedPrompt**: Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.16452)\n* [**Arxiv, 2023.8**] **Dr. Knows**: Leveraging a medical knowledge graph into large language models for diagnosis prediction. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.14321)\n* [**Arxiv, 2023.3**] **DelD-GPT**: Zero-shot medical text de-identification by gpt-4. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.11032) [code](https:\u002F\u002Fgithub.com\u002Fyhydhx\u002FChatGPT-API)\n* [**Arxiv, 2023.2\u002F5**] **ChatCAD\u002FChatCAD+**: Interactive computer-aided diagnosis on medical image using large language models. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.07257) [code](https:\u002F\u002Fgithub.com\u002Fzhaozh10\u002FChatCAD)\n* [**Nature, 2022.12**] **MedPaLM**: Large language models encode clinical knowledge. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.13138)\n* [**Arxiv, 2022.7\u002F2023.12**] Can large language models reason about medical questions? [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.08143) \n\n\n## 📊 Practical Guide for Medical Data\n\n### Clinical Knowledge Bases\n* **[Drugs.com](https:\u002F\u002Fwww.drugs.com\u002F)**\n* **[DrugBank](https:\u002F\u002Fgo.drugbank.com\u002F)**\n* **[NHS Health](https:\u002F\u002Fwww.nhs.uk\u002Fconditions\u002F)**\n* **[NHS Medicine](https:\u002F\u002Fwww.nhs.uk\u002Fmedicines\u002F)**\n* **[Unified Medical Language System (UMLS)](https:\u002F\u002Fwww.nlm.nih.gov\u002Fresearch\u002Fumls\u002Findex.html)**\n* **[The Human Phenotype Ontology](https:\u002F\u002Fhpo.jax.org\u002Fapp\u002F)**\n* **[Center for Disease Control and Prevention](https:\u002F\u002Fwww.cdc.gov\u002F)**\n* **[National Institute for Health and Care Excellence](https:\u002F\u002Fwww.nice.org.uk\u002Fguidance)**\n* **[World Health Organization](https:\u002F\u002Fwww.who.int\u002Fpublications\u002Fwho-guidelines)**\n\n### Pre-training Data\n* [**NEJM AI, 2024**] Clinical Text Datasets for Medical Artificial Intelligence and Large Language Models — A Systematic Review [paper](https:\u002F\u002Fai.nejm.org\u002Fdoi\u002Ffull\u002F10.1056\u002FAIra2400012)\n* [**npj Digital Medicine, 2023**] **EHRs**: A Study of Generative Large Language Model for Medical Research and Healthcare. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13523)\n* [**Arxiv, 2023**] **Guidelines**: A high-quality collection of clinical practice guidelines (CPGs) for the medical training of LLMs. [dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fepfl-llm\u002Fguidelines)\n* [**Arxiv, 2023**] **GAP-REPLAY**: Scaling Medical Pretraining for Large Language Models. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.16079)\n* [**npj Digital Medicine, 2022**] **EHRs**: A large language model for electronic health records. [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-022-00742-2)\n* [**National Library of Medicine, 2022**] **PubMed**: National Institutes of Health. PubMed Data. [database](https:\u002F\u002Fpubmed.ncbi.nlm.nih.gov\u002Fdownload\u002F)\n* [**Arxiv, 2020**] **PubMed**: The pile: An 800 GB dataset of diverse text for language modeling. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.00027) [code](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Fthe-pile)\n* [**EMNLP, 2020**] **MedDialog**: Meddialog: Two large-scale medical dialogue datasets. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.03329) [code](https:\u002F\u002Fgithub.com\u002FUCSD-AI4H\u002FMedical-Dialogue-System)\n* [**NAACL, 2018**] **Literature**: Construction of the literature graph in semantic scholar. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1805.02262)\n* [**Scientific Data, 2016**] **MIMIC-III**: MIMIC-III, a freely accessible critical care database. [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fsdata201635)\n\n### Fine-tuning Data\n* **MedBook-18-CoT**: Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks [Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.00376) [Huggingface](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdmis-lab\u002Fmeerkat-instructions) \n* **MMedC**: Towards building multilingual language model for medicine. [[paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41467-024-52417-z) [[code]](https:\u002F\u002Fgithub.com\u002FMAGIC-AI4Med\u002FMMedLM) [[huggingface]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FHenrychur\u002FMMedC)\n* **MedTrinity-25M**: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine. 2024. [github](https:\u002F\u002Fgithub.com\u002FUCSC-VLAA\u002FMedTrinity-25M) [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.02900)\n* **cMeKG**: Chinese Medical Knowledge Graph. 2023. [github](https:\u002F\u002Fgithub.com\u002Fking-yyf\u002FCMeKG_tools)\n* **CMD.**: Chinese medical dialogue data. 2023. [repo](https:\u002F\u002Fgithub.com\u002FToyhom\u002FChinese-medical-dialogue-data)\n* **BianQueCorpus**: BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.15896)\n* **MD-EHR**: ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.09968)\n* **VariousMedQA**: Multi-scale attentive interaction networks for chinese medical question answer selection. 2018. [paper](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8548603\u002F)\n* **VariousMedQA**: What disease does this patient have? a large-scale open domain question answering dataset from medical exams. 2021. [paper](https:\u002F\u002Fwww.mdpi.com\u002F2076-3417\u002F11\u002F14\u002F6421)\n* **MedDialog**: Meddialog: Two large-scale medical dialogue datasets. 2020. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.03329)\n* **ChiMed**: Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.09089)\n* **ChiMed-VL**: Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.17956)\n* **Healthcare Magic**: Healthcare Magic. [link](https:\u002F\u002Fwww.healthcaremagic.com\u002F),[huggingface](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fwangrongsheng\u002FHealthCareMagic-100k-en)\n* **ICliniq**: ICliniq. [platform](https:\u002F\u002Fwww.icliniq.com\u002F)\n* **Hybrid SFT**: HuatuoGPT, towards Taming Language Model to Be a Doctor. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.15075)\n* **PMC-15M**: Large-scale domain-specific pretraining for biomedical vision-language processing. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.00915)\n* **MedQuAD**: A question-entailment approach to question answering. 2019. [paper](https:\u002F\u002Fbmcbioinformatics.biomedcentral.com\u002Farticles\u002F10.1186\u002Fs12859-019-3119-4?ref=https:\u002F\u002Fgithubhelp.com)\n* **VariousMedQA**: Visual med-alpaca: A parameter-efficient biomedical llm with visual capabilities. 2023. [repo](https:\u002F\u002Fgithub.com\u002Fcambridgeltl\u002Fvisual-med-alpaca)\n* **CMtMedQA**:Zhongjing: Enhancing the Chinese medical capabilities of large language model through expert feedback and real-world multi-turn dialogue. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.03549)\n* **MTB**: Med-flamingo: a multimodal medical few-shot learner. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.15189)\n* **PMC-OA**: Pmc-clip: Contrastive language-image pre-training using biomedical documents. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.07240)\n* **Medical Meadow**: MedAlpaca--An Open-Source Collection of Medical Conversational AI Models and Training Data. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.08247)\n* **Literature**: S2ORC: The semantic scholar open research corpus. 2019. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.02782)\n* **MedC-I**: Pmc-llama: Further finetuning llama on medical papers. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.14454)\n* **ShareGPT**: Sharegpt. 2023. [platform](https:\u002F\u002Fsharegpt.com\u002F)\n* **PubMed**: National Institutes of Health. PubMed Data. In National Library of Medicine. 2022. [database](https:\u002F\u002Fpubmed.ncbi.nlm.nih.gov\u002Fdownload\u002F)\n* **MedQA**: What disease does this patient have? a large-scale open domain question answering dataset from medical exams. 2021. [paper](https:\u002F\u002Fwww.mdpi.com\u002F2076-3417\u002F11\u002F14\u002F6421)\n* **MultiMedQA**: Towards expert-level medical question answering with large language models. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.09617)\n* **MultiMedBench**: Towards generalist biomedical ai. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.14334)\n* **MedInstruct-52**: Instruction-tuned Large Language Models for Medical Application. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.14558)\n* **eICU-CRD**: The eicu collaborative research database, a freely available multi-center database for critical care research. 2018. [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fsdata2018178)\n* **MIMIC-IV**: MIMIC-IV, a freely accessible electronic health record dataset. 2023. [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41597-022-01899-x) [database](https:\u002F\u002Fphysionet.org\u002Fcontent\u002Fmimiciv\u002F2.2\u002F)\n* **PMC-Patients**: 167k open patient summaries. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.13876) [database](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fzhengyun21\u002FPMC-Patients)\n  \n## 🗂️ Downstream Biomedical Tasks\n\n\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_88518667ecfb.png\" width=\"800px\">\n\u003C\u002Fdiv>\n\n### Huggingface Leaderboard\n* **Open Medical-LLM Leaderboard**: MedQA (USMLE), PubMedQA, MedMCQA, and subsets of MMLU related to medicine and biology.  [Leaderboard](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fopenlifescienceai\u002Fopen_medical_llm_leaderboard)\n* **ReXrank**: A Public Leaderboard for AI-Powered Radiology Report Generation [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.15122\u002F) [[paper]] [[code]] (https:\u002F\u002Fgithub.com\u002Frajpurkarlab\u002FReXrank)\n### Generative Tasks\n\n#### Text Summarization\n* **PubMed**: National Institutes of Health. PubMed Data. In National Library of Medicine. [database](https:\u002F\u002Fpubmed.ncbi.nlm.nih.gov\u002Fdownload\u002F)\n* **PMC**: National Institutes of Health. PubMed Central Data. In National Library of Medicine. [database](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002F)\n* **CORD-19**: Cord-19: The covid-19 open research dataset 2020. [paper](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC7251955\u002F)\n* **MentSum**: Mentsum: A resource for exploring summarization of mental health online posts 2022. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.00856)\n* **MeQSum**: On the summarization of consumer health questions 2019. [paper](https:\u002F\u002Faclanthology.org\u002FP19-1215\u002F)\n* **MedQSum**: Enhancing Large Language Models’ Utility for Medical Question-Answering: A Patient Health Question Summarization Approach. [[paper]](https:\u002F\u002Fdoi.org\u002F10.1109\u002FSITA60746.2023.10373720) [[code]](https:\u002F\u002Fgithub.com\u002Fzekaouinoureddine\u002FMedQSum)\n\n#### Text Simplification\n* **MultiCochrane**: Multilingual Simplification of Medical Texts 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.12532)\n* **AutoMeTS**: AutoMeTS: the autocomplete for medical text simplification 2020. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.10573)\n\n#### Question Answering\n* **CareQA**: CareQA: A multichoice question answering dataset based on the access exam for Spanish Specialised Healthcare Training (FSE).[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.01886) [dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FHPAI-BSC\u002FCareQA)\n* **BioASQ-QA**: BioASQ-QA: A manually curated corpus for Biomedical Question Answering 2023. [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41597-023-02068-4)\n* **emrQA**: emrqa: A large corpus for question answering on electronic medical records 2018. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.00732)\n* **CliCR**: CliCR: a dataset of clinical case reports for machine reading comprehension 2018. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.09720)\n* **PubMedQA**: Pubmedqa: A dataset for biomedical research question answering 2019. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.06146)\n* **COVID-QA**: COVID-QA: A question answering dataset for COVID-19 2020. [paper](https:\u002F\u002Faclanthology.org\u002F2020.nlpcovid19-acl.18\u002F)\n* **MASH-QA**: Question answering with long multiple-span answers 2020. [paper](https:\u002F\u002Faclanthology.org\u002F2020.findings-emnlp.342\u002F)\n* **Health-QA**: A hierarchical attention retrieval model for healthcare question answering 2019. [paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3308558.3313699)\n* **MedQA**: What disease does this patient have? a large-scale open domain question answering dataset from medical exams 2021. [paper](https:\u002F\u002Fwww.mdpi.com\u002F2076-3417\u002F11\u002F14\u002F6421)\n* **MedMCQA**: Medmcqa: A large-scale multi-subject multi-choice dataset for medical domain question answering 2022. [paper](https:\u002F\u002Fproceedings.mlr.press\u002Fv174\u002Fpal22a.html)\n* **MMLU (Clinical Knowledge)**: Measuring massive multitask language understanding 2020. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.03300)\n* **MMLU (College Medicine)**: Measuring massive multitask language understanding 2020. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.03300)\n* **MMLU (Professional Medicine)**: Measuring massive multitask language understanding 2020. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.03300)\n* [**Arxiv 2024**] MediQ: Question-Asking LLMs for Adaptive and Reliable Clinical Reasoning. [[paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.00922) [[code]](https:\u002F\u002Fgithub.com\u002Fstellalisy\u002FMediQ)\n* **EWS_v5_USONLY_final**: Emergency War Surgery QA Dataset (v5) 2025. [data](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FPaxrad\u002FEWS_v5_USONLY_final)\n\n### Discriminative Tasks\n\n#### Entity Extraction\n* [**Arxiv, 2024.10**] Named Clinical Entity Recognition Benchmark [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.05046) [Leaderboard](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fm42-health\u002Fclinical_ner_leaderboard)\n* **NCBI Disease**: NCBI disease corpus: a resource for disease name recognition and concept normalization 2014. [paper](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046413001974)\n* **JNLPBA**: Introduction to the bio-entity recognition task at JNLPBA 2004. [paper](https:\u002F\u002Faclanthology.org\u002FW04-1213.pdf)\n* **GENIA**: GENIA corpus--a semantically annotated corpus for bio-textmining 2003. [paper](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FJin-Dong-Kim-2\u002Fpublication\u002F10667350_GENIA_corpus-A_semantically_annotated_corpus_for_bio-textmining\u002Flinks\u002F00b49520d9a33ae419000000\u002FGENIA-corpus-A-semantically-annotated-corpus-for-bio-textmining.pdf)\n* **BC5CDR**: BioCreative V CDR task corpus: a resource for chemical disease relation extraction 2016. [paper](https:\u002F\u002Facademic.oup.com\u002Fdatabase\u002Farticle\u002Fdoi\u002F10.1093\u002Fdatabase\u002Fbaw068\u002F2630414?ref=https%3A%2F%2Fgithubhelp.com&login=true)\n* **BC4CHEMD**: The CHEMDNER corpus of chemicals and drugs and its annotation principles 2015. [paper](https:\u002F\u002Fjcheminf.biomedcentral.com\u002Farticles\u002F10.1186\u002F1758-2946-7-S1-S2)\n* **BioRED**: BioRED: a rich biomedical relation extraction dataset 2022. [paper](https:\u002F\u002Facademic.oup.com\u002Fbib\u002Farticle-abstract\u002F23\u002F5\u002Fbbac282\u002F6645993)\n* **CMeEE**: Cblue: A chinese biomedical language understanding evaluation benchmark 2021. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.08087)\n* **NLM-Chem-BC7**: NLM-Chem-BC7: manually annotated full-text resources for chemical entity annotation and indexing in biomedical articles 2022. [paper](https:\u002F\u002Facademic.oup.com\u002Fdatabase\u002Farticle-abstract\u002Fdoi\u002F10.1093\u002Fdatabase\u002Fbaac102\u002F6858529)\n* **ADE**: Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports 2012. [paper](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046412000615)\n* **2012 i2b2**: Evaluating temporal relations in clinical text: 2012 i2b2 challenge 2013. [paper](https:\u002F\u002Facademic.oup.com\u002Fjamia\u002Farticle-abstract\u002F20\u002F5\u002F806\u002F726374)\n* **2014 i2b2\u002FUTHealth (Track 1)**: Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2\u002FUTHealth corpus 2015. [paper](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046415001823)\n* **2018 n2c2 (Track 2)**: 2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records 2020. [paper](https:\u002F\u002Facademic.oup.com\u002Fjamia\u002Farticle-abstract\u002F27\u002F1\u002F3\u002F5581277)\n* **Cadec**: Cadec: A corpus of adverse drug event annotations 2015. [paper](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046415000532)\n* **DDI**: Semeval-2013 task 9: Extraction of drug-drug interactions from biomedical texts (ddiextraction 2013) 2013. [paper](https:\u002F\u002Fe-archivo.uc3m.es\u002Fhandle\u002F10016\u002F20455)\n* **PGR**: A silver standard corpus of human phenotype-gene relations 2019. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.10728)\n* **EU-ADR**: The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships 2012. [paper](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046412000573)\n* [**BioCreative VII Challenge, 2021**] Medications detection in tweets using transformer networks and multi-task learning. [[paper]]( https:\u002F\u002Farxiv.org\u002Fabs\u002F2111.13726) [[code]]( https:\u002F\u002Fgithub.com\u002FMachine-Learning-for-Medical-Language\u002FSMMH-Medication-Detection)\n\n#### Relation Extraction\n* **BC5CDR**: BioCreative V CDR task corpus: a resource for chemical disease relation extraction 2016. [paper](https:\u002F\u002Facademic.oup.com\u002Fdatabase\u002Farticle\u002Fdoi\u002F10.1093\u002Fdatabase\u002Fbaw068\u002F2630414?ref=https%3A%2F%2Fgithubhelp.com&login=true)\n* **BioRED**: BioRED: a rich biomedical relation extraction dataset 2022. [paper](https:\u002F\u002Facademic.oup.com\u002Fbib\u002Farticle-abstract\u002F23\u002F5\u002Fbbac282\u002F6645993)\n* **ADE**: Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports 2012. [paper](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046412000615)\n* **2018 n2c2 (Track 2)**: 2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records 2020. [paper](https:\u002F\u002Facademic.oup.com\u002Fjamia\u002Farticle-abstract\u002F27\u002F1\u002F3\u002F5581277)\n* **2010 i2b2\u002FVA**: 2010 i2b2\u002FVA challenge on concepts, assertions, and relations in clinical text 2011. [paper](https:\u002F\u002Facademic.oup.com\u002Fjamia\u002Farticle-abstract\u002F18\u002F5\u002F552\u002F830538)\n* **ChemProt**: Overview of the BioCreative VI chemical-protein interaction Track 2017. [database](https:\u002F\u002Fbiocreative.bioinformatics.udel.edu\u002Fnews\u002Fcorpora\u002Fchemprot-corpus-biocreative-vi\u002F)\n* **GDA**: Renet: A deep learning approach for extracting gene-disease associations from literature 2019. [paper](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-030-17083-7_17)\n* **DDI**: Semeval-2013 task 9: Extraction of drug-drug interactions from biomedical texts (ddiextraction 2013) 2013. [paper](https:\u002F\u002Fe-archivo.uc3m.es\u002Fhandle\u002F10016\u002F20455)\n* **GAD**: The genetic association database 2004. [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fng0504-431)\n* **2012 i2b2**: Evaluating temporal relations in clinical text: 2012 i2b2 challenge 2013. [paper](https:\u002F\u002Facademic.oup.com\u002Fjamia\u002Farticle-abstract\u002F20\u002F5\u002F806\u002F726374)\n* **PGR**: A silver standard corpus of human phenotype-gene relations 2019. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.10728)\n* **EU-ADR**: The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships 2012. [paper](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046412000573)\n\n#### Text Classification\n* **OpiateID**: Identifying Self-Disclosures of Use, Misuse and Addiction in Community-based Social Media Posts. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.09066) [code](https:\u002F\u002Fgithub.com\u002Fyangalan123\u002FOpioidID)\n* **ADE**: Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports 2012. [paper](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046412000615)\n* **2014 i2b2\u002FUTHealth (Track 2)**: Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2\u002FUTHealth corpus 2015. [paper](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046415001823)\n* **HoC**: Automatic semantic classification of scientific literature according to the hallmarks of cancer 2016. [paper](https:\u002F\u002Facademic.oup.com\u002Fbioinformatics\u002Farticle-abstract\u002F32\u002F3\u002F432\u002F1743783)\n* **OHSUMED**: OHSUMED: An interactive retrieval evaluation and new large test collection for research 1994. [paper](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-1-4471-2099-5_20)\n* **WNUT-2020 Task 2**: WNUT-2020 task 2: identification of informative COVID-19 english tweets 2020. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.08232)\n* **Medical Abstracts**: Evaluating unsupervised text classification: zero-shot and similarity-based approaches 2022. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.16285)\n* **MIMIC-III**: MIMIC-III, a freely accessible critical care database 2016. [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fsdata201635)\n\n#### Natural Language Inference\n* **MedNLI**: Lessons from natural language inference in the clinical domain 2018. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1808.06752)\n* **BioNLI**: BioNLI: Generating a Biomedical NLI Dataset Using Lexico-semantic Constraints for Adversarial Examples 2022. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.14814)\n\n#### Semantic Textual Similarity\n* **MedSTS**: MedSTS: a resource for clinical semantic textual similarity 2020. [paper](https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.1007\u002Fs10579-018-9431-1)\n* **2019 n2c2\u002FOHNLP**: The 2019 n2c2\u002Fohnlp track on clinical semantic textual similarity: overview 2020. [paper](https:\u002F\u002Fmedinform.jmir.org\u002F2020\u002F11\u002Fe23375)\n* **BIOSSES**: BIOSSES: a semantic sentence similarity estimation system for the biomedical domain 2017. [paper](https:\u002F\u002Facademic.oup.com\u002Fbioinformatics\u002Farticle-abstract\u002F33\u002F14\u002Fi49\u002F3953954)\n\n#### Information Retrieval\n* **TREC-COVID**: TREC-COVID: constructing a pandemic information retrieval test collection 2021. [paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3451964.3451965)\n* **NFCorpus**: A full-text learning to rank dataset for medical information retrieval 2016. [paper](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-319-30671-1_58)\n* **BioASQ (BEIR)**: A heterogenous benchmark for zero-shot evaluation of information retrieval models 2021. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.08663)\n\n## ✨ Practical Guide for Clinical Applications\n\n\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_def02b00fd14.png\" width=\"800px\">\n\u003C\u002Fdiv>\n\n### Retrieval-augmented Generation\n* [**Arxiv, 2024**] Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation. [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.04187v1)\n* [**NEJM AI, 2024**] GPT-4 for Information Retrieval and Comparison of Medical Oncology Guidelines. [paper](https:\u002F\u002Fai.nejm.org\u002Fdoi\u002Fabs\u002F10.1056\u002FAIcs2300235)\n* [**Arxiv, 2023**] Think and Retrieval: A Hypothesis Knowledge Graph Enhanced Medical Large Language Models. [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.15883.pdf)\n* [**JASN, 2023**] Retrieve, Summarize, and Verify: How Will ChatGPT Affect Information Seeking from the Medical Literature? [paper](https:\u002F\u002Fjournals.lww.com\u002Fjasn\u002Ffulltext\u002F2023\u002F08000\u002Fretrieve,_summarize,_and_verify__how_will_chatgpt.4.aspx)\n\n### Medical Decision-Making\n* [**NAACL Findings, 2024**] Identifying Self-Disclosures of Use, Misuse and Addiction in Community-based Social Media Posts. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.09066) [code](https:\u002F\u002Fgithub.com\u002Fyangalan123\u002FOpioidID)\n* [**Nature, 2023**] **NYUTron** Health system-scale language models are all-purpose prediction engines [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-023-06160-y)\n* [**Arxiv, 2023**] Leveraging a medical knowledge graph into large language models for diagnosis prediction. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.14321)\n* [**Arxiv, 2023**] ChatCAD+\u002FChatcad: Interactive computer-aided diagnosis on medical image using large language models. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.07257) [code](https:\u002F\u002Fgithub.com\u002Fzhaozh10\u002FChatCAD)\n* [**Cancer Inform, 2023**] Designing a Deep Learning-Driven Resource-Efficient Diagnostic System for Metastatic Breast Cancer: Reducing Long Delays of Clinical Diagnosis and Improving Patient Survival in Developing Countries. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.02597)\n* [**Nature Medicine, 2023**] Large language models in medicine. [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41591-023-02448-8)\n* [**Nature Medicine, 2022**] AI in health and medicine. [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41591-021-01614-0)\n\n### Clinical Coding\n* [**NEJM AI, 2024**] Large Language Models Are Poor Medical Coders — Benchmarking of Medical Code Querying. [paper](https:\u002F\u002Fai.nejm.org\u002Fdoi\u002Ffull\u002F10.1056\u002FAIdbp2300040)\n* [**JMAI, 2023**] Applying large language model artificial intelligence for retina International Classification of Diseases (ICD) coding. [paper](https:\u002F\u002Fjmai.amegroups.org\u002Farticle\u002Fview\u002F8198\u002Fhtml)\n* [**ClinicalNLP Workshop, 2022**] PLM-ICD: automatic ICD coding with pretrained language models. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.05289) [code](https:\u002F\u002Fgithub.com\u002FMiuLab\u002FPLM-ICD)\n\n### Clinical Report Generation\n* [**Nature Medicine, 2024**] Adapted large language models can outperform medical experts in clinical text summarization. [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41591-024-02855-5)\n* [**Arxiv, 2023**] Can GPT-4V (ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.09909)\n* [**Arxiv, 2023**] Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.17956)\n* [**Arxiv, 2023**] Customizing General-Purpose Foundation Models for Medical Report Generation. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.05642)\n* [**Arxiv, 2023**] Towards generalist foundation model for radiology. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.02463) [code](https:\u002F\u002Fgithub.com\u002Fchaoyi-wu\u002FRadFM)\n* [**Arxiv, 2023**] Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.07430) [project](https:\u002F\u002Fstanfordmimi.github.io\u002Fclin-summ\u002F) [code](https:\u002F\u002Fgithub.com\u002FStanfordMIMI\u002Fclin-summ)\n* [**Arxiv, 2023**] MAIRA-1: A specialised large multimodal model for radiology report generation. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.13668) [project](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fproject\u002Fproject-maira\u002F)\n* [**Arxiv, 2023**] Consensus, dissensus and synergy between clinicians and specialist foundation models in radiology report generation. [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.18260.pdf)\n* [**Lancet Digit Health, 2023**] Using ChatGPT to write patient clinic letters. [paper](https:\u002F\u002Fwww.thelancet.com\u002Fjournals\u002Flandig\u002Farticle\u002FPIIS2589-7500(23)00048-1\u002Ffulltext)\n* [**Lancet Digit Health, 2023**] ChatGPT: the future of discharge summaries?. [paper](https:\u002F\u002Fwww.thelancet.com\u002Fjournals\u002Flandig\u002Farticle\u002FPIIS2589-7500(23)00021-3\u002Ffulltext)\n* [**Arxiv, 2023.2\u002F5**] **ChatCAD\u002FChatCAD+**: Interactive computer-aided diagnosis on medical image using large language models. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.07257) [code](https:\u002F\u002Fgithub.com\u002Fzhaozh10\u002FChatCAD)\n\n### Medical Education\n* [**JMIR, 2023**] Large Language Models in Medical Education: Opportunities, Challenges, and Future Directions. [paper](https:\u002F\u002Fmededu.jmir.org\u002F2023\u002F1\u002Fe48291\u002F)\n* [**JMIR, 2023**] The Advent of Generative Language Models in Medical Education. [paper](https:\u002F\u002Fmededu.jmir.org\u002F2023\u002F1\u002Fe48163)\n* [**Korean J Med Educ., 2023**] The impending impacts of large language models on medical education. [paper](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC10020064\u002F)\n* [**Healthcare, 2023**]Leveraging Generative AI and Large Language Models: A Comprehensive Roadmap for Healthcare Integration. [paper](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC10606429\u002F)\n\n### Medical Robotics\n* [**ICARM, 2023**] A Nested U-Structure for Instrument Segmentation in Robotic Surgery. [paper](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10218893\u002F)\n* [**Appl. Sci., 2023**] The multi-trip autonomous mobile robot scheduling problem with time windows in a stochastic environment at smart hospitals. [paper](https:\u002F\u002Fwww.mdpi.com\u002F2076-3417\u002F13\u002F17\u002F9879)\n* [**Arxiv, 2023**] GRID: Scene-Graph-based Instruction-driven Robotic Task Planning. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.07726)\n* [**I3CE, 2023**] Trust in Construction AI-Powered Collaborative Robots: A Qualitative Empirical Analysis. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.14846)\n* [**STAR, 2016**] Advanced robotics for medical rehabilitation.  [paper](https:\u002F\u002Flink.springer.com\u002Fcontent\u002Fpdf\u002F10.1007\u002F978-3-319-19896-5.pdf)\n\n### Medical Language Translation\n* [**New Biotechnology, 2023**] Machine translation of standardised medical terminology using natural language processing: A Scoping Review. [paper](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1871678423000432)\n* [**JMIR, 2023**] The Advent of Generative Language Models in Medical Education. [paper](https:\u002F\u002Fmededu.jmir.org\u002F2023\u002F1\u002Fe48163)\n\n### Mental Health Support\n* [**Arxiv, 2024**] Large Language Models in Mental Health Care: a Scoping Review. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.02984)\n* [**Arxiv, 2023**] PsyChat: A Client-Centric Dialogue System for Mental Health Support. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.04262) [code](https:\u002F\u002Fgithub.com\u002Fqiuhuachuan\u002FPsyChat)\n* [**Arxiv, 2023**] Benefits and Harms of Large Language Models in Digital Mental Health. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.14693)\n* [**CIKM, 2023**] ChatCounselor: A Large Language Models for Mental Health Support. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.15461) [code](https:\u002F\u002Fgithub.com\u002FEmoCareAI\u002FChatPsychiatrist)\n* [**HCII, 2023**] Tell me, what are you most afraid of? Exploring the Effects of Agent Representation on Information Disclosure in Human-Chatbot Interaction. [paper](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-031-35894-4_13)\n* [**IJSR, 2023**] A Brief Wellbeing Training Session Delivered by a Humanoid Social Robot: A Pilot Randomized Controlled Trial. [paper](https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.1007\u002Fs12369-023-01054-5)\n* [**CHB, 2015**] Real conversations with artificial intelligence: A comparison between human–human online conversations and human–chatbot conversations. [paper](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS0747563215001247)\n\n## ⚔️ Practical Guide for Challenges\n\n\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_4aa6a2699bae.png\" width=\"800px\">\n\u003C\u002Fdiv>\n\n### Hallucination\n* [**Arxiv, 2025**] Medical Hallucination in Foundation Models and Their Impact on Healthcare. [paper](https:\u002F\u002Fwww.medrxiv.org\u002Fcontent\u002Fmedrxiv\u002Fearly\u002F2025\u002F03\u002F03\u002F2025.02.28.25323115.full.pdf) [Code](https:\u002F\u002Fgithub.com\u002Fmitmedialab\u002Fmedical_hallucination)\n* [**Arxiv, 2024**] Chain-of-verification reduces hallucination in large language models. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11495)\n* [**ACM Computing Surveys, 2023**] Survey of hallucination in natural language generation. [paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3571730)\n* [**EMNLP, 2023**] Med-halt: Medical domain hallucination test for large language models. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.15343)\n* [**Arxiv, 2023**] A survey of hallucination in large foundation models. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.05922) [code](https:\u002F\u002Fgithub.com\u002Fvr25\u002Fhallucination-foundation-model-survey)\n* [**EMNLP, 2023**] Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.08896)\n* [**EMNLP Findings, 2021**] Retrieval augmentation reduces hallucination in conversation. 2021. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.07567)\n\n\n### Lack of Evaluation Benchmarks and Metrics\n* [**Arxiv, 2025.06**] LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation [Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.04078) [Code](https:\u002F\u002Fgithub.com\u002Fllmeval\u002FLLMEval-Med)\n* [**Arxiv, 2025.05**] HealthBench: Evaluating Large Language Models Towards Improved Human Health [Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.08775) [Code](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fsimple-evals)\n* [**Blog, 2024.11**] SymptomCheck Bench. [blog](https:\u002F\u002Fmedask.tech\u002Fblogs\u002Fintroducing-symptomcheck-bench\u002F) [code](https:\u002F\u002Fgithub.com\u002Fmedaks\u002Fsymptomcheck-bench)\n* [**EMNLP, 2024**] A Metric for Radiology Report Generation. [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.16845)\n* [**Arxiv, 2024**] GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.03361)\n* [**Arxiv, 2024**] Large Language Models in the Clinic: A Comprehensive Benchmark. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.00716) [code](https:\u002F\u002Fgithub.com\u002FAI-in-Health\u002FClinicBench)\n* [**Nature Reviews Bioengineering, 2023**] Benchmarking medical large language models. [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs44222-023-00097-7)\n* [**Bioinformatics, 2023**] An extensive benchmark study on biomedical text generation and mining with ChatGPT. [paper](https:\u002F\u002Facademic.oup.com\u002Fbioinformatics\u002Farticle\u002F39\u002F9\u002Fbtad557\u002F7264174)\n* [**Arxiv, 2023**] Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16326)\n* [**ACL, 2023**] HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models. [paper](https:\u002F\u002Fui.adsabs.harvard.edu\u002Fabs\u002F2023arXiv230511747L\u002Fabstract) [code](https:\u002F\u002Fgithub.com\u002FRUCAIBox\u002FHaluEval)\n* [**ACL, 2022**] Truthfulqa: Measuring how models mimic human falsehoods. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.07958)\n* [**Appl. Sci, 2021**] What disease does this patient have? a large-scale open domain question answering dataset from medical exams. [paper](https:\u002F\u002Fwww.mdpi.com\u002F2076-3417\u002F11\u002F14\u002F6421)\n\n### Domain Data Limitations\n* [**Arxiv, 2023**] Textbooks Are All You Need. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.11644)\n* [**Arxiv, 2023**] Model Dementia: Generated Data Makes Models Forget. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17493)\n\n### New Knowledge Adaptation\n* [**ACL Findings, 2023**] Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17553)\n* [**EMNLP, 2023**] Editing Large Language Models: Problems, Methods, and Opportunities. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13172)\n* [**NeurIPS, 2020**] Retrieval-augmented generation for knowledge-intensive nlp tasks. [paper](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2020\u002Fhash\u002F6b493230205f780e1bc26945df7481e5-Abstract.html)\n\n### Behavior Alignment\n* [**JMIR Medical Education, 2023**] Differentiate ChatGPT-generated and Human-written Medical Texts. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.11567)\n* [**Arxiv, 2023**] Languages are rewards: Hindsight finetuning using human feedback. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.02676) [code](https:\u002F\u002Fgithub.com\u002Flhao499\u002Fchain-of-hindsight)\n* [**Arxiv, 2022**] Training a helpful and harmless assistant with reinforcement learning from human feedback. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.05862) [code](https:\u002F\u002Fgithub.com\u002Fanthropics\u002Fhh-rlhf)\n* [**Arxiv, 2022**] Improving alignment of dialogue agents via targeted human judgements. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.14375)\n* [**ICLR, 2021**] Aligning AI with shared human values.  [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2008.02275) [code](https:\u002F\u002Fgithub.com\u002Fhendrycks\u002Fethics\u002F)\n* [**Arxiv, 2021.12**] Webgpt: Browser-assisted question-answering with human feedback. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.09332)\n\n\n### Ethical, Legal, and Safety Concerns\n* [**Arxiv, 2023.10**] A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.05694)\n* [**Arxiv, 2023.8**] \"Do Anything Now\": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.03825) [code](https:\u002F\u002Fgithub.com\u002Fverazuo\u002Fjailbreak_llms)\n* [**NeurIPS, 2023.7**] Jailbroken: How does llm safety training fail?. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.02483)\n* [**EMNLP, 2023.4**] Multi-step jailbreaking privacy attacks on chatgpt. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.05197)\n* [**Healthcare, 2023.3**] ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. [paper](https:\u002F\u002Fwww.mdpi.com\u002F2227-9032\u002F11\u002F6\u002F887)\n* [**Nature News, 2023.1**] ChatGPT listed as author on research papers: many scientists disapprove. [paper](https:\u002F\u002Fui.adsabs.harvard.edu\u002Fabs\u002F2023Natur.613..620S\u002Fabstract)\n\n## 🚀 Practical Guide for Future Directions\n\n\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_3f4b89237820.png\" width=\"800px\">\n\u003C\u002Fdiv>\n\n### Introduction of New Benchmarks\n* [**EMNLP, 2024.11**] Large Language Models Are Poor Clinical Decision-Makers: A Comprehensive Benchmark. [paper](https:\u002F\u002Faclanthology.org\u002F2024.emnlp-main.759.pdf) [code](https:\u002F\u002Fgithub.com\u002FAI-in-Health\u002FClinicBench)\n* [**Blog, 2024.11**] SymptomCheck Bench. [blog](https:\u002F\u002Fmedask.tech\u002Fblogs\u002Fintroducing-symptomcheck-bench\u002F) [code](https:\u002F\u002Fgithub.com\u002Fmedaks\u002Fsymptomcheck-bench)\n* [**Nature Communications, 2024.9**] **MMed-Llama3**: Towards building multilingual language model for medicine. [[paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41467-024-52417-z) [[code]](https:\u002F\u002Fgithub.com\u002FMAGIC-AI4Med\u002FMMedLM) [[huggingface]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FHenrychur\u002FMMedBench)\n* [**Arxiv, 2023.12**] Designing Guiding Principles for NLP for Healthcare: A Case Study of Maternal Health. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.11803)\n* [**JCO CCI, 2023**] Natural language processing to automatically extract the presence and severity of esophagitis in notes of patients undergoing radiotherapy. [[paper]]( https:\u002F\u002Fpubmed.ncbi.nlm.nih.gov\u002F37506330\u002F) [[code]]( https:\u002F\u002Fgithub.com\u002FAIM-Harvard\u002FEso_alpha)\n* [**JAMA ONC, 2023**] Use of Artificial Intelligence Chatbots for Cancer Treatment Information. [[paper]]( https:\u002F\u002Fjamanetwork.com\u002Fjournals\u002Fjamaoncology\u002Ffullarticle\u002F2808731) [[code]]( https:\u002F\u002Fgithub.com\u002FAIM-Harvard\u002FChatGPT_NCCN)\n* [**BioRxiv, 2023**] A comprehensive benchmark study on biomedical text generation and mining with ChatGPT. [paper](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.04.19.537463.abstract)\n* [**JAMA, 2023**] Creation and adoption of large language models in medicine. [paper](https:\u002F\u002Fjamanetwork.com\u002Fjournals\u002Fjama\u002Farticle-abstract\u002F2808296)\n* [**Arxiv, 2023**] Large Language Models in Sport Science & Medicine: Opportunities, Risks and Considerations. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.03851)\n\n\n### Interdisciplinary Collaborations\n* [**JAMA, 2023**] Creation and adoption of large language models in medicine. 2023. [paper](https:\u002F\u002Fjamanetwork.com\u002Fjournals\u002Fjama\u002Farticle-abstract\u002F2808296)\n* [**JAMA Forum, 2023**] ChatGPT and Physicians' Malpractice Risk. [paper](https:\u002F\u002Fjamanetwork.com\u002Fjournals\u002Fjama-health-forum\u002Ffullarticle\u002F2805334)\n\n### Multi-modal LLM\n* [**TPAMI, 2025**] Aligning, Autoencoding and Prompting Large Language Models for Novel Disease Reporting [paper](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1Q8y9iQA3aw_4u98YkgaTas7I9IGfkEJM\u002Fview) [code](https:\u002F\u002Fgithub.com\u002Fai-in-health\u002FPromptLLM)\n* [**npj Digital Medicine, 2025**] A multimodal multidomain multilingual medical foundation model for zero shot clinical diagnosis [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-024-01339-7) [code](https:\u002F\u002Fgithub.com\u002Fai-in-health\u002FM3FM)\n* [**Nature Medicine, 2024**] **BiomedGPT** A generalist vision–language foundation model for diverse biomedical tasks [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41591-024-03185-2)\n* [**Arxiv, 2023**] VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.04992)\n* [**Arxiv, 2023**] A Survey on Multimodal Large Language Models. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.13549)\n* [**Arxiv, 2023**] Mm-react: Prompting chatgpt for multimodal reasoning and action. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.11381)\n* [**Int J Oral Sci, 2023**] ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model. [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41368-023-00239-y)\n* [**MIDL, 2023**] Frozen Language Model Helps ECG Zero-Shot Learning. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.12311)\n* [**Arxiv, 2023**] Exploring and Characterizing Large Language Models For Embedded System Development and Debugging. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.03817)\n\u003C!-- * Holistic Evaluation of GPT-4V for Biomedical Imaging. 2023. [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.05256v1.pdf) -->\n\n### Medical Agents\n* [**Arxiv, 2025**] A Co-evolving Agentic AI System for Medical Imaging Analysis [Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.20279) [Code](https:\u002F\u002Fgithub.com\u002Fzhihuanglab\u002FTissueLab)\n* [**Arxiv, 2024**] MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.14654) [code](https:\u002F\u002Fgithub.com\u002Fstanfordmlgroup\u002FMedAgentBench)\n* [**Arxiv, 2023**] The Rise and Potential of Large Language Model Based Agents: A Survey. [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.07864)\n* [**Arxiv, 2023**] MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.10537) [code](https:\u002F\u002Fgithub.com\u002Fgersteinlab\u002FMedAgents)\n* [**Arxiv, 2023**] GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information.  [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.09667) [code](https:\u002F\u002Fgithub.com\u002Fncbi\u002FGeneGPT)\n* [**MedRxiv, 2023**] OpenMedCalc: Augmentation of ChatGPT with Clinician-Informed Tools Improves Performance on Medical Calculation Tasks. [paper](https:\u002F\u002Fwww.medrxiv.org\u002Fcontent\u002F10.1101\u002F2023.12.13.23299881v1)\n* [**NEJM AI, 2024**] Almanac — Retrieval-Augmented Language Models for Clinical Medicine. [paper](https:\u002F\u002Fai.nejm.org\u002Fdoi\u002Ffull\u002F10.1056\u002FAIoa2300068)\n* [**Arxiv, 2024**] ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.14777)\n* [**Arxiv, 2024**] AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.07960)\n* [**Arxiv, 2024**] MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making. [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.15155) [code](https:\u002F\u002Fgithub.com\u002Fmitmedialab\u002FMDAgents)\n* [**Arxiv 2024**] MediQ: Question-Asking LLMs for Adaptive and Reliable Clinical Reasoning. [[paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.00922) [[code]](https:\u002F\u002Fgithub.com\u002Fstellalisy\u002FMediQ).\n\n## 👍 Acknowledgement\n* [LLMs Practical Guide](https:\u002F\u002Fgithub.com\u002FMooler0410\u002FLLMsPracticalGuide). The codebase we built upon and it is a comprehensive LLM survey.\n* [Large AI Survey](https:\u002F\u002Fieeexplore.ieee.org\u002Fstamp\u002Fstamp.jsp?arnumber=10261199&tag=1). Large AI Models in Health Informatics: Applications, Challenges, and the Future.\n* [Nature Medicine](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41591-023-02448-8). A Survey of the Large language models in medicine.\n* [Healthcare LLMs Survey](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.05694). A Survey of Large Language Models for Healthcare.\n\n\n## 📑 Citation\n\nPlease consider citing 📑 our papers if our repository is helpful to your work, thanks sincerely!\n\n```bibtex\n@article{liu2025application,\n  title={Application of large language models in medicine},\n  author={Fenglin Liu, Hongjian Zhou, Boyang Gu, Xinyu Zou, Jinfa Huang, Jinge Wu, Yiru Li, Sam S. Chen, Yining Hua, Peilin Zhou, Junling Liu, Chengfeng Mao, Chenyu You, Xian Wu, Yefeng Zheng, Lei Clifton, Zheng Li, Jiebo Luo, David A. Clifton },\n  journal={Nature Reviews Bioengineering},\n  year={2025}\n}\n\n@article{zhou2023survey,\n  title={A Survey of Large Language Models in Medicine: Progress, Application, and Challenge},\n  author={Hongjian Zhou, Fenglin Liu, Boyang Gu, Xinyu Zou, Jinfa Huang, Jinge Wu, Yiru Li, Sam S. Chen, Peilin Zhou, Junling Liu, Yining Hua, Chengfeng Mao, Xian Wu, Yefeng Zheng, Lei Clifton, Zheng Li, Jiebo Luo, David A. Clifton},\n  journal={arXiv preprint arXiv:2311.05112},\n  year={2023}\n}\n```\n\n## ♥️ Contributors\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FAI-in-Health\u002FMedLLMsPracticalGuide\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_b3898e5232ef.png\" \u002F>\n\u003C\u002Fa>\n\n","\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_e54a88f23569.png\" width=\"180px\">\n\u003C\u002Fdiv>\n\u003Ch2 align=\"center\">\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.05112.pdf\"> [Nature Reviews Bioengineering] 医疗大型语言模型实用指南 \u003C\u002Fa>\u003C\u002Fh2>\n\u003Ch5 align=\"center\"> 如果您喜欢我们的项目，请在 GitHub 上为我们点亮一颗星 ⭐，以获取最新更新。\u003C\u002Fh5>\n\n\u003Ch5 align=\"center\">\n\n\n   [![Awesome](https:\u002F\u002Fawesome.re\u002Fbadge.svg)](https:\u002F\u002Fawesome.re)\n   [![arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2311.05112-red)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.05112.pdf)\n   [![twitter](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTwitter%40elvis%20-black?logo=twitter&logoColor=1D9BF0&color=black&link=https%3A%2F%2Ftwitter.com%2Fomarsar0%2Fstatus%2F1734599425568231513%3Fs%3D61%26t%3D8Li3X-wK0wxSRkHjNK7Pfw)](https:\u002F\u002Fx.com\u002Fomarsar0\u002Fstatus\u002F1734599425568231513?s=20)\n   [![TechBeat](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F将门创投%20-black)](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002FgV3HHkVQXgR-Cego1P0ZBQ)\n   [![YouTube](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-YouTube-000000?logo=youtube&logoColor=FF0000)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=mSlKPzmW3Ac&t=23s)\n   ![GitHub Repo stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAI-in-Health\u002FMedLLMsPracticalGuide?logoColor=%23C8A2C8&color=%23DCC6E0)\n\n\n\u003C\u002Fh5>\n\n这是一个不断更新的医疗大型语言模型（Medical LLMs）实用指南资源列表。\n它基于我们的综述论文：\n\n> [Nature Reviews Bioengineering🔥]\n> [大型语言模型在医学中的应用](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs44222-025-00279-5)\n\n> [arXiv 预印本]\n> [医学领域大型语言模型的综述：进展、应用与挑战](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05112)\n> \n> *周洪建\u003Csup>1,\\*\u003C\u002Fsup>, 刘凤林\u003Csup>1,\\*\u003C\u002Fsup>, 顾博洋\u003Csup>2,\\*\u003C\u002Fsup>, 邹欣宇\u003Csup>3,\\*\u003C\u002Fsup>, 黄金发\u003Csup>4,\\*\u003C\u002Fsup>, 吴金格\u003Csup>5\u003C\u002Fsup>, 李怡如\u003Csup>6\u003C\u002Fsup>, Sam S. Chen\u003Csup>7\u003C\u002Fsup>, 周培琳\u003Csup>8\u003C\u002Fsup>, 刘俊玲\u003Csup>9\u003C\u002Fsup>, 华一宁\u003Csup>10\u003C\u002Fsup>,\n毛成峰\u003Csup>11\u003C\u002Fsup>, 游晨宇\u003Csup>12\u003C\u002Fsup>, 武贤\u003Csup>13\u003C\u002Fsup>, 郑业峰\u003Csup>13\u003C\u002Fsup>, Lei Clifton\u003Csup>1\u003C\u002Fsup>,\n李征\u003Csup>14,†\u003C\u002Fsup>, 罗杰波\u003Csup>4,†\u003C\u002Fsup>, \n大卫·A·克利夫顿\u003Csup>1,†\u003C\u002Fsup>.* (\\*核心贡献者, †通讯作者)\n\n> *\u003Csup>1\u003C\u002Fsup>牛津大学, \u003Csup>2\u003C\u002Fsup>帝国理工学院伦敦, \u003Csup>3\u003C\u002Fsup>滑铁卢大学,\n\u003Csup>4\u003C\u002Fsup>罗切斯特大学, \u003Csup>5\u003C\u002Fsup>伦敦大学学院, \u003Csup>6\u003C\u002Fsup>西安大略大学,\n\u003Csup>7\u003C\u002Fsup>佐治亚大学, \u003Csup>8\u003C\u002Fsup>香港科技大学（广州）,\n\u003Csup>9\u003C\u002Fsup>阿里巴巴, \u003Csup>10\u003C\u002Fsup>哈佛 T.H. 钱公共卫生学院, \u003Csup>11\u003C\u002Fsup>麻省理工学院, \u003Csup>12\u003C\u002Fsup>耶鲁大学, \u003Csup>13\u003C\u002Fsup>腾讯, \u003Csup>14\u003C\u002Fsup>亚马逊*\n\n##  📣 最新消息\n[2025-04-08] 🎉🎉🎉 我们的论文已正式发表于 Nature Reviews Bioengineering，GitHub 仓库的星标数也达到了 1,500 颗！\n\n\u003C!--\n[2024-10-11] 🎉🎉🎉 大喜讯！我们的仓库已获得 1,000 颗星。感谢每一位贡献者。\n\n[2024-07-10] 我们已更新至 [版本 6](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05112)。感谢大家的支持！\n\n[2024-05-05] 我们已更新至 [版本 5](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05112)。欢迎大家查看！\n\n[2024-03-03] 我们已更新至 [版本 4](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05112)。请大家关注！\n\n[2024-02-04] 🍻🍻🍻 祝贺！春节快乐！我们已更新至 [版本 3](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05112)。欢迎大家查看！\n\n[2023-12-11] 我们已更新至综述 [版本 2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05112)。请大家查阅！\n-->\n\n[2023-11-09] 我们发布了该仓库及 [综述](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05112)。\n\n## ⚡ 贡献方式\n\n如果您希望将自己的工作或模型添加到此列表中，请随时发送邮件至 fenglin.liu@eng.ox.ac.uk 和 jhuang90@ur.rochester.edu，或提交 [拉取请求]([https:\u002F\u002Fgithub.com\u002Frichard-peng-xia\u002Fawesome-multimodal-in-medical-imaging\u002Fpulls](https:\u002F\u002Fgithub.com\u002FAI-in-Health\u002FMedLLMsPracticalGuide\u002Fpulls))。\nMarkdown 格式如下：\n\n```markdown\n* [**会议或期刊名称 + 年份**] 论文标题。[[论文]](链接) [[代码]](链接)\n```\n## 🤔 医疗 LLM 的目标是什么？\n\n**目标 1：超越人类专家水平**。\n\n\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_ad4bc945cf49.png\" width=\"800px\">\n\u003C\u002Fdiv>\n\n**目标 2：随着模型规模扩大，医疗 LLM 出现涌现性特征**。\n\n\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_6657bf9b6091.png\" width=\"800px\">\n\u003C\u002Fdiv>\n\n\n## 🤗 这篇综述讲的是什么？\n本综述全面概述了大型语言模型在医学领域的原理、应用及面临的挑战。我们重点探讨了以下几个具体问题：\n1. 医疗 LLM 应该如何构建？\n2. 如何评估医疗 LLM 的下游性能？\n3. 医疗 LLM 应如何应用于实际临床实践？\n4. 使用医疗 LLM 会面临哪些挑战？\n5. 我们应如何更好地构建和利用医疗 LLM？\n\n本综述旨在为读者提供关于大型语言模型在医学领域机遇与挑战的深入见解，并作为构建高效医疗 LLM 的实用参考资源。\n\n\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_6fa2df387eed.png\" width=\"800px\">\n\u003C\u002Fdiv>\n\n## 目录\n- [📣 更新消息](#-update-news)\n- [⚡ 贡献](#-contributing)\n- [🤔 医疗大模型的目标是什么？](#-what-are-the-goals-of-the-medical-llm)\n- [🤗 本次调查的主题是什么？](#-what-is-this-survey-about)\n- [目录](#table-of-contents)\n- [🔥 构建流程实用指南](#-practical-guide-for-building-pipeline)\n  - [从头开始预训练](#pre-training-from-scratch)\n  - [微调通用大模型](#fine-tuning-general-llms)\n  - [提示通用大模型](#prompting-general-llms)\n- [📊 医疗数据实用指南](#-practical-guide-for-medical-data)\n  - [临床知识库](#clinical-knowledge-bases)\n  - [预训练数据](#pre-training-data)\n  - [微调数据](#fine-tuning-data)\n- [🗂️ 下游生物医学任务](#️-downstream-biomedical-tasks)\n  - [Huggingface排行榜](#huggingface-leadboard)\n  - [生成式任务](#generative-tasks)\n    - [文本摘要](#text-summarization)\n    - [文本简化](#text-simplification)\n    - [问答](#question-answering)\n  - [判别式任务](#discriminative-tasks)\n    - [实体抽取](#entity-extraction)\n    - [关系抽取](#relation-extraction)\n    - [文本分类](#text-classification)\n    - [自然语言推理](#natural-language-inference)\n    - [语义文本相似度](#semantic-textual-similarity)\n    - [信息检索](#information-retrieval)\n- [✨ 临床应用实用指南](#-practical-guide-for-clinical-applications)\n  - [检索增强生成](#retrieval-augmented-generation)\n  - [医疗决策](#medical-decision-making)\n  - [临床编码](#clinical-coding)\n  - [临床报告生成](#clinical-report-generation)\n  - [医学教育](#medical-education)\n  - [医疗机器人](#medical-robotics)\n  - [医学语言翻译](#medical-language-translation)\n  - [心理健康支持](#mental-health-support)\n- [⚔️ 挑战实用指南](#️-practical-guide-for-challenges)\n  - [幻觉问题](#hallucination)\n  - [缺乏评估基准和指标](#lack-of-evaluation-benchmarks-and-metrics)\n  - [领域数据局限性](#domain-data-limitations)\n  - [新知识适应](#new-knowledge-adaptation)\n  - [行为对齐](#behavior-alignment)\n  - [伦理、法律及安全顾虑](#ethical-legal-and-safety-concerns)\n- [🚀 未来发展方向实用指南](#-practical-guide-for-future-directions)\n  - [引入新的基准](#introduction-of-new-benchmarks)\n  - [跨学科合作](#interdisciplinary-collaborations)\n  - [多模态大模型](#multi-modal-llm)\n  - [医疗智能体](#medical-agents)\n- [👍 致谢](#-acknowledgement)\n- [📑 引用](#-citation)\n- [♥️ 贡献者](#️-contributors)\n\n## 🔥 构建流程实用指南\n\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_abc3f4e9fddd.png\" width=\"1000px\">\n\u003C\u002Fdiv>\n\n### 从头开始预训练\n* [**Nature Medicine, 2024**] **BiomedGPT** 一种用于多样化生物医学任务的通用视觉-语言基础模型 [论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41591-024-03185-2)\n* [**Nature, 2023**] **NYUTron** 健康系统规模的语言模型是全能的预测引擎 [论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-023-06160-y)\n* [**Arxiv, 2023**] **OphGLM**: 基于指令和对话训练的眼科大型语言-视觉助手。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.12174)\n* [**npj Digital Medicine, 2023**] **GatorTronGPT**: 关于用于医学研究和医疗保健的生成式大型语言模型的研究。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13523)\n* [**Bioinformatics, 2023**] **MedCPT**: 对比预训练的Transformer，结合大规模PubMed搜索日志，用于零样本生物医学信息检索。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.00589)\n* [**Bioinformatics, 2022**] **BioGPT**: 用于生物医学文本生成和挖掘的生成式预训练Transformer。[论文](https:\u002F\u002Facademic.oup.com\u002Fbib\u002Farticle-abstract\u002F23\u002F6\u002Fbbac409\u002F6713511)\n* [**NeurIPS, 2022**] **DRAGON**: 深度双向语言-知识图谱预训练。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.09338) [代码](https:\u002F\u002Fgithub.com\u002Fmichiyasunaga\u002Fdragon)\n* [**ACL, 2022**] **BioLinkBERT\u002FLinkBERT**: 使用文档链接进行语言模型预训练。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.15827) [代码](https:\u002F\u002Fgithub.com\u002Fmichiyasunaga\u002FLinkBERT)\n* [**npj Digital Medicine, 2022**] **GatorTron**: 一种用于电子健康记录的大语言模型。[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-022-00742-2)\n* [**HEALTH, 2021**] **PubMedBERT**: 面向生物医学自然语言处理的领域特定语言模型预训练。[论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3458754)\n* [**Bioinformatics, 2020**] **BioBERT**: 一种用于生物医学文本挖掘的预训练生物医学语言表示模型。[论文](https:\u002F\u002Facademic.oup.com\u002Fbioinformatics\u002Farticle-abstract\u002F36\u002F4\u002F1234\u002F5566506)\n* [**ENNLP, 2019**] **SciBERT**: 一种用于科学文本的预训练语言模型。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.10676)\n* [**NAACL研讨会，2019年**] **ClinicalBERT**: 公开可用的临床BERT嵌入。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.03323)\n* [**BioNLP研讨会，2019年**] **BlueBERT**: 生物医学自然语言处理中的迁移学习：在十个基准数据集上对BERT和ELMo的评估。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.05474)\n\n### 通用大语言模型的微调  \n* [**Nature Communications, 2024.9**] **MMed-Llama3**: 构建多语言医学语言模型。[[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41467-024-52417-z) [[代码]](https:\u002F\u002Fgithub.com\u002FMAGIC-AI4Med\u002FMMedLM)\n* [**Arxiv, 2024.8**] **Med42-v2**: 一系列临床大语言模型。[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.06142) [模型](https:\u002F\u002Fhuggingface.co\u002Fm42-health)\n* [**JAMIA, 2024.5**] **Internist.ai 7b** 高质量、跨领域数据对医学语言模型性能的影响 [论文](https:\u002F\u002Facademic.oup.com\u002Fjamia\u002Farticle-abstract\u002F31\u002F9\u002F1875\u002F7680487?redirectedFrom=fulltext) [模型](https:\u002F\u002Fhuggingface.co\u002Finternistai\u002Fbase-7b-v0.2)\n* [**Huggingface, 2024.5**] **OpenBioLLM-70b**: 推动开源大型语言模型在医疗领域的应用 [模型](https:\u002F\u002Fhuggingface.co\u002Faaditya\u002FLlama3-OpenBioLLM-70B)\n* [**Huggingface, 2024.5**] **MedLllama3** [模型](https:\u002F\u002Fhuggingface.co\u002FProbeMedicalYonseiMAILab\u002Fmedllama3-v20)\n* [**Arxiv, 2024.5**] **Aloe**: 一系列微调后的开源医疗大语言模型。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.01886) [模型](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FHPAI-BSC\u002Fhealthcare-llms-aloe-family-6701b6a777f7e874a2123363)\n* [**Arxiv, 2024.4**] **Med-Gemini** Gemini模型在医学领域的应用能力。[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.18416)\n* [**npj Digital Medicine, 2024**] **Meerkat**: 小型语言模型从医学教科书中学习增强的推理能力 [论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.00376)\n* [**Arxiv, 2024.2**] **BioMistral** 医疗领域的一系列开源预训练大型语言模型。[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.10373)\n* [**Arxiv, 2023.12**] **从初学者到专家**: 将医学知识融入通用大语言模型。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.01040)\n* [**Arxiv, 2023.11**] **Taiyi**: 一款面向多样化生物医学任务的双语微调大型语言模型。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.11608) [代码](https:\u002F\u002Fgithub.com\u002FDUTIR-BioNLP\u002FTaiyi-LLM)\n* [**Arxiv, 2023.10**] **AlpaCare**: 针对医疗应用的指令微调大型语言模型。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.14558) [代码](https:\u002F\u002Fgithub.com\u002FXZhang97666\u002FAlpaCare)\n* [**Arxiv, 2023.10**] **BianQue**: 通过ChatGPT润色的多轮健康对话，平衡健康类大语言模型的提问与建议能力。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.15896)\n* [**Arxiv, 2023.10**] **Qilin-Med**: 多阶段知识注入的先进医学大型语言模型。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.09089)\n* [**Arxiv, 2023.10**] **Qilin-Med-VL**: 向通用医疗领域的中文大型视觉-语言模型迈进。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.17956)\n* [**Arxiv, 2023.10**] **MEDITRON-70B**: 扩展大型语言模型的医学预训练规模。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.16079)\n* [**AAAI, 2024\u002F2023.10**] **Med42**: 评估医学大语言模型的微调策略：全参数与参数高效方法。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.14779) [模型](https:\u002F\u002Fhuggingface.co\u002Fm42-health\u002Fmed42-70b)\n* [**Arxiv, 2023.9**] **CPLLM**: 利用大型语言模型进行临床预测。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11295)\n* [**Arxiv, 2023.8**] **BioMedGPT\u002FOpenBioMed** 面向生物医学的开源多模态生成式预训练Transformer。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.09442) [代码](https:\u002F\u002Fgithub.com\u002FPharMolix\u002FOpenBioMed)\n* [**Nature Digital Medicine, 2023.8**] 大型语言模型用于识别电子健康记录中的社会健康决定因素。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.06354) [[代码]](https:\u002F\u002Fgithub.com\u002FAIM-Harvard\u002FSDoH)\n* [**Arxiv, 2023.8**] **Zhongjing**: 通过专家反馈和真实场景下的多轮对话，提升大型语言模型的中医能力。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.03549)\n* [**Arxiv, 2023.7**] **Med-Flamingo**: 多模态医学少样本学习模型。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.15189) [代码](https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fmed-flamingo)\n* [**Arxiv, 2023.6**] **ClinicalGPT**: 基于多样化的医疗数据并经过全面评估的大型语言模型微调。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.09968)\n* [**Cureus, 2023.6**] **ChatDoctor**: 一款基于Meta AI（LLaMA）大型语言模型，并结合医学领域知识进行微调的医疗聊天模型。[论文](https:\u002F\u002Fwww.cureus.com\u002Farticles\u002F152858-chatdoctor-a-medical-chat-model-fine-tuned-on-a-large-language-model-meta-ai-llama-using-medical-domain-knowledge.pdf)\n* [**NeurIPS 数据集\u002F基准测试赛道, 2023.6**] **LLaVA-Med**: 在一天内训练一个面向生物医学的大语言-视觉助手。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.00890)\n* [**Arxiv, 2023.6**] **MedPaLM 2**: 朝着利用大型语言模型实现专家级医学问答目标迈进。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.09617)\n* [**Arxiv, 2023.5**] **Clinical Camel**: 一款开源的专家级医学语言模型，采用对话式知识编码。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.12031)\n* [**Arxiv, 2023.5**] **BiomedGPT**: 一款面向多样化生物医学任务的通用视觉-语言基础模型。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17100)\n* [**Arxiv, 2023.5**] **HuatuoGPT**: HuatuoGPT，致力于将语言模型驯化为医生。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.15075)\n* [**Arxiv, 2023.4**] **Baize-healthcare**: 一款基于自我对话数据进行参数高效微调的开源聊天模型。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.01196)\n* [**Arxiv, 2023.4**] **Visual Med-Alpeca**: 一款具有视觉能力的参数高效生物医学语言模型。[GitHub](https:\u002F\u002Fgithub.com\u002Fcambridgeltl\u002Fvisual-med-alpaca)\n* [**Arxiv, 2023.4**] **PMC-LLaMA**: 进一步对LLaMA模型进行医学论文的微调。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.14454)\n* [**Arxiv, 2023.4**] **MedPaLM M**: 朝着通用生物医学人工智能迈进。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.14334) [代码](https:\u002F\u002Fgithub.com\u002Fkyegomez\u002FMed-PaLM)\n* [**Arxiv, 2023.4**] **BenTsao\u002FHuatuo**: 利用中医药知识对LLaMA模型进行微调。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.06975)\n* [**Github, 2023.4**] **ChatGLM-Med**: ChatGLM-Med: 基于中文医学知识的ChatGLM模型微调。[GitHub](https:\u002F\u002Fgithub.com\u002FSCIR-HI\u002FMed-ChatGLM)\n* [**Arxiv, 2023.4**] **DoctorGLM**: 微调你的中国医生并非难事。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.01097)\n\n### 通用大语言模型的提示工程\n* [**NEJM AI, 2024**] GPT-4在肿瘤学指南的信息检索与比较中的应用。[论文](https:\u002F\u002Fai.nejm.org\u002Fdoi\u002Fabs\u002F10.1056\u002FAIcs2300235)\n* [**Arxiv, 2023.11**] **MedPrompt**: 通用基础模型能否超越专用微调？以医学为例的研究。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.16452)\n* [**Arxiv, 2023.8**] **Dr. Knows**: 利用医学知识图谱增强大型语言模型的诊断预测能力。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.14321)\n* [**Arxiv, 2023.3**] **DelD-GPT**: 使用GPT-4实现零样本医学文本去标识化。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.11032) [代码](https:\u002F\u002Fgithub.com\u002Fyhydhx\u002FChatGPT-API)\n* [**Arxiv, 2023.2\u002F5**] **ChatCAD\u002FChatCAD+**: 基于大型语言模型的医学影像交互式辅助诊断系统。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.07257) [代码](https:\u002F\u002Fgithub.com\u002Fzhaozh10\u002FChatCAD)\n* [**Nature, 2022.12**] **MedPaLM**: 大型语言模型能够编码临床知识。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.13138)\n* [**Arxiv, 2022.7\u002F2023.12**] 大型语言模型能否对医学问题进行推理？[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.08143)\n\n\n## 📊 医疗数据实用指南\n\n### 临床知识库\n* **[Drugs.com](https:\u002F\u002Fwww.drugs.com\u002F)**\n* **[DrugBank](https:\u002F\u002Fgo.drugbank.com\u002F)**\n* **[NHS Health](https:\u002F\u002Fwww.nhs.uk\u002Fconditions\u002F)**\n* **[NHS Medicine](https:\u002F\u002Fwww.nhs.uk\u002Fmedicines\u002F)**\n* **[统一医学语言系统 (UMLS)](https:\u002F\u002Fwww.nlm.nih.gov\u002Fresearch\u002Fumls\u002Findex.html)**\n* **[人类表型本体](https:\u002F\u002Fhpo.jax.org\u002Fapp\u002F)**\n* **[疾病控制与预防中心](https:\u002F\u002Fwww.cdc.gov\u002F)**\n* **[英国国家健康与临床优化研究所](https:\u002F\u002Fwww.nice.org.uk\u002Fguidance)**\n* **[世界卫生组织](https:\u002F\u002Fwww.who.int\u002Fpublications\u002Fwho-guidelines)**\n\n### 预训练数据\n* [**NEJM AI, 2024**] 医疗人工智能与大型语言模型的临床文本数据集——系统综述 [论文](https:\u002F\u002Fai.nejm.org\u002Fdoi\u002Ffull\u002F10.1056\u002FAIra2400012)\n* [**npj Digital Medicine, 2023**] **EHRs**: 生成式大型语言模型在医学研究与医疗保健中的应用研究。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13523)\n* [**Arxiv, 2023**] **Guidelines**: 高质量的临床实践指南（CPGs）集合，用于大型语言模型的医学训练。[数据集](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fepfl-llm\u002Fguidelines)\n* [**Arxiv, 2023**] **GAP-REPLAY**: 扩展大型语言模型的医学预训练规模。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.16079)\n* [**npj Digital Medicine, 2022**] **EHRs**: 面向电子健康记录的大型语言模型。[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-022-00742-2)\n* [**美国国家医学图书馆, 2022**] **PubMed**: 美国国立卫生研究院的PubMed数据。[数据库](https:\u002F\u002Fpubmed.ncbi.nlm.nih.gov\u002Fdownload\u002F)\n* [**Arxiv, 2020**] **PubMed**: The Pile：一个包含多样化文本的800 GB语言建模数据集。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.00027) [代码](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Fthe-pile)\n* [**EMNLP, 2020**] **MedDialog**: Meddialog：两个大规模医学对话数据集。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.03329) [代码](https:\u002F\u002Fgithub.com\u002FUCSD-AI4H\u002FMedical-Dialogue-System)\n* [**NAACL, 2018**] **Literature**: 在Semantic Scholar中构建文献图谱。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1805.02262)\n* [**Scientific Data, 2016**] **MIMIC-III**: MIMIC-III，一个可自由访问的重症监护数据库。[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fsdata201635)\n\n### 微调数据\n* **MedBook-18-CoT**: 小型语言模型从医学教科书中学习增强的推理能力 [论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.00376) [Huggingface](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdmis-lab\u002Fmeerkat-instructions) \n* **MMedC**: 构建多语言医学语言模型。[[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41467-024-52417-z) [[代码]](https:\u002F\u002Fgithub.com\u002FMAGIC-AI4Med\u002FMMedLM) [[Huggingface]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FHenrychur\u002FMMedC)\n* **MedTrinity-25M**: 医学领域的大规模多模态数据集，包含多粒度标注。2024年。[github](https:\u002F\u002Fgithub.com\u002FUCSC-VLAA\u002FMedTrinity-25M) [论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.02900)\n* **cMeKG**: 中文医学知识图谱。2023年。[github](https:\u002F\u002Fgithub.com\u002Fking-yyf\u002FCMeKG_tools)\n* **CMD.**: 中文医学对话数据。2023年。[仓库](https:\u002F\u002Fgithub.com\u002FToyhom\u002FChinese-medical-dialogue-data)\n* **BianQueCorpus**: BianQue：通过ChatGPT润色的多轮健康对话，平衡健康LLM的提问与建议能力。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.15896)\n* **MD-EHR**: ClinicalGPT：基于多样化的医疗数据进行微调并进行全面评估的大语言模型。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.09968)\n* **VariousMedQA**: 用于中文医学问答选择的多尺度注意力交互网络。2018年。[论文](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8548603\u002F)\n* **VariousMedQA**: 这位患者患有何种疾病？来自医学考试的大规模开放域问答数据集。2021年。[论文](https:\u002F\u002Fwww.mdpi.com\u002F2076-3417\u002F11\u002F14\u002F6421)\n* **MedDialog**: Meddialog：两个大规模医学对话数据集。2020年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.03329)\n* **ChiMed**: Qilin-Med：多阶段知识注入的先进医学大语言模型。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.09089)\n* **ChiMed-VL**: Qilin-Med-VL：面向通用医疗领域的中文大型视觉-语言模型。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.17956)\n* **Healthcare Magic**: Healthcare Magic。[链接](https:\u002F\u002Fwww.healthcaremagic.com\u002F)，[Huggingface](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fwangrongsheng\u002FHealthCareMagic-100k-en)\n* **ICliniq**: ICliniq。[平台](https:\u002F\u002Fwww.icliniq.com\u002F)\n* **Hybrid SFT**: HuatuoGPT，旨在驯服语言模型成为医生。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.15075)\n* **PMC-15M**: 面向生物医学视觉-语言处理的大规模领域特定预训练。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.00915)\n* **MedQuAD**: 基于问题蕴涵的问答方法。2019年。[论文](https:\u002F\u002Fbmcbioinformatics.biomedcentral.com\u002Farticles\u002F10.1186\u002Fs12859-019-3119-4?ref=https:\u002F\u002Fgithubhelp.com)\n* **VariousMedQA**: Visual med-alpaca：具有视觉能力的参数高效生物医学LLM。2023年。[仓库](https:\u002F\u002Fgithub.com\u002Fcambridgeltl\u002Fvisual-med-alpaca)\n* **CMtMedQA**: Zhongjing：通过专家反馈和真实世界多轮对话提升大语言模型的中文医学能力。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.03549)\n* **MTB**: Med-flamingo：一种多模态医学少样本学习模型。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.15189)\n* **PMC-OA**: Pmc-clip：利用生物医学文献进行对比语言-图像预训练。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.07240)\n* **Medical Meadow**: MedAlpaca——一个开源的医学对话AI模型及训练数据集合。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.08247)\n* **Literature**: S2ORC：语义学者开放研究语料库。2019年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.02782)\n* **MedC-I**: Pmc-llama：在医学论文上进一步微调llama模型。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.14454)\n* **ShareGPT**: Sharegpt。2023年。[平台](https:\u002F\u002Fsharegpt.com\u002F)\n* **PubMed**: 美国国立卫生研究院。PubMed数据。位于美国国家医学图书馆。2022年。[数据库](https:\u002F\u002Fpubmed.ncbi.nlm.nih.gov\u002Fdownload\u002F)\n* **MedQA**: 这位患者患有何种疾病？来自医学考试的大规模开放域问答数据集。2021年。[论文](https:\u002F\u002Fwww.mdpi.com\u002F2076-3417\u002F11\u002F14\u002F6421)\n* **MultiMedQA**: 利用大语言模型实现专家级医学问答。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.09617)\n* **MultiMedBench**: 朝着通用生物医学AI方向发展。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.14334)\n* **MedInstruct-52**: 针对医疗应用的指令微调大语言模型。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.14558)\n* **eICU-CRD**: eicu合作研究数据库，一个免费提供的多中心重症监护研究数据库。2018年。[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fsdata2018178)\n* **MIMIC-IV**: MIMIC-IV，一个可自由访问的电子健康记录数据集。2023年。[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41597-022-01899-x) [数据库](https:\u002F\u002Fphysionet.org\u002Fcontent\u002Fmimiciv\u002F2.2\u002F)\n* **PMC-Patients**: 16.7万份公开的患者摘要。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.13876) [数据库](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fzhengyun21\u002FPMC-Patients)\n  \n## 🗂️ 下游生物医学任务\n\n\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_88518667ecfb.png\" width=\"800px\">\n\u003C\u002Fdiv>\n\n### Huggingface排行榜\n* **开放医学-LLM排行榜**: MedQA (USMLE)、PubMedQA、MedMCQA以及与医学和生物学相关的MMLU子集。[排行榜](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fopenlifescienceai\u002Fopen_medical_llm_leaderboard)\n* **ReXrank**: 一个用于AI驱动放射科报告生成的公开排行榜 [论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.15122\u002F) [[论文]] [[代码]] (https:\u002F\u002Fgithub.com\u002Frajpurkarlab\u002FReXrank)\n\n### 生成任务\n\n#### 文本摘要\n* **PubMed**: 美国国立卫生研究院。PubMed 数据。载于美国国家医学图书馆。[数据库](https:\u002F\u002Fpubmed.ncbi.nlm.nih.gov\u002Fdownload\u002F)\n* **PMC**: 美国国立卫生研究院。PubMed Central 数据。载于美国国家医学图书馆。[数据库](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002F)\n* **CORD-19**: Cord-19：2020 年新冠肺炎开放研究数据集。[论文](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC7251955\u002F)\n* **MentSum**: Mentsum：用于探索心理健康在线帖子摘要的资源，2022 年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.00856)\n* **MeQSum**: 关于消费者健康问题的摘要，2019 年。[论文](https:\u002F\u002Faclanthology.org\u002FP19-1215\u002F)\n* **MedQSum**: 提升大型语言模型在医疗问答中的实用性：一种患者健康问题摘要方法。[[论文]](https:\u002F\u002Fdoi.org\u002F10.1109\u002FSITA60746.2023.10373720) [[代码]](https:\u002F\u002Fgithub.com\u002Fzekaouinoureddine\u002FMedQSum)\n\n#### 文本简化\n* **MultiCochrane**: 多语种医学文本简化，2023 年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.12532)\n* **AutoMeTS**: AutoMeTS：用于医学文本简化的人工智能自动补全工具，2020 年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.10573)\n\n#### 问答任务\n* **CareQA**: CareQA：基于西班牙专科医疗培训准入考试（FSE）的多选题问答数据集。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.01886) [数据集](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FHPAI-BSC\u002FCareQA)\n* **BioASQ-QA**: BioASQ-QA：2023 年手动整理的生物医学问答语料库。[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41597-023-02068-4)\n* **emrQA**: emrqa：2018 年用于电子病历问答的大规模语料库。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.00732)\n* **CliCR**: CliCR：2018 年用于机器阅读理解的临床病例报告数据集。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.09720)\n* **PubMedQA**: Pubmedqa：2019 年用于生物医学研究问答的数据集。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1909.06146)\n* **COVID-QA**: COVID-QA：2020 年针对新冠肺炎的问答数据集。[论文](https:\u002F\u002Faclanthology.org\u002F2020.nlpcovid19-acl.18\u002F)\n* **MASH-QA**: 具有长跨度多答案的问答任务，2020 年。[论文](https:\u002F\u002Faclanthology.org\u002F2020.findings-emnlp.342\u002F)\n* **Health-QA**: 2019 年用于医疗问答的层次化注意力检索模型。[论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3308558.3313699)\n* **MedQA**: 这位患者患有何种疾病？2021 年来自医学考试的大规模开放域问答数据集。[论文](https:\u002F\u002Fwww.mdpi.com\u002F2076-3417\u002F11\u002F14\u002F6421)\n* **MedMCQA**: Medmcqa：2022 年用于医学领域问答的大规模多学科多项选择数据集。[论文](https:\u002F\u002Fproceedings.mlr.press\u002Fv174\u002Fpal22a.html)\n* **MMLU（临床知识）**: 测量大规模多任务语言理解能力，2020 年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.03300)\n* **MMLU（大学医学）**: 测量大规模多任务语言理解能力，2020 年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.03300)\n* **MMLU（专业医学）**: 测量大规模多任务语言理解能力，2020 年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.03300)\n* 【**ArXiv 2024**】MediQ：用于自适应且可靠的临床推理的问答型大语言模型。[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.00922) [[代码]](https:\u002F\u002Fgithub.com\u002Fstellalisy\u002FMediQ)\n* **EWS_v5_USONLY_final**: 紧急战地外科问答数据集（v5），2025 年。[数据](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FPaxrad\u002FEWS_v5_USONLY_final)\n\n### 区分性任务\n\n#### 实体抽取\n* [**Arxiv, 2024.10**] 命名临床实体识别基准 [论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.05046) [排行榜](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fm42-health\u002Fclinical_ner_leaderboard)\n* **NCBI疾病**：NCBI疾病语料库——用于疾病名称识别和概念归一化的资源，2014年。[论文](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046413001974)\n* **JNLPBA**：2004年JNLPBA生物实体识别任务介绍。[论文](https:\u002F\u002Faclanthology.org\u002FW04-1213.pdf)\n* **GENIA**：GENIA语料库——用于生物文本挖掘的语义标注语料库，2003年。[论文](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FJin-Dong-Kim-2\u002Fpublication\u002F10667350_GENIA_corpus-A_semantically_annotated_corpus_for_bio-textmining\u002Flinks\u002F00b49520d9a33ae419000000\u002FGENIA-corpus-A-semantically-annotated-corpus-for-bio-textmining.pdf)\n* **BC5CDR**：BioCreative V CDR任务语料库——用于化学疾病关系抽取的资源，2016年。[论文](https:\u002F\u002Facademic.oup.com\u002Fdatabase\u002Farticle-abstract\u002Fdoi\u002F10.1093\u002Fdatabase\u002Fbaw068\u002F2630414?ref=https%3A%2F%2Fgithubhelp.com&login=true)\n* **BC4CHEMD**：CHEMDNER化学与药物语料库及其标注原则，2015年。[论文](https:\u002F\u002Fjcheminf.biomedcentral.com\u002Farticles\u002F10.1186\u002F1758-2946-7-S1-S2)\n* **BioRED**：BioRED——丰富的生物医学关系抽取数据集，2022年。[论文](https:\u002F\u002Facademic.oup.com\u002Fbib\u002Farticle-abstract\u002F23\u002F5\u002Fbbac282\u002F6645993)\n* **CMeEE**：Cblue——中文生物医学语言理解评估基准，2021年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.08087)\n* **NLM-Chem-BC7**：NLM-Chem-BC7——用于生物医学文章中化学实体标注与索引的手动标注全文资源，2022年。[论文](https:\u002F\u002Facademic.oup.com\u002Fdatabase\u002Farticle-abstract\u002Fdoi\u002F10.1093\u002Fdatabase\u002Fbaac102\u002F6858529)\n* **ADE**：开发用于支持从医疗病例报告中自动提取药物相关不良反应的基准语料库，2012年。[论文](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046412000615)\n* **2012 i2b2**：评估临床文本中的时间关系：2012年i2b2挑战赛，2013年。[论文](https:\u002F\u002Facademic.oup.com\u002Fjamia\u002Farticle-abstract\u002F20\u002F5\u002F806\u002F726374)\n* **2014 i2b2\u002FUTHealth（赛道1）**：为去标识化而标注纵向临床叙述：2014年i2b2\u002FUTHealth语料库，2015年。[论文](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046415001823)\n* **2018 n2c2（赛道2）**：2018年n2c2电子健康记录中不良药物事件与用药信息抽取共享任务，2020年。[论文](https:\u002F\u002Facademic.oup.com\u002Fjamia\u002Farticle-abstract\u002F27\u002F1\u002F3\u002F5581277)\n* **Cadec**：Cadec——不良药物事件标注语料库，2015年。[论文](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046415000532)\n* **DDI**：Semeval-2013任务9：从生物医学文本中抽取药物-药物相互作用（ddiextraction 2013），2013年。[论文](https:\u002F\u002Fe-archivo.uc3m.es\u002Fhandle\u002F10016\u002F20455)\n* **PGR**：人类表型-基因关系银标准语料库，2019年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.10728)\n* **EU-ADR**：EU-ADR语料库——标注了药物、疾病、靶点及其相互关系，2012年。[论文](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046412000573)\n* [**BioCreative VII挑战赛，2021年**] 使用Transformer网络和多任务学习在推文中检测药物。[[论文]]( https:\u002F\u002Farxiv.org\u002Fabs\u002F2111.13726) [[代码]]( https:\u002F\u002Fgithub.com\u002FMachine-Learning-for-Medical-Language\u002FSMMH-Medication-Detection)\n\n#### 关系抽取\n* **BC5CDR**：BioCreative V CDR任务语料库——用于化学疾病关系抽取的资源，2016年。[论文](https:\u002F\u002Facademic.oup.com\u002Fdatabase\u002Farticle-abstract\u002Fdoi\u002F10.1093\u002Fdatabase\u002Fbaw068\u002F2630414?ref=https%3A%2F%2Fgithubhelp.com&login=true)\n* **BioRED**：BioRED——丰富的生物医学关系抽取数据集，2022年。[论文](https:\u002F\u002Facademic.oup.com\u002Fbib\u002Farticle-abstract\u002F23\u002F5\u002Fbbac282\u002F6645993)\n* **ADE**：开发用于支持从医疗病例报告中自动提取药物相关不良反应的基准语料库，2012年。[论文](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046412000615)\n* **2018 n2c2（赛道2）**：2018年n2c2电子健康记录中不良药物事件与用药信息抽取共享任务，2020年。[论文](https:\u002F\u002Facademic.oup.com\u002Fjamia\u002Farticle-abstract\u002F27\u002F1\u002F3\u002F5581277)\n* **2010 i2b2\u002FVA**：2010年i2b2\u002FVA临床文本中的概念、断言和关系挑战赛，2011年。[论文](https:\u002F\u002Facademic.oup.com\u002Fjamia\u002Farticle-abstract\u002F18\u002F5\u002F552\u002F830538)\n* **ChemProt**：2017年BioCreative VI化学-蛋白质相互作用赛道概述。[数据库](https:\u002F\u002Fbiocreative.bioinformatics.udel.edu\u002Fnews\u002Fcorpora\u002Fchemprot-corpus-biocreative-vi\u002F)\n* **GDA**：Renet——一种基于深度学习从文献中提取基因-疾病关联的方法，2019年。[论文](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-030-17083-7_17)\n* **DDI**：Semeval-2013任务9：从生物医学文本中抽取药物-药物相互作用（ddiextraction 2013），2013年。[论文](https:\u002F\u002Fe-archivo.uc3m.es\u002Fhandle\u002F10016\u002F20455)\n* **GAD**：遗传关联数据库，2004年。[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fng0504-431)\n* **2012 i2b2**：评估临床文本中的时间关系：2012年i2b2挑战赛，2013年。[论文](https:\u002F\u002Facademic.oup.com\u002Fjamia\u002Farticle-abstract\u002F20\u002F5\u002F806\u002F726374)\n* **PGR**：人类表型-基因关系银标准语料库，2019年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.10728)\n* **EU-ADR**：EU-ADR语料库——标注了药物、疾病、靶点及其相互关系，2012年。[论文](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046412000573)\n\n#### 文本分类\n* **OpiateID**: 识别社区社交媒体帖子中关于药物使用、滥用和成瘾的自我披露。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.09066) [代码](https:\u002F\u002Fgithub.com\u002Fyangalan123\u002FOpioidID)\n* **ADE**: 开发用于支持从医学病例报告中自动提取药物相关不良反应的基准语料库，2012年。[论文](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046412000615)\n* **2014 i2b2\u002FUTHealth (Track 2)**: 为去标识化标注纵向临床叙述：2014年i2b2\u002FUTHealth语料库，2015年。[论文](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1532046415001823)\n* **HoC**: 根据癌症标志物对科学文献进行自动语义分类，2016年。[论文](https:\u002F\u002Facademic.oup.com\u002Fbioinformatics\u002Farticle-abstract\u002F32\u002F3\u002F432\u002F1743783)\n* **OHSUMED**: OHSUMED：交互式检索评估及用于研究的新大型测试集，1994年。[论文](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-1-4471-2099-5_20)\n* **WNUT-2020 Task 2**: WNUT-2020任务2：识别具有信息量的COVID-19英文推文，2020年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.08232)\n* **Medical Abstracts**: 评估无监督文本分类：零样本和基于相似度的方法，2022年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.16285)\n* **MIMIC-III**: MIMIC-III，一个可自由访问的重症监护数据库，2016年。[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fsdata201635)\n\n#### 自然语言推理\n* **MedNLI**: 临床领域自然语言推理的经验教训，2018年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1808.06752)\n* **BioNLI**: BioNLI：利用词汇语义约束生成用于对抗性示例的生物医学NLI数据集，2022年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.14814)\n\n#### 语义文本相似度\n* **MedSTS**: MedSTS：用于临床语义文本相似度的资源，2020年。[论文](https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.1007\u002Fs10579-018-9431-1)\n* **2019 n2c2\u002FOHNLP**: 2019年n2c2\u002Fohnlp临床语义文本相似度赛道：概述，2020年。[论文](https:\u002F\u002Fmedinform.jmir.org\u002F2020\u002F11\u002Fe23375)\n* **BIOSSES**: BIOSSES：用于生物医学领域的语义句子相似度估计系统，2017年。[论文](https:\u002F\u002Facademic.oup.com\u002Fbioinformatics\u002Farticle-abstract\u002F33\u002F14\u002Fi49\u002F3953954)\n\n#### 信息检索\n* **TREC-COVID**: TREC-COVID：构建大流行信息检索测试集，2021年。[论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3451964.3451965)\n* **NFCorpus**: 用于医学信息检索的全文学习排序数据集，2016年。[论文](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-319-30671-1_58)\n* **BioASQ (BEIR)**: 用于信息检索模型零样本评估的异构基准，2021年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.08663)\n\n\n\n## ✨ 临床应用实用指南\n\n\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_def02b00fd14.png\" width=\"800px\">\n\u003C\u002Fdiv>\n\n### 检索增强生成\n* [**Arxiv, 2024**] 医学图RAG：通过图检索增强生成实现安全的医学大型语言模型。[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.04187v1)\n* [**NEJM AI, 2024**] GPT-4用于医学肿瘤学指南的信息检索与比较。[论文](https:\u002F\u002Fai.nejm.org\u002Fdoi\u002Fabs\u002F10.1056\u002FAIcs2300235)\n* [**Arxiv, 2023**] 思考与检索：一种基于假设知识图谱增强的医学大型语言模型。[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.15883.pdf)\n* [**JASN, 2023**] 检索、总结与验证：ChatGPT将如何影响从医学文献中获取信息？[论文](https:\u002F\u002Fjournals.lww.com\u002Fjasn\u002Ffulltext\u002F2023\u002F08000\u002Fretrieve,_summarize,_and_verify__how_will_chatgpt.4.aspx)\n\n### 医学决策\n* [**NAACL Findings, 2024**] 识别社区社交媒体帖子中关于药物使用、滥用和成瘾的自我披露。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.09066) [代码](https:\u002F\u002Fgithub.com\u002Fyangalan123\u002FOpioidID)\n* [**Nature, 2023**] **NYUTron** 健康系统规模的语言模型是通用预测引擎 [论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-023-06160-y)\n* [**Arxiv, 2023**] 将医学知识图谱融入大型语言模型以进行诊断预测。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.14321)\n* [**Arxiv, 2023**] ChatCAD+\u002FChatcad：利用大型语言模型在医学影像上进行交互式计算机辅助诊断。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.07257) [代码](https:\u002F\u002Fgithub.com\u002Fzhaozh10\u002FChatCAD)\n* [**Cancer Inform, 2023**] 设计一种基于深度学习的资源高效转移性乳腺癌诊断系统：减少临床诊断的长期延误并提高发展中国家患者的生存率。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.02597)\n* [**Nature Medicine, 2023**] 大型语言模型在医学中的应用。[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41591-023-02448-8)\n* [**Nature Medicine, 2022**] 人工智能在健康与医学中的应用。[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41591-021-01614-0)\n\n### 临床编码\n* [**NEJM AI, 2024**] 大型语言模型是糟糕的医疗编码员——医疗编码查询的基准测试。[论文](https:\u002F\u002Fai.nejm.org\u002Fdoi\u002Ffull\u002F10.1056\u002FAIdbp2300040)\n* [**JMAI, 2023**] 将大型语言模型人工智能应用于视网膜国际疾病分类（ICD）编码。[论文](https:\u002F\u002Fjmai.amegroups.org\u002Farticle\u002Fview\u002F8198\u002Fhtml)\n* [**ClinicalNLP Workshop, 2022**] PLM-ICD：使用预训练语言模型进行自动ICD编码。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.05289) [代码](https:\u002F\u002Fgithub.com\u002FMiuLab\u002FPLM-ICD)\n\n### 临床报告生成\n* [**《自然医学》，2024年**] 经过适配的大型语言模型在临床文本摘要生成方面可超越医学专家。[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41591-024-02855-5)\n* [**Arxiv，2023年**] GPT-4V（视觉）能否服务于医疗应用？GPT-4V在多模态医学诊断中的案例研究。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.09909)\n* [**Arxiv，2023年**] Qilin-Med-VL：面向通用医疗保健的中文大型视觉—语言模型。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.17956)\n* [**Arxiv，2023年**] 针对医学报告生成定制通用基础模型。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.05642)\n* [**Arxiv，2023年**] 向放射学领域的通用基础模型迈进。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.02463) [代码](https:\u002F\u002Fgithub.com\u002Fchaoyi-wu\u002FRadFM)\n* [**Arxiv，2023年**] 临床文本摘要生成：适配大型语言模型的表现可超越人类专家。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.07430) [项目](https:\u002F\u002Fstanfordmimi.github.io\u002Fclin-summ\u002F) [代码](https:\u002F\u002Fgithub.com\u002FStanfordMIMI\u002Fclin-summ)\n* [**Arxiv，2023年**] MAIRA-1：用于放射学报告生成的专业化大型多模态模型。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.13668) [项目](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fproject\u002Fproject-maira\u002F)\n* [**Arxiv，2023年**] 放射学报告生成中临床医生与专业基础模型之间的共识、分歧与协同作用。[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.18260.pdf)\n* [**《柳叶刀·数字健康》，2023年**] 使用ChatGPT撰写患者门诊信件。[论文](https:\u002F\u002Fwww.thelancet.com\u002Fjournals\u002Flandig\u002Farticle\u002FPIIS2589-7500(23)00048-1\u002Ffulltext)\n* [**《柳叶刀·数字健康》，2023年**] ChatGPT：出院小结的未来吗？[论文](https:\u002F\u002Fwww.thelancet.com\u002Fjournals\u002Flandig\u002Farticle\u002FPIIS2589-7500(23)00021-3\u002Ffulltext)\n* [**Arxiv，2023年2月5日**] **ChatCAD\u002FChatCAD+**：利用大型语言模型进行医学影像的交互式计算机辅助诊断。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.07257) [代码](https:\u002F\u002Fgithub.com\u002Fzhaozh10\u002FChatCAD)\n\n### 医学教育\n* [**JMIR，2023年**] 大型语言模型在医学教育中的机遇、挑战与未来方向。[论文](https:\u002F\u002Fmededu.jmir.org\u002F2023\u002F1\u002Fe48291\u002F)\n* [**JMIR，2023年**] 生成式语言模型在医学教育中的兴起。[论文](https:\u002F\u002Fmededu.jmir.org\u002F2023\u002F1\u002Fe48163)\n* [**韩国医学教育杂志，2023年**] 大型语言模型对医学教育的潜在影响。[论文](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC10020064\u002F)\n* [**Healthcare，2023年**] 利用生成式AI和大型语言模型：医疗整合的全面路线图。[论文](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC10606429\u002F)\n\n### 医疗机器人\n* [**ICARM，2023年**] 用于机器人手术中器械分割的嵌套U型结构。[论文](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10218893\u002F)\n* [**Appl. Sci.，2023年**] 智慧医院中随机环境下的带时间窗多趟次自主移动机器人调度问题。[论文](https:\u002F\u002Fwww.mdpi.com\u002F2076-3417\u002F13\u002F17\u002F9879)\n* [**Arxiv，2023年**] GRID：基于场景图的指令驱动型机器人任务规划。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.07726)\n* [**I3CE，2023年**] 对建筑领域人工智能协作机器人的信任：一项定性实证分析。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.14846)\n* [**STAR，2016年**] 用于医疗康复的先进机器人技术。[论文](https:\u002F\u002Flink.springer.com\u002Fcontent\u002Fpdf\u002F10.1007\u002F978-3-319-19896-5.pdf)\n\n### 医学语言翻译\n* [**New Biotechnology，2023年**] 基于自然语言处理的标准医学术语机器翻译：范围界定综述。[论文](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS1871678423000432)\n* [**JMIR，2023年**] 生成式语言模型在医学教育中的兴起。[论文](https:\u002F\u002Fmededu.jmir.org\u002F2023\u002F1\u002Fe48163)\n\n### 心理健康支持\n* [**Arxiv，2024年**] 大型语言模型在心理健康护理中的应用：范围界定综述。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.02984)\n* [**Arxiv，2023年**] PsyChat：以客户为中心的心理健康支持对话系统。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.04262) [代码](https:\u002F\u002Fgithub.com\u002Fqiuhuachuan\u002FPsyChat)\n* [**Arxiv，2023年**] 数字心理健康领域中大型语言模型的益处与危害。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.14693)\n* [**CIKM，2023年**] ChatCounselor：用于心理健康支持的大型语言模型。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.15461) [代码](https:\u002F\u002Fgithub.com\u002FEmoCareAI\u002FChatPsychiatrist)\n* [**HCII，2023年**] 告诉我，你最害怕什么？探讨人机聊天互动中主体表征对信息披露的影响。[论文](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-031-35894-4_13)\n* [**IJSR，2023年**] 由人形社交机器人提供的简短幸福感训练课程：一项试点随机对照试验。[论文](https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.1007\u002Fs12369-023-01054-5)\n* [**CHB，2015年**] 与人工智能的真实对话：人与人在线交流及人与聊天机器人交流的比较。[论文](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS0747563215001247)\n\n## ⚔️ 面临挑战的实用指南\n\n\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_4aa6a2699bae.png\" width=\"800px\">\n\u003C\u002Fdiv>\n\n### 幻觉问题\n* [**Arxiv，2025年**] 基础模型中的医学幻觉及其对医疗保健的影响。[论文](https:\u002F\u002Fwww.medrxiv.org\u002Fcontent\u002Fmedrxiv\u002Fearly\u002F2025\u002F03\u002F03\u002F2025.02.28.25323115.full.pdf) [代码](https:\u002F\u002Fgithub.com\u002Fmitmedialab\u002Fmedical_hallucination)\n* [**Arxiv，2024年**] 验证链可减少大型语言模型中的幻觉。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11495)\n* [**ACM计算综述，2023年**] 自然语言生成中的幻觉问题综述。[论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3571730)\n* [**EMNLP，2023年**] Med-halt：针对大型语言模型的医学领域幻觉测试。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.15343)\n* [**Arxiv，2023年**] 大型基础模型中的幻觉问题综述。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.05922) [代码](https:\u002F\u002Fgithub.com\u002Fvr25\u002Fhallucination-foundation-model-survey)\n* [**EMNLP，2023年**] Selfcheckgpt：面向生成式大型语言模型的零资源黑盒幻觉检测工具。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.08896)\n* [**EMNLP Findings，2021年**] 检索增强可减少对话中的幻觉。2021年。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.07567)\n\n### 评估基准与指标的缺乏\n* [**Arxiv, 2025.06**] LLMEval-Med：经医生验证的医疗大语言模型真实临床基准 [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.04078) [代码](https:\u002F\u002Fgithub.com\u002Fllmeval\u002FLLMEval-Med)\n* [**Arxiv, 2025.05**] HealthBench：面向改善人类健康的大型语言模型评估 [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.08775) [代码](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fsimple-evals)\n* [**博客, 2024.11**] SymptomCheck 基准。[博客](https:\u002F\u002Fmedask.tech\u002Fblogs\u002Fintroducing-symptomcheck-bench\u002F) [代码](https:\u002F\u002Fgithub.com\u002Fmedaks\u002Fsymptomcheck-bench)\n* [**EMNLP, 2024**] 放射科报告生成的评价指标。[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.16845)\n* [**Arxiv, 2024**] GMAI-MMBench：迈向通用医疗人工智能的综合性多模态评估基准。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.03361)\n* [**Arxiv, 2024**] 临床中的大型语言模型：一个全面的基准测试。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.00716) [代码](https:\u002F\u002Fgithub.com\u002FAI-in-Health\u002FClinicBench)\n* [**Nature Reviews Bioengineering, 2023**] 医疗领域大型语言模型的基准测试。[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs44222-023-00097-7)\n* [**Bioinformatics, 2023**] 关于使用 ChatGPT 进行生物医学文本生成和挖掘的大规模基准研究。[论文](https:\u002F\u002Facademic.oup.com\u002Fbioinformatics\u002Farticle\u002F39\u002F9\u002Fbtad557\u002F7264174)\n* [**Arxiv, 2023**] 大型语言模型在生物医学自然语言处理中的应用：基准、基线及建议。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16326)\n* [**ACL, 2023**] HaluEval：大型语言模型幻觉的大规模评估基准。[论文](https:\u002F\u002Fui.adsabs.harvard.edu\u002Fabs\u002F2023arXiv230511747L\u002Fabstract) [代码](https:\u002F\u002Fgithub.com\u002FRUCAIBox\u002FHaluEval)\n* [**ACL, 2022**] TruthfulQA：衡量模型如何模仿人类错误信息。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.07958)\n* [**Appl. Sci, 2021**] 这位患者患有何种疾病？来自医学考试的大规模开放域问答数据集。[论文](https:\u002F\u002Fwww.mdpi.com\u002F2076-3417\u002F11\u002F14\u002F6421)\n\n### 领域数据局限性\n* [**Arxiv, 2023**] 只需教科书即可。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.11644)\n* [**Arxiv, 2023**] 模型痴呆症：生成的数据使模型遗忘。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17493)\n\n### 新知识适应\n* [**ACL Findings, 2023**] 检测大型语言模型中的编辑失败：一种改进的特异性基准。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17553)\n* [**EMNLP, 2023**] 编辑大型语言模型：问题、方法与机遇。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13172)\n* [**NeurIPS, 2020**] 面向知识密集型 NLP 任务的检索增强生成。[论文](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2020\u002Fhash\u002F6b493230205f780e1bc26945df7481e5-Abstract.html)\n\n### 行为对齐\n* [**JMIR Medical Education, 2023**] 区分 ChatGPT 生成与人类撰写的医学文本。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.11567)\n* [**Arxiv, 2023**] 语言即奖励：利用人类反馈进行事后微调。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.02676) [代码](https:\u002F\u002Fgithub.com\u002Flhao499\u002Fchain-of-hindsight)\n* [**Arxiv, 2022**] 使用人类反馈强化学习训练有益且无害的助手。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.05862) [代码](https:\u002F\u002Fgithub.com\u002Fanthropics\u002Fhh-rlhf)\n* [**Arxiv, 2022**] 通过有针对性的人类判断改进对话代理的对齐性。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.14375)\n* [**ICLR, 2021**] 将 AI 与共同的人类价值观对齐。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2008.02275) [代码](https:\u002F\u002Fgithub.com\u002Fhendrycks\u002Fethics\u002F)\n* [**Arxiv, 2021.12**] WebGPT：借助浏览器和人类反馈进行问答。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.09332)\n\n### 伦理、法律与安全顾虑\n* [**Arxiv, 2023.10**] 医疗保健领域大型语言模型综述：从数据、技术与应用到问责制与伦理。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.05694)\n* [**Arxiv, 2023.8**] “立即做任何事”：表征并评估大型语言模型上的野外越狱提示。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.03825) [代码](https:\u002F\u002Fgithub.com\u002Fverazuo\u002Fjailbreak_llms)\n* [**NeurIPS, 2023.7**] 越狱：LLM 安全训练为何失效？[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.02483)\n* [**EMNLP, 2023.4**] ChatGPT 上的多步越狱隐私攻击。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.05197)\n* [**Healthcare, 2023.3**] ChatGPT 在医疗教育、研究和实践中的效用：关于其潜在前景与合理担忧的系统综述。[论文](https:\u002F\u002Fwww.mdpi.com\u002F2227-9032\u002F11\u002F6\u002F887)\n* [**Nature News, 2023.1**] ChatGPT 被列为研究论文作者：许多科学家表示反对。[论文](https:\u002F\u002Fui.adsabs.harvard.edu\u002Fabs\u002F2023Natur.613..620S\u002Fabstract)\n\n## 🚀 未来发展方向实用指南\n\n\u003Cdiv align=center>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_3f4b89237820.png\" width=\"800px\">\n\u003C\u002Fdiv>\n\n### 新基准的引入\n* [**EMNLP, 2024.11**] 大型语言模型是糟糕的临床决策者：一个全面的基准测试。[论文](https:\u002F\u002Faclanthology.org\u002F2024.emnlp-main.759.pdf) [代码](https:\u002F\u002Fgithub.com\u002FAI-in-Health\u002FClinicBench)\n* [**博客, 2024.11**] SymptomCheck 基准。[博客](https:\u002F\u002Fmedask.tech\u002Fblogs\u002Fintroducing-symptomcheck-bench\u002F) [代码](https:\u002F\u002Fgithub.com\u002Fmedaks\u002Fsymptomcheck-bench)\n* [**Nature Communications, 2024.9**] **MMed-Llama3**：迈向构建多语言医学语言模型。[[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41467-024-52417-z) [[代码]](https:\u002F\u002Fgithub.com\u002FMAGIC-AI4Med\u002FMMedLM) [[Hugging Face]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FHenrychur\u002FMMedBench)\n* [**Arxiv, 2023.12**] 为医疗保健领域的 NLP 设计指导原则：以母体健康为例。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.11803)\n* [**JCO CCI, 2023**] 利用自然语言处理自动提取接受放疗患者病历中食管炎的存在及其严重程度。[[论文]]( https:\u002F\u002Fpubmed.ncbi.nlm.nih.gov\u002F37506330\u002F) [[代码]]( https:\u002F\u002Fgithub.com\u002FAIM-Harvard\u002FEso_alpha)\n* [**JAMA ONC, 2023**] 使用人工智能聊天机器人提供癌症治疗信息。[[论文]]( https:\u002F\u002Fjamanetwork.com\u002Fjournals\u002Fjamaoncology\u002Ffullarticle\u002F2808731) [[代码]]( https:\u002F\u002Fgithub.com\u002FAIM-Harvard\u002FChatGPT_NCCN)\n* [**BioRxiv, 2023**] 关于使用 ChatGPT 进行生物医学文本生成和挖掘的全面基准研究。[论文](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.04.19.537463.abstract)\n* [**JAMA, 2023**] 大型语言模型在医学中的创建与应用。[论文](https:\u002F\u002Fjamanetwork.com\u002Fjournals\u002Fjama\u002Farticle-abstract\u002F2808296)\n* [**Arxiv, 2023**] 大型语言模型在运动科学与医学中的应用：机遇、风险与考量。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.03851)\n\n### 跨学科合作\n* [**JAMA, 2023**] 医学领域中大型语言模型的创建与应用。2023年。[论文](https:\u002F\u002Fjamanetwork.com\u002Fjournals\u002Fjama\u002Farticle-abstract\u002F2808296)\n* [**JAMA Forum, 2023**] ChatGPT与医生的医疗过失风险。[论文](https:\u002F\u002Fjamanetwork.com\u002Fjournals\u002Fjama-health-forum\u002Ffullarticle\u002F2805334)\n\n### 多模态LLM\n* [**TPAMI, 2025**] 对齐、自编码与提示工程在大型语言模型中的应用：用于新型疾病报告 [论文](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1Q8y9iQA3aw_4u98YkgaTas7I9IGfkEJM\u002Fview) [代码](https:\u002F\u002Fgithub.com\u002Fai-in-health\u002FPromptLLM)\n* [**npj Digital Medicine, 2025**] 一种多模态、多领域、多语言的医学基础模型，用于零样本临床诊断 [论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-024-01339-7) [代码](https:\u002F\u002Fgithub.com\u002Fai-in-health\u002FM3FM)\n* [**Nature Medicine, 2024**] **BiomedGPT** 一个通用的视觉-语言基础模型，适用于多样化的生物医学任务 [论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41591-024-03185-2)\n* [**Arxiv, 2023**] VisionFM：一种多模态、多任务的视觉基础模型，用于通用眼科人工智能。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.04992)\n* [**Arxiv, 2023**] 多模态大型语言模型综述。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.13549)\n* [**Arxiv, 2023**] Mm-react：利用ChatGPT进行多模态推理与行动。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.11381)\n* [**Int J Oral Sci, 2023**] ChatGPT在塑造牙科未来中的作用：多模态大型语言模型的潜力。[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41368-023-00239-y)\n* [**MIDL, 2023**] 冻结语言模型助力心电图零样本学习。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.12311)\n* [**Arxiv, 2023**] 探索与表征大型语言模型在嵌入式系统开发与调试中的应用。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.03817)\n\u003C!-- * GPT-4V在生物医学影像中的整体评估。2023年。[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.05256v1.pdf) -->\n\n### 医疗代理\n* [**Arxiv, 2025**] 一种协同进化的代理型AI系统，用于医学影像分析 [论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.20279) [代码](https:\u002F\u002Fgithub.com\u002Fzhihuanglab\u002FTissueLab)\n* [**Arxiv, 2024**] MedAgentBench：一个用于基准测试医疗LLM代理的真实虚拟电子健康记录环境 [论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.14654) [代码](https:\u002F\u002Fgithub.com\u002Fstanfordmlgroup\u002FMedAgentBench)\n* [**Arxiv, 2023**] 基于大型语言模型的代理崛起与潜力：综述。[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.07864)\n* [**Arxiv, 2023**] MedAgents：大型语言模型作为协作伙伴，用于零样本医学推理。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.10537) [代码](https:\u002F\u002Fgithub.com\u002Fgersteinlab\u002FMedAgents)\n* [**Arxiv, 2023**] GeneGPT：通过领域工具增强大型语言模型，以提升生物医学信息的可访问性。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.09667) [代码](https:\u002F\u002Fgithub.com\u002Fncbi\u002FGeneGPT)\n* [**MedRxiv, 2023**] OpenMedCalc：将ChatGPT与临床医生提供的工具相结合，可显著提升其在医学计算任务中的表现。[论文](https:\u002F\u002Fwww.medrxiv.org\u002Fcontent\u002F10.1101\u002F2023.12.13.23299881v1)\n* [**NEJM AI, 2024**] Almanac——用于临床医学的检索增强型语言模型。[论文](https:\u002F\u002Fai.nejm.org\u002Fdoi\u002Ffull\u002F10.1056\u002FAIoa2300068)\n* [**Arxiv, 2024**] ClinicalAgent：基于大型语言模型推理的临床试验多智能体系统。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.14777)\n* [**Arxiv, 2024**] AgentClinic：一个用于评估模拟临床环境中AI性能的多模态代理基准测试 [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.07960)\n* [**Arxiv, 2024**] MDAgents：一种用于医疗决策的LLM自适应协作系统。[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.15155) [代码](https:\u002F\u002Fgithub.com\u002Fmitmedialab\u002FMDAgents)\n* [**Arxiv 2024**] MediQ：用于自适应且可靠的临床推理的提问型LLM。[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.00922) [[代码]](https:\u002F\u002Fgithub.com\u002Fstellalisy\u002FMediQ)。\n\n## 👍 致谢\n* [LLMs实用指南](https:\u002F\u002Fgithub.com\u002FMooler0410\u002FLLMsPracticalGuide)。我们在此基础上构建了代码库，它是一份全面的LLM综述。\n* [大型AI综述](https:\u002F\u002Fieeexplore.ieee.org\u002Fstamp\u002Fstamp.jsp?arnumber=10261199&tag=1)。健康信息学中的大型AI模型：应用、挑战与未来。\n* [Nature Medicine](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41591-023-02448-8)。医学领域大型语言模型的综述。\n* [医疗LLM综述](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.05694)。面向医疗保健领域的大型语言模型综述。\n\n## 📑 引用\n如果您觉得我们的仓库对您的工作有所帮助，请考虑引用我们的论文，衷心感谢！\n\n```bibtex\n@article{liu2025application,\n  title={大型语言模型在医学中的应用},\n  author={刘凤林、周洪建、顾博洋、邹欣宇、黄金发、吴金格、李一儒、陈Sam S.、华怡宁、周培琳、刘俊玲、毛成峰、游晨宇、吴宪、郑叶枫、克利夫顿Lei、李正、罗杰波、大卫·A·克利夫顿 },\n  journal={自然评论·生物工程},\n  year={2025}\n}\n\n@article{zhou2023survey,\n  title={医学领域大型语言模型的综述：进展、应用与挑战},\n  author={周洪建、刘凤林、顾博洋、邹欣宇、黄金发、吴金格、李一儒、陈Sam S.、周培琳、刘俊玲、华怡宁、毛成峰、吴宪、郑叶枫、克利夫顿Lei、李正、罗杰波、大卫·A·克利夫顿},\n  journal={arXiv预印本 arXiv:2311.05112},\n  year={2023}\n}\n```\n\n## ♥️ 贡献者\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FAI-in-Health\u002FMedLLMsPracticalGuide\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_readme_b3898e5232ef.png\" \u002F>\n\u003C\u002Fa>","# MedLLMsPracticalGuide 快速上手指南\n\n**MedLLMsPracticalGuide** 并非一个单一的可安装软件包，而是一个由牛津大学等机构维护的**医疗大语言模型（Medical LLMs）实战资源清单与综述指南**。它基于发表在《Nature Reviews Bioengineering》上的论文，提供了从模型构建、数据准备到临床应用的完整路线图。\n\n本指南将帮助中国开发者快速利用该仓库中的资源，搭建或微调属于自己的医疗大模型。\n\n## 环境准备\n\n由于该指南涵盖多种模型（如 BioGPT, LLaMA-Med, Med-Gemini 等），环境需求取决于你选择的具体模型。以下是通用的基础环境要求：\n\n*   **操作系统**: Linux (推荐 Ubuntu 20.04+) 或 macOS\n*   **Python**: 3.9 或更高版本\n*   **GPU**: 建议 NVIDIA GPU，显存至少 16GB（微调大模型建议 24GB+ 或多卡）\n*   **核心依赖**:\n    *   `PyTorch` (建议 2.0+)\n    *   `Transformers` (Hugging Face)\n    *   `Accelerate`\n    *   `PEFT` (用于高效微调)\n\n**前置检查：**\n确保已安装 CUDA 驱动并验证 GPU 状态：\n```bash\nnvidia-smi\npython --version\n```\n\n## 安装步骤\n\n由于这是一个资源列表，你需要根据指南中推荐的模型进行针对性安装。以下以目前最流行的开源医疗模型微调流程为例（基于 Hugging Face 生态）。\n\n### 1. 创建虚拟环境\n```bash\nconda create -n medllm python=3.10 -y\nconda activate medllm\n```\n\n### 2. 安装深度学习框架（推荐使用国内镜像源加速）\n```bash\npip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\npip install transformers datasets accelerate peft bitsandbytes -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 3. 克隆资源指南仓库\n获取最新的模型列表和论文更新：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FAI-in-Health\u002FMedLLMsPracticalGuide.git\ncd MedLLMsPracticalGuide\n```\n\n### 4. 获取具体模型示例\n以指南中推荐的 **OpenBioLLM** 或 **LLaMA-3-Med** 系列为例，你可以通过 Hugging Face 直接加载。若需运行本地推理，请安装 `llama-cpp-python` 或 `vllm`（可选）：\n```bash\npip install llama-cpp-python -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n# 或者对于高性能推理\npip install vllm -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 基本使用\n\n本部分展示如何使用 Python 和 Hugging Face `transformers` 库，加载指南中推荐的一个预训练医疗模型进行简单的推理测试。\n\n### 示例：加载医疗模型进行问答\n\n以下代码演示了如何加载一个典型的医疗微调模型（以 `aaditya\u002FLlama3-OpenBioLLM-70B` 的量化版或类似小参数模型为例，此处用 `OpenBioLLM-Llama3-8B` 作为轻量级示例）：\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\n\n# 1. 定义模型路径 (来自指南推荐的 Hugging Face 模型)\n# 注意：实际使用时请根据显存选择 8B, 70B 或量化版本 (如 -GGUF)\nmodel_name = \"aaditya\u002FLlama3-OpenBioLLM-8B\" \n\n# 2. 加载分词器和模型\nprint(\"Loading tokenizer and model...\")\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_name,\n    torch_dtype=torch.float16,\n    device_map=\"auto\", # 自动分配 GPU\n    trust_remote_code=True\n)\n\n# 3. 构建医疗提示词 (Prompt)\n# 医疗模型通常对指令格式敏感，建议遵循 Chat 模板\nprompt = \"\"\"\nYou are an expert medical assistant. \nQuestion: What are the common symptoms of Type 2 Diabetes?\nAnswer:\n\"\"\"\n\ninputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n\n# 4. 生成回答\nprint(\"Generating response...\")\noutputs = model.generate(\n    **inputs,\n    max_new_tokens=512,\n    temperature=0.7,\n    do_sample=True,\n    pad_token_id=tokenizer.eos_token_id\n)\n\nresponse = tokenizer.decode(outputs[0], skip_special_tokens=True)\nprint(response)\n```\n\n### 进阶：使用指南中的数据集进行微调\n若需按照指南中的 **\"Fine-tuning General LLMs\"** 章节进行操作，可使用 `datasets` 库加载指南推荐的医疗数据（如 PubMedQA, MedQA 等）：\n\n```python\nfrom datasets import load_dataset\n\n# 加载指南中常见的医疗问答数据集\ndataset = load_dataset(\"bigbio\u002Fmed_qa\", split=\"train\")\n\n# 查看数据样例\nprint(dataset[0])\n# 后续可结合 PEFT (LoRA) 进行微调，具体脚本参考仓库中的 Pipeline 指南\n```\n\n> **提示**：详细的技术路线（如从头预训练 Pre-training from Scratch、检索增强生成 RAG 架构等）请参阅仓库根目录下的 `README.md` 中对应的章节链接，或阅读关联的 [arXiv 论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.05112.pdf)。","某三甲医院科研团队正计划开发一款辅助医生进行复杂病例分析的医疗大模型，但在技术选型和合规落地阶段陷入停滞。\n\n### 没有 MedLLMsPracticalGuide 时\n- **资源检索如大海捞针**：团队成员需在 arXiv、GitHub 和各大学术会议网站间反复切换，耗时数周仍难以穷尽最新的医疗大模型论文与开源代码。\n- **技术路线盲目试错**：缺乏对现有模型（如针对临床笔记微调或医学影像多模态模型）的系统性对比，导致选择了不适合特定科室数据的基座模型，浪费大量算力资源。\n- **合规风险难以评估**：由于缺少权威的实践指南，团队无法快速识别数据隐私保护、模型幻觉抑制等关键挑战的成熟解决方案，项目因伦理审查顾虑被迫搁置。\n- **前沿动态严重滞后**：仅能关注到个别热门模型，错过了 Nature Reviews Bioengineering 等顶级期刊发布的最新综述与行业趋势判断。\n\n### 使用 MedLLMsPracticalGuide 后\n- **一站式获取权威资源**：直接利用其整理的\"Medical LLMs Tree\"和表格，在几分钟内定位到涵盖预训练、微调到评估的全链路高质量论文与代码库。\n- **精准匹配技术方案**：参考指南中按任务类型（如诊断推理、病历生成）分类的模型清单，迅速锁定了适合该院数据结构的最优基座模型，缩短选型周期 80%。\n- **规避落地核心陷阱**：依据指南中总结的挑战与对策章节，提前部署了针对性的去幻觉策略和数据脱敏流程，顺利通过伦理委员会审核。\n- **同步全球最新进展**：通过其持续更新的机制，团队实时掌握了 2025 年最新发布的医疗大模型应用案例，确保技术架构具备前瞻性。\n\nMedLLMsPracticalGuide 将原本需要数月完成的调研工作压缩至数天，为医疗大模型的安全、高效落地提供了不可或缺的导航图。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAI-in-Health_MedLLMsPracticalGuide_e54a88f2.png","AI-in-Health","AI4H Group","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FAI-in-Health_bb21c405.png","\"AI for Healthcare\" research.","Oxford University","Oxford, UK",null,"https:\u002F\u002Fgithub.com\u002FAI-in-Health",2007,175,"2026-04-13T21:36:20","MIT","","未说明",{"notes":88,"python":86,"dependencies":89},"该项目是一个医学大语言模型（Medical LLMs）的实用指南和资源列表（Survey\u002FAwesome List），而非一个可直接运行的单一软件工具或代码库。README 中列出了多个独立的预训练模型（如 BioGPT, NYUTron）和微调模型（如 MMed-Llama3, OpenBioLLM），每个模型的具体运行环境需求（操作系统、GPU、内存、依赖库等）需参考其各自提供的论文链接或代码仓库。因此，本 README 文件中未包含统一的安装或运行环境要求。",[],[27,17],[92,93,94,95,96],"ai-in-medicine","clinical-ai","large-language-models","survey","medical-large-language-models","2026-03-27T02:49:30.150509","2026-04-20T12:53:03.889427",[],[]]