[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-Jianing-Qiu--Awesome-Healthcare-Foundation-Models":3,"tool-Jianing-Qiu--Awesome-Healthcare-Foundation-Models":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",154349,2,"2026-04-13T23:32:16",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":65,"owner_avatar_url":74,"owner_bio":65,"owner_company":65,"owner_location":65,"owner_email":65,"owner_twitter":65,"owner_website":65,"owner_url":75,"languages":65,"stars":76,"forks":77,"last_commit_at":78,"license":79,"difficulty_score":80,"env_os":81,"env_gpu":82,"env_ram":82,"env_deps":83,"category_tags":86,"github_topics":65,"view_count":32,"oss_zip_url":65,"oss_zip_packed_at":65,"status":17,"created_at":89,"updated_at":90,"faqs":91,"releases":92},7407,"Jianing-Qiu\u002FAwesome-Healthcare-Foundation-Models","Awesome-Healthcare-Foundation-Models",null,"Awesome-Healthcare-Foundation-Models 是一个精心整理的开源资源库，专注于汇聚医疗领域的各类大型人工智能基础模型。它将当前前沿的模型系统性地划分为四大类：大型语言模型、大型视觉模型、大型音频模型以及多模态模型，覆盖了从生物信息学、医学影像诊断到公共卫生和医疗机器人等广泛应用场景。\n\n面对医疗 AI 领域模型种类繁多、分散难寻的现状，该资源库有效解决了研究人员和开发者难以快速定位高质量模型及相关数据集的痛点。它不仅提供了按领域（医疗专用与通用）分类的详细清单，还收录了大规模生物医学数据集、相关法律法规以及最新的综述文章，甚至包含了关于医疗大模型智能体（Agents）的前沿研究成果。\n\n这一工具特别适合医疗 AI 研究人员、算法工程师、数据科学家以及关注智慧医疗发展的学生使用。无论是希望复现最新算法、寻找特定任务的基础模型，还是想了解行业伦理与安全规范，都能在此找到宝贵线索。其独特的亮点在于不仅罗列模型，更紧跟学术前沿，及时更新如《Nature Machine Intelligence》等顶级期刊的最新论文，并关联了 IEEE 期刊的特刊征稿方向，为用户把","Awesome-Healthcare-Foundation-Models 是一个精心整理的开源资源库，专注于汇聚医疗领域的各类大型人工智能基础模型。它将当前前沿的模型系统性地划分为四大类：大型语言模型、大型视觉模型、大型音频模型以及多模态模型，覆盖了从生物信息学、医学影像诊断到公共卫生和医疗机器人等广泛应用场景。\n\n面对医疗 AI 领域模型种类繁多、分散难寻的现状，该资源库有效解决了研究人员和开发者难以快速定位高质量模型及相关数据集的痛点。它不仅提供了按领域（医疗专用与通用）分类的详细清单，还收录了大规模生物医学数据集、相关法律法规以及最新的综述文章，甚至包含了关于医疗大模型智能体（Agents）的前沿研究成果。\n\n这一工具特别适合医疗 AI 研究人员、算法工程师、数据科学家以及关注智慧医疗发展的学生使用。无论是希望复现最新算法、寻找特定任务的基础模型，还是想了解行业伦理与安全规范，都能在此找到宝贵线索。其独特的亮点在于不仅罗列模型，更紧跟学术前沿，及时更新如《Nature Machine Intelligence》等顶级期刊的最新论文，并关联了 IEEE 期刊的特刊征稿方向，为用户把握行业趋势提供了权威指引，是探索医疗大模型生态的一站式入口。","# Awesome-Healthcare-Foundation-Models\n\n[![Awesome](https:\u002F\u002Fawesome.re\u002Fbadge.svg)](https:\u002F\u002Fawesome.re)\n\nCurated list of awesome large AI models (LAMs), or foundation models, in healthcare. We organize the current LAMs into four categories: large language models (LLMs), large vision models (LVMs), large audio models, and large multi-modal models (LMMs). The areas that these LAMs are applied to include but not limited to bioinformatics, medical diagnosis, medical imaging, medical informatics, medical education, public health, and medical robotics.\n\nWe welcome contributions to this repository to add more resources. Please submit a pull request if you want to contribute!\n\n## News\n\nWe are excited to annouce a _IEEE J-BHI_ special issue on **Biomedical and Health Foundation Models**. Please refer to the [call-for-papers](https:\u002F\u002Fwww.embs.org\u002Fjbhi\u002Fwp-content\u002Fuploads\u002Fsites\u002F18\u002F2023\u002F06\u002FJBHI_Foundation_Models_Call-for-Papers.pdf) for more details.\n\nTopics of interest include but not limited to:\n\n1. Basic research on new theories, principles, and structures of biomedical and health foundation models\n2. Basic research on the interpretability and explainability of biomedical and health foundation models\n3. Prompt engineering in biomedical and health foundation models\n4. Data engineering in biomedical and health foundation models\n5. Large-scale biomedical and health dataset\n6. Multi-modal learning and alignment for biomedical and health foundation models\n7. Efficient computing for biomedical and health foundation models\n8. Adversarial robustness of biomedical and health foundation models\n9. Applications of foundation models in biomedical and health informatics\n10. New evaluation paradigms for biomedical and health foundation models\n11. New computer systems for biomedical and health foundation models\n12. Decentralised methods for developing and deploying biomedical and health foundation models\n13. Foundation model ethics, safety, privacy, and regulations in biomedicine and healthcare\n\nPlease help spread the word and contribute if you are interested or already working on these topics!\n\n:star2: Our latest article about LLM agents in medicine is published in **Nature Machine Intelligence**, 2024. Please check it out and hope it is helpful!\n\n**[LLM-based Agentic Systems in Medicine and Healthcare](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs42256-024-00944-1)**\n\u003Cbr \u002F>\nJianing Qiu,\nKyle Lam,\nGuohao Li,\nAmish Acharya,\nTien Yin Wong,\nAra Darzi,\nWu Yuan, and\nEric J. Topol\n\u003Cbr \u002F>\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJianing-Qiu_Awesome-Healthcare-Foundation-Models_readme_b293dcd5e1ff.png)\n\n## Table of Contents\n\n- [Awesome-Healthcare-Foundation-Models](#awesome-healthcare-foundation-models)\n  - [News](#news)\n  - [Table of Contents](#table-of-contents)\n  - [Survey](#survey)\n  - [Large Language Models](#large-language-models)\n    - [Healthcare Domain](#healthcare-domain)\n    - [General Domain](#general-domain)\n  - [Large Vision Models](#large-vision-models)\n    - [Healthcare Domain](#healthcare-domain-1)\n    - [General Domain](#general-domain-1)\n  - [Large Audio Models](#large-audio-models)\n    - [Healthcare Domain](#healthcare-domain-2)\n    - [General Domain](#general-domain-2)\n  - [Large Multi-modal Models](#large-multi-modal-models)\n    - [Healthcare Domain](#healthcare-domain-3)\n    - [General Domain](#general-domain-3)\n  - [Applications of Large AI Models in Healthcare](#applications-of-large-ai-models-in-healthcare)\n    - [Bioinformatics](#bioinformatics)\n    - [Medical Diagnosis](#medical-diagnosis)\n    - [Medical Imaging](#medical-imaging)\n    - [Medical Informatics](#medical-informatics)\n    - [Medical Education](#medical-education)\n    - [Public Health](#public-health)\n    - [Medical Robotics](#medical-robotics)\n  - [AI Legislation](#ai-legislation)\n  - [Large-scale Datasets in Biomedical and Health Informatics](#large-scale-datasets-in-biomedical-and-health-informatics)\n    - [Open Source](#open-source)\n    - [Private or Upon Approval](#private-or-upon-approval)\n\n## Survey\n\nThis repository is largely based on the following paper:\n\n**[Large AI Models in Health Informatics:\nApplications, Challenges, and the Future](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F10261199)**\n\u003Cbr \u002F>\nJianing Qiu,\nLin Li,\nJiankai Sun,\nJiachuan Peng,\nPeilun Shi,\nRuiyang Zhang,\nYinzhao Dong,\nKyle Lam,\nFrank P.-W. Lo,\nBo Xiao,\nWu Yuan,\nNingli Wang,\nDong Xu, and\nBenny Lo\n\u003Cbr \u002F>\n\nIf you find this repository helpful, please consider citing:\n\n```bibtex\n@article{qiu2023large,\n  title={Large ai models in health informatics: Applications, challenges, and the future},\n  author={Qiu, Jianing and Li, Lin and Sun, Jiankai and Peng, Jiachuan and Shi, Peilun and Zhang, Ruiyang and Dong, Yinzhao and Lam, Kyle and Lo, Frank P-W and Xiao, Bo and others},\n  journal={IEEE Journal of Biomedical and Health Informatics},\n  year={2023},\n  publisher={IEEE}\n}\n```\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJianing-Qiu_Awesome-Healthcare-Foundation-Models_readme_db04681ec841.png)\n\n## Large Language Models\n\n### Healthcare Domain\n\n\n- ClinicalMamba: A Generative Clinical Language Model on Longitudinal Clinical Notes  [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.05795)\n- ChiMed-GPT: A Chinese Medical Large Language Model with Full\nTraining Regime and Better Alignment to Human Preferences [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.06025.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fsynlp\u002FChiMed-GPT)\n- Med-PaLM 2: Towards Expert-Level Medical Question Answering with Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.09617.pdf)\n- KeBioLM: Improving Biomedical Pretrained Language Models with Knowledge [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.10344)\n- BioELMo: Probing Biomedical Embeddings from Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.02181)\n- BioBART: Pretraining and Evaluation of A Biomedical Generative Language Model [[Paper]](https:\u002F\u002Faclanthology.org\u002F2022.bionlp-1.9.pdf)\n- ClinicalT5: A Generative Language Model for Clinical Text [[Paper]](https:\u002F\u002Faclanthology.org\u002F2022.findings-emnlp.398.pdf)\n- GatorTron: A Large Clinical Language Model to Unlock Patient Information from Unstructured Electronic Health Records [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.03540v2.pdf)\n- ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.07257.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fzhaozh10\u002FChatCAD)\n- DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.11032.pdf)\n- Capabilities of GPT-4 on Medical Challenge Problems [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.13375.pdf)\n- BioBERT: a pre-trained biomedical language representation model for biomedical text mining [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1901.08746.pdf)\n- Publicly Available Clinical BERT Embeddings [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.03323.pdf)\n- BioMegatron: Larger Biomedical Domain Language Model [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.06060.pdf)\n- Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks [[Paper]](https:\u002F\u002Faclanthology.org\u002F2020.acl-main.740.pdf)\n- Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-021-00455-y)\n- CPLLM: Clinical Prediction with Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11295) [[Code]](https:\u002F\u002Fgithub.com\u002Fnadavlab\u002FCPLLM)\n- DoctorGLM: Fine-tuning your chinese doctor is not a herculean task [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.01097) [[Code]](https:\u002F\u002Fgithub.com\u002Fxionghonglin\u002FDoctorGLM)\n- HuatuoGPT, Towards Taming Language Models To Be a Doctor [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.15075) [[Code]](https:\u002F\u002Fgithub.com\u002FFreedomIntelligence\u002FHuatuoGPT)\n- BioELECTRA:Pretrained Biomedical text Encoder using Discriminators [[Paper]](https:\u002F\u002Faclanthology.org\u002F2021.bionlp-1.16.pdf)\n- LinkBERT: Pretraining Language Models with Document Links [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.15827.pdf)\n- BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.10341.pdf)\n- Large Language Models Encode Clinical Knowledge [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2212.13138.pdf)\n- A large language model for electronic health records [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-022-00742-2)\n- Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.15779.pdf)\n- BEHRT: Transformer for Electronic Health Records [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-020-62922-y)\n- Federated Learning of Medical Concepts Embedding using BEHRT [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13052) [[Code]](https:\u002F\u002Fgithub.com\u002Fnadavlab\u002FFederatedBEHRT)\n- RadBERT: Adapting Transformer-based Language Models to Radiology [[paper]](https:\u002F\u002Fpubs.rsna.org\u002Fdoi\u002Fepdf\u002F10.1148\u002Fryai.210258) [[HuggingFace]](https:\u002F\u002Fhuggingface.co\u002FUCSD-VA-health\u002FRadBERT-RoBERTa-4m)\n- Highly accurate protein structure prediction with AlphaFold [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-021-03819-2) [[Code]](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Falphafold)\n- Accurate prediction of protein structures and interactions using a three-track neural network [[Paper]](https:\u002F\u002Fwww.science.org\u002Fdoi\u002Ffull\u002F10.1126\u002Fscience.abj8754?casa_token=tleEHPOOSr8AAAAA%3AT0eToIMPW0oN1jjIGLs8aPyQK8qbcFIByjT1x4k90tvBAj03SZUzpEinCPe_t-g4ECmjJ9wlj8OwQBs)\n- Protein complex prediction with AlphaFold-Multimer [[Paper]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2021.10.04.463034v2.abstract)\n- FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.00854) [[Code]](https:\u002F\u002Fgithub.com\u002Fhpcaitech\u002Ffastfold)\n- HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.05477) [[Code]](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddleHelix)\n- Uni-Fold: An Open-Source Platform for Developing Protein Folding Models beyond AlphaFold [[Paper]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.08.04.502811v3.abstract) [[Code]](https:\u002F\u002Fgithub.com\u002Fdptech-corp\u002FUni-Fold)\n- OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization [[Paper]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.11.20.517210v2.abstract) [[Code]](https:\u002F\u002Fgithub.com\u002Faqlaboratory\u002Fopenfold)\n- ManyFold: an efficient and flexible library for training and validating protein folding models [[Paper]](https:\u002F\u002Facademic.oup.com\u002Fbioinformatics\u002Farticle\u002F39\u002F1\u002Fbtac773\u002F6887136) [[Code]](https:\u002F\u002Fgithub.com\u002Finstadeepai\u002Fmanyfold)\n- ColabFold: making protein folding accessible to all [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41592-022-01488-1) [[Code]](https:\u002F\u002Fgithub.com\u002Fsokrypton\u002FColabFold)\n- Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences [[Paper]](https:\u002F\u002Fwww.pnas.org\u002Fdoi\u002Fabs\u002F10.1073\u002Fpnas.2016239118) [[Code]](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fesm)\n- ProGen: Language Modeling for Protein Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.03497) [[Code]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fprogen)\n- ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.06225) [[Code]](https:\u002F\u002Fgithub.com\u002Fagemagician\u002FProtTrans)\n- Evolutionary-scale prediction of atomic level protein structure with a language model [[Paper]](https:\u002F\u002Fwww.science.org\u002Fdoi\u002Ffull\u002F10.1126\u002Fscience.ade2574)\n- High-resolution de novo structure prediction from primary sequence [[Paper]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.07.21.500999v1.abstract) [[Code]](https:\u002F\u002Fgithub.com\u002FHeliXonProtein\u002FOmegaFold)\n- Single-sequence protein structure prediction using a language model and deep learning [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41587-022-01432-w)\n- Improved the Protein Complex Prediction with Protein Language Models [[Paper]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.09.15.508065v2.abstract)\n- MSA Transformer [[Paper]](http:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Frao21a.html) [[Code]](https:\u002F\u002Fgithub.com\u002FThe-AI-Summer\u002Fself-attention-cv)\n- Deciphering antibody affinity maturation with language models and weakly supervised learning [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.07782)\n- xTrimoABFold: De novo Antibody Structure Prediction without MSA [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.00735)\n- scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.00735) [[Code]](https:\u002F\u002Fgithub.com\u002FTencentAILabHealthcare\u002FscBERT)\n- Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions [[Paper]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.08.06.503062v2.abstract) [[Code]](https:\u002F\u002Fgithub.com\u002Fml4bio\u002Frna-fm)\n- E2Efold-3D: End-to-End Deep Learning Method for accurate de novo RNA 3D Structure Prediction [[Paper]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.08.06.503062v2.abstract) [[Code]](https:\u002F\u002Fgithub.com\u002Fml4bio\u002Frna-fm)\n- HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.15794) [[Code]](https:\u002F\u002Fgithub.com\u002FHazyResearch\u002Fhyena-dna)\n\n### General Domain\n\n- Chatgpt: Optimizing language models for dialogue [[Blog]](https:\u002F\u002Fopenai.com\u002Fblog\u002Fchatgpt\u002F)\n- LLaMA: Open and Efficient Foundation Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.13971.pdf)\n- Scaling Instruction-Finetuned Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.11416.pdf)\n- PaLM: Scaling Language Modeling with Pathways [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.02311.pdf)\n- Training Compute-Optimal Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.15556.pdf)\n- Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2201.11990.pdf)\n- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2211.05100.pdf)\n- LaMDA: Language Models for Dialog Applications [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2201.08239.pdf)\n- OPT: Open Pre-trained Transformer Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2205.01068.pdf)\n- Training language models to follow instructions with human feedback [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.02155.pdf)\n- Scaling Language Models: Methods, Analysis & Insights from Training Gopher [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2112.11446.pdf)\n- Multitask prompted training enables zero-shot task generalization [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.08207.pdf)\n- Language Models are Few-Shot Learners [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.14165.pdf)\n- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.10683.pdf)\n- RoBERTa: A Robustly Optimized BERT Pretraining Approach [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1907.11692.pdf)\n- Language Models are Unsupervised Multitask Learners [[Paper]](https:\u002F\u002Fd4mucfpksywv.cloudfront.net\u002Fbetter-language-models\u002Flanguage_models_are_unsupervised_multitask_learners.pdf)\n- Improving language models by retrieving from trillions of tokens [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2112.04426.pdf)\n- WebGPT: Browser-assisted question-answering with human feedback [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2112.09332.pdf)\n- Improving alignment of dialogue agents via targeted human judgements [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.14375.pdf)\n- Improving Language Understanding by Generative Pre-Training [[Paper]](https:\u002F\u002Fs3-us-west-2.amazonaws.com\u002Fopenai-assets\u002Fresearch-covers\u002Flanguage-unsupervised\u002Flanguage_understanding_paper.pdf)\n- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.04805.pdf)\n\n## Large Vision Models\n\n### Healthcare Domain\n\n- vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2025\u002Fhtml\u002FWittmann_vesselFM_A_Foundation_Model_for_Universal_3D_Blood_Vessel_Segmentation_CVPR_2025_paper.html) [[Code]](https:\u002F\u002Fgithub.com\u002Fbwittmann\u002FvesselFM)\n- VisionFM: Development and Validation of a Multimodal Multitask Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence [[Paper]](https:\u002F\u002Fai.nejm.org\u002Fdoi\u002Ffull\u002F10.1056\u002FAIoa2300221) [[Code]](https:\u002F\u002Fgithub.com\u002FABILab-CUHK\u002FVisionFM)\n- RETFound: A foundation model for generalizable disease detection from retinal images [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-023-06555-x)\n- EndoFM: Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.16741) [[Code]](https:\u002F\u002Fgithub.com\u002Fmed-air\u002FEndo-FM)\n- STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.06716)\n- LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.11925) [[Code]](https:\u002F\u002Fgithub.com\u002Fduyhominhnguyen\u002FLVM-Med)\n- Med3d: Transfer learning for 3d medical image analysis [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.00625) [[Code]](https:\u002F\u002Fgithub.com\u002FTencent\u002FMedicalNet)\n- Models genesis: Generic autodidactic models for 3d medical image analysis [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1908.06912) [[Code]](https:\u002F\u002Fgithub.com\u002FMrGiovanni\u002FModelsGenesis)\n- MICLe: Big self-supervised models advance medical image classifications [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.05224) [[Code]](https:\u002F\u002Fgithub.com\u002Frjrobben\u002FMICLe_pytorch)\n- C2l: Comparing to Learn: Surpassing ImageNet Pretraining on Radiographs By Comparing Image Representations [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.07423) [[Code]](https:\u002F\u002Fgithub.com\u002Ffunnyzhou\u002FC2L_MICCAI2020)\n- MoCo-CXR: MoCo Pretraining Improves Representation and Transferability of Chest X-ray Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.05352) [[Code]](https:\u002F\u002Fgithub.com\u002Fstanfordmlgroup\u002FMoCo-CXR)\n- Transunet: Transformers make strong encoders for medical image segmentation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.04306) [[Code]](https:\u002F\u002Fgithub.com\u002FBeckschen\u002FTransUNet)\n- Transfuse: Fusing transformers and cnns for medical image segmentation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.08005) [[Code]](https:\u002F\u002Fgithub.com\u002FRayicer\u002FTransFuse)\n- Medical transformer: Gated axial-attention for medical image segmentation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.10662) [[Code]](https:\u002F\u002Fgithub.com\u002Fjeya-maria-jose\u002FMedical-Transformer)\n- UNETR: Transformers for 3D Medical Image Segmentation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.10504) [[Code]](https:\u002F\u002Fgithub.com\u002FProject-MONAI\u002Fresearch-contributions\u002Ftree\u002Fmain\u002FUNETR\u002FBTCV)\n- Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.03024) [[Code]](https:\u002F\u002Fgithub.com\u002FYtongXie\u002FCoTr)\n- Swin-unet: Unet-like pure transformer for medical image segmentation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.05537) [[Code]](https:\u002F\u002Fgithub.com\u002FHuCaoFighting\u002FSwin-Unet)\n- SAM4Med: Generalist Vision Foundation Models for Medical Imaging: A Case Study of Segment Anything Model on Zero-Shot Medical Segmentation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.12637.pdf)\n- Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.15220) [[Code]](https:\u002F\u002Fgithub.com\u002FCAMMA-public\u002FSurgVLP)\n\n### General Domain\n\n**CNNs**:\n\n- GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism [[paper]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2019\u002Fhash\u002F093f65e080a295f8076b1c5722a46aa2-Abstract.html)\n- Big Transfer (BiT): General Visual Representation Learning [[paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2020\u002Fpapers_ECCV\u002Fpapers\u002F123500477.pdf)\n- Designing Network Design Spaces [[paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_CVPR_2020\u002Fhtml\u002FRadosavovic_Designing_Network_Design_Spaces_CVPR_2020_paper.html)\n- Self-supervised Pretraining of Visual Features in the Wild [[paper]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2103.01988)\n- EfficientNetV2: Smaller Models and Faster Training [[paper]](https:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Ftan21a.html)\n- A ConvNet for the 2020s [[paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FLiu_A_ConvNet_for_the_2020s_CVPR_2022_paper.pdf)\n- InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions [[paper]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2211.05778)\n\n**Vision Transformers**:\n\n- Generative Pretraining From Pixels [[paper]](https:\u002F\u002Fproceedings.mlr.press\u002Fv119\u002Fchen20s.html)\n- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [[paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=YicbFdNTTy&utm_campaign=f86497ed3a-EMAIL_CAMPAIGN_2019_04_24_03_18_COPY_01&utm_medium=email&utm_source=Deep%20Learning%20Weekly&utm_term=0_384567b42d-f86497ed3a-72965345)\n- Transformer in Transformer [[paper]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2021\u002Fhash\u002F854d9fca60b4bd07f9bb215d59ef5561-Abstract.html)\n- Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows [[paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2021\u002Fhtml\u002FLiu_Swin_Transformer_Hierarchical_Vision_Transformer_Using_Shifted_Windows_ICCV_2021_paper.html)\n- Training data-efficient image transformers & distillation through attention [[paper]](https:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Ftouvron21a.html)\n- Self-supervised Models are Good Teaching Assistants for Vision Transformers [[paper]](https:\u002F\u002Fproceedings.mlr.press\u002Fv162\u002Fwu22c.html)\n- Scaling Vision with Sparse Mixture of Experts [[paper]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2021\u002Fhash\u002F48237d9f2dea8c74c2a72126cf63d933-Abstract.html)\n- Going Deeper With Image Transformers [[paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2021\u002Fhtml\u002FTouvron_Going_Deeper_With_Image_Transformers_ICCV_2021_paper.html)\n- Masked Autoencoders Are Scalable Vision Learners [[paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fhtml\u002FHe_Masked_Autoencoders_Are_Scalable_Vision_Learners_CVPR_2022_paper.html)\n- Swin Transformer V2: Scaling Up Capacity and Resolution [[paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fhtml\u002FLiu_Swin_Transformer_V2_Scaling_Up_Capacity_and_Resolution_CVPR_2022_paper.html)\n- Scaling Vision Transformers [[paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fhtml\u002FZhai_Scaling_Vision_Transformers_CVPR_2022_paper.html)\n- Efficient Self-supervised Vision Transformers for Representation Learning [[paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=fVu3o-YUGQK)\n- Scaling Vision Transformers to 22 Billion Parameters [[paper]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2302.05442)\n- H-optimus-0: 1.1B parameter vision transformer trained on a proprietary collection of more than 500,000 H&E stained whole slide histology images [[Code]](https:\u002F\u002Fgithub.com\u002Fbioptimus\u002Freleases\u002Ftree\u002Fmain\u002Fmodels\u002Fh-optimus\u002Fv0)[[HuggingFace]](https:\u002F\u002Fhuggingface.co\u002Fbioptimus\u002FH-optimus-0)\n\n**CNNs + ViTs**:\n\n- CoAtNet: Marrying Convolution and Attention for All Data Sizes [[paper]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2021\u002Fhash\u002F20568692db622456cc42a2e853ca21f8-Abstract.html)\n- LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference [[paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2021\u002Fhtml\u002FGraham_LeViT_A_Vision_Transformer_in_ConvNets_Clothing_for_Faster_Inference_ICCV_2021_paper.html)\n- ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases [[paper]](https:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Fd-ascoli21a.html)\n\n## Large Audio Models\n\n### Healthcare Domain\n\n### General Domain\n\n- wav2vec: Unsupervised Pre-training for Speech Recognition [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.05862) [[Blog]](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fwav2vec-20-learning-the-structure-of-speech-from-raw-audio\u002F)\n- W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2108.06209)\n- AudioLM: a Language Modeling Approach to Audio Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.03143) [[Project]](https:\u002F\u002Fgoogle-research.github.io\u002Fseanet\u002Faudiolm\u002Fexamples\u002F) [[Blog]](https:\u002F\u002Fai.googleblog.com\u002F2022\u002F10\u002Faudiolm-language-modeling-approach-to.html)\n- HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.07447) [[HuggingFace]](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fhubert)\n- XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2111.09296) [[Blog]](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fxls-r-self-supervised-speech-processing-for-128-languages\u002F) [[HuggingFace]](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fwav2vec2-xls-r-300m)\n- MusicLM: Generating Music From Text [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.11325) [[Project]](https:\u002F\u002Fgoogle-research.github.io\u002Fseanet\u002Fmusiclm\u002Fexamples\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fmusiclm-pytorch)\n- Diffsound: Discrete Diffusion Model for Text-to-sound Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.09983) [[Project]](http:\u002F\u002Fdongchaoyang.top\u002Ftext-to-sound-synthesis-demo\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fyangdongchao\u002FText-to-sound-Synthesis)\n- AudioGen: Textually Guided Audio Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.15352) [[Project]](https:\u002F\u002Ffelixkreuk.github.io\u002Faudiogen\u002F)\n- Whisper: Robust Speech Recognition via Large-Scale Weak Supervision [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.04356) [[Code]](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper) [[HuggingFace]](https:\u002F\u002Fhuggingface.co\u002Fopenai\u002Fwhisper-tiny.en)\n- Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.01037) [[Blog]](https:\u002F\u002Fai.googleblog.com\u002F2023\u002F03\u002Funiversal-speech-model-usm-state-of-art.html)\n\n## Large Multi-modal Models\n\n### Healthcare Domain\n\n- The application of multimodal large language models in medicine [[Paper]](https:\u002F\u002Fwww.thelancet.com\u002Fjournals\u002Flanwpc\u002Farticle\u002FPIIS2666-6065(24)00042-7\u002Ffulltext)\n- Foundation models: the future of surgical artificial intelligence? [[Paper]](https:\u002F\u002Facademic.oup.com\u002Fbjs\u002Farticle\u002F111\u002F4\u002Fznae090\u002F7656455)\n- Bootstrapping Large Language Models for Radiology Report Generation [[Paper]](https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F29826)[[Code]](https:\u002F\u002Fgithub.com\u002Fsynlp\u002FR2-LLM)\n- Dietary Assessment with Multimodal ChatGPT: A Systematic Analysis [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.08592)\n- PLIP: A visual–language foundation model for pathology image analysis using medical Twitter [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41591-023-02504-3)\n- LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.00890.pdf)\n- GPT-4 Technical Report [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.08774.pdf)\n- Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41551-022-00936-9)\n- Contrastive Learning of Medical Visual Representations from Paired Images and Text [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.00747.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FedreisMD\u002FConVIRT-pytorch)\n- Gloria: A multimodal global-local representation learning framework for labelefficient medical image recognition [[Paper]](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F9710099) [[Code]](https:\u002F\u002Fgithub.com\u002Fmarshuang80\u002Fgloria)\n- RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.00534)\n- PubMedCLIP: How Much Does CLIP Benefit Visual Question Answering in the Medical Domain? [[Paper]](https:\u002F\u002Faclanthology.org\u002F2023.findings-eacl.88\u002F)\n- SurgVLP: Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.15220) [[Code]](https:\u002F\u002Fgithub.com\u002FCAMMA-public\u002FSurgVLP)\n- Frontiers in intelligent colonoscopy [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.17241) [[Code]](https:\u002F\u002Fgithub.com\u002Fai4colonoscopy\u002FIntelliScope)\n\n### General Domain\n\n**Multi-modal Chatbot**\n- The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [[Paper]](https:\u002F\u002Fbrowse.arxiv.org\u002Fpdf\u002F2309.17421.pdf)\n\n**Representation learning**:\n\n- Learning Transferable Visual Models From Natural Language Supervision [[paper]](https:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Fradford21a.html)\n- Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision [[paper]](https:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Fjia21b.html)\n- Florence: A New Foundation Model for Computer Vision [[paper]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2111.11432)\n- Grounded Language-Image Pre-Training [[paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fhtml\u002FLi_Grounded_Language-Image_Pre-Training_CVPR_2022_paper.html)\n- WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training [[paper]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2103.06561)\n- FLAVA: A Foundational Language and Vision Alignment Model [[paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fhtml\u002FSingh_FLAVA_A_Foundational_Language_and_Vision_Alignment_Model_CVPR_2022_paper.html)\n- SimVLM: Simple Visual Language Model Pretraining with Weak Supervision [[paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=GUrhfTuf_3)\n- FILIP: Fine-grained Interactive Language-Image Pre-Training [[paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=cpDhcsEDC2)\n- Combined Scaling for Open-Vocabulary Image Classification [[paper]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2111.10050)\n- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation [[paper]](https:\u002F\u002Fproceedings.mlr.press\u002Fv162\u002Fli22n.html)\n- PaLI: A Jointly-Scaled Multilingual Language-Image Model [[paper]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2209.06794)\n- Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information [[paper]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2211.09807)\n- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models [[paper]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2301.12597)\n- Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm [[paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=zq1iJkNk3uN)\n- Language Is Not All You Need: Aligning Perception with Language Models [[paper]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2302.14045)\n- PaLM-E: An Embodied Multimodal Language Model [[paper]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2303.03378)\n- Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models [[paper]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2303.04671)\n\n**Text-to-image generation**:\n\n- Zero-Shot Text-to-Image Generation [[paper]](https:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Framesh21a.html)\n- High-Resolution Image Synthesis With Latent Diffusion Models [[paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fhtml\u002FRombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.html)\n- Hierarchical Text-Conditional Image Generation with CLIP Latents [[paper]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2204.06125)\n- GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models [[paper]](https:\u002F\u002Fproceedings.mlr.press\u002Fv162\u002Fnichol22a.html)\n- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding [[paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=08Yk-n5l2Al)\n- Scaling Autoregressive Models for Content-Rich Text-to-Image Generation [[paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=AFDcYJKhND)\n\n## Applications of Large AI Models in Healthcare\n\nNote that some of the following models were not targeted at healthcare applications initially but may have the potential to be transferred to the healthcare domain or inspire future development.\n\n### Bioinformatics\n\n- GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.09667)\n- Highly accurate protein structure prediction with AlphaFold [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-021-03819-2) [[Code]](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Falphafold)\n- Accurate prediction of protein structures and interactions using a three-track neural network [[Paper]](https:\u002F\u002Fwww.science.org\u002Fdoi\u002Ffull\u002F10.1126\u002Fscience.abj8754?casa_token=tleEHPOOSr8AAAAA%3AT0eToIMPW0oN1jjIGLs8aPyQK8qbcFIByjT1x4k90tvBAj03SZUzpEinCPe_t-g4ECmjJ9wlj8OwQBs)\n- Protein complex prediction with AlphaFold-Multimer [[Paper]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2021.10.04.463034v2.abstract)\n- FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.00854) [[Code]](https:\u002F\u002Fgithub.com\u002Fhpcaitech\u002Ffastfold)\n- HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.05477) [[Code]](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddleHelix)\n- Uni-Fold: An Open-Source Platform for Developing Protein Folding Models beyond AlphaFold [[Paper]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.08.04.502811v3.abstract) [[Code]](https:\u002F\u002Fgithub.com\u002Fdptech-corp\u002FUni-Fold)\n- OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization [[Paper]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.11.20.517210v2.abstract) [[Code]](https:\u002F\u002Fgithub.com\u002Faqlaboratory\u002Fopenfold)\n- ManyFold: an efficient and flexible library for training and validating protein folding models [[Paper]](https:\u002F\u002Facademic.oup.com\u002Fbioinformatics\u002Farticle\u002F39\u002F1\u002Fbtac773\u002F6887136) [[Code]](https:\u002F\u002Fgithub.com\u002Finstadeepai\u002Fmanyfold)\n- ColabFold: making protein folding accessible to all [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41592-022-01488-1) [[Code]](https:\u002F\u002Fgithub.com\u002Fsokrypton\u002FColabFold)\n- Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences [[Paper]](https:\u002F\u002Fwww.pnas.org\u002Fdoi\u002Fabs\u002F10.1073\u002Fpnas.2016239118) [[Code]](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fesm)\n- ProGen: Language Modeling for Protein Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.03497) [[Code]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fprogen)\n- ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.06225) [[Code]](https:\u002F\u002Fgithub.com\u002Fagemagician\u002FProtTrans)\n- Evolutionary-scale prediction of atomic level protein structure with a language model [[Paper]](https:\u002F\u002Fwww.science.org\u002Fdoi\u002Ffull\u002F10.1126\u002Fscience.ade2574)\n- High-resolution de novo structure prediction from primary sequence [[Paper]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.07.21.500999v1.abstract) [[Code]](https:\u002F\u002Fgithub.com\u002FHeliXonProtein\u002FOmegaFold)\n- Single-sequence protein structure prediction using a language model and deep learning [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41587-022-01432-w)\n- Improved the Protein Complex Prediction with Protein Language Models [[Paper]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.09.15.508065v2.abstract)\n- MSA Transformer [[Paper]](http:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Frao21a.html) [[Code]](https:\u002F\u002Fgithub.com\u002FThe-AI-Summer\u002Fself-attention-cv)\n- Deciphering antibody affinity maturation with language models and weakly supervised learning [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.07782)\n- xTrimoABFold: De novo Antibody Structure Prediction without MSA [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.00735)\n- scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.00735) [[Code]](https:\u002F\u002Fgithub.com\u002FTencentAILabHealthcare\u002FscBERT)\n- Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions [[Paper]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.08.06.503062v2.abstract) [[Code]](https:\u002F\u002Fgithub.com\u002Fml4bio\u002Frna-fm)\n- E2Efold-3D: End-to-End Deep Learning Method for accurate de novo RNA 3D Structure Prediction [[Paper]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.08.06.503062v2.abstract) [[Code]](https:\u002F\u002Fgithub.com\u002Fml4bio\u002Frna-fm)\n- SMILES-BERT: large scale unsupervised pre-training for molecular property prediction [[Paper]](https:\u002F\u002Fpar.nsf.gov\u002Fservlets\u002Fpurl\u002F10168888) [[Code]](https:\u002F\u002Fgithub.com\u002Futa-smile\u002FSMILES-BERT)\n- SMILES Transformer: Pre-trained molecular fingerprint for low data drug discovery [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.04738) [[Code]](https:\u002F\u002Fgithub.com\u002FDSPsleeporg\u002Fsmiles-transformer)\n- MolBert: Molecular representation learning with language models and domain-relevant auxiliary tasks [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2011.13230) [[Code]](https:\u002F\u002Fgithub.com\u002FBenevolentAI\u002FMolBERT)\n- AGBT: Algebraic graph-assisted bidirectional transformers for molecular property prediction [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41467-021-23720-w) [[Code]](https:\u002F\u002Fgithub.com\u002FChenDdon\u002FAGBTcode)\n- GROVER: Self-supervised graph transformer on large-scale molecular data [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.02835) [[Code]](https:\u002F\u002Fgithub.com\u002Ftencent-ailab\u002Fgrover)\n- Molgpt: molecular generation using a transformer-decoder model [[Paper]](https:\u002F\u002Fpubs.acs.org\u002Fdoi\u002F10.1021\u002Facs.jcim.1c00600) [[Code]](https:\u002F\u002Fgithub.com\u002Fdevalab\u002Fmolgpt)\n- A Model to Search for Synthesizable Molecules [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.05221) [[Code]](https:\u002F\u002Fgithub.com\u002Fjohn-bradshaw\u002Fmolecule-chef)\n- Transformer neural network for protein-specific de novo drug generation as a machine translation problem [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-020-79682-4)\n- Deepconv-dti: Prediction of drug-target interactions via deep learning with convolution on protein sequences [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.02114) [[Code]](https:\u002F\u002Fgithub.com\u002FGIST-CSBL\u002FDeepConv-DTI)\n- Graphdta: predicting drug–target binding affinity with graph neural networks [[Paper]](https:\u002F\u002Fpubmed.ncbi.nlm.nih.gov\u002F33119053\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fthinng\u002FGraphDTA)\n- Moltrans: molecular interaction transformer for drug–target interaction prediction [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.11424) [[Code]](https:\u002F\u002Fgithub.com\u002Fkexinhuang12345\u002Fmoltrans)\n- Extracting Predictive Representations from Hundreds of Millions of Molecules [[Paper]](https:\u002F\u002Fpubs.acs.org\u002Fdoi\u002F10.1021\u002Facs.jpclett.1c03058) [[Code]](https:\u002F\u002Fgithub.com\u002FWeilabMSU\u002FPretrainModels)\n- ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties [[Project]](https:\u002F\u002Fadmetmesh.scbdd.com\u002F) [[Paper]](https:\u002F\u002Fpubmed.ncbi.nlm.nih.gov\u002F33893803\u002F)\n- MPG: Learn molecular representations from large-scale unlabeled molecules for drug discovery [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2012.11175)\n- MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction [[Paper]](https:\u002F\u002Facademic.oup.com\u002Fbib\u002Farticle-abstract\u002F22\u002F6\u002Fbbab152\u002F6265201?redirectedFrom=fulltext) [[Code]](https:\u002F\u002Fgithub.com\u002FParishadBehnam\u002FMG-BERT)\n- PanGu Drug Model: Learn a Molecule Like a Human [[Project]](http:\u002F\u002Fwww.pangu-drug.com\u002F) [[Paper]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.03.31.485886v1.full)\n- DrugBAN: Interpretable bilinear attention network with domain adaptation improves drug–target prediction [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs42256-022-00605-1) [[Code]](https:\u002F\u002Fgithub.com\u002Fpeizhenbai\u002FDrugBAN)\n- DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.09637) [[Code]](https:\u002F\u002Fgithub.com\u002Ftencent-ailab\u002FDrugOOD)\n\n### Medical Diagnosis\n\n- VisionFM: Development and Validation of a Multimodal Multitask Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence [[Paper]](https:\u002F\u002Fai.nejm.org\u002Fdoi\u002Ffull\u002F10.1056\u002FAIoa2300221) [[Code]](https:\u002F\u002Fgithub.com\u002FABILab-CUHK\u002FVisionFM)\n- RETFound: A foundation model for generalizable disease detection from retinal images [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-023-06555-x)\n- LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.00890.pdf)\n- Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41551-022-00936-9)\n- ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.07257.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fzhaozh10\u002FChatCAD)\n- BEHRT: Transformer for Electronic Health Records [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-020-62922-y)\n- Federated Learning of Medical Concepts Embedding using BEHRT [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13052) [[Code]](https:\u002F\u002Fgithub.com\u002Fnadavlab\u002FFederatedBEHRT)\n- Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-021-00455-y)\n- CPLLM: Clinical Prediction with Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11295) [[Code]](https:\u002F\u002Fgithub.com\u002Fnadavlab\u002FCPLLM)\n- RadBERT: Adapting Transformer-based Language Models to Radiology [[paper]](https:\u002F\u002Fpubs.rsna.org\u002Fdoi\u002Fepdf\u002F10.1148\u002Fryai.210258) [[HuggingFace]](https:\u002F\u002Fhuggingface.co\u002FUCSD-VA-health\u002FRadBERT-RoBERTa-4m)\n- ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs [[paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.15964) [[Code]](https:\u002F\u002Fgithub.com\u002Fzhaozh10\u002FChatCAD)\n\n### Medical Imaging\n\n- VisionFM: Development and Validation of a Multimodal Multitask Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence [[Paper]](https:\u002F\u002Fai.nejm.org\u002Fdoi\u002Ffull\u002F10.1056\u002FAIoa2300221) [[Code]](https:\u002F\u002Fgithub.com\u002FABILab-CUHK\u002FVisionFM)\n- RETFound: A foundation model for generalizable disease detection from retinal images [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-023-06555-x)\n- Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41551-022-00936-9)\n- Med3d: Transfer learning for 3d medical image analysis [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.00625) [[Code]](https:\u002F\u002Fgithub.com\u002FTencent\u002FMedicalNet)\n- Models genesis: Generic autodidactic models for 3d medical image analysis [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1908.06912) [[Code]](https:\u002F\u002Fgithub.com\u002FMrGiovanni\u002FModelsGenesis)\n- MICLe: Big self-supervised models advance medical image classifications [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.05224) [[Code]](https:\u002F\u002Fgithub.com\u002Frjrobben\u002FMICLe_pytorch)\n- C2l: Comparing to Learn: Surpassing ImageNet Pretraining on Radiographs By Comparing Image Representations [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.07423) [[Code]](https:\u002F\u002Fgithub.com\u002Ffunnyzhou\u002FC2L_MICCAI2020)\n- ConVIRT: Contrastive learning of medical visual representations from paired images and text [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.11032.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FedreisMD\u002FConVIRT-pytorch)\n- Gloria: A multimodal global-local representation learning framework for labelefficient medical image recognition [[Paper]](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F9710099) [[Code]](https:\u002F\u002Fgithub.com\u002Fmarshuang80\u002Fgloria)\n- MoCo-CXR: MoCo Pretraining Improves Representation and Transferability of Chest X-ray Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.05352) [[Code]](https:\u002F\u002Fgithub.com\u002Fstanfordmlgroup\u002FMoCo-CXR)\n- Transunet: Transformers make strong encoders for medical image segmentation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.04306) [[Code]](https:\u002F\u002Fgithub.com\u002FBeckschen\u002FTransUNet)\n- Transfuse: Fusing transformers and cnns for medical image segmentation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.08005) [[Code]](https:\u002F\u002Fgithub.com\u002FRayicer\u002FTransFuse)\n- Medical transformer: Gated axial-attention for medical image segmentation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.10662) [[Code]](https:\u002F\u002Fgithub.com\u002Fjeya-maria-jose\u002FMedical-Transformer)\n- UNETR: Transformers for 3D Medical Image Segmentation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.10504) [[Code]](https:\u002F\u002Fgithub.com\u002FProject-MONAI\u002Fresearch-contributions\u002Ftree\u002Fmain\u002FUNETR\u002FBTCV)\n- Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.03024) [[Code]](https:\u002F\u002Fgithub.com\u002FYtongXie\u002FCoTr)\n- Swin-unet: Unet-like pure transformer for medical image segmentation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.05537) [[Code]](https:\u002F\u002Fgithub.com\u002FHuCaoFighting\u002FSwin-Unet)\n- SAM4Med: Generalist Vision Foundation Models for Medical Imaging: A Case Study of Segment Anything Model on Zero-Shot Medical Segmentation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.12637.pdf)\n\n### Medical Informatics\n\n- Med-PaLM 2: Towards Expert-Level Medical Question Answering with Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.09617.pdf)\n- DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.11032.pdf)\n- Capabilities of GPT-4 on Medical Challenge Problems [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.13375.pdf)\n- BioBERT: a pre-trained biomedical language representation model for biomedical text mining [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1901.08746.pdf)\n- Publicly Available Clinical BERT Embeddings [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.03323.pdf)\n- BioMegatron: Larger Biomedical Domain Language Model [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.06060.pdf)\n- Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks [[Paper]](https:\u002F\u002Faclanthology.org\u002F2020.acl-main.740.pdf)\n- Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-021-00455-y)\n- CPLLM: Clinical Prediction with Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11295) [[Code]](https:\u002F\u002Fgithub.com\u002Fnadavlab\u002FCPLLM)\n- BioELECTRA:Pretrained Biomedical text Encoder using Discriminators [[Paper]](https:\u002F\u002Faclanthology.org\u002F2021.bionlp-1.16.pdf)\n- LinkBERT: Pretraining Language Models with Document Links [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.15827.pdf)\n- BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.10341.pdf)\n- Large Language Models Encode Clinical Knowledge [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2212.13138.pdf)\n- A large language model for electronic health records [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-022-00742-2)\n- Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.15779.pdf)\n- BEHRT: Transformer for Electronic Health Records [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-020-62922-y)\n- Federated Learning of Medical Concepts Embedding using BEHRT [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13052) [[Code]](https:\u002F\u002Fgithub.com\u002Fnadavlab\u002FFederatedBEHRT)\n\n### Medical Education\n\n- GPT-4 Technical Report [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.08774.pdf)\n- Empowering Beginners in Bioinformatics with ChatGPT [[Paper]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.03.07.531414v1)\n\n### Public Health\n\n- Dietary Assessment with Multimodal ChatGPT: A Systematic Analysis [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.08592)\n- Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning [[Paper]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41551-022-00936-9)\n- Clustering Egocentric Images in Passive Dietary Monitoring with Self-Supervised Learning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2208.12160.pdf)\n- ClimaX: A foundation model for weather and climate [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2301.10343.pdf)\n\n### Medical Robotics\n\n- EndoFM: Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.16741) [[Code]](https:\u002F\u002Fgithub.com\u002Fmed-air\u002FEndo-FM)\n- Decision Transformer: Reinforcement Learning via Sequence Modeling [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.01345) [[Code]](https:\u002F\u002Fgithub.com\u002Fkzl\u002Fdecision-transformer)\n- R3M: A Universal Visual Representation for Robot Manipulation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.12601) [[Project]](https:\u002F\u002Fsites.google.com\u002Fview\u002Frobot-r3m\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fr3m)\n- MimicPlay: Long-Horizon Imitation Learning by Watching Human Play [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.12422) [[Project]](https:\u002F\u002Fmimic-play.github.io\u002F)\n- PaLM-E: An Embodied Multimodal Language Model [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.03378) [[Project]](https:\u002F\u002Fpalm-e.github.io\u002F) [[Blog]](https:\u002F\u002Fai.googleblog.com\u002F2023\u002F03\u002Fpalm-e-embodied-multimodal-language.html)\n- A Generalist Agent [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.06175) [[Blog]](https:\u002F\u002Fwww.deepmind.com\u002Fblog\u002Fa-generalist-agent)\n- CLIPort: What and Where Pathways for Robotic Manipulation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.12098) [[Project]](https:\u002F\u002Fcliport.github.io\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fcliport\u002Fcliport)\n- Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.05451) [[Project]](https:\u002F\u002Fperact.github.io\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fperact\u002Fperact)\n- Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.01691) [[Project]](https:\u002F\u002Fsay-can.github.io\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Ftree\u002Fmaster\u002Fsaycan)\n- VIMA: General Robot Manipulation with Multimodal Prompts [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.03094) [[Project]](https:\u002F\u002Fvimalabs.github.io\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fvimalabs\u002FVIMA)\n- RT-1: Robotics Transformer for Real-World Control at Scale [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.06817) [[Project]](https:\u002F\u002Frobotics-transformer.github.io\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Frobotics_transformer)\n- ChatGPT for Robotics: Design Principles and Model Abilities [[Paper]](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fuploads\u002Fprod\u002F2023\u002F02\u002FChatGPT___Robotics.pdf) [[Blog]](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fgroup\u002Fautonomous-systems-group-robotics\u002Farticles\u002Fchatgpt-for-robotics\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FPromptCraft-Robotics)\n\n\n## AI Legislation\n\n- AI Act (EU) [[Source]](https:\u002F\u002Fartificialintelligenceact.eu\u002F)\n- A pro-innovation approach to AI regulation (UK) [[Source]](https:\u002F\u002Fassets.publishing.service.gov.uk\u002Fgovernment\u002Fuploads\u002Fsystem\u002Fuploads\u002Fattachment_data\u002Ffile\u002F1146542\u002Fa_pro-innovation_approach_to_AI_regulation.pdf)\n- Blueprint for an AI Bill of Rights (USA) [[Source]](https:\u002F\u002Fwww.whitehouse.gov\u002Fostp\u002Fai-bill-of-rights\u002F)\n- AI Risk Management Framework (USA) [[Source]](https:\u002F\u002Fwww.nist.gov\u002Fitl\u002Fai-risk-management-framework)\n- Provisions on the Administration of Deep Synthesis Internet Information Services (China) [[Source]](https:\u002F\u002Fwww.chinalawtranslate.com\u002Fen\u002Fdeep-synthesis\u002F)\n- Interim Measures for the Management of Generative Artificial Intelligence Services (China) [[Source]](https:\u002F\u002Fwww.chinalawtranslate.com\u002Fen\u002Fgenerative-ai-interim\u002F)\n\n## Large-scale Datasets in Biomedical and Health Informatics\n### Open Source\n\n| Dataset                                                      | Description                                                  |\n| ------------------------------------------------------------ | ------------------------------------------------------------ |\n| [Big Fantastic Datasbase](https:\u002F\u002Fbfd.mmseqs.com\u002F)           | 2.1 B protein sequences, 393 B amino acids                   |\n| [Observed Antibody Space](https:\u002F\u002Fopig.stats.ox.ac.uk\u002Fwebapps\u002Foas\u002F) | 558 M antibody sequences                                     |\n| [RNAcentral](https:\u002F\u002Frnacentral.org\u002F)                        | 34 M ncRNA sequences, 22 M secondary structure               |\n| [ZINC20](https:\u002F\u002Fzinc20.docking.org\u002F)                        | 1.4B compounds from 310 catalogs from 150 companies          |\n| [MIMIC-CXR](https:\u002F\u002Fphysionet.org\u002Fcontent\u002Fmimic-cxr\u002F2.0.0\u002F)  | 65K patients, 337K chest X-ray images and 227K radiology reports |\n| [MedMNIST v2](https:\u002F\u002Fmedmnist.com\u002F)                         | 708K 2D medical images, 10K 3D medical images                |\n| [Medical Meadow](https:\u002F\u002Fgithub.com\u002Fkbressem\u002FmedAlpaca)      | 1.5M data points containing a wide range of medical language processing tasks |\n| [Endo-FM database](https:\u002F\u002Fgithub.com\u002Fmed-air\u002FEndo-FM)       | 33K endoscopic videos, up to 5M frames                       |\n| [SurgVLP database](https:\u002F\u002Fgithub.com\u002FCAMMA-public\u002FSurgVLP)       | 25K laparoscopic video-text pairs from 1k surgical lecture videos |\n| [ColonINST](https:\u002F\u002Fgithub.com\u002Fai4colonoscopy\u002FIntelliScope) | 450K multimodal instruction tuning pairs in colonoscopy |\n\n### Private or Upon Approval\n| Dataset                                                      | Description                                                  |\n| ------------------------------------------------------------ | ------------------------------------------------------------ |\n| [Mount Sinai ECG Data](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-023-00840-9)           | 2.1 M patients, containing 8.5 M discrete ECG recordings|\n| [Google DR Dev. Dataset](https:\u002F\u002Fjamanetwork.com\u002Fjournals\u002Fjama\u002Ffullarticle\u002F2588763) | 239 K unique individuals, 1.6 M fundus images |\n| [UF Health IDR Clinical Note Database](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-022-00742-2) | 290 M clinical notes, with up to 82 B medical words |\n| [Clinical Practice Research Datalink](https:\u002F\u002Facademic.oup.com\u002Fije\u002Farticle\u002F44\u002F3\u002F827\u002F632531) | 11.3 M patients covering data on demographics, symptoms, diagnoses, etc |\n\n\n","# 令人惊叹的医疗基础模型\n\n[![Awesome](https:\u002F\u002Fawesome.re\u002Fbadge.svg)](https:\u002F\u002Fawesome.re)\n\n精选的医疗领域中优秀的大型人工智能模型（LAMs），即基础模型列表。我们把当前的LAMs分为四大类：大型语言模型（LLMs）、大型视觉模型（LVMs）、大型音频模型以及大型多模态模型（LMMs）。这些LAMs的应用领域包括但不限于生物信息学、医学诊断、医学影像、医学信息学、医学教育、公共卫生和医疗机器人等。\n\n我们欢迎各位为本仓库贡献更多资源。如果您有意参与，请提交拉取请求！\n\n## 新闻\n\n我们很高兴地宣布，《IEEE生物医学与健康信息学杂志》将推出一期关于**生物医学与健康基础模型**的特刊。更多详情请参阅[征稿启事](https:\u002F\u002Fwww.embs.org\u002Fjbhi\u002Fwp-content\u002Fuploads\u002Fsites\u002F18\u002F2023\u002F06\u002FJBHI_Foundation_Models_Call-for-Papers.pdf)。\n\n感兴趣的议题包括但不限于：\n\n1. 生物医学与健康基础模型的新理论、新原理及新架构的基础研究\n2. 生物医学与健康基础模型的可解释性与透明度研究\n3. 生物医学与健康基础模型中的提示工程\n4. 生物医学与健康基础模型中的数据工程\n5. 大规模生物医学与健康数据集\n6. 生物医学与健康基础模型的多模态学习与对齐\n7. 生物医学与健康基础模型的高效计算\n8. 生物医学与健康基础模型的对抗鲁棒性\n9. 基础模型在生物医学与健康信息学中的应用\n10. 生物医学与健康基础模型的新评估范式\n11. 面向生物医学与健康基础模型的新型计算机系统\n12. 开发与部署生物医学与健康基础模型的去中心化方法\n13. 生物医学与健康领域的基础模型伦理、安全、隐私及监管问题\n\n如果您对此感兴趣或已在相关领域开展工作，请帮忙转发消息并积极投稿！\n\n:star2: 我们最新发表的关于医学领域LLM智能体的文章已刊登于2024年的《自然·机器智能》杂志。欢迎大家阅读，希望对您有所帮助！\n\n**[基于LLM的医学与医疗保健智能体系统](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs42256-024-00944-1)**\n\u003Cbr \u002F>\n邱佳宁，\n凯尔·拉姆，\n李国豪，\n阿米什·阿查里亚，\n田英·王，\n阿拉·达尔齐，\n袁武，以及\n埃里克·J·托波尔\n\u003Cbr \u002F>\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJianing-Qiu_Awesome-Healthcare-Foundation-Models_readme_b293dcd5e1ff.png)\n\n## 目录\n\n- [令人惊叹的医疗基础模型](#awesome-healthcare-foundation-models)\n  - [新闻](#news)\n  - [目录](#table-of-contents)\n  - [综述](#survey)\n  - [大型语言模型](#large-language-models)\n    - [医疗领域](#healthcare-domain)\n    - [通用领域](#general-domain)\n  - [大型视觉模型](#large-vision-models)\n    - [医疗领域](#healthcare-domain-1)\n    - [通用领域](#general-domain-1)\n  - [大型音频模型](#large-audio-models)\n    - [医疗领域](#healthcare-domain-2)\n    - [通用领域](#general-domain-2)\n  - [大型多模态模型](#large-multi-modal-models)\n    - [医疗领域](#healthcare-domain-3)\n    - [通用领域](#general-domain-3)\n  - [大型AI模型在医疗领域的应用](#applications-of-large-ai-models-in-healthcare)\n    - [生物信息学](#bioinformatics)\n    - [医学诊断](#medical-diagnosis)\n    - [医学影像](#medical-imaging)\n    - [医学信息学](#medical-informatics)\n    - [医学教育](#medical-education)\n    - [公共卫生](#public-health)\n    - [医疗机器人](#medical-robotics)\n  - [AI立法](#ai-legislation)\n  - [生物医学与健康信息学中的大规模数据集](#large-scale-datasets-in-biomedical-and-health-informatics)\n    - [开源](#open-source)\n    - [私有或需审批](#private-or-upon-approval)\n\n## 综述\n\n本仓库主要基于以下论文：\n\n**[健康信息学中的大型AI模型：\n应用、挑战与未来](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F10261199)**\n\u003Cbr \u002F>\n邱佳宁，\n李林，\n孙建凯，\n彭家川，\n史培伦，\n张瑞阳，\n董寅钊，\n凯尔·拉姆，\n罗鹏伟，\n肖博，\n袁武，\n王宁利，\n徐东，以及\n本尼·洛\n\u003Cbr \u002F>\n\n如果您觉得本仓库有所帮助，请考虑引用：\n\n```bibtex\n@article{qiu2023large,\n  title={Large ai models in health informatics: Applications, challenges, and the future},\n  author={Qiu, Jianing and Li, Lin and Sun, Jiankai and Peng, Jiachuan and Shi, Peilun and Zhang, Ruiyang and Dong, Yinzhao and Lam, Kyle and Lo, Frank P-W and Xiao, Bo and others},\n  journal={IEEE Journal of Biomedical and Health Informatics},\n  year={2023},\n  publisher={IEEE}\n}\n```\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJianing-Qiu_Awesome-Healthcare-Foundation-Models_readme_db04681ec841.png)\n\n## 大型语言模型\n\n### 医疗领域\n\n- ClinicalMamba：基于纵向临床记录的生成式临床语言模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.05795)\n- ChiMed-GPT：具有完整训练机制且更符合人类偏好的中文医学大语言模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.06025.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fsynlp\u002FChiMed-GPT)\n- Med-PaLM 2：利用大语言模型实现专家级医学问答 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.09617.pdf)\n- KeBioLM：通过知识增强生物医学预训练语言模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.10344)\n- BioELMo：探究语言模型中的生物医学嵌入 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.02181)\n- BioBART：生物医学生成式语言模型的预训练与评估 [[论文]](https:\u002F\u002Faclanthology.org\u002F2022.bionlp-1.9.pdf)\n- ClinicalT5：面向临床文本的生成式语言模型 [[论文]](https:\u002F\u002Faclanthology.org\u002F2022.findings-emnlp.398.pdf)\n- GatorTron：从非结构化电子健康记录中挖掘患者信息的大规模临床语言模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.03540v2.pdf)\n- ChatCAD：基于大语言模型的医学影像交互式辅助诊断 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.07257.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fzhaozh10\u002FChatCAD)\n- DeID-GPT：利用GPT-4实现零样本医学文本去标识化 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.11032.pdf)\n- GPT-4在医学挑战性问题上的能力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.13375.pdf)\n- BioBERT：用于生物医学文本挖掘的预训练生物医学语言表示模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1901.08746.pdf)\n- 公开可用的临床BERT嵌入 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.03323.pdf)\n- BioMegatron：更大的生物医学领域语言模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.06060.pdf)\n- 不要停止预训练：将语言模型适配到特定领域和任务 [[论文]](https:\u002F\u002Faclanthology.org\u002F2020.acl-main.740.pdf)\n- Med-BERT：基于大规模结构化电子健康记录的预训练上下文嵌入，用于疾病预测 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-021-00455-y)\n- CPLLM：利用大语言模型进行临床预测 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11295) [[代码]](https:\u002F\u002Fgithub.com\u002Fnadavlab\u002FCPLLM)\n- DoctorGLM：微调你的中文医生并非难事 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.01097) [[代码]](https:\u002F\u002Fgithub.com\u002Fxionghonglin\u002FDoctorGLM)\n- 华佗GPT：让语言模型像医生一样发挥作用 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.15075) [[代码]](https:\u002F\u002Fgithub.com\u002FFreedomIntelligence\u002FHuatuoGPT)\n- BioELECTRA：使用判别器预训练的生物医学文本编码器 [[论文]](https:\u002F\u002Faclanthology.org\u002F2021.bionlp-1.16.pdf)\n- LinkBERT：利用文档链接对语言模型进行预训练 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.15827.pdf)\n- BioGPT：用于生物医学文本生成与挖掘的生成式预训练Transformer [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.10341.pdf)\n- 大语言模型能够编码临床知识 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2212.13138.pdf)\n- 面向电子健康记录的大语言模型 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-022-00742-2)\n- 面向生物医学自然语言处理的领域特定语言模型预训练 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.15779.pdf)\n- BEHRT：用于电子健康记录的Transformer [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-020-62922-y)\n- 基于BEHRT的医疗概念嵌入联邦学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13052) [[代码]](https:\u002F\u002Fgithub.com\u002Fnadavlab\u002FFederatedBEHRT)\n- RadBERT：将基于Transformer的语言模型适配到放射学领域 [[论文]](https:\u002F\u002Fpubs.rsna.org\u002Fdoi\u002Fepdf\u002F10.1148\u002Fryai.210258) [[HuggingFace]](https:\u002F\u002Fhuggingface.co\u002FUCSD-VA-health\u002FRadBERT-RoBERTa-4m)\n- 使用AlphaFold实现高精度蛋白质结构预测 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-021-03819-2) [[代码]](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Falphafold)\n- 利用三轨神经网络精确预测蛋白质结构及相互作用 [[论文]](https:\u002F\u002Fwww.science.org\u002Fdoi\u002Ffull\u002F10.1126\u002Fscience.abj8754?casa_token=tleEHPOOSr8AAAAA%3AT0eToIMPW0oN1jjIGLs8aPyQK8qbcFIByjT1x4k90tvBAj03SZUzpEinCPe_t-g4ECmjJ9wlj8OwQBs)\n- 使用AlphaFold-Multimer预测蛋白质复合物 [[论文]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2021.10.04.463034v2.abstract)\n- FastFold：将AlphaFold训练时间从11天缩短至67小时 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.00854) [[代码]](https:\u002F\u002Fgithub.com\u002Fhpcaitech\u002Ffastfold)\n- HelixFold：基于PaddlePaddle的高效AlphaFold2实现 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.05477) [[代码]](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddleHelix)\n- Uni-Fold：一个开源平台，用于开发超越AlphaFold的蛋白质折叠模型 [[论文]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.08.04.502811v3.abstract) [[代码]](https:\u002F\u002Fgithub.com\u002Fdptech-corp\u002FUni-Fold)\n- OpenFold：重新训练AlphaFold2可揭示其学习机制与泛化能力 [[论文]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.11.20.517210v2.abstract) [[代码]](https:\u002F\u002Fgithub.com\u002Faqlaboratory\u002Fopenfold)\n- ManyFold：一个高效灵活的库，用于训练和验证蛋白质折叠模型 [[论文]](https:\u002F\u002Facademic.oup.com\u002Fbioinformatics\u002Farticle\u002F39\u002F1\u002Fbtac773\u002F6887136) [[代码]](https:\u002F\u002Fgithub.com\u002Finstadeepai\u002Fmanyfold)\n- ColabFold：让蛋白质折叠触手可及 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41592-022-01488-1) [[代码]](https:\u002F\u002Fgithub.com\u002Fsokrypton\u002FColabFold)\n- 通过将无监督学习扩展到2.5亿条蛋白质序列，可以揭示生物体的结构与功能 [[论文]](https:\u002F\u002Fwww.pnas.org\u002Fdoi\u002Fabs\u002F10.1073\u002Fpnas.2016239118) [[代码]](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fesm)\n- ProGen：用于蛋白质生成的语言建模 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.03497) [[代码]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fprogen)\n- ProtTrans：借助自监督深度学习和高性能计算破解生命密码的语言 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.06225) [[代码]](https:\u002F\u002Fgithub.com\u002Fagemagician\u002FProtTrans)\n- 利用语言模型实现原子尺度的进化级蛋白质结构预测 [[论文]](https:\u002F\u002Fwww.science.org\u002Fdoi\u002Ffull\u002F10.1126\u002Fscience.ade2574)\n- 从一级序列出发进行高分辨率从头结构预测 [[论文]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.07.21.500999v1.abstract) [[代码]](https:\u002F\u002Fgithub.com\u002FHeliXonProtein\u002FOmegaFold)\n- 利用语言模型和深度学习进行单序列蛋白质结构预测 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41587-022-01432-w)\n- 利用蛋白质语言模型改进蛋白质复合物预测 [[论文]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.09.15.508065v2.abstract)\n- MSA Transformer [[论文]](http:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Frao21a.html) [[代码]](https:\u002F\u002Fgithub.com\u002FThe-AI-Summer\u002Fself-attention-cv)\n- 利用语言模型和弱监督学习解析抗体亲和力成熟过程 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.07782)\n- xTrimoABFold：无需MSA即可从头预测抗体结构 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.00735)\n- scBERT作为大规模预训练深度语言模型，可用于单细胞RNA测序数据的细胞类型注释 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.00735) [[代码]](https:\u002F\u002Fgithub.com\u002FTencentAILabHealthcare\u002FscBERT)\n- 从未标注数据中构建可解释的RNA基础模型，以实现高度准确的RNA结构与功能预测 [[论文]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.08.06.503062v2.abstract) [[代码]](https:\u002F\u002Fgithub.com\u002Fml4bio\u002Frna-fm)\n- E2Efold-3D：端到端深度学习方法，用于精确的RNA三维结构从头预测 [[论文]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.08.06.503062v2.abstract) [[代码]](https:\u002F\u002Fgithub.com\u002Fml4bio\u002Frna-fm)\n- HyenaDNA：以单核苷酸分辨率进行长距离基因组序列建模 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.15794) [[代码]](https:\u002F\u002Fgithub.com\u002FHazyResearch\u002Fhyena-dna)\n\n### 通用领域\n\n- ChatGPT：优化用于对话的语言模型 [[博客]](https:\u002F\u002Fopenai.com\u002Fblog\u002Fchatgpt\u002F)\n- LLaMA：开放且高效的基座语言模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.13971.pdf)\n- 指令微调语言模型的扩展性 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.11416.pdf)\n- PaLM：通过Pathways扩展语言建模 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.02311.pdf)\n- 训练计算最优的大规模语言模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.15556.pdf)\n- 使用DeepSpeed和Megatron训练Megatron-Turing NLG 530B，一个大规模生成式语言模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2201.11990.pdf)\n- BLOOM：一个拥有1760亿参数的开源多语言语言模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2211.05100.pdf)\n- LaMDA：面向对话应用的语言模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2201.08239.pdf)\n- OPT：开放的预训练Transformer语言模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2205.01068.pdf)\n- 通过人类反馈训练语言模型以遵循指令 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.02155.pdf)\n- 扩展语言模型：方法、分析及训练Gopher的见解 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2112.11446.pdf)\n- 多任务提示训练实现零样本任务泛化 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.08207.pdf)\n- 语言模型是少样本学习者 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.14165.pdf)\n- 使用统一的文本到文本Transformer探索迁移学习的极限 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.10683.pdf)\n- RoBERTa：一种鲁棒优化的BERT预训练方法 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1907.11692.pdf)\n- 语言模型是无监督的多任务学习者 [[论文]](https:\u002F\u002Fd4mucfpksywv.cloudfront.net\u002Fbetter-language-models\u002Flanguage_models_are_unsupervised_multitask_learners.pdf)\n- 通过检索数万亿个标记来改进语言模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2112.04426.pdf)\n- WebGPT：基于浏览器的人工辅助问答与人类反馈 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2112.09332.pdf)\n- 通过有针对性的 human judgements 改善对话代理的一致性 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.14375.pdf)\n- 通过生成式预训练提升语言理解能力 [[论文]](https:\u002F\u002Fs3-us-west-2.amazonaws.com\u002Fopenai-assets\u002Fresearch-covers\u002Flanguage-unsupervised\u002Flanguage_understanding_paper.pdf)\n- BERT：用于语言理解的深度双向Transformer预训练 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.04805.pdf)\n\n## 大型视觉模型\n\n### 医疗健康领域\n\n- vesselFM：用于通用三维血管分割的基础模型 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2025\u002Fhtml\u002FWittmann_vesselFM_A_Foundation_Model_for_Universal_3D_Blood_Vessel_Segmentation_CVPR_2025_paper.html) [[代码]](https:\u002F\u002Fgithub.com\u002Fbwittmann\u002FvesselFM)\n- VisionFM：开发与验证用于眼科通用人工智能的多模态多任务视觉基础模型 [[论文]](https:\u002F\u002Fai.nejm.org\u002Fdoi\u002Ffull\u002F10.1056\u002FAIoa2300221) [[代码]](https:\u002F\u002Fgithub.com\u002FABILab-CUHK\u002FVisionFM)\n- RETFound：用于从视网膜图像中进行可泛化的疾病检测的基础模型 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-023-06555-x)\n- EndoFM：通过大规模自监督预训练实现内窥镜视频分析的基础模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.16741) [[代码]](https:\u002F\u002Fgithub.com\u002Fmed-air\u002FEndo-FM)\n- STU-Net：由大规模监督预训练赋能的可扩展且可迁移的医学图像分割模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.06716)\n- LVM-Med：通过二阶图匹配学习用于医学成像的大规模自监督视觉模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.11925) [[代码]](https:\u002F\u002Fgithub.com\u002Fduyhominhnguyen\u002FLVM-Med)\n- Med3d：用于3D医学图像分析的迁移学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.00625) [[代码]](https:\u002F\u002Fgithub.com\u002FTencent\u002FMedicalNet)\n- Models genesis：用于3D医学图像分析的通用自学模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1908.06912) [[代码]](https:\u002F\u002Fgithub.com\u002FMrGiovanni\u002FModelsGenesis)\n- MICLe：大型自监督模型推动医学图像分类的发展 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.05224) [[代码]](https:\u002F\u002Fgithub.com\u002Frjrobben\u002FMICLe_pytorch)\n- C2l：比较学习：通过比较图像表示超越ImageNet在X光片上的预训练 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.07423) [[代码]](https:\u002F\u002Fgithub.com\u002Ffunnyzhou\u002FC2L_MICCAI2020)\n- MoCo-CXR：MoCo预训练提升了胸部X光模型的表征能力和可迁移性 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.05352) [[代码]](https:\u002F\u002Fgithub.com\u002Fstanfordmlgroup\u002FMoCo-CXR)\n- Transunet：Transformer为医学图像分割提供了强大的编码器 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.04306) [[代码]](https:\u002F\u002Fgithub.com\u002FBeckschen\u002FTransUNet)\n- Transfuse：将Transformer和CNN融合用于医学图像分割 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.08005) [[代码]](https:\u002F\u002Fgithub.com\u002FRayicer\u002FTransFuse)\n- Medical Transformer：用于医学图像分割的门控轴向注意力 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.10662) [[代码]](https:\u002F\u002Fgithub.com\u002Fjeya-maria-jose\u002FMedical-Transformer)\n- UNETR：用于3D医学图像分割的Transformer [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.10504) [[代码]](https:\u002F\u002Fgithub.com\u002FProject-MONAI\u002Fresearch-contributions\u002Ftree\u002Fmain\u002FUNETR\u002FBTCV)\n- Cotr：高效地连接CNN和Transformer用于3D医学图像分割 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.03024) [[代码]](https:\u002F\u002Fgithub.com\u002FYtongXie\u002FCoTr)\n- Swin-unet：类似Unet的纯Transformer用于医学图像分割 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.05537) [[代码]](https:\u002F\u002Fgithub.com\u002FHuCaoFighting\u002FSwin-Unet)\n- SAM4Med：用于医学成像的通用视觉基础模型：以Zero-Shot医学分割为例探讨Segment Anything Model的应用 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.12637.pdf)\n- 通过观看数百场手术视频讲座学习多模态表征[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.15220) [[代码]](https:\u002F\u002Fgithub.com\u002FCAMMA-public\u002FSurgVLP)\n\n### 通用领域\n\n**卷积神经网络（CNNs）**：\n\n- GPipe：利用流水线并行高效训练巨型神经网络 [[论文]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2019\u002Fhash\u002F093f65e080a295f8076b1c5722a46aa2-Abstract.html)\n- Big Transfer (BiT)：通用视觉表征学习 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2020\u002Fpapers_ECCV\u002Fpapers\u002F123500477.pdf)\n- 设计网络设计空间 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_CVPR_2020\u002Fhtml\u002FRadosavovic_Designing_Network_Design_Spaces_CVPR_2020_paper.html)\n- 野外视觉特征的自监督预训练 [[论文]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2103.01988)\n- EfficientNetV2：更小的模型与更快的训练 [[论文]](https:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Ftan21a.html)\n- 面向2020年代的卷积神经网络 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FLiu_A_ConvNet_for_the_2020s_CVPR_2022_paper.pdf)\n- InternImage：探索基于可变形卷积的大规模视觉基础模型 [[论文]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2211.05778)\n\n**视觉Transformer**：\n\n- 基于像素的生成式预训练 [[论文]](https:\u002F\u002Fproceedings.mlr.press\u002Fv119\u002Fchen20s.html)\n- 一张图像胜过16×16个词：大规模图像识别中的Transformer [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=YicbFdNTTy&utm_campaign=f86497ed3a-EMAIL_CAMPAIGN_2019_04_24_03_18_COPY_01&utm_medium=email&utm_source=Deep%20Learning%20Weekly&utm_term=0_384567b42d-f86497ed3a-72965345)\n- Transformer in Transformer [[论文]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2021\u002Fhash\u002F854d9fca60b4bd07f9bb215d59ef5561-Abstract.html)\n- Swin Transformer：使用移位窗口的层次化视觉Transformer [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2021\u002Fhtml\u002FLiu_Swin_Transformer_Hierarchical_Vision_Transformer_Using_Shifted_Windows_ICCV_2021_paper.html)\n- 数据高效型图像Transformer及其通过注意力进行的蒸馏训练 [[论文]](https:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Ftouvron21a.html)\n- 自监督模型是视觉Transformer的良好助教 [[论文]](https:\u002F\u002Fproceedings.mlr.press\u002Fv162\u002Fwu22c.html)\n- 利用稀疏专家混合扩展视觉能力 [[论文]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2021\u002Fhash\u002F48237d9f2dea8c74c2a72126cf63d933-Abstract.html)\n- 深入研究图像Transformer [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2021\u002Fhtml\u002FTouvron_Going_Deeper_With_Image_Transformers_ICCV_2021_paper.html)\n- 掩码自编码器是可扩展的视觉学习者 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fhtml\u002FHe_Masked_Autoencoders_Are_Scalable_Vision_Learners_CVPR_2022_paper.html)\n- Swin Transformer V2：提升容量与分辨率 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fhtml\u002FLiu_Swin_Transformer_V2_Scaling_Up_Capacity_and_Resolution_CVPR_2022_paper.html)\n- 扩展视觉Transformer [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fhtml\u002FZhai_Scaling_Vision_Transformers_CVPR_2022_paper.html)\n- 用于表征学习的高效自监督视觉Transformer [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=fVu3o-YUGQK)\n- 将视觉Transformer扩展至220亿参数 [[论文]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2302.05442)\n- H-optimus-0：11亿参数的视觉Transformer，基于超过50万张H&E染色全切片组织学图像的专有数据集训练而成 [[代码]](https:\u002F\u002Fgithub.com\u002Fbioptimus\u002Freleases\u002Ftree\u002Fmain\u002Fmodels\u002Fh-optimus\u002Fv0)[[HuggingFace]](https:\u002F\u002Fhuggingface.co\u002Fbioptimus\u002FH-optimus-0)\n\n**CNNs + ViTs**：\n\n- CoAtNet：将卷积与注意力结合以适应所有数据规模 [[论文]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2021\u002Fhash\u002F20568692db622456cc42a2e853ca21f8-Abstract.html)\n- LeViT：以卷积神经网络的形式实现视觉Transformer，加速推理过程 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2021\u002Fhtml\u002FGraham_LeViT_A_Vision_Transformer_in_ConvNets_Clothing_for_Faster_Inference_ICCV_2021_paper.html)\n- ConViT：通过软卷积归纳偏置改进视觉Transformer [[论文]](https:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Fd-ascoli21a.html)\n\n## 大型音频模型\n\n### 医疗健康领域\n\n### 通用领域\n\n- wav2vec：语音识别的无监督预训练 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.05862) [[博客]](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fwav2vec-20-learning-the-structure-of-speech-from-raw-audio\u002F)\n- W2v-BERT：结合对比学习与掩码语言建模的自监督语音预训练 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2108.06209)\n- AudioLM：一种基于语言模型的音频生成方法 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.03143) [[项目]](https:\u002F\u002Fgoogle-research.github.io\u002Fseanet\u002Faudiolm\u002Fexamples\u002F) [[博客]](https:\u002F\u002Fai.googleblog.com\u002F2022\u002F10\u002Faudiolm-language-modeling-approach-to.html)\n- HuBERT：通过掩码预测隐藏单元进行自监督语音表征学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.07447) [[HuggingFace]](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fhubert)\n- XLS-R：大规模跨语言自监督语音表征学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2111.09296) [[博客]](https:\u002F\u002Fai.facebook.com\u002Fblog\u002Fxls-r-self-supervised-speech-processing-for-128-languages\u002F) [[HuggingFace]](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fwav2vec2-xls-r-300m)\n- MusicLM：从文本生成音乐 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.11325) [[项目]](https:\u002F\u002Fgoogle-research.github.io\u002Fseanet\u002Fmusiclm\u002Fexamples\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fmusiclm-pytorch)\n- Diffsound：用于文本到声音生成的离散扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.09983) [[项目]](http:\u002F\u002Fdongchaoyang.top\u002Ftext-to-sound-synthesis-demo\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fyangdongchao\u002FText-to-sound-Synthesis)\n- AudioGen：文本引导的音频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.15352) [[项目]](https:\u002F\u002Ffelixkreuk.github.io\u002Faudiogen\u002F)\n- Whisper：通过大规模弱监督实现鲁棒语音识别 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.04356) [[代码]](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper) [[HuggingFace]](https:\u002F\u002Fhuggingface.co\u002Fopenai\u002Fwhisper-tiny.en)\n- Google USM：将自动语音识别扩展至100多种语言 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.01037) [[博客]](https:\u002F\u002Fai.googleblog.com\u002F2023\u002F03\u002Funiversal-speech-model-usm-state-of-art.html)\n\n## 大型多模态模型\n\n### 医疗健康领域\n\n- 多模态大语言模型在医学中的应用 [[论文]](https:\u002F\u002Fwww.thelancet.com\u002Fjournals\u002Flanwpc\u002Farticle\u002FPIIS2666-6065(24)00042-7\u002Ffulltext)\n- 基础模型：外科人工智能的未来？[[论文]](https:\u002F\u002Facademic.oup.com\u002Fbjs\u002Farticle\u002F111\u002F4\u002Fznae090\u002F7656455)\n- 用于放射科报告生成的大语言模型自举法 [[论文]](https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F29826)[[代码]](https:\u002F\u002Fgithub.com\u002Fsynlp\u002FR2-LLM)\n- 使用多模态ChatGPT进行膳食评估：系统性分析 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.08592)\n- PLIP：利用医学Twitter数据构建用于病理图像分析的视觉—语言基础模型 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41591-023-02504-3)\n- LLaVA-Med：一天内训练一个面向生物医学的大语言—视觉助手 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.00890.pdf)\n- GPT-4技术报告 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.08774.pdf)\n- 通过自监督学习实现对未标注胸部X光片中病变的专家级检测 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41551-022-00936-9)\n- 基于配对图像与文本的医学视觉表征对比学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.00747.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FedreisMD\u002FConVIRT-pytorch)\n- Gloria：一种用于标签高效医学图像识别的多模态全局—局部表征学习框架 [[论文]](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F9710099) [[代码]](https:\u002F\u002Fgithub.com\u002Fmarshuang80\u002Fgloria)\n- RAMM：基于多模态预训练的检索增强型生物医学视觉问答 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.00534)\n- PubMedCLIP：CLIP在医学领域的视觉问答任务中能带来多大收益？[[论文]](https:\u002F\u002Faclanthology.org\u002F2023.findings-eacl.88\u002F)\n- SurgVLP：通过观看数百个手术视频讲座学习多模态表征[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.15220) [[代码]](https:\u002F\u002Fgithub.com\u002FCAMMA-public\u002FSurgVLP)\n- 智能结肠镜检查的前沿进展 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.17241) [[代码]](https:\u002F\u002Fgithub.com\u002Fai4colonoscopy\u002FIntelliScope)\n\n### 通用领域\n\n**多模态聊天机器人**\n- LMM时代的曙光：使用GPT-4V（vision）的初步探索 [[论文]](https:\u002F\u002Fbrowse.arxiv.org\u002Fpdf\u002F2309.17421.pdf)\n\n**表征学习**：\n\n- 从自然语言监督中学习可迁移的视觉模型 [[论文]](https:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Fradford21a.html)\n- 利用噪声文本监督扩展视觉及视觉—语言表征学习 [[论文]](https:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Fjia21b.html)\n- Florence：一种新的计算机视觉基础模型 [[论文]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2111.11432)\n- 基于实体的语言—图像预训练 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fhtml\u002FLi_Grounded_Language-Image_Pre-Training_CVPR_2022_paper.html)\n- WenLan：通过大规模多模态预训练连接视觉与语言 [[论文]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2103.06561)\n- FLAVA：一种基础性的语言与视觉对齐模型 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fhtml\u002FSingh_FLAVA_A_Foundational_Language_and_Vision_Alignment_Model_CVPR_2022_paper.html)\n- SimVLM：使用弱监督进行简单的视觉语言模型预训练 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=GUrhfTuf_3)\n- FILIP：细粒度交互式语言—图像预训练 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=cpDhcsEDC2)\n- 面向开放词汇图像分类的联合缩放 [[论文]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2111.10050)\n- BLIP：通过冻结图像编码器和大型语言模型进行语言—图像预训练的自举法 [[论文]](https:\u002F\u002Fproceedings.mlr.press\u002Fv162\u002Fli22n.html)\n- PaLI：一种联合缩放的多语言语言—图像模型 [[论文]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2209.06794)\n- 通过最大化多模态互信息迈向一体化预训练 [[论文]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2211.09807)\n- BLIP-2：利用冻结图像编码器和大型语言模型进行语言—图像预训练的自举法 [[论文]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2301.12597)\n- 监督无处不在：一种数据高效的对比语言—图像预训练范式 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=zq1iJkNk3uN)\n- 语言并非全部所需：将感知与语言模型对齐 [[论文]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2302.14045)\n- PaLM-E：一种具身化的多模态语言模型 [[论文]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2303.03378)\n- Visual ChatGPT：与视觉基础模型对话、绘图和编辑 [[论文]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2303.04671)\n\n**文本到图像生成**：\n\n- 零样本文本到图像生成 [[论文]](https:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Framesh21a.html)\n- 基于潜在扩散模型的高分辨率图像合成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fhtml\u002FRombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.html)\n- 基于CLIP潜在空间的分层文本条件图像生成 [[论文]](http:\u002F\u002Farxiv.org\u002Fabs\u002F2204.06125)\n- GLIDE：迈向由文本引导的扩散模型实现的真实感图像生成与编辑 [[论文]](https:\u002F\u002Fproceedings.mlr.press\u002Fv162\u002Fnichol22a.html)\n- 具有深度语言理解的真实感文本到图像扩散模型 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=08Yk-n5l2Al)\n- 扩展自回归模型以实现内容丰富的文本到图像生成 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=AFDcYJKhND)\n\n## 大型AI模型在医疗健康领域的应用\n\n请注意，以下部分模型最初并非针对医疗健康应用设计，但可能具有迁移到医疗健康领域的潜力，或为未来的发展提供灵感。\n\n### 生物信息学\n\n- GeneGPT：通过领域工具增强大型语言模型，以更好地获取生物医学信息 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.09667)\n- 使用AlphaFold实现高精度蛋白质结构预测 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-021-03819-2) [[代码]](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Falphafold)\n- 利用三路神经网络准确预测蛋白质结构及其相互作用 [[论文]](https:\u002F\u002Fwww.science.org\u002Fdoi\u002Ffull\u002F10.1126\u002Fscience.abj8754?casa_token=tleEHPOOSr8AAAAA%3AT0eToIMPW0oN1jjIGLs8aPyQK8qbcFIByjT1x4k90tvBAj03SZUzpEinCPe_t-g4ECmjJ9wlj8OwQBs)\n- 使用AlphaFold-Multimer进行蛋白质复合物预测 [[论文]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2021.10.04.463034v2.abstract)\n- FastFold：将AlphaFold训练时间从11天缩短至67小时 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.00854) [[代码]](https:\u002F\u002Fgithub.com\u002Fhpcaitech\u002Ffastfold)\n- HelixFold：基于PaddlePaddle的高效AlphaFold2实现 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.05477) [[代码]](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddleHelix)\n- Uni-Fold：一个开源平台，用于开发超越AlphaFold的蛋白质折叠模型 [[论文]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.08.04.502811v3.abstract) [[代码]](https:\u002F\u002Fgithub.com\u002Fdptech-corp\u002FUni-Fold)\n- OpenFold：对AlphaFold2进行再训练，揭示其学习机制和泛化能力 [[论文]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.11.20.517210v2.abstract) [[代码]](https:\u002F\u002Fgithub.com\u002Faqlaboratory\u002Fopenfold)\n- ManyFold：一个高效灵活的库，用于训练和验证蛋白质折叠模型 [[论文]](https:\u002F\u002Facademic.oup.com\u002Fbioinformatics\u002Farticle\u002F39\u002F1\u002Fbtac773\u002F6887136) [[代码]](https:\u002F\u002Fgithub.com\u002Finstadeepai\u002Fmanyfold)\n- ColabFold：让蛋白质折叠技术惠及所有人 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41592-022-01488-1) [[代码]](https:\u002F\u002Fgithub.com\u002Fsokrypton\u002FColabFold)\n- 通过将无监督学习扩展到2.5亿条蛋白质序列，揭示了生物结构与功能的涌现 [[论文]](https:\u002F\u002Fwww.pnas.org\u002Fdoi\u002Fabs\u002F10.1073\u002Fpnas.2016239118) [[代码]](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fesm)\n- ProGen：用于蛋白质生成的语言模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.03497) [[代码]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fprogen)\n- ProtTrans：借助自监督深度学习和高性能计算，破解生命密码的语言 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.06225) [[代码]](https:\u002F\u002Fgithub.com\u002Fagemagician\u002FProtTrans)\n- 利用语言模型实现进化尺度上的原子级蛋白质结构预测 [[论文]](https:\u002F\u002Fwww.science.org\u002Fdoi\u002Ffull\u002F10.1126\u002Fscience.ade2574)\n- 从一级序列出发的高分辨率从头蛋白质结构预测 [[论文]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.07.21.500999v1.abstract) [[代码]](https:\u002F\u002Fgithub.com\u002FHeliXonProtein\u002FOmegaFold)\n- 基于语言模型和深度学习的单序列蛋白质结构预测 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41587-022-01432-w)\n- 利用蛋白质语言模型改进蛋白质复合物预测 [[论文]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.09.15.508065v2.abstract)\n- MSA Transformer [[论文]](http:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Frao21a.html) [[代码]](https:\u002F\u002Fgithub.com\u002FThe-AI-Summer\u002Fself-attention-cv)\n- 利用语言模型和弱监督学习解析抗体亲和力成熟过程 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.07782)\n- xTrimoABFold：无需多序列比对即可从头预测抗体结构 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.00735)\n- scBERT作为大规模预训练深度语言模型，用于单细胞RNA测序数据的细胞类型注释 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.00735) [[代码]](https:\u002F\u002Fgithub.com\u002FTencentAILabHealthcare\u002FscBERT)\n- 基于未标注数据的可解释RNA基础模型，用于高度精确的RNA结构和功能预测 [[论文]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.08.06.503062v2.abstract) [[代码]](https:\u002F\u002Fgithub.com\u002Fml4bio\u002Frna-fm)\n- E2Efold-3D：端到端深度学习方法，用于精确的RNA三维结构从头预测 [[论文]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.08.06.503062v2.abstract) [[代码]](https:\u002F\u002Fgithub.com\u002Fml4bio\u002Frna-fm)\n- SMILES-BERT：大规模无监督预训练，用于分子性质预测 [[论文]](https:\u002F\u002Fpar.nsf.gov\u002Fservlets\u002Fpurl\u002F10168888) [[代码]](https:\u002F\u002Fgithub.com\u002Futa-smile\u002FSMILES-BERT)\n- SMILES Transformer：预训练的分子指纹，用于低数据量药物发现 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.04738) [[代码]](https:\u002F\u002Fgithub.com\u002FDSPsleeporg\u002Fsmiles-transformer)\n- MolBert：利用语言模型和领域相关辅助任务进行分子表征学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2011.13230) [[代码]](https:\u002F\u002Fgithub.com\u002FBenevolentAI\u002FMolBERT)\n- AGBT：基于代数图论的双向变压器，用于分子性质预测 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41467-021-23720-w) [[代码]](https:\u002F\u002Fgithub.com\u002FChenDdon\u002FAGBTcode)\n- GROVER：在大规模分子数据上应用自监督图变换器 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.02835) [[代码]](https:\u002F\u002Fgithub.com\u002Ftencent-ailab\u002Fgrover)\n- Molgpt：使用变换器解码器模型进行分子生成 [[论文]](https:\u002F\u002Fpubs.acs.org\u002Fdoi\u002F10.1021\u002Facs.jcim.1c00600) [[代码]](https:\u002F\u002Fgithub.com\u002Fdevalab\u002Fmolgpt)\n- 用于搜索可合成分子的模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.05221) [[代码]](https:\u002F\u002Fgithub.com\u002Fjohn-bradshaw\u002Fmolecule-chef)\n- 将蛋白质特异性从头药物生成视为机器翻译问题的变换器神经网络 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-020-79682-4)\n- Deepconv-dti：通过基于蛋白质序列的卷积深度学习预测药物-靶点相互作用 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1811.02114) [[代码]](https:\u002F\u002Fgithub.com\u002FGIST-CSBL\u002FDeepConv-DTI)\n- Graphdta：利用图神经网络预测药物-靶点结合亲和力 [[论文]](https:\u002F\u002Fpubmed.ncbi.nlm.nih.gov\u002F33119053\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fthinng\u002FGraphDTA)\n- Moltrans：用于预测药物-靶点相互作用的分子交互变换器 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.11424) [[代码]](https:\u002F\u002Fgithub.com\u002Fkexinhuang12345\u002Fmoltrans)\n- 从数亿个分子中提取预测性表征 [[论文]](https:\u002F\u002Fpubs.acs.org\u002Fdoi\u002F10.1021\u002Facs.jpclett.1c03058) [[代码]](https:\u002F\u002Fgithub.com\u002FWeilabMSU\u002FPretrainModels)\n- ADMETlab 2.0：一个集成的在线平台，用于准确且全面地预测ADMET属性 [[项目]](https:\u002F\u002Fadmetmesh.scbdd.com\u002F) [[论文]](https:\u002F\u002Fpubmed.ncbi.nlm.nih.gov\u002F33893803\u002F)\n- MPG：从大规模未标记分子中学习分子表征，以支持药物发现 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2012.11175)\n- MG-BERT：利用无监督原子表征学习进行分子性质预测 [[论文]](https:\u002F\u002Facademic.oup.com\u002Fbib\u002Farticle-abstract\u002F22\u002F6\u002Fbbab152\u002F6265201?redirectedFrom=fulltext) [[代码]](https:\u002F\u002Fgithub.com\u002FParishadBehnam\u002FMG-BERT)\n- PanGu药物模型：像人类一样学习分子 [[项目]](http:\u002F\u002Fwww.pangu-drug.com\u002F) [[论文]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.03.31.485886v1.full)\n- DrugBAN：具有领域适应性的可解释双线性注意力网络，可提升药物-靶点预测性能 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs42256-022-00605-1) [[代码]](https:\u002F\u002Fgithub.com\u002Fpeizhenbai\u002FDrugBAN)\n- DrugOOD：面向人工智能辅助药物发现的分布外（OOD）数据集策展者和基准测试 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.09637) [[代码]](https:\u002F\u002Fgithub.com\u002Ftencent-ailab\u002FDrugOOD)\n\n### 医学诊断\n\n- VisionFM：用于通用眼科人工智能的多模态多任务视觉基础模型的开发与验证 [[论文]](https:\u002F\u002Fai.nejm.org\u002Fdoi\u002Ffull\u002F10.1056\u002FAIoa2300221) [[代码]](https:\u002F\u002Fgithub.com\u002FABILab-CUHK\u002FVisionFM)\n- RETFound：一种用于从视网膜图像中进行可泛化疾病检测的基础模型 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-023-06555-x)\n- LLaVA-Med：在一天内训练一个用于生物医学的大规模语言-视觉助手 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.00890.pdf)\n- 通过自监督学习，以专家级水平从未标注的胸部X光片中检测病理 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41551-022-00936-9)\n- ChatCAD：利用大型语言模型对医学影像进行交互式计算机辅助诊断 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.07257.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fzhaozh10\u002FChatCAD)\n- BEHRT：用于电子健康记录的Transformer模型 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-020-62922-y)\n- 基于BEHRT的联邦学习用于医学概念嵌入 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13052) [[代码]](https:\u002F\u002Fgithub.com\u002Fnadavlab\u002FFederatedBEHRT)\n- Med-BERT：在大规模结构化电子健康记录上预训练的上下文嵌入，用于疾病预测 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-021-00455-y)\n- CPLLM：基于大型语言模型的临床预测 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11295) [[代码]](https:\u002F\u002Fgithub.com\u002Fnadavlab\u002FCPLLM)\n- RadBERT：将基于Transformer的语言模型适配到放射学领域 [[论文]](https:\u002F\u002Fpubs.rsna.org\u002Fdoi\u002Fepdf\u002F10.1148\u002Fryai.210258) [[HuggingFace]](https:\u002F\u002Fhuggingface.co\u002FUCSD-VA-health\u002FRadBERT-RoBERTa-4m)\n- ChatCAD+：迈向使用LLM的通用且可靠的交互式CAD [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.15964) [[代码]](https:\u002F\u002Fgithub.com\u002Fzhaozh10\u002FChatCAD)\n\n### 医学影像\n\n- VisionFM：用于通用眼科人工智能的多模态多任务视觉基础模型的开发与验证 [[论文]](https:\u002F\u002Fai.nejm.org\u002Fdoi\u002Ffull\u002F10.1056\u002FAIoa2300221) [[代码]](https:\u002F\u002Fgithub.com\u002FABILab-CUHK\u002FVisionFM)\n- RETFound：一种用于从视网膜图像中进行可泛化疾病检测的基础模型 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-023-06555-x)\n- 通过自监督学习，以专家级水平从未标注的胸部X光片中检测病理 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41551-022-00936-9)\n- Med3d：用于3D医学影像分析的迁移学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.00625) [[代码]](https:\u002F\u002Fgithub.com\u002FTencent\u002FMedicalNet)\n- Models genesis：用于3D医学影像分析的通用自学模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1908.06912) [[代码]](https:\u002F\u002Fgithub.com\u002FMrGiovanni\u002FModelsGenesis)\n- MICLe：大型自监督模型推动医学影像分类的进步 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2101.05224) [[代码]](https:\u002F\u002Fgithub.com\u002Frjrobben\u002FMICLe_pytorch)\n- C2l：比较学习——通过比较图像表示超越ImageNet预训练的放射影像 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.07423) [[代码]](https:\u002F\u002Fgithub.com\u002Ffunnyzhou\u002FC2L_MICCAI2020)\n- ConVIRT：基于成对图像和文本的医学视觉表征对比学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.11032.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FedreisMD\u002FConVIRT-pytorch)\n- Gloria：一种用于标签高效医学图像识别的多模态全局-局部表征学习框架 [[论文]](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F9710099) [[代码]](https:\u002F\u002Fgithub.com\u002Fmarshuang80\u002Fgloria)\n- MoCo-CXR：MoCo预训练提升胸片模型的表征能力和迁移性 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.05352) [[代码]](https:\u002F\u002Fgithub.com\u002Fstanfordmlgroup\u002FMoCo-CXR)\n- Transunet：Transformer作为强大的编码器用于医学影像分割 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.04306) [[代码]](https:\u002F\u002Fgithub.com\u002FBeckschen\u002FTransUNet)\n- Transfuse：将Transformer与CNN融合用于医学影像分割 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.08005) [[代码]](https:\u002F\u002Fgithub.com\u002FRayicer\u002FTransFuse)\n- Medical Transformer：用于医学影像分割的门控轴向注意力 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.10662) [[代码]](https:\u002F\u002Fgithub.com\u002Fjeya-maria-jose\u002FMedical-Transformer)\n- UNETR：用于3D医学影像分割的Transformer模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.10504) [[代码]](https:\u002F\u002Fgithub.com\u002FProject-MONAI\u002Fresearch-contributions\u002Ftree\u002Fmain\u002FUNETR\u002FBTCV)\n- Cotr：高效连接CNN和Transformer用于3D医学影像分割 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.03024) [[代码]](https:\u002F\u002Fgithub.com\u002FYtongXie\u002FCoTr)\n- Swin-unet：类似Unet的纯Transformer用于医学影像分割 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.05537) [[代码]](https:\u002F\u002Fgithub.com\u002FHuCaoFighting\u002FSwin-Unet)\n- SAM4Med：用于医学影像的通用视觉基础模型：以Zero-Shot医学分割中的Segment Anything Model为例 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.12637.pdf)\n\n### 医学信息学\n\n- Med-PaLM 2：基于大型语言模型实现专家级医学问答 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.09617.pdf)\n- DeID-GPT：利用GPT-4实现零样本医学文本去标识化 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.11032.pdf)\n- GPT-4在医学挑战性问题上的能力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.13375.pdf)\n- BioBERT：用于生物医学文本挖掘的预训练生物医学语言表示模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1901.08746.pdf)\n- 公开可用的临床BERT嵌入 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.03323.pdf)\n- BioMegatron：更大的生物医学领域语言模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.06060.pdf)\n- 不要停止预训练：将语言模型适配到特定领域和任务 [[论文]](https:\u002F\u002Faclanthology.org\u002F2020.acl-main.740.pdf)\n- Med-BERT：基于大规模结构化电子健康记录的预训练上下文嵌入，用于疾病预测 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-021-00455-y)\n- CPLLM：利用大型语言模型进行临床预测 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11295) [[代码]](https:\u002F\u002Fgithub.com\u002Fnadavlab\u002FCPLLM)\n- BioELECTRA：使用判别器预训练的生物医学文本编码器 [[论文]](https:\u002F\u002Faclanthology.org\u002F2021.bionlp-1.16.pdf)\n- LinkBERT：利用文档链接进行语言模型预训练 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.15827.pdf)\n- BioGPT：用于生物医学文本生成与挖掘的生成式预训练Transformer [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.10341.pdf)\n- 大型语言模型能够编码临床知识 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2212.13138.pdf)\n- 面向电子健康记录的大规模语言模型 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-022-00742-2)\n- 面向生物医学自然语言处理的领域特定语言模型预训练 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.15779.pdf)\n- BEHRT：用于电子健康记录的Transformer模型 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-020-62922-y)\n- 基于BEHRT的医疗概念嵌入联邦学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13052) [[代码]](https:\u002F\u002Fgithub.com\u002Fnadavlab\u002FFederatedBEHRT)\n\n### 医学教育\n\n- GPT-4技术报告 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.08774.pdf)\n- 利用ChatGPT赋能生物信息学初学者 [[论文]](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.03.07.531414v1)\n\n### 公共卫生\n\n- 多模态ChatGPT在膳食评估中的应用：系统性分析 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.08592)\n- 通过自监督学习实现对未标注胸部X光片中病理的专家级检测 [[论文]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41551-022-00936-9)\n- 在被动膳食监测中利用自监督学习对第一人称视角图像进行聚类 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2208.12160.pdf)\n- ClimaX：用于天气和气候的基础模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2301.10343.pdf)\n\n### 医疗机器人\n\n- EndoFM：基于大规模自监督预训练的内窥镜视频分析基础模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.16741) [[代码]](https:\u002F\u002Fgithub.com\u002Fmed-air\u002FEndo-FM)\n- 决策 Transformer：通过序列建模实现强化学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.01345) [[代码]](https:\u002F\u002Fgithub.com\u002Fkzl\u002Fdecision-transformer)\n- R3M：用于机器人操作的通用视觉表征 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.12601) [[项目]](https:\u002F\u002Fsites.google.com\u002Fview\u002Frobot-r3m\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fr3m)\n- MimicPlay：通过观察人类操作实现长时程模仿学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.12422) [[项目]](https:\u002F\u002Fmimic-play.github.io\u002F)\n- PaLM-E：具身多模态语言模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.03378) [[项目]](https:\u002F\u002Fpalm-e.github.io\u002F) [[博客]](https:\u002F\u002Fai.googleblog.com\u002F2023\u002F03\u002Fpalm-e-embodied-multimodal-language.html)\n- 通用智能体 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.06175) [[博客]](https:\u002F\u002Fwww.deepmind.com\u002Fblog\u002Fa-generalist-agent)\n- CLIPort：面向机器人操作的“什么”和“哪里”路径 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.12098) [[项目]](https:\u002F\u002Fcliport.github.io\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fcliport\u002Fcliport)\n- Perceiver-Actor：用于机器人操作的多任务Transformer [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.05451) [[项目]](https:\u002F\u002Fperact.github.io\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fperact\u002Fperact)\n- “照我所能做，而非照我说的做”：将语言 grounding 到机器人的操作可能性上 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.01691) [[项目]](https:\u002F\u002Fsay-can.github.io\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Ftree\u002Fmaster\u002Fsaycan)\n- VIMA：利用多模态提示实现通用机器人操作 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.03094) [[项目]](https:\u002F\u002Fvimalabs.github.io\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fvimalabs\u002FVIMA)\n- RT-1：用于大规模真实世界控制的机器人Transformer [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.06817) [[项目]](https:\u002F\u002Frobotics-transformer.github.io\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Frobotics_transformer)\n- ChatGPT在机器人领域的应用：设计原则与模型能力 [[论文]](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fuploads\u002Fprod\u002F2023\u002F02\u002FChatGPT___Robotics.pdf) [[博客]](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fgroup\u002Fautonomous-systems-group-robotics\u002Farticles\u002Fchatgpt-for-robotics\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FPromptCraft-Robotics)\n\n\n## 人工智能立法\n\n- 人工智能法案（欧盟） [[来源]](https:\u002F\u002Fartificialintelligenceact.eu\u002F)\n- 英国的人工智能监管创新友好型方法 [[来源]](https:\u002F\u002Fassets.publishing.service.gov.uk\u002Fgovernment\u002Fuploads\u002Fsystem\u002Fuploads\u002Fattachment_data\u002Ffile\u002F1146542\u002Fa_pro-innovation_approach_to_AI_regulation.pdf)\n- 美国人工智能权利法案蓝图 [[来源]](https:\u002F\u002Fwww.whitehouse.gov\u002Fostp\u002Fai-bill-of-rights\u002F)\n- 美国人工智能风险管理框架 [[来源]](https:\u002F\u002Fwww.nist.gov\u002Fitl\u002Fai-risk-management-framework)\n- 关于深度合成互联网信息服务管理的规定（中国） [[来源]](https:\u002F\u002Fwww.chinalawtranslate.com\u002Fen\u002Fdeep-synthesis\u002F)\n- 生成式人工智能服务管理暂行办法（中国） [[来源]](https:\u002F\u002Fwww.chinalawtranslate.com\u002Fen\u002Fgenerative-ai-interim\u002F)\n\n## 生物医学与健康信息学中的大规模数据集\n\n### 开源\n\n| 数据集                                                      | 描述                                                  |\n| ------------------------------------------------------------ | ------------------------------------------------------------ |\n| [Big Fantastic 数据库](https:\u002F\u002Fbfd.mmseqs.com\u002F)           | 21亿条蛋白质序列，共3930亿个氨基酸                   |\n| [已知抗体空间](https:\u002F\u002Fopig.stats.ox.ac.uk\u002Fwebapps\u002Foas\u002F) | 5580万条抗体序列                                     |\n| [RNAcentral](https:\u002F\u002Frnacentral.org\u002F)                        | 3400万条非编码RNA序列，2200万个二级结构               |\n| [ZINC20](https:\u002F\u002Fzinc20.docking.org\u002F)                        | 来自150家公司的310个化合物目录中的14亿种化合物          |\n| [MIMIC-CXR](https:\u002F\u002Fphysionet.org\u002Fcontent\u002Fmimic-cxr\u002F2.0.0\u002F)  | 6.5万名患者，33.7万张胸部X光片及22.7万份放射学报告     |\n| [MedMNIST v2](https:\u002F\u002Fmedmnist.com\u002F)                         | 70.8万张2D医学图像，1万张3D医学图像                    |\n| [Medical Meadow](https:\u002F\u002Fgithub.com\u002Fkbressem\u002FmedAlpaca)      | 包含150万个数据点，涵盖广泛的医学语言处理任务         |\n| [Endo-FM 数据库](https:\u002F\u002Fgithub.com\u002Fmed-air\u002FEndo-FM)       | 3.3万段内窥镜视频，每段最多可达500万帧                 |\n| [SurgVLP 数据库](https:\u002F\u002Fgithub.com\u002FCAMMA-public\u002FSurgVLP)       | 2.5万对腹腔镜视频-文本配对，来源于1000段手术讲座视频   |\n| [ColonINST](https:\u002F\u002Fgithub.com\u002Fai4colonoscopy\u002FIntelliScope) | 45万组结肠镜检查的多模态指令微调样本对                |\n\n### 私有或需审批\n| 数据集                                                      | 描述                                                  |\n| ------------------------------------------------------------ | ------------------------------------------------------------ |\n| [西奈山心电图数据](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-023-00840-9)           | 210万名患者，包含850万份独立的心电图记录|\n| [谷歌糖尿病视网膜病变开发数据集](https:\u002F\u002Fjamanetwork.com\u002Fjournals\u002Fjama\u002Ffullarticle\u002F2588763) | 23.9万名不同个体，160万张眼底照片 |\n| [UF Health IDR 临床笔记数据库](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41746-022-00742-2) | 2.9亿份临床笔记，包含多达820亿个医学术语 |\n| [临床实践研究数据链](https:\u002F\u002Facademic.oup.com\u002Fije\u002Farticle\u002F44\u002F3\u002F827\u002F632531) | 涵盖人口统计、症状、诊断等信息的1130万名患者         |","# Awesome-Healthcare-Foundation-Models 快速上手指南\n\n本项目并非单一的可安装软件包，而是一个精选的**医疗领域基础大模型（LAMs）资源列表**。它涵盖了大型语言模型（LLMs）、视觉模型（LVMs）、音频模型及多模态模型。以下指南将帮助您快速定位所需模型并运行典型的医疗大模型（以列表中热门的 `HuatuoGPT` 或 `ChiMed-GPT` 为例）。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux (推荐 Ubuntu 20.04+) 或 macOS。Windows 用户建议使用 WSL2。\n*   **硬件要求**:\n    *   **GPU**: 推荐 NVIDIA GPU，显存至少 16GB（运行 7B 参数模型），若要运行更大参数模型或进行微调，建议 24GB+ (如 RTX 3090\u002F4090) 或多卡环境。\n    *   **内存**: 系统内存建议 32GB 以上。\n*   **前置依赖**:\n    *   Python 3.8 - 3.10\n    *   CUDA Toolkit (版本需与 PyTorch 匹配，通常建议 11.7 或 12.1)\n    *   Git\n    *   Conda 或 Mamba (推荐用于环境管理)\n\n## 安装步骤\n\n由于本仓库包含多个独立项目，以下以列表中具有代表性的中文医疗大模型 **HuatuoGPT** 为例展示安装流程。其他模型请参考其各自仓库的 `README`。\n\n### 1. 克隆项目代码\n使用 Git 克隆选定的模型仓库（此处以 HuatuoGPT 为例）：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FFreedomIntelligence\u002FHuatuoGPT.git\ncd HuatuoGPT\n```\n\n> **提示**: 如果访问 GitHub 速度较慢，可尝试使用国内镜像源加速，或在 Gitee 上搜索是否有同步镜像。\n\n### 2. 创建虚拟环境\n推荐使用 Conda 创建隔离环境：\n\n```bash\nconda create -n huatuo python=3.9 -y\nconda activate huatuo\n```\n\n### 3. 安装依赖\n安装项目所需的 Python 库。建议配置国内 pip 镜像源以加速下载：\n\n```bash\npip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n若需手动安装核心深度学习框架（根据显卡驱动选择对应版本）：\n\n```bash\npip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118 -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 4. 获取模型权重\n大多数医疗大模型权重托管在 Hugging Face。国内用户建议使用 **ModelScope (魔搭社区)** 或设置镜像加速下载。\n\n**方式 A: 使用 Hugging Face (需网络环境支持)**\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\n\nmodel_name = \"FreedomIntelligence\u002FHuatuoGPT-7B\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForCausalLM.from_pretrained(model_name)\n```\n\n**方式 B: 使用 ModelScope (推荐国内用户)**\n许多开源模型已同步至 ModelScope，下载速度更快。\n```bash\npip install modelscope -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n然后在代码中使用 modelscope 的 snapshot_download 功能获取权重。\n\n## 基本使用\n\n以下是一个基于 `transformers` 库加载医疗大模型并进行简单问诊的最简示例。\n\n### 最简单的使用示例 (Inference)\n\n创建一个名为 `inference.py` 的文件，输入以下代码：\n\n```python\nimport torch\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\n\n# 1. 加载模型和分词器\n# 替换为您实际下载的模型路径或模型 ID\nmodel_path = \"FreedomIntelligence\u002FHuatuoGPT-7B\" \ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n\nprint(f\"Loading model to {device}...\")\ntokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_path, \n    device_map=\"auto\", \n    torch_dtype=torch.float16, \n    trust_remote_code=True\n)\nmodel.eval()\n\n# 2. 构建医疗问诊 Prompt\n# 不同模型可能需要特定的 Prompt 模板，请参考具体模型的 README\ninput_text = \"你好，我最近经常头痛，伴有恶心，可能是什么原因？\"\n\n# 3. 生成回复\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(device)\noutputs = model.generate(\n    **inputs,\n    max_new_tokens=512,\n    do_sample=True,\n    temperature=0.7,\n    top_p=0.9,\n    repetition_penalty=1.1\n)\n\nresponse = tokenizer.decode(outputs[0], skip_special_tokens=True)\n\n# 4. 输出结果\nprint(\"-\" * 30)\nprint(\"AI 医生回复:\")\nprint(response)\nprint(\"-\" * 30)\n```\n\n运行脚本：\n\n```bash\npython inference.py\n```\n\n### 进阶资源探索\n本仓库还整理了大量其他领域的模型，您可以直接访问其提供的链接获取特定任务的代码：\n*   **医学影像**: 查看 `Large Vision Models` 章节下的 `ChatCAD` 等项目。\n*   **生物信息学**: 查看 `Large Language Models` 章节下的 `AlphaFold`, `ESM` 等项目。\n*   **数据集**: 参考 `Large-scale Datasets` 章节获取训练数据。\n\n> **注意**: 医疗 AI 模型仅供研究和辅助参考，严禁直接用于临床诊断决策。","某三甲医院科研团队正试图构建一个能同时分析肺部 CT 影像与电子病历文本的多模态辅助诊断系统，以早期筛查肺癌。\n\n### 没有 Awesome-Healthcare-Foundation-Models 时\n- **选型迷茫**：面对海量通用 AI 模型，研究人员难以快速甄别哪些是专门针对医疗数据训练的基础模型，耗费数周时间进行无效调研。\n- **模态割裂**：团队需分别寻找独立的图像模型和文本模型，缺乏现成的多模态（LMM）资源指引，导致影像与病历数据无法有效对齐融合。\n- **合规风险**：由于缺乏对医疗伦理、隐私法规及数据集许可的系统梳理，项目在数据使用阶段面临潜在的法律与合规隐患。\n- **重复造轮子**：因不了解已有的生物信息学或医学教育领域开源成果，团队在数据预处理和特征工程上做了大量重复性工作。\n\n### 使用 Awesome-Healthcare-Foundation-Models 后\n- **精准定位**：直接通过分类目录锁定经过验证的医疗垂直领域大语言模型（LLMs）和视觉模型（LVMs），将技术选型周期从数周缩短至两天。\n- **多模态整合**：利用列表中推荐的大型多模态模型资源，快速搭建了影像 - 文本联合分析架构，显著提升了病灶特征与临床描述的关联精度。\n- **安全落地**：参考库中关于 AI 立法、伦理及安全性的专题板块，提前规避了数据隐私风险，确保系统设计符合医疗行业监管要求。\n- **高效复用**：基于收录的大规模开源数据集和前沿应用案例（如医疗机器人、公共卫生），直接复用了成熟的基线代码与数据处理流程，研发效率提升 50%。\n\nAwesome-Healthcare-Foundation-Models 通过提供一站式、分类清晰的医疗基础模型生态图谱，帮助研发团队跨越了从“盲目搜索”到“精准落地”的鸿沟，极大加速了智慧医疗系统的创新进程。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FJianing-Qiu_Awesome-Healthcare-Foundation-Models_ed0e0db7.png","Jianing-Qiu","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FJianing-Qiu_dc60ed2e.png","https:\u002F\u002Fgithub.com\u002FJianing-Qiu",509,53,"2026-04-13T18:13:29","MIT",1,"","未说明",{"notes":84,"python":82,"dependencies":85},"该仓库是一个医疗领域大模型（LAMs）的精选列表（Awesome List），而非单一的独立软件工具。它汇总了包括大型语言模型、视觉模型、音频模型及多模态模型在内的众多开源项目链接（如 ClinicalMamba, Med-PaLM 2, AlphaFold 等）。因此，具体的运行环境需求（操作系统、GPU、内存、依赖库等）取决于用户选择运行的列表中哪一个具体模型，需参考各子项目的独立文档。",[],[35,15,87,88],"音频","其他","2026-03-27T02:49:30.150509","2026-04-14T15:31:46.577401",[],[]]