[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-microsoft--AI2BMD":3,"tool-microsoft--AI2BMD":65},[4,23,32,40,49,57],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":22},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,2,"2026-04-05T10:45:23",[13,14,15,16,17,18,19,20,21],"图像","数据工具","视频","插件","Agent","其他","语言模型","开发框架","音频","ready",{"id":24,"name":25,"github_repo":26,"description_zh":27,"stars":28,"difficulty_score":29,"last_commit_at":30,"category_tags":31,"status":22},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,3,"2026-04-04T04:44:48",[17,13,20,19,18],{"id":33,"name":34,"github_repo":35,"description_zh":36,"stars":37,"difficulty_score":29,"last_commit_at":38,"category_tags":39,"status":22},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74913,"2026-04-05T10:44:17",[19,13,20,18],{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":46,"last_commit_at":47,"category_tags":48,"status":22},3215,"awesome-machine-learning","josephmisiti\u002Fawesome-machine-learning","awesome-machine-learning 是一份精心整理的机器学习资源清单，汇集了全球优秀的机器学习框架、库和软件工具。面对机器学习领域技术迭代快、资源分散且难以甄选的痛点，这份清单按编程语言（如 Python、C++、Go 等）和应用场景（如计算机视觉、自然语言处理、深度学习等）进行了系统化分类，帮助使用者快速定位高质量项目。\n\n它特别适合开发者、数据科学家及研究人员使用。无论是初学者寻找入门库，还是资深工程师对比不同语言的技术选型，都能从中获得极具价值的参考。此外，清单还延伸提供了免费书籍、在线课程、行业会议、技术博客及线下聚会等丰富资源，构建了从学习到实践的全链路支持体系。\n\n其独特亮点在于严格的维护标准：明确标记已停止维护或长期未更新的项目，确保推荐内容的时效性与可靠性。作为机器学习领域的“导航图”，awesome-machine-learning 以开源协作的方式持续更新，旨在降低技术探索门槛，让每一位从业者都能高效地站在巨人的肩膀上创新。",72149,1,"2026-04-03T21:50:24",[20,18],{"id":50,"name":51,"github_repo":52,"description_zh":53,"stars":54,"difficulty_score":46,"last_commit_at":55,"category_tags":56,"status":22},2234,"scikit-learn","scikit-learn\u002Fscikit-learn","scikit-learn 是一个基于 Python 构建的开源机器学习库，依托于 SciPy、NumPy 等科学计算生态，旨在让机器学习变得简单高效。它提供了一套统一且简洁的接口，涵盖了从数据预处理、特征工程到模型训练、评估及选择的全流程工具，内置了包括线性回归、支持向量机、随机森林、聚类等在内的丰富经典算法。\n\n对于希望快速验证想法或构建原型的数据科学家、研究人员以及 Python 开发者而言，scikit-learn 是不可或缺的基础设施。它有效解决了机器学习入门门槛高、算法实现复杂以及不同模型间调用方式不统一的痛点，让用户无需重复造轮子，只需几行代码即可调用成熟的算法解决分类、回归、聚类等实际问题。\n\n其核心技术亮点在于高度一致的 API 设计风格，所有估算器（Estimator）均遵循相同的调用逻辑，极大地降低了学习成本并提升了代码的可读性与可维护性。此外，它还提供了强大的模型选择与评估工具，如交叉验证和网格搜索，帮助用户系统地优化模型性能。作为一个由全球志愿者共同维护的成熟项目，scikit-learn 以其稳定性、详尽的文档和活跃的社区支持，成为连接理论学习与工业级应用的最",65628,"2026-04-05T10:10:46",[20,18,14],{"id":58,"name":59,"github_repo":60,"description_zh":61,"stars":62,"difficulty_score":10,"last_commit_at":63,"category_tags":64,"status":22},3364,"keras","keras-team\u002Fkeras","Keras 是一个专为人类设计的深度学习框架，旨在让构建和训练神经网络变得简单直观。它解决了开发者在不同深度学习后端之间切换困难、模型开发效率低以及难以兼顾调试便捷性与运行性能的痛点。\n\n无论是刚入门的学生、专注算法的研究人员，还是需要快速落地产品的工程师，都能通过 Keras 轻松上手。它支持计算机视觉、自然语言处理、音频分析及时间序列预测等多种任务。\n\nKeras 3 的核心亮点在于其独特的“多后端”架构。用户只需编写一套代码，即可灵活选择 TensorFlow、JAX、PyTorch 或 OpenVINO 作为底层运行引擎。这一特性不仅保留了 Keras 一贯的高层易用性，还允许开发者根据需求自由选择：利用 JAX 或 PyTorch 的即时执行模式进行高效调试，或切换至速度最快的后端以获得最高 350% 的性能提升。此外，Keras 具备强大的扩展能力，能无缝从本地笔记本电脑扩展至大规模 GPU 或 TPU 集群，是连接原型开发与生产部署的理想桥梁。",63927,"2026-04-04T15:24:37",[20,14,18],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":80,"owner_email":81,"owner_twitter":82,"owner_website":83,"owner_url":84,"languages":85,"stars":90,"forks":91,"last_commit_at":92,"license":93,"difficulty_score":29,"env_os":94,"env_gpu":95,"env_ram":96,"env_deps":97,"category_tags":103,"github_topics":80,"view_count":10,"oss_zip_url":80,"oss_zip_packed_at":80,"status":22,"created_at":104,"updated_at":105,"faqs":106,"releases":135},3219,"microsoft\u002FAI2BMD","AI2BMD","AI-powered ab initio biomolecular dynamics simulation","AI2BMD 是一款由微软推出的开源程序，旨在利用人工智能技术高效模拟蛋白质分子动力学，并达到第一性原理（ab initio）的计算精度。传统上，要实现如此高精度的生物分子模拟往往需要巨大的计算资源和漫长的时间，而 AI2BMD 通过深度学习模型成功解决了这一效率瓶颈，让研究人员能够在可接受的时间内获得接近量子力学计算级别的准确结果。\n\n该工具特别适合计算生物学、生物物理学领域的科研人员以及关注分子模拟算法的开发者使用。其核心亮点在于将复杂的密度泛函理论（DFT）计算能力与 AI 推理速度相结合，内置了包含约 2000 万种二肽构象的大规模训练数据集，并支持如 Chignolin 等标准蛋白质体系的快速模拟。用户无需配置复杂的环境，只需通过简单的 Python 启动脚本配合 Docker 容器即可运行。目前，AI2BMD 已将其核心研究成果发表于《自然》杂志，为探索蛋白质折叠、药物研发等前沿科学问题提供了强有力的计算支持。","# AI\u003Csup>2\u003C\u002Fsup>BMD: AI-powered *ab initio* biomolecular dynamics simulation\n\n## Contents\n\n- [Overview](#overview)\n- [Get Started](#get-started)\n- [Datasets](#datasets)\n- [System Requirements](#system-requirements)\n- [Advanced Setup](#advanced-setup)\n- [Related Research](#related-research)\n- [Citation](#citation)\n- [License](#license)\n- [Disclaimer](#disclaimer)\n- [Contacts](#contacts)\n\n## Overview\n\nAI\u003Csup>2\u003C\u002Fsup>BMD is a program for efficiently simulating protein molecular dynamics with *ab initio* accuracy. This repository contains the simulation program, datasets, and public materials related to AI\u003Csup>2\u003C\u002Fsup>BMD. The main content of AI\u003Csup>2\u003C\u002Fsup>BMD is published on [Nature](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-024-08127-z).\n\nHere is an animation to illustrate how AI\u003Csup>2\u003C\u002Fsup>BMD works.\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F912a3e5a-c465-4dc7-8c2d-9f7807cac2a7\n\n\n\n## Get Started\n\nThe source code of AI\u003Csup>2\u003C\u002Fsup>BMD is hosted in this repository.\nWe package the source code and runtime libraries into a Docker image, and provide a Python launcher program to simplify the setup process.\nTo run the simulation program, you don't need to clone this repository. Simply download `scripts\u002Fai2bmd` and launch it (Python >=3.7 and docker enviroments are required).\n\n\nWe can run a molecular dynamics simulation as follows.\n\n```shell\n# skip the following two lines if you've already set up the launcher\nwget 'https:\u002F\u002Fraw.githubusercontent.com\u002Fmicrosoft\u002FAI2BMD\u002Fmain\u002Fscripts\u002Fai2bmd'\nchmod +x ai2bmd\n# download the Chignolin protein structure data file\nwget 'https:\u002F\u002Fraw.githubusercontent.com\u002Fmicrosoft\u002FAI2BMD\u002Fmain\u002Fexamples\u002Fchig.pdb'\n# download the preprocessed and solvated Chignolin protein structure data files\nwget --directory-prefix=chig_preprocessed 'https:\u002F\u002Fraw.githubusercontent.com\u002Fmicrosoft\u002FAI2BMD\u002Fmain\u002Fexamples\u002Fchig_preprocessed\u002Fchig-preeq.pdb'\nwget --directory-prefix=chig_preprocessed 'https:\u002F\u002Fraw.githubusercontent.com\u002Fmicrosoft\u002FAI2BMD\u002Fmain\u002Fexamples\u002Fchig_preprocessed\u002Fchig-preeq-nowat.pdb'\n# pull the docker image from the container registry\ndocker pull ghcr.io\u002Fmicrosoft\u002Fai2bmd:latest\n# launch the program, with all simulation parameters set to default values\n# you may need to \"sudo\" the following line if the docker group is not configured for the user\n.\u002Fai2bmd --prot-file chig.pdb --preprocess-dir chig_preprocessed --preeq-steps 0 --sim-steps 1000 --record-per-steps 1\n```\n\nHere we use a very simple protein `Chignolin` as an example.\nThe program will run a simulation with the default parameters.\n\nThe results will be placed in a new directory `Logs-chig`.\nThe directory contains the simulation trajectory file:\n\n- chig-traj.traj: The full trajectory file in ASE binary format.\n\nNote: Currently, AI\u003Csup>2\u003C\u002Fsup>BMD supports MD simulations for proteins with neutral terminal caps (ACE and NME), single chain and standard amino acids.\n\n\n\n## Datasets\n\n### Protein Unit Dataset\n\nThe Protein Unit Dataset covers about 20 million conformations for dipeptides calculated at DFT level. It can be downloaded with the following commands:\n\n```shell\n# skip the following two lines if you've already set up the launcher\nwget 'https:\u002F\u002Fraw.githubusercontent.com\u002Fmicrosoft\u002FAI2BMD\u002Fmain\u002Fscripts\u002Fai2bmd'\nchmod +x ai2bmd\n# you may need to \"sudo\" the following line if the docker group is not configured for the user\n.\u002Fai2bmd --download-training-data\n```\n\nWhen it finishes, the current working directory will be populated by the numpy data files (*.npz).\n\n### AIMD-Chig Dataset\n\nThe AIMD-Chig dataset consists of 2 million conformations of the 166-atom `Chignolin`, along with their corresponding potential energy and atomic forces calculated using Density Functional Theory (DFT) at the M06-2X\u002F6-31G* level.\n\n\u003C!--\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fblob\u002Fresources\u002Fimages\u002Faimd-chig.png?raw=true\" width=50%>-->\n\n- Read the article [AIMD-Chig: Exploring the conformational space of a 166-atom protein Chignolin with ab initio molecular dynamics](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41597-023-02465-9).\n\n- Find the story [The first whole conformational molecular dynamics dataset for proteins at ab initio accuracy and the novel computational technologies behind it](https:\u002F\u002Fbioengineeringcommunity.nature.com\u002Fposts\u002Faimd-chig-exploring-the-conformational-space-of-proteins-at-dft-level).\n\n- Get the dataset [AIMD-Chig](https:\u002F\u002Ffigshare.com\u002Farticles\u002Fdataset\u002F_strong_AIMD-Chig_exploring_the_conformational_space_of_166-atom_protein_strong_em_strong_Chignolin_strong_em_strong_with_strong_em_strong_ab_initio_strong_em_strong_molecular_dynamics_strong_\u002F22786730).\n\n## System Requirements\n\n### Hardware Requirements\n\nThe AI\u003Csup>2\u003C\u002Fsup>BMD program runs on x86-64 GNU\u002FLinux systems.\nWe recommend a machine with the following specs:\n\n- **CPU**: 8+ cores\n- **Memory**: 32+ GB\n- **GPU**: CUDA-enabled GPU with 8+ GB memory\n\nThe program has been tested on the following GPUs:\n- A100\n- V100\n- RTX A6000\n- Titan RTX\n\n### Software Requirements\n\nThe program has been tested on the following systems:\n\n- **OS**: Ubuntu 20.04,  **Docker**: 27.1\n- **OS**: ArchLinux,  **Docker**: 26.1\n\n\n## Advanced Setup\n### Environment\nThe runtime libraries and requirents are packed into a Docker image for convenience and practicality. Before launching the Docker image, you need to install the Docker software (see https:\u002F\u002Fdocs.docker.com\u002Fengine\u002Finstall\u002F for more details) and add the user to docker group with the following commands:\n\n```shell\nsudo groupadd docker\nsudo usermod -aG docker $USER\nnewgrp docker\n```\n\n### Protein File Preparation\n\nThe input file for AI\u003Csup>2\u003C\u002Fsup>BMD should be `.pdb` format.\nIf hydrogen atoms are missing in the `.pdb` file, hydrogens should be added.\nThen, the protein should be capped with ACE (acetyl) at the N-terminus and NME (N-methyl) at the C-terminus.  These steps can be efficiently done using the PyMOL software with the following commands as a reference.\n\n```python\nfrom pymol import cmd\npymol.finish_launching()\ncmd.load(\"your_protein.pdb\",\"molecule\")\ncmd.h_add(\"molecule\") # Adding hydrogen\n\ncmd.wizard(\"mutagenesis\")\ncmd.get_wizard().set_n_cap(\"acet\")\nselection = \"\u002F%s\u002F\u002F%s\u002F%s\" % (molecule, chain, resi) #selection of N-term\ncmd.get_wizard().do_select(selection)\ncmd.get_wizard().apply()\n\ncmd.get_wizard().set_c_cap(\"nmet\")\nselection = \"\u002F%s\u002F\u002F%s\u002F%s\" % (molecule, chain, resi) #selection of N-term\ncmd.get_wizard().do_select(selection)\ncmd.get_wizard().apply()\n\ncmd.set_wizard()\n```\n\nNext, you can use AmberTools' `pdb4amber` utility to adjust atom names in the `.pdb` file, specifically ensuring compatibility for ACE and NME as required by `ai2bmd`. The atom names for ACE and NME should conform to the following:\n\n- ACE: C, O, CH3, H1, H2, H3\n- NME: N, CH3, H, HH31, HH32, HH33\n\n```\npdb4amber -i your_protein.pdb -o processed_your_protein.pdb\n```\n\nIn addition, please verify that there are no `TER` separators in the protein chain. Additionally, the residue numbering should start from 1 without gaps.\n\n\nAfter completing the above steps, your `.pdb` file should resemble the following format:\n\n```\nATOM      1  H1  ACE     1      10.845   8.614   5.964  1.00  0.00           H\nATOM      2  CH3 ACE     1      10.143   9.373   5.620  1.00  0.00           C\nATOM      3  H2  ACE     1       9.425   9.446   6.437  1.00  0.00           H\nATOM      4  H3  ACE     1       9.643   9.085   4.695  1.00  0.00           H\nATOM      5  C   ACE     1      10.805  10.740   5.408  1.00  0.00           C\nATOM      6  O   ACE     1      10.682  11.417   4.442  1.00  0.00           O\n...\nATOM    170  N   NME    12       9.499   8.258  10.367  1.00  0.00           N\nATOM    171  H   NME    12       9.393   8.028  11.345  1.00  0.00           H\nATOM    172  CH3 NME    12       8.845   7.223   9.569  1.00  0.00           C\nATOM    173 HH31 NME    12       7.842   6.990   9.925  1.00  0.00           H\nATOM    174 HH32 NME    12       8.798   7.589   8.543  1.00  0.00           H\nATOM    175 HH33 NME    12       9.418   6.305   9.435  1.00  0.00           H\nEND\n\n```\n\nYou can also take the protein files in `examples` folder as reference. Note, currently, the machine learning potential doesn't support the protein with disulfide bonds well. We will update it soon.\n\n### Preprocess\nDuring the preprocess, the solvated sytem is built and encounted energy minimization and alternative pre-equilibrium stages. Currently, AI\u003Csup>2\u003C\u002Fsup>MD provides two methods for the preprocess via the argument `preprocess_method`.\n\nIf you choose the `FF19SB` method, the system will go through solvation, energy minimization, heating and several pre-equilibrium stages. To accelerate the preprocess by multiple CPU cores and GPUs, you should get AMBER software packages and modify the corresponding commands in `src\u002FAIMD\u002Fpreprocess.py`.\n\nIf you choose the `AMOEBA` method, the system will go through solvation and energy minimization stages. We highly recommend to perform pre-equilibrium simulations to let the simulation system fully relaxed.\n\n### Simulation\nAI\u003Csup>2\u003C\u002Fsup>BMD provides two modes for performing the production simulations via the argument `mode`. The default mode of `fragment` represents protein is fragmented into dipeptides and then calculated by the machine learning potential in every simulation step.\n\nAI\u003Csup>2\u003C\u002Fsup>BMD also supports to train the machine learning potential by yourselves and perform simulations without fragmentation. The `visnet` mode represents the potential energy and atomic forces of the protein are directly calculated by the ViSNet model as a whole molecule without fragmentation. When using this mode, you need to train ViSNet model with the data of the molecules by yourself, upload the model to `src\u002FViSNet` and give the corresponding value to the argument `ckpt-type`. In this way, you can use AI\u003Csup>2\u003C\u002Fsup>BMD simulation program to simulate any kinds of molecules beyond proteins. To train the ViSNet model by yourselves, please check out the branch [ViSNet](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Ftree\u002FViSNet) for the source code, instructions on model training, and more techniqucal details.\n\nTo perform the whole AI\u003Csup>2\u003C\u002Fsup>BMD simulation including the preprocess, please use the following commands as reference.\n\n```shell\nwget 'https:\u002F\u002Fraw.githubusercontent.com\u002Fmicrosoft\u002FAI2BMD\u002Fmain\u002Fscripts\u002Fai2bmd'\nchmod +x ai2bmd\n# you may need to \"sudo\" the following line if the docker group is not configured for the user\n.\u002Fai2bmd --prot-file path\u002Fto\u002Ftarget-protein.pdb --sim-steps nnn  ...\n#        '-------- required argument ---------' '-- optional arguments --'\n#\n# Notable optional arguments:\n#\n# [Simulation directory mapping options]\n#   --base-dir path\u002Fto\u002Fbase-dir    Directory for running simulation (default: current directory)\n#   --log-dir  path\u002Fto\u002Flog-dir     Directory for logs, results (default: base-dir\u002FLogs-protein-name)\n#   --src-dir  path\u002Fto\u002Fsrc-dir     Mount src-dir in place of src\u002F from this repository (default: not used)\n#\n# [Simulation parameter options]\n#   --sim-steps nnn                Simulation steps\n#   --temp-k nnn                   Simulation temperature in Kelvin\n#   --timestep nnn                 Time-step (fs) for simulation\n#   --preeq-steps nnn              Pre-equilibration simulation steps for each constraint\n#   --max-cyc nnn                  Maximum energy minimization cycles in preprocessing\n#   --preprocess-method [method]   The method for preprocess\n#   --mode [mode]                  Use fragmentation or not during the simulation\n#   --record-per-steps nnn         The frequency to save trajectory\n#\n# [Performance tweaks]\n#   --device-strategy [strategy]   The compute device allocation strategy\n#       excess-compute                 Reserves last GPU for non-bonded\u002Fsolvent computation\n#       small-molecule                 Maximize resources for model inference\n#       large-molecule                 Improve performance for large molecules\n#   --chunk-size nnn               Number of atoms in each batch (reduces memory consumption)\n#\n# [Additional launcher options]\n#   --software-update              When specified, updates the program in the Docker image before running\n#   --download-training-data       When specified, downloads the AI2BMD training data, and unpacks it in the working directory.\n#                                  Ignores all other options.\n#   --gpus                         Specifies the GPU devices to passthrough to the program. Can be one of the following:\n#                                  all:        Passthrough all available GPUs to the program.\n#                                  none:       Disables GPU passthrough.\n#                                  i[,j,k...]  Passthrough some GPUs. Example: --gpus 0,1\n```\n\n### Post-analysis\nThe format of the simulation trajectory is `.traj` of ASE format. To convert it to `.dcd` format for visualization, you can install MDAnalysis first and take `src\u002Futils\u002Ftraj2dcd.py` as reference with the following commands:\n\n```shell\npython traj2dcd.py --input xxx.pdb --output xxx.dcd --pdb xxx.pdb --num-atoms nnn --stride nnn\n\n# arguments\n# --input         The name of the input trajectory file\n# --output        The name of the output trajectory file\n# --pdb           The reference pdb file corresponding to the input trajectory\n# --num-atoms     The number of atoms for protein or the whole solvated system\n# --stride        The frequency to output the trajectory\n```\n\n### Trouble shooting\nThe simulations may collapse due to insufficient modeling on proton hopping or improper simulation system settings. Proton hopping occurs frequently, especially for large biomolecules and long simulations. Since there are a few cases of proton hopping during machine learning potential training process, the model may encounter \"out-of-distribution\" cases, giving incorrect atomic forces and resulting in a collapse of the simulation. We will continuously update the AI\u003Csup>2\u003C\u002Fsup>BMD potential with more powerful prediction ability and also highly recommend anyone to contribute to the dataset for model finetuning.\n\nTo avoid and alleviate simulation collapse, we provide some suggestions: 1) fully relax the simulation system before the production simulation runs; 2) increase the duration of the preequilibrium simulations; 3) increase the duration of simulations with constraints in the production runs (via the argument `preeq-steps`); 4) restart the simulation from a few steps before the crash; 5) increase the box size of solvent; 6) adjust other simulation system setting.\n\nBeyond directly performing simulations, we also encourage users to employ AI\u003Csup>2\u003C\u002Fsup>BMD for reweighting the existing simulation trajectories and calculate protein properties accordingly.\n\n## Related Research\n\n### Model Architectures\n\n#### ViSNet\n\nViSNet (**V**ector-**S**calar **i**nteractive graph neural **Net**work) is an equivariant geometry-enhanced graph neural for molecules that significantly alleviates the dilemma between computational costs and the sufficient utilization of geometric information.\n\n\u003C!--\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fblob\u002Fresources\u002Fimages\u002Fvisnet_arch.png?raw=true\" width=50%>-->\n\n- ViSNet is published on *Nature Communications* [Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41467-023-43720-2).\n\n- ViSNet is selected as \"Editors' Highlights\" for both [\"**AI and machine learning**\"](https:\u002F\u002Fwww.nature.com\u002Fcollections\u002Fceiajcdbeb) and [\"**Biotechnology and methods**\"](https:\u002F\u002Fwww.nature.com\u002Fcollections\u002Fidhhgedgig) fields of Nature Communications.\n\n\u003C!--\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fblob\u002Fresources\u002Fimages\u002Fai-eh.png?raw=true\" width=50%>-->\n\u003C!--\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fblob\u002Fresources\u002Fimages\u002Fbio-eh.png?raw=true\" width=50%> -->\n\n- ViSNet has won the Championship in [The First Global AI Drug Development Competition](https:\u002F\u002Faistudio.baidu.com\u002Fcompetition\u002Fdetail\u002F1012\u002F0\u002Fleaderboard) and one of the winners in [OGB-LSC @ NeurIPS 2022 PCQM4Mv2 Track](https:\u002F\u002Fogb.stanford.edu\u002Fneurips2022\u002Fresults\u002F)!\n\n- Please check out the branch [ViSNet](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Ftree\u002FViSNet) for the source code, instructions on model training, and more techniqucal details.\n\n#### Geoformer\n\nGeoformer (**Geo**metric Trans**former**) is a novel geometric Transformer to effectively model molecular structures for various molecular property predictions. Geoformer introduces a novel positional encoding method, Interatomic Positional Encoding (IPE), to parameterize atomic environments in Transformer. By incorporating IPE, Geoformer captures valuable geometric information beyond pairwise distances within a Transformer-based architecture. Geoformer can be regarded as a Transformer variant of ViSNet.\n\n- Geoformer was published on NeurIPS 2023.\n- Read the paper of Geoformer [Geometric Transformer with Interatomic Positional Encoding](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Ftree\u002FGeoformer\u002FGeoformer.pdf).\n- Please check out the branch [Geoformer](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Ftree\u002FGeoformer) for the source code, instructions on model training, and more techniqucal details.\n\n\u003C!--\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fblob\u002Fresources\u002Fimages\u002Fgeoformer.png?raw=true\" width=50%>-->\n\n#### Fine-grained force metrics for MLFF\n\nMachine learning force fields (MLFFs) have gained popularity in recent years as a cost-effective alternative to *ab initio* molecular dynamics (MD) simulations. Despite their small errors on test sets, MLFFs inherently suffer from generalization and robustness issues during MD simulations.\n\nTo alleviate these issues, we propose the use of global force metrics and fine-grained metrics from elemental and conformational aspects to systematically measure MLFFs for every atom and conformation of molecules. Furthermore, the performance of MLFFs and the stability of MD simulations can be enhanced by employing the proposed force metrics during model training. This includes training MLFF models using these force metrics as loss functions, fine-tuning by reweighting samples in the original dataset, and continued training by incorporating additional unexplored data.\n\n\u003C!--\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fblob\u002Fresources\u002Fimages\u002Fmlff.jpg?raw=true\" width=25%>-->\n\n- Read the Cover Story article [Improving machine learning force fields for molecular dynamics simulations with fine-grained force metrics](https:\u002F\u002Fpubs.aip.org\u002Faip\u002Fjcp\u002Farticle-abstract\u002F159\u002F3\u002F035101\u002F2902663\u002FImproving-machine-learning-force-fields-for?redirectedFrom=fulltext) .\n\n#### Stochastic lag time parameterization for Markov State Model\n\nMarkov state models (MSMs) play a key role in studying protein conformational dynamics. A sliding count window with a fixed lag time is commonly used to sample sub-trajectories for transition counting and MSM construction. However, sub-trajectories sampled with a fixed lag time may not perform well under different selections of lag time, requiring strong prior experience and resulting in less robust estimations.\n\nTo alleviate this, we propose a novel stochastic method based on a Poisson process to generate perturbative lag times for sub-trajectory sampling and use it to construct a Markov chain. Comprehensive evaluations on the double-well system, WW domain, BPTI, and RBD–ACE2 complex of SARS-CoV-2 reveal that our algorithm significantly increases the robustness and accuracy of the constructed MSM without disrupting its Markovian properties. Furthermore, the advantages of our algorithm are especially pronounced for slow dynamic modes in complex biological processes.\n\n\u003C!--\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fblob\u002Fresources\u002Fimages\u002Fmarkov.jpg?raw=true\" width=25%>-->\n\n- Read the Cover Story article [Stochastic Lag Time Parameterization for Markov State Models of Protein Dynamics](https:\u002F\u002Fpubs.acs.org\u002Fdoi\u002F10.1021\u002Facs.jpcb.2c03711).\n\n- Find an application case in studying the Spike-ACE2 complex structure for the highly infectious mechanism of Omicron: [Structural insights into the SARS-CoV-2 Omicron RBD-ACE2 interaction](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41422-022-00644-8).\n\n\n## Citation\n(#: co-first author; *: corresponding author)\n\nTong Wang#\\*, Xinheng He#, Mingyu Li#, Yatao Li#, Ran Bi, Yusong Wang, Chaoran Cheng, Xiangzhen Shen, Jiawei Meng, He Zhang, Haiguang Liu, Zun Wang, Shaoning Li, Bin Shao\\*, Tie-Yan Liu. Ab initio characterization of protein molecular dynamics with AI\u003Csup>2\u003C\u002Fsup>BMD. Nature 2024.\n\nYusong Wang#, Tong Wang#\\*, Shaoning Li#, Xinheng He, Mingyu Li, Zun Wang, Nanning Zheng, Bin Shao*, Tie-Yan Liu, Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing, Nature Communications, 15.1 (2024): 313.\n\nTong Wang#\\*, Xinheng He#, Mingyu Li#, Bin Shao*, Tie-Yan Liu. AIMD-Chig: Exploring the conformational space of a 166-atom protein Chignolin with ab initio molecular dynamics, Scientific Data 10, 549 (2023).\n\nYusong Wang#, Shaoning Li#, Tong Wang*, Bin Shao, Nanning Zheng, Tie-Yan Liu. Geometric Transformer with Interatomic Positional Encoding. NeurIPS 2023.\n\nZun Wang#, Hongfei Wu#, Lixin Sun, Xinheng He, Zhirong Liu, Bin Shao, Tong Wang*, Tie-Yan Liu. Improving machine learning force fields for molecular dynamics simulations with fine-grained force metrics, The Journal of Chemical Physics, Volume 159, Issue 3, Cover Story.\n\nShiqi Gong#, Xinheng He#, Qi Meng, Zhiming Ma, Bin Shao*, Tong Wang*, Tie-Yan Liu. Stochastic Lag Time Parameterization for Markov State Models of Protein Dynamics, The Journal of Physical Chemistry B 2022 126 (46), Cover Story, 2022.\n\n## License\n\nCopyright (c) Microsoft Corporation. All rights reserved.\n\nLicensed under the [MIT](LICENSE) license.\n\n## Disclaimer\n\nAI\u003Csup>2\u003C\u002Fsup>BMD is a research project. It is not an officially supported Microsoft product.\n\n## Contacts\n\nPlease contact \u003CA href=\"mailto:tongwang.bio@outlook.com\">Tong Wang\u003C\u002FA> (Project Lead) and \u003CA href=\"mailto:biran@microsoft.com\">Ran Bi\u003C\u002FA> for any questions, suggestions, and technical support.\n","# AI\u003Csup>2\u003C\u002Fsup>BMD：人工智能驱动的从头计算生物分子动力学模拟\n\n## 目录\n\n- [概述](#overview)\n- [快速入门](#get-started)\n- [数据集](#datasets)\n- [系统要求](#system-requirements)\n- [高级设置](#advanced-setup)\n- [相关研究](#related-research)\n- [引用](#citation)\n- [许可证](#license)\n- [免责声明](#disclaimer)\n- [联系方式](#contacts)\n\n## 概述\n\nAI\u003Csup>2\u003C\u002Fsup>BMD 是一款能够以从头计算精度高效模拟蛋白质分子动力学的程序。本仓库包含该模拟程序、数据集以及与 AI\u003Csup>2\u003C\u002Fsup>BMD 相关的公开资料。AI\u003Csup>2\u003C\u002Fsup>BMD 的主要内容已发表在《Nature》期刊上（https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-024-08127-z）。\n\n以下动画展示了 AI\u003Csup>2\u003C\u002Fsup>BMD 的工作原理。\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F912a3e5a-c465-4dc7-8c2d-9f7807cac2a7\n\n\n\n## 快速入门\n\nAI\u003Csup>2\u003C\u002Fsup>BMD 的源代码托管于此仓库中。\n我们已将源代码和运行时库打包成 Docker 镜像，并提供一个 Python 启动脚本，以简化安装流程。\n要运行该模拟程序，您无需克隆此仓库。只需下载 `scripts\u002Fai2bmd` 并执行即可（需 Python >=3.7 和 Docker 环境）。\n\n我们可以按照如下步骤运行一次分子动力学模拟：\n\n```shell\n# 如果已设置好启动脚本，可跳过以下两行\nwget 'https:\u002F\u002Fraw.githubusercontent.com\u002Fmicrosoft\u002FAI2BMD\u002Fmain\u002Fscripts\u002Fai2bmd'\nchmod +x ai2bmd\n# 下载 Chignolin 蛋白质结构数据文件\nwget 'https:\u002F\u002Fraw.githubusercontent.com\u002Fmicrosoft\u002FAI2BMD\u002Fmain\u002Fexamples\u002Fchig.pdb'\n# 下载预处理并溶剂化的 Chignolin 蛋白质结构数据文件\nwget --directory-prefix=chig_preprocessed 'https:\u002F\u002Fraw.githubusercontent.com\u002Fmicrosoft\u002FAI2BMD\u002Fmain\u002Fexamples\u002Fchig_preprocessed\u002Fchig-preeq.pdb'\nwget --directory-prefix=chig_preprocessed 'https:\u002F\u002Fraw.githubusercontent.com\u002Fmicrosoft\u002FAI2BMD\u002Fmain\u002Fexamples\u002Fchig_preprocessed\u002Fchig-preeq-nowat.pdb'\n# 从容器注册表拉取 Docker 镜像\ndocker pull ghcr.io\u002Fmicrosoft\u002Fai2bmd:latest\n# 启动程序，所有模拟参数均采用默认值\n# 如果用户未加入 docker 组，可能需要使用 sudo 执行以下命令\n.\u002Fai2bmd --prot-file chig.pdb --preprocess-dir chig_preprocessed --preeq-steps 0 --sim-steps 1000 --record-per-steps 1\n```\n\n这里我们以一个非常简单的蛋白质 `Chignolin` 作为示例。\n程序将使用默认参数进行模拟。\n\n结果将保存在一个名为 `Logs-chig` 的新目录中。该目录包含模拟轨迹文件：\n\n- chig-traj.traj：ASE 二进制格式的完整轨迹文件。\n\n注意：目前，AI\u003Csup>2\u003C\u002Fsup>BMD 支持具有中性末端封端（ACE 和 NME）、单链且由标准氨基酸组成的蛋白质的 MD 模拟。\n\n\n\n## 数据集\n\n### 蛋白质单元数据集\n\n蛋白质单元数据集涵盖了约 2000 万个在 DFT 水平下计算得到的二肽构象。可通过以下命令下载：\n\n```shell\n# 如果已设置好启动脚本，可跳过以下两行\nwget 'https:\u002F\u002Fraw.githubusercontent.com\u002Fmicrosoft\u002FAI2BMD\u002Fmain\u002Fscripts\u002Fai2bmd'\nchmod +x ai2bmd\n# 如果用户未加入 docker 组，可能需要使用 sudo 执行以下命令\n.\u002Fai2bmd --download-training-data\n```\n\n下载完成后，当前工作目录中将生成多个 numpy 数据文件 (*.npz)。\n\n### AIMD-Chig 数据集\n\nAIMD-Chig 数据集包含 166 个原子的 `Chignolin` 蛋白质的 200 万个构象，以及它们对应的势能和原子力，这些数据均由密度泛函理论 (DFT) 在 M06-2X\u002F6-31G* 水平下计算得出。\n\n\u003C!--\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fblob\u002Fresources\u002Fimages\u002Faimd-chig.png?raw=true\" width=50%>-->\n\n- 阅读文章 [AIMD-Chig：利用从头计算分子动力学探索 166 原子蛋白质 Chignolin 的构象空间]（https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41597-023-02465-9）。\n\n- 查看报道 [首个达到从头计算精度的蛋白质完整构象分子动力学数据集及其背后的新型计算技术]（https:\u002F\u002Fbioengineeringcommunity.nature.com\u002Fposts\u002Faimd-chig-exploring-the-conformational-space-of-proteins-at-dft-level）。\n\n- 获取数据集 [AIMD-Chig]（https:\u002F\u002Ffigshare.com\u002Farticles\u002Fdataset\u002F_strong_AIMD-Chig_exploring_the_conformational_space_of_166-atom_protein_strong_em_strong_Chignolin_strong_em_strong_with_strong_em_strong_ab_initio_strong_em_strong_molecular_dynamics_strong_\u002F22786730）。\n\n## 系统要求\n\n### 硬件要求\n\nAI\u003Csup>2\u003C\u002Fsup>BMD 程序可在 x86-64 架构的 GNU\u002FLinux 系统上运行。\n我们建议使用具备以下配置的机器：\n\n- **CPU**：8 核及以上\n- **内存**：32 GB 及以上\n- **GPU**：支持 CUDA 的 GPU，显存 8 GB 及以上\n\n该程序已在以下 GPU 上测试通过：\n- A100\n- V100\n- RTX A6000\n- Titan RTX\n\n### 软件要求\n\n该程序已在以下系统上测试通过：\n\n- **操作系统**：Ubuntu 20.04，**Docker**：27.1\n- **操作系统**：ArchLinux，**Docker**：26.1\n\n\n## 高级设置\n### 环境配置\n为方便实用，运行时库及依赖项已被打包至 Docker 镜像中。在启动 Docker 镜像之前，您需要安装 Docker 软件（详情请参阅 https:\u002F\u002Fdocs.docker.com\u002Fengine\u002Finstall\u002F），并将当前用户添加到 docker 组中，具体操作如下：\n\n```shell\nsudo groupadd docker\nsudo usermod -aG docker $USER\nnewgrp docker\n```\n\n### 蛋白质文件准备\n\nAI\u003Csup>2\u003C\u002Fsup>BMD 的输入文件应为 `.pdb` 格式。如果 `.pdb` 文件中缺少氢原子，需添加氢原子。随后，应在蛋白质的 N 端用 ACE（乙酰基）封端，在 C 端用 NME（N-甲基）封端。这些步骤可使用 PyMOL 软件通过以下命令高效完成。\n\n```python\nfrom pymol import cmd\npymol.finish_launching()\ncmd.load(\"your_protein.pdb\",\"molecule\")\ncmd.h_add(\"molecule\") # 添加氢原子\n\ncmd.wizard(\"mutagenesis\")\ncmd.get_wizard().set_n_cap(\"acet\")\nselection = \"\u002F%s\u002F\u002F%s\u002F%s\" % (molecule, chain, resi) # 选择 N 端\ncmd.get_wizard().do_select(selection)\ncmd.get_wizard().apply()\n\ncmd.get_wizard().set_c_cap(\"nmet\")\nselection = \"\u002F%s\u002F\u002F%s\u002F%s\" % (molecule, chain, resi) # 选择 N 端\ncmd.get_wizard().do_select(selection)\ncmd.get_wizard().apply()\n\ncmd.set_wizard()\n```\n\n接下来，可以使用 AmberTools 的 `pdb4amber` 工具调整 `.pdb` 文件中的原子名称，特别是确保 ACE 和 NME 的命名符合 `ai2bmd` 的要求。ACE 和 NME 的原子名称应符合以下规范：\n\n- ACE：C、O、CH3、H1、H2、H3\n- NME：N、CH3、H、HH31、HH32、HH33\n\n```\npdb4amber -i your_protein.pdb -o processed_your_protein.pdb\n```\n\n此外，请确认蛋白质链中不存在 `TER` 分隔符，并且残基编号应从 1 开始，不得有间隙。\n\n完成上述步骤后，您的 `.pdb` 文件应类似于以下格式：\n\n```\nATOM      1  H1  ACE     1      10.845   8.614   5.964  1.00  0.00           H\nATOM      2  CH3 ACE     1      10.143   9.373   5.620  1.00  0.00           C\nATOM      3  H2  ACE     1       9.425   9.446   6.437  1.00  0.00           H\nATOM      4  H3  ACE     1       9.643   9.085   4.695  1.00  0.00           H\nATOM      5  C   ACE     1      10.805  10.740   5.408  1.00  0.00           C\nATOM      6  O   ACE     1      10.682  11.417   4.442  1.00  0.00           O\n...\nATOM    170  N   NME    12       9.499   8.258  10.367  1.00  0.00           N\nATOM    171  H   NME    12       9.393   8.028  11.345  1.00  0.00           H\nATOM    172  CH3 NME    12       8.845   7.223   9.569  1.00  0.00           C\nATOM    173 HH31 NME    12       7.842   6.990   9.925  1.00  0.00           H\nATOM    174 HH32 NME    12       8.798   7.589   8.543  1.00  0.00           H\nATOM    175 HH33 NME    12       9.418   6.305   9.435  1.00  0.00           H\nEND\n\n```\n\n您也可以参考 `examples` 文件夹中的蛋白质文件。需要注意的是，目前机器学习势能对含有二硫键的蛋白质支持尚不完善，我们将在近期进行更新。\n\n### 预处理\n在预处理阶段，系统会先构建溶剂化体系，然后进行能量最小化和若干预平衡阶段。目前，AI\u003Csup>2\u003C\u002Fsup>MD 通过参数 `preprocess_method` 提供两种预处理方法。\n\n如果您选择 `FF19SB` 方法，系统将依次经历溶剂化、能量最小化、加热以及多个预平衡阶段。为了利用多核 CPU 和 GPU 加速预处理过程，您需要获取 AMBER 软件包，并修改 `src\u002FAIMD\u002Fpreprocess.py` 中的相关命令。\n\n如果您选择 `AMOEBA` 方法，系统将仅进行溶剂化和能量最小化阶段。我们强烈建议您进行预平衡模拟，以使模拟系统充分松弛。\n\n### 模拟\nAI\u003Csup>2\u003C\u002Fsup>BMD 通过参数 `mode` 提供两种生产模拟模式。默认的 `fragment` 模式表示蛋白质会被分割成二肽，然后在每一步模拟中由机器学习势能进行计算。\n\nAI\u003Csup>2\u003C\u002Fsup>BMD 也支持用户自行训练机器学习势能，并在不进行碎片化的情况下进行模拟。`visnet` 模式表示蛋白质的整体势能和原子力将由 ViSNet 模型直接计算，无需进行碎片化。使用此模式时，您需要自行使用分子数据训练 ViSNet 模型，将其上传至 `src\u002FViSNet`，并为参数 `ckpt-type` 设置相应的值。这样，您就可以使用 AI\u003Csup>2\u003C\u002Fsup>BMD 模拟程序来模拟除蛋白质以外的任何分子。如需自行训练 ViSNet 模型，请参阅分支 [ViSNet](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Ftree\u002FViSNet)，其中包含源代码、模型训练说明及其他技术细节。\n\n要执行包括预处理在内的完整 AI\u003Csup>2\u003C\u002Fsup>BMD 模拟，可参考以下命令。\n\n```shell\nwget 'https:\u002F\u002Fraw.githubusercontent.com\u002Fmicrosoft\u002FAI2BMD\u002Fmain\u002Fscripts\u002Fai2bmd'\nchmod +x ai2bmd\n# 如果用户未配置 docker 组，可能需要使用 sudo 运行以下命令\n.\u002Fai2bmd --prot-file path\u002Fto\u002Ftarget-protein.pdb --sim-steps nnn  ...\n#        '-------- 必填参数 ---------' '-- 可选参数 --'\n#\n# 常见可选参数：\n#\n# 【模拟目录映射选项】\n#   --base-dir path\u002Fto\u002Fbase-dir    模拟运行目录（默认为当前目录）\n#   --log-dir  path\u002Fto\u002Flog-dir     日志及结果保存目录（默认为 base-dir\u002FLogs-蛋白质名）\n#   --src-dir  path\u002Fto\u002Fsrc-dir     将本地 src 目录挂载到仓库的 src\u002F 目录位置（默认不启用）\n#\n# 【模拟参数选项】\n#   --sim-steps nnn                模拟步数\n#   --temp-k nnn                   模拟温度（单位：开尔文）\n#   --timestep nnn                 模拟时间步长（单位：飞秒）\n#   --preeq-steps nnn              每个约束条件下的预平衡模拟步数\n#   --max-cyc nnn                  预处理中能量最小化的最大循环次数\n#   --preprocess-method [method]   预处理方法\n#   --mode [mode]                  模拟过程中是否进行碎片化\n#   --record-per-steps nnn         轨迹保存频率\n#\n# 【性能优化选项】\n#   --device-strategy [strategy]   计算设备分配策略\n#       excess-compute                 保留最后一块 GPU 用于非键合及溶剂相关计算\n#       small-molecule                 最大化资源用于模型推理\n#       large-molecule                 提升大分子模拟性能\n#   --chunk-size nnn               每批处理的原子数（减少内存消耗）\n#\n# 【其他启动选项】\n#   --software-update              若指定，则在运行前更新 Docker 镜像中的程序\n#   --download-training-data       若指定，则下载 AI2BMD 训练数据并解压到工作目录，忽略其他所有选项。\n#   --gpus                         指定传递给程序的 GPU 设备，可选：\n#                                  all:        将所有可用 GPU 传递给程序。\n\n#                                  none:       禁用 GPU 直通。\n#                                  i[,j,k...]  直通部分 GPU。示例：--gpus 0,1\n```\n\n### 后处理分析\n模拟轨迹的格式为 ASE 格式的 `.traj` 文件。若需将其转换为 `.dcd` 格式以便进行可视化，可先安装 MDAnalysis，并以 `src\u002Futils\u002Ftraj2dcd.py` 脚本作为参考，执行以下命令：\n\n```shell\npython traj2dcd.py --input xxx.pdb --output xxx.dcd --pdb xxx.pdb --num-atoms nnn --stride nnn\n\n# 参数说明：\n# --input         输入轨迹文件的名称\n# --output        输出轨迹文件的名称\n# --pdb           与输入轨迹对应的参考 PDB 文件\n# --num-atoms     蛋白质或整个溶剂化体系的原子数\n# --stride        输出轨迹的频率\n```\n\n### 故障排除\n由于对质子跳跃行为的建模不足或模拟体系设置不当，模拟可能会崩溃。质子跳跃现象在大型生物分子和长时间模拟中尤为常见。鉴于机器学习势能训练过程中已出现过若干质子跳跃案例，模型可能遇到“分布外”情况，从而给出错误的原子作用力，最终导致模拟崩溃。我们将持续更新 AI\u003Csup>2\u003C\u002Fsup>BMD 势能，提升其预测能力；同时也强烈建议大家贡献数据集，以进一步微调模型。\n\n为避免或缓解模拟崩溃，我们提供以下建议：1) 在正式生产模拟运行前充分弛豫模拟体系；2) 延长预平衡模拟的时间；3) 在正式生产模拟中增加施加约束条件的模拟时长（通过参数 `preeq-steps`）；4) 从崩溃前几步处重新启动模拟；5) 扩大溶剂盒子的尺寸；6) 调整其他模拟体系设置。\n\n除了直接进行模拟之外，我们也鼓励用户利用 AI\u003Csup>2\u003C\u002Fsup>BMD 对现有模拟轨迹进行重加权，并据此计算蛋白质的相关性质。\n\n## 相关研究\n\n### 模型架构\n\n#### ViSNet\n\nViSNet（**V**ector-**S**calar **i**nteractive graph neural **Net**work）是一种等变的、基于几何信息增强的分子图神经网络，能够显著缓解计算成本与充分利用几何信息之间的矛盾。\n\n\u003C!--\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fblob\u002Fresources\u002Fimages\u002Fvisnet_arch.png?raw=true\" width=50%>-->\n\n- ViSNet 已发表在《Nature Communications》上，论文题目为《通过等变向量-标量交互消息传递增强分子的几何表征》（[链接](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41467-023-43720-2)）。\n\n- ViSNet 同时入选了《Nature Communications》“编辑精选”栏目，分别在“**人工智能与机器学习**”（[链接](https:\u002F\u002Fwww.nature.com\u002Fcollections\u002Fceiajcdbeb)）和“**生物技术与方法**”（[链接](https:\u002F\u002Fwww.nature.com\u002Fcollections\u002Fidhhgedgig)）两个领域。\n\n\u003C!--\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fblob\u002Fresources\u002Fimages\u002Fai-eh.png?raw=true\" width=50%>-->\n\u003C!--\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fblob\u002Fresources\u002Fimages\u002Fbio-eh.png?raw=true\" width=50%> -->\n\n- ViSNet 在【首届全球AI药物研发大赛】中荣获冠军，并在【OGB-LSC @ NeurIPS 2022 PCQM4Mv2赛道】中跻身获奖者行列！\n\n- 欢迎访问分支 [ViSNet](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Ftree\u002FViSNet)，获取源代码、模型训练说明及更多技术细节。\n\n#### Geoformer\n\nGeoformer（**Geo**metric Trans**former**）是一种新颖的几何Transformer模型，用于高效建模分子结构，以实现多种分子性质的预测。Geoformer 引入了一种新的位置编码方法——原子间位置编码（IPE），用于在Transformer架构中参数化原子环境。通过引入IPE，Geoformer能够在基于Transformer的架构中捕捉到超越两两距离的宝贵几何信息。Geoformer可以被视为ViSNet的一种Transformer变体。\n\n- Geoformer 已发表于NeurIPS 2023。\n- 请阅读Geoformer的论文《带有原子间位置编码的几何Transformer》（[链接](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Ftree\u002FGeoformer\u002FGeoformer.pdf)）。\n- 欢迎访问分支 [Geoformer](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Ftree\u002FGeoformer)，获取源代码、模型训练说明及更多技术细节。\n\n\u003C!--\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fblob\u002Fresources\u002Fimages\u002Fgeoformer.png?raw=true\" width=50%>-->\n\n#### 面向MLFF的细粒度力场指标\n\n近年来，机器学习力场（MLFF）作为一种经济高效的替代方案，逐渐取代从头算分子动力学（MD）模拟而受到广泛关注。然而，尽管MLFF在测试集上的误差较小，但在实际的MD模拟过程中，其泛化能力和鲁棒性往往存在明显不足。\n\n为解决这些问题，我们提出了一种综合性的评估框架，通过全局力场指标以及从元素和构象角度出发的细粒度指标，系统地衡量MLFF在每个原子和每种分子构象上的表现。此外，在模型训练过程中引入这些力场指标，可以进一步提升MLFF的性能和MD模拟的稳定性。具体方法包括：将这些力场指标作为损失函数来训练MLFF模型；通过对原始数据集中的样本进行重加权来进行微调；以及结合额外的未探索数据继续训练。\n\n\u003C!--\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fblob\u002Fresources\u002Fimages\u002Fmlff.jpg?raw=true\" width=25%>-->\n\n- 请阅读封面文章《利用细粒度力场指标改进用于分子动力学模拟的机器学习力场》（[链接](https:\u002F\u002Fpubs.aip.org\u002Faip\u002Fjcp\u002Farticle-abstract\u002F159\u002F3\u002F035101\u002F2902663\u002FImproving-machine-learning-force-fields-for?redirectedFrom=fulltext)）。\n\n#### 马尔可夫状态模型的随机延迟时间参数化\n马尔可夫状态模型（MSM）在研究蛋白质构象动力学中起着关键作用。通常，人们会使用一个固定延迟时间的滑动计数窗口来采样子轨迹，以便进行转移计数和构建MSM。然而，采用固定延迟时间采样的子轨迹在不同延迟时间选择下可能表现不佳，这不仅需要丰富的先验经验，还可能导致估计结果不够稳健。\n\n为此，我们提出了一种基于泊松过程的新型随机方法，用于生成扰动式的延迟时间来采样子轨迹，并以此构建马尔可夫链。针对双势阱系统、WW结构域、BPTI以及SARS-CoV-2的RBD–ACE2复合物等复杂体系的全面评估表明，我们的算法能够在不破坏马尔可夫性质的前提下，显著提高所构建MSM的稳健性和准确性。此外，该算法的优势在复杂生物过程中的慢动态模式中尤为突出。\n\n\u003C!--\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fblob\u002Fresources\u002Fimages\u002Fmarkov.jpg?raw=true\" width=25%>-->\n\n- 请阅读封面文章《蛋白质动力学马尔可夫状态模型的随机延迟时间参数化》（[链接](https:\u002F\u002Fpubs.acs.org\u002Fdoi\u002F10.1021\u002Facs.jpcb.2c03711)）。\n- 另外，您还可以参考一项应用案例：通过研究Omicron变异株的Spike-ACE2复合物结构，揭示其高传染性机制——《关于SARS-CoV-2 Omicron RBD-ACE2相互作用的结构见解》（[链接](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41422-022-00644-8)）。\n\n## 引用\n(#：共同第一作者；*：通讯作者）\n\nTong Wang#\\*, Xinheng He#, Mingyu Li#, Yatao Li#, Ran Bi, Yusong Wang, Chaoran Cheng, Xiangzhen Shen, Jiawei Meng, He Zhang, Haiguang Liu, Zun Wang, Shaoning Li, Bin Shao\\*, Tie-Yan Liu. 利用AI\u003Csup>2\u003C\u002Fsup>BMD对蛋白质分子动力学进行从头计算表征。Nature 2024。\n\nYusong Wang#, Tong Wang#\\*, Shaoning Li#, Xinheng He, Mingyu Li, Zun Wang, Nanning Zheng, Bin Shao*, Tie-Yan Liu. 通过等变向量-标量交互消息传递增强分子的几何表征，Nature Communications, 15.1 (2024): 313。\n\nTong Wang#\\*, Xinheng He#, Mingyu Li#, Bin Shao*, Tie-Yan Liu. AIMD-Chig：利用从头算分子动力学探索166个原子的Chignolin蛋白构象空间，Scientific Data 10, 549 (2023)。\n\nYusong Wang#, Shaoning Li#, Tong Wang*, Bin Shao, Nanning Zheng, Tie-Yan Liu. 带有原子间位置编码的几何Transformer。NeurIPS 2023。\n\nZun Wang#, Hongfei Wu#, Lixin Sun, Xinheng He, Zhirong Liu, Bin Shao, Tong Wang*, Tie-Yan Liu. 利用细粒度力场指标改进用于分子动力学模拟的机器学习力场，The Journal of Chemical Physics，第159卷，第3期，封面文章。\n\nShiqi Gong#, Xinheng He#, Qi Meng, Zhiming Ma, Bin Shao*, Tong Wang*, Tie-Yan Liu. 蛋白质动力学马尔可夫状态模型的随机延迟时间参数化，The Journal of Physical Chemistry B 2022年第126卷第46期，封面文章，2022年。\n\n## 许可证\n\n版权所有© 微软公司。保留所有权利。\n\n根据[MIT](LICENSE)许可证授权。\n\n## 免责声明\n\nAI\u003Csup>2\u003C\u002Fsup>BMD 是一项研究项目。它并非微软官方支持的产品。\n\n## 联系方式\n\n如有任何问题、建议或技术支持，请联系 \u003CA href=\"mailto:tongwang.bio@outlook.com\">Tong Wang\u003C\u002FA>（项目负责人）和 \u003CA href=\"mailto:biran@microsoft.com\">Ran Bi\u003C\u002FA>。","# AI2BMD 快速上手指南\n\nAI2BMD 是一款基于人工智能的从头算（*ab initio*）生物分子动力学模拟程序，能够以量子力学精度高效模拟蛋白质动力学。本指南将帮助您快速在本地环境中部署并运行该工具。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: x86-64 GNU\u002FLinux (推荐 Ubuntu 20.04 或 ArchLinux)\n- **CPU**: 8 核及以上\n- **内存**: 32 GB 及以上\n- **GPU**: 支持 CUDA 的显卡，显存 8 GB 及以上\n  - *已测试型号*: A100, V100, RTX A6000, Titan RTX\n\n### 前置依赖\n- **Python**: 版本 >= 3.7\n- **Docker**: 已安装并配置好用户权限\n  - 若未配置 Docker 用户组，请执行以下命令：\n    ```shell\n    sudo groupadd docker\n    sudo usermod -aG docker $USER\n    newgrp docker\n    ```\n\n## 安装步骤\n\nAI2BMD 采用 Docker 容器化部署，无需克隆整个代码仓库，只需下载启动脚本即可。\n\n1. **下载启动脚本并赋予执行权限**：\n   ```shell\n   wget 'https:\u002F\u002Fraw.githubusercontent.com\u002Fmicrosoft\u002FAI2BMD\u002Fmain\u002Fscripts\u002Fai2bmd'\n   chmod +x ai2bmd\n   ```\n\n2. **拉取 Docker 镜像**：\n   首次运行时脚本会自动拉取，也可手动预拉取：\n   ```shell\n   docker pull ghcr.io\u002Fmicrosoft\u002Fai2bmd:latest\n   ```\n\n## 基本使用\n\n以下示例演示如何对一个小蛋白 `Chignolin` 进行简单的动力学模拟。\n\n### 1. 准备数据文件\n下载示例蛋白结构文件及预处理文件：\n```shell\n# 下载原始蛋白结构\nwget 'https:\u002F\u002Fraw.githubusercontent.com\u002Fmicrosoft\u002FAI2BMD\u002Fmain\u002Fexamples\u002Fchig.pdb'\n\n# 下载预处理后的溶剂化结构文件\nmkdir -p chig_preprocessed\nwget --directory-prefix=chig_preprocessed 'https:\u002F\u002Fraw.githubusercontent.com\u002Fmicrosoft\u002FAI2BMD\u002Fmain\u002Fexamples\u002Fchig_preprocessed\u002Fchig-preeq.pdb'\nwget --directory-prefix=chig_preprocessed 'https:\u002F\u002Fraw.githubusercontent.com\u002Fmicrosoft\u002FAI2BMD\u002Fmain\u002Fexamples\u002Fchig_preprocessed\u002Fchig-preeq-nowat.pdb'\n```\n\n### 2. 运行模拟\n执行以下命令启动模拟（使用默认参数，模拟 1000 步）：\n```shell\n.\u002Fai2bmd --prot-file chig.pdb --preprocess-dir chig_preprocessed --preeq-steps 0 --sim-steps 1000 --record-per-steps 1\n```\n> **注意**：如果当前用户未加入 docker 组，可能需要在命令前加 `sudo`。\n\n### 3. 查看结果\n模拟完成后，结果将保存在 `Logs-chig` 目录中：\n- `chig-traj.traj`: ASE 二进制格式的全轨迹文件。\n\n### 输入文件注意事项\n目前 AI2BMD 仅支持以下类型的蛋白质模拟：\n- 单链蛋白\n- 标准氨基酸\n- 末端需封闭：N 端为 ACE (乙酰基)，C 端为 NME (N-甲基)\n- 暂不支持二硫键（后续版本将更新支持）\n\n若需处理自己的 PDB 文件，请确保补充氢原子并按上述要求添加末端封闭基团（可参考 PyMOL 或 AmberTools 进行处理）。","某生物医药研发团队正致力于解析一种新型小蛋白药物在原子层面的折叠路径与动态稳定性，以指导后续的分子优化设计。\n\n### 没有 AI2BMD 时\n- **计算成本极高**：传统从头算（ab initio）分子动力学模拟依赖密度泛函理论（DFT），对包含上百个原子的蛋白质进行纳秒级模拟，往往需要占用超级计算机数周甚至数月的算力资源。\n- **精度与效率难以兼得**：为了缩短时间，研究人员被迫使用经验力场代替量子力学计算，但这牺牲了电子层面的高精度，无法准确捕捉化学键断裂或复杂的电荷转移过程。\n- **数据采样不足**：受限于昂贵的计算开销，只能运行极短时间的模拟或极少次数的重复实验，导致无法充分探索蛋白质广阔的构象空间，遗漏关键的中间态结构。\n- **技术门槛高**：搭建高精度的量子化学计算环境复杂，参数调试繁琐，严重拖慢了从假设提出到验证的迭代周期。\n\n### 使用 AI2BMD 后\n- **仿真速度飞跃**：AI2BMD 利用深度学习模型替代耗时的 DFT 实时计算，在保持从头算精度的前提下，将模拟速度提升了数个数量级，使原本需数月的任务在普通工作站上几天即可完成。\n- **兼顾量子精度**：无需妥协于经验力场，AI2BMD 直接输出符合 DFT 级别（如 M06-2X\u002F6-31G*）的势能面和原子受力，精准还原蛋白质折叠过程中的微观量子效应。\n- **充分构象采样**：高效的计算能力支持运行更长时程的轨迹模拟和更多样本的并行测试，帮助团队完整绘制出蛋白质折叠的自由能景观，发现以往被忽略的亚稳态。\n- **部署便捷高效**：通过 Docker 镜像和简单的 Python 启动脚本，研究人员无需深入配置复杂的量子化学软件栈，即可快速复现 Nature 论文级别的模拟结果。\n\nAI2BMD 成功打破了量子精度与计算效率之间的长期壁垒，让科研团队能以低成本实现“显微镜级”的蛋白质动态观测。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_AI2BMD_e3060680.png","microsoft","Microsoft","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fmicrosoft_4900709c.png","Open source projects and samples from Microsoft",null,"opensource@microsoft.com","OpenAtMicrosoft","https:\u002F\u002Fopensource.microsoft.com","https:\u002F\u002Fgithub.com\u002Fmicrosoft",[86],{"name":87,"color":88,"percentage":89},"Python","#3572A5",100,571,82,"2026-03-19T07:37:31","MIT","Linux (x86-64)","必需，需支持 CUDA 的 NVIDIA GPU，显存 8GB+。已测试型号：A100, V100, RTX A6000, Titan RTX","32GB+",{"notes":98,"python":99,"dependencies":100},"该工具通过 Docker 镜像分发，无需手动安装复杂的 Python 依赖库，但宿主机器必须安装 Docker 并将用户加入 docker 组。输入蛋白文件 (.pdb) 需预先处理：添加氢原子，并在 N 端和 C 端分别加上 ACE 和 NME 封端基团，且需符合特定的原子命名规范（建议使用 PyMOL 和 AmberTools 处理）。目前不支持含二硫键的蛋白质。","3.7+",[101,102],"Docker (26.1+)","NVIDIA Container Toolkit (隐含需求)",[18],"2026-03-27T02:49:30.150509","2026-04-06T06:44:08.872216",[107,112,117,122,126,131],{"id":108,"question_zh":109,"answer_zh":110,"source_url":111},14833,"运行模拟时出现\"Solvent dynamic component Tinker terminated abnormally\"错误或卡在 80-100 步怎么办？","这通常是因为 GPU 配置问题。首先检查终端输出是否包含\"ERROR:root:tinker-GPU is specified, but there's no GPU. Reverting back to CPU\"。如果有，说明 GPU 未被正确识别。请确保：1. 您的显卡型号在 README 支持列表中（例如 RTX 3090、P6000 可能不支持，建议更换为 V100 或 A100）；2. Docker 已正确配置 GPU 权限（使用 --gpus all 参数）；3. 无需在 Docker 内部额外设置参数，只需确保宿主机 GPU 可用且型号受支持。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fissues\u002F27",{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},14834,"遇到\"[Errno 32] Broken pipe\"错误导致模拟终止的原因是什么？","该错误通常由 GPU 兼容性问题引起。即使 Docker 拉取成功，如果使用的显卡（如 P6000）不在 Tinker-GPU 的支持列表中，也会导致此错误。解决方案是更换为支持的显卡型号（如 V100 或 A100），并确保在运行命令中添加了 --gpus all 参数以启用 GPU 加速。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fissues\u002F25",{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},14835,"如何处理 pdb4amber 处理后产生的 ACE\u002FNME 残基原子命名不匹配问题？","pdb4amber 生成的文件中，ACE 和 NME 残基的氢原子名称（如 H1, H2, HN2）可能与 AI2BMD 要求的名称（如 H, HH31, HH32, HH33）不一致。虽然可以手动修改 PDB 文件中的原子名称，但如果预处理后仍报错，可能需要检查预处理脚本中的盒子大小设置。对于大蛋白（如超过 300 个氨基酸），建议在预处理脚本中将默认的盒子大小（box size，默认值 20）改小（例如改为 10），以避免体积过大导致的计算错误。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fissues\u002F29",{"id":123,"question_zh":124,"answer_zh":125,"source_url":121},14836,"如何在 Docker 容器中运行本地修改过的 preprocess.py 脚本？","默认情况下，Docker 容器会使用镜像内的初始脚本。若要运行本地修改后的 src\u002FAIMD\u002Fpreprocess.py，需要利用 Docker 的卷挂载功能。在启动命令中，将本地修改后的脚本路径挂载到容器内对应的路径，覆盖原文件。例如，确保启动命令中包含类似 -v \u002Flocal\u002Fpath\u002Fto\u002Fsrc\u002FAIMD\u002Fpreprocess.py:\u002Fai2bmd\u002FAIMD\u002Fpreprocess.py 的参数（具体容器内路径需根据项目结构确认），这样容器启动时就会读取您本地的修改版本而不是镜像内的默认版本。",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},14837,"Pretrained-3D-ViSNet 模型在验证集上的表现较差（如 MAE 约为 0.091）是否正常？","根据项目方的解释，预训练过程可以从\"对比蒸馏\"（contrastive distillation）的角度理解。学生网络（2D net）的性能指标（如 0.091）并不直接代表最终蒸馏效果的失败，而是预训练阶段的一种中间状态或特定评估方式的结果。该数值反映了学生网络在特定任务上的表现，需结合最终的蒸馏后模型性能来综合评估，单看此数值不能断定蒸馏工作效果不好。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FAI2BMD\u002Fissues\u002F2",{"id":132,"question_zh":133,"answer_zh":134,"source_url":116},14838,"哪些 GPU 型号支持 AI2BMD 的 Tinker-GPU 加速？","并非所有 NVIDIA GPU 都受支持。根据用户反馈，RTX 3090 和 P6000 在某些版本中可能导致\"Broken pipe\"或回退到 CPU 的错误。官方推荐并验证支持的型号包括 V100 和 A100。在运行前，请务必查阅项目 README 文档中的支持列表（support list），确保您的硬件在列，否则程序可能会自动回退到 CPU 模式或直接报错终止。",[136,141,146,151],{"id":137,"version":138,"summary_zh":139,"released_at":140},81694,"v1.1.0","添加新的检查点 (`--ckpt-type de11d1421ccda37ffab07d7403c8f5bb`)，该检查点能够处理含有胱氨酸（CYX）残基的蛋白质。","2025-02-18T07:57:45",{"id":142,"version":143,"summary_zh":144,"released_at":145},81695,"v1.0.0","各种生活质量修复","2025-02-18T07:46:27",{"id":147,"version":148,"summary_zh":149,"released_at":150},81696,"v0.1.0","AI2BMD 初次公开发布","2025-02-18T07:40:03",{"id":152,"version":153,"summary_zh":80,"released_at":154},81697,"ViSNet-v1.0","2023-11-03T10:18:32"]