[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-google-deepmind--alphafold":3,"tool-google-deepmind--alphafold":65},[4,23,32,40,49,57],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":22},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85013,2,"2026-04-06T11:09:19",[13,14,15,16,17,18,19,20,21],"图像","数据工具","视频","插件","Agent","其他","语言模型","开发框架","音频","ready",{"id":24,"name":25,"github_repo":26,"description_zh":27,"stars":28,"difficulty_score":29,"last_commit_at":30,"category_tags":31,"status":22},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,3,"2026-04-04T04:44:48",[17,13,20,19,18],{"id":33,"name":34,"github_repo":35,"description_zh":36,"stars":37,"difficulty_score":29,"last_commit_at":38,"category_tags":39,"status":22},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74991,"2026-04-06T23:16:49",[19,13,20,18],{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":46,"last_commit_at":47,"category_tags":48,"status":22},3215,"awesome-machine-learning","josephmisiti\u002Fawesome-machine-learning","awesome-machine-learning 是一份精心整理的机器学习资源清单，汇集了全球优秀的机器学习框架、库和软件工具。面对机器学习领域技术迭代快、资源分散且难以甄选的痛点，这份清单按编程语言（如 Python、C++、Go 等）和应用场景（如计算机视觉、自然语言处理、深度学习等）进行了系统化分类，帮助使用者快速定位高质量项目。\n\n它特别适合开发者、数据科学家及研究人员使用。无论是初学者寻找入门库，还是资深工程师对比不同语言的技术选型，都能从中获得极具价值的参考。此外，清单还延伸提供了免费书籍、在线课程、行业会议、技术博客及线下聚会等丰富资源，构建了从学习到实践的全链路支持体系。\n\n其独特亮点在于严格的维护标准：明确标记已停止维护或长期未更新的项目，确保推荐内容的时效性与可靠性。作为机器学习领域的“导航图”，awesome-machine-learning 以开源协作的方式持续更新，旨在降低技术探索门槛，让每一位从业者都能高效地站在巨人的肩膀上创新。",72149,1,"2026-04-03T21:50:24",[20,18],{"id":50,"name":51,"github_repo":52,"description_zh":53,"stars":54,"difficulty_score":46,"last_commit_at":55,"category_tags":56,"status":22},2234,"scikit-learn","scikit-learn\u002Fscikit-learn","scikit-learn 是一个基于 Python 构建的开源机器学习库，依托于 SciPy、NumPy 等科学计算生态，旨在让机器学习变得简单高效。它提供了一套统一且简洁的接口，涵盖了从数据预处理、特征工程到模型训练、评估及选择的全流程工具，内置了包括线性回归、支持向量机、随机森林、聚类等在内的丰富经典算法。\n\n对于希望快速验证想法或构建原型的数据科学家、研究人员以及 Python 开发者而言，scikit-learn 是不可或缺的基础设施。它有效解决了机器学习入门门槛高、算法实现复杂以及不同模型间调用方式不统一的痛点，让用户无需重复造轮子，只需几行代码即可调用成熟的算法解决分类、回归、聚类等实际问题。\n\n其核心技术亮点在于高度一致的 API 设计风格，所有估算器（Estimator）均遵循相同的调用逻辑，极大地降低了学习成本并提升了代码的可读性与可维护性。此外，它还提供了强大的模型选择与评估工具，如交叉验证和网格搜索，帮助用户系统地优化模型性能。作为一个由全球志愿者共同维护的成熟项目，scikit-learn 以其稳定性、详尽的文档和活跃的社区支持，成为连接理论学习与工业级应用的最",65644,"2026-04-06T10:25:08",[20,18,14],{"id":58,"name":59,"github_repo":60,"description_zh":61,"stars":62,"difficulty_score":10,"last_commit_at":63,"category_tags":64,"status":22},3364,"keras","keras-team\u002Fkeras","Keras 是一个专为人类设计的深度学习框架，旨在让构建和训练神经网络变得简单直观。它解决了开发者在不同深度学习后端之间切换困难、模型开发效率低以及难以兼顾调试便捷性与运行性能的痛点。\n\n无论是刚入门的学生、专注算法的研究人员，还是需要快速落地产品的工程师，都能通过 Keras 轻松上手。它支持计算机视觉、自然语言处理、音频分析及时间序列预测等多种任务。\n\nKeras 3 的核心亮点在于其独特的“多后端”架构。用户只需编写一套代码，即可灵活选择 TensorFlow、JAX、PyTorch 或 OpenVINO 作为底层运行引擎。这一特性不仅保留了 Keras 一贯的高层易用性，还允许开发者根据需求自由选择：利用 JAX 或 PyTorch 的即时执行模式进行高效调试，或切换至速度最快的后端以获得最高 350% 的性能提升。此外，Keras 具备强大的扩展能力，能无缝从本地笔记本电脑扩展至大规模 GPU 或 TPU 集群，是连接原型开发与生产部署的理想桥梁。",63927,"2026-04-04T15:24:37",[20,14,18],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":71,"readme_en":72,"readme_zh":73,"quickstart_zh":74,"use_case_zh":75,"hero_image_url":76,"owner_login":77,"owner_name":78,"owner_avatar_url":79,"owner_bio":80,"owner_company":81,"owner_location":81,"owner_email":81,"owner_twitter":81,"owner_website":82,"owner_url":83,"languages":84,"stars":100,"forks":101,"last_commit_at":102,"license":103,"difficulty_score":104,"env_os":105,"env_gpu":106,"env_ram":107,"env_deps":108,"category_tags":117,"github_topics":81,"view_count":10,"oss_zip_url":81,"oss_zip_packed_at":81,"status":22,"created_at":118,"updated_at":119,"faqs":120,"releases":150},4731,"google-deepmind\u002Falphafold","alphafold","Open source code for AlphaFold 2.","AlphaFold 是由 DeepMind 开发的开源人工智能系统，核心功能是根据氨基酸序列高精度预测蛋白质的三维结构。它成功解决了生物学领域长期存在的“蛋白质折叠问题”，即如何从一维基因信息推导出决定生物功能的复杂空间形态，其预测精度在多项国际竞赛中已达到实验级水平。\n\n这套代码实现了 AlphaFold 2 的完整推理流程，并包含了用于预测蛋白质复合物的 AlphaFold-Multimer 模型。其独特的技术亮点在于深度学习架构与进化生物学数据的深度融合：通过挖掘多序列比对中的共进化信号，结合注意力机制神经网络，无需依赖大量已知结构模板即可实现从头预测。\n\nAlphaFold 主要面向生物信息学研究人员、计算生物学家及具备 Linux 操作基础的开发者使用。由于运行该系统需要配置 Docker 环境、下载数百 GB 的遗传数据库，并依赖高性能 NVIDIA GPU 进行计算，它对硬件资源和工程部署能力有一定门槛，因此不太适合无编程背景的普通用户或仅需简单可视化设计的设计师。对于科研团队而言，AlphaFold 提供了强大的本地化部署方案，能够加速新药研发、酶设计及基础生命科学探索","AlphaFold 是由 DeepMind 开发的开源人工智能系统，核心功能是根据氨基酸序列高精度预测蛋白质的三维结构。它成功解决了生物学领域长期存在的“蛋白质折叠问题”，即如何从一维基因信息推导出决定生物功能的复杂空间形态，其预测精度在多项国际竞赛中已达到实验级水平。\n\n这套代码实现了 AlphaFold 2 的完整推理流程，并包含了用于预测蛋白质复合物的 AlphaFold-Multimer 模型。其独特的技术亮点在于深度学习架构与进化生物学数据的深度融合：通过挖掘多序列比对中的共进化信号，结合注意力机制神经网络，无需依赖大量已知结构模板即可实现从头预测。\n\nAlphaFold 主要面向生物信息学研究人员、计算生物学家及具备 Linux 操作基础的开发者使用。由于运行该系统需要配置 Docker 环境、下载数百 GB 的遗传数据库，并依赖高性能 NVIDIA GPU 进行计算，它对硬件资源和工程部署能力有一定门槛，因此不太适合无编程背景的普通用户或仅需简单可视化设计的设计师。对于科研团队而言，AlphaFold 提供了强大的本地化部署方案，能够加速新药研发、酶设计及基础生命科学探索，是连接基因序列与生命功能的关键桥梁。","![header](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgoogle-deepmind_alphafold_readme_f0835d8a2191.jpg)\n\n# AlphaFold\n\nThis package provides an implementation of the inference pipeline of AlphaFold\nv2. For simplicity, we refer to this model as AlphaFold throughout the rest of\nthis document.\n\nWe also provide:\n\n1.  An implementation of AlphaFold-Multimer. This represents a work in progress\n    and AlphaFold-Multimer isn't expected to be as stable as our monomer\n    AlphaFold system. [Read the guide](#updating-existing-installation) for how\n    to upgrade and update code.\n2.  The [technical note](docs\u002Ftechnical_note_v2.3.0.md) containing the models\n    and inference procedure for an updated AlphaFold v2.3.0.\n3.  A [CASP15 baseline](docs\u002Fcasp15_predictions.zip) set of predictions along\n    with documentation of any manual interventions performed.\n\nAny publication that discloses findings arising from using this source code or\nthe model parameters should [cite](#citing-this-work) the\n[AlphaFold paper](https:\u002F\u002Fdoi.org\u002F10.1038\u002Fs41586-021-03819-2) and, if\napplicable, the\n[AlphaFold-Multimer paper](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2021.10.04.463034v1).\n\nPlease also refer to the\n[Supplementary Information](https:\u002F\u002Fstatic-content.springer.com\u002Fesm\u002Fart%3A10.1038%2Fs41586-021-03819-2\u002FMediaObjects\u002F41586_2021_3819_MOESM1_ESM.pdf)\nfor a detailed description of the method.\n\n**You can use a slightly simplified version of AlphaFold with\ncommunity-supported versions (see below).\n\nIf you have any questions, please contact the AlphaFold team at\n[alphafold@deepmind.com](mailto:alphafold@deepmind.com).\n\n![CASP14 predictions](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgoogle-deepmind_alphafold_readme_f6135c1ce55e.gif)\n\n## Installation and running your first prediction\n\nYou will need a machine running Linux, AlphaFold does not support other\noperating systems. Full installation requires up to 3 TB of disk space to keep\ngenetic databases (SSD storage is recommended) and a modern NVIDIA GPU (GPUs\nwith more memory can predict larger protein structures).\n\nPlease follow these steps:\n\n1.  Install [Docker](https:\u002F\u002Fwww.docker.com\u002F).\n\n    *   Install\n        [NVIDIA Container Toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Finstall-guide.html)\n        for GPU support.\n    *   Setup running\n        [Docker as a non-root user](https:\u002F\u002Fdocs.docker.com\u002Fengine\u002Finstall\u002Flinux-postinstall\u002F#manage-docker-as-a-non-root-user).\n\n1.  Clone this repository and `cd` into it.\n\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Falphafold.git\n    cd .\u002Falphafold\n    ```\n\n1.  Download genetic databases and model parameters:\n\n    *   Install `aria2c`. On most Linux distributions it is available via the\n        package manager as the `aria2` package (on Debian-based distributions\n        this can be installed by running `sudo apt install aria2`).\n        Same for `rsync`.\n\n    *   Please use the script `scripts\u002Fdownload_all_data.sh` to download and set\n        up full databases. This may take substantial time (download size is 556\n        GB), so we recommend running this script in the background:\n\n    ```bash\n    scripts\u002Fdownload_all_data.sh \u003CDOWNLOAD_DIR> > download.log 2> download_all.log &\n    ```\n\n    *   **Note: The download directory `\u003CDOWNLOAD_DIR>` should *not* be a\n        subdirectory in the AlphaFold repository directory.** If it is, the\n        Docker build will be slow as the large databases will be copied into the\n        docker build context.\n\n    *   It is possible to run AlphaFold with reduced databases; please refer to\n        the [complete documentation](#genetic-databases).\n\n1.  Check that AlphaFold will be able to use a GPU by running:\n\n    ```bash\n    docker run --rm --gpus all nvidia\u002Fcuda:11.0-base nvidia-smi\n    ```\n\n    The output of this command should show a list of your GPUs. If it doesn't,\n    check if you followed all steps correctly when setting up the\n    [NVIDIA Container Toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Finstall-guide.html)\n    or take a look at the following\n    [NVIDIA Docker issue](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fnvidia-docker\u002Fissues\u002F1447#issuecomment-801479573).\n\n    If you wish to run AlphaFold using Singularity (a common containerization\n    platform on HPC systems) we recommend using some of the third party\n    Singularity setups as linked in\n    https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Falphafold\u002Fissues\u002F10 or\n    https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Falphafold\u002Fissues\u002F24.\n\n1.  Build the Docker image:\n\n    ```bash\n    docker build -f docker\u002FDockerfile -t alphafold .\n    ```\n\n    If you encounter the following error:\n\n    ```\n    W: GPG error: https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fcuda\u002Frepos\u002Fubuntu1804\u002Fx86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC\n    E: The repository 'https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fcuda\u002Frepos\u002Fubuntu1804\u002Fx86_64 InRelease' is not signed.\n    ```\n\n    use the workaround described in\n    https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Falphafold\u002Fissues\u002F463#issuecomment-1124881779.\n\n1.  Install the `run_docker.py` dependencies. Note: You may optionally wish to\n    create a\n    [Python Virtual Environment](https:\u002F\u002Fdocs.python.org\u002F3\u002Ftutorial\u002Fvenv.html)\n    to prevent conflicts with your system's Python environment.\n\n    ```bash\n    pip3 install -r docker\u002Frequirements.txt\n    ```\n\n1.  Make sure that the output directory exists (the default is `\u002Ftmp\u002Falphafold`)\n    and that you have sufficient permissions to write into it.\n\n1.  Run `run_docker.py` pointing to a FASTA file containing the protein\n    sequence(s) for which you wish to predict the structure (`--fasta_paths`\n    parameter). AlphaFold will search for the available templates before the\n    date specified by the `--max_template_date` parameter; this could be used to\n    avoid certain templates during modeling. `--data_dir` is the directory with\n    downloaded genetic databases and `--output_dir` is the absolute path to the\n    output directory.\n\n    ```bash\n    python3 docker\u002Frun_docker.py \\\n      --fasta_paths=your_protein.fasta \\\n      --max_template_date=2022-01-01 \\\n      --data_dir=$DOWNLOAD_DIR \\\n      --output_dir=\u002Fhome\u002Fuser\u002Fabsolute_path_to_the_output_dir\n    ```\n\n1.  Once the run is over, the output directory shall contain predicted\n    structures of the target protein. Please check the documentation below for\n    additional options and troubleshooting tips.\n\n### Genetic databases\n\nThis step requires `aria2c` to be installed on your machine.\n\nAlphaFold needs multiple genetic (sequence) databases to run:\n\n*   [BFD](https:\u002F\u002Fbfd.mmseqs.com\u002F),\n*   [MGnify](https:\u002F\u002Fwww.ebi.ac.uk\u002Fmetagenomics\u002F),\n*   [PDB70](http:\u002F\u002Fwwwuser.gwdg.de\u002F~compbiol\u002Fdata\u002Fhhsuite\u002Fdatabases\u002Fhhsuite_dbs\u002F),\n*   [PDB](https:\u002F\u002Fwww.rcsb.org\u002F) (structures in the mmCIF format),\n*   [PDB seqres](https:\u002F\u002Fwww.rcsb.org\u002F) – only for AlphaFold-Multimer,\n*   [UniRef30 (FKA UniClust30)](https:\u002F\u002Funiclust.mmseqs.com\u002F),\n*   [UniProt](https:\u002F\u002Fwww.uniprot.org\u002Funiprot\u002F) – only for AlphaFold-Multimer,\n*   [UniRef90](https:\u002F\u002Fwww.uniprot.org\u002Fhelp\u002Funiref).\n\nWe provide a script `scripts\u002Fdownload_all_data.sh` that can be used to download\nand set up all of these databases:\n\n*   Recommended default:\n\n    ```bash\n    scripts\u002Fdownload_all_data.sh \u003CDOWNLOAD_DIR>\n    ```\n\n    will download the full databases.\n\n*   With `reduced_dbs` parameter:\n\n    ```bash\n    scripts\u002Fdownload_all_data.sh \u003CDOWNLOAD_DIR> reduced_dbs\n    ```\n\n    will download a reduced version of the databases to be used with the\n    `reduced_dbs` database preset. This shall be used with the corresponding\n    AlphaFold parameter `--db_preset=reduced_dbs` later during the AlphaFold run\n    (please see [AlphaFold parameters](#running-alphafold) section).\n\n:ledger: **Note: The download directory `\u003CDOWNLOAD_DIR>` should *not* be a\nsubdirectory in the AlphaFold repository directory.** If it is, the Docker build\nwill be slow as the large databases will be copied during the image creation.\n\nWe don't provide exactly the database versions used in CASP14 – see the\n[note on reproducibility](#note-on-casp14-reproducibility). Some of the\ndatabases are mirrored for speed, see [mirrored databases](#mirrored-databases).\n\n:ledger: **Note: The total download size for the full databases is around 556 GB\nand the total size when unzipped is 2.62 TB. Please make sure you have a large\nenough hard drive space, bandwidth and time to download. We recommend using an\nSSD for better genetic search performance.**\n\n:ledger: **Note: If the download directory and datasets don't have full read and\nwrite permissions, it can cause errors with the MSA tools, with opaque\n(external) error messages. Please ensure the required permissions are applied,\ne.g. with the `sudo chmod 755 --recursive \"$DOWNLOAD_DIR\"` command.**\n\nThe `download_all_data.sh` script will also download the model parameter files.\nOnce the script has finished, you should have the following directory structure:\n\n```\n$DOWNLOAD_DIR\u002F                             # Total: ~ 2.62 TB (download: 556 GB)\n    bfd\u002F                                   # ~ 1.8 TB (download: 271.6 GB)\n        # 6 files.\n    mgnify\u002F                                # ~ 120 GB (download: 67 GB)\n        mgy_clusters_2022_05.fa\n    params\u002F                                # ~ 5.3 GB (download: 5.3 GB)\n        # 5 CASP14 models,\n        # 5 pTM models,\n        # 5 AlphaFold-Multimer models,\n        # LICENSE,\n        # = 16 files.\n    pdb70\u002F                                 # ~ 56 GB (download: 19.5 GB)\n        # 9 files.\n    pdb_mmcif\u002F                             # ~ 238 GB (download: 43 GB)\n        mmcif_files\u002F\n            # About 199,000 .cif files.\n        obsolete.dat\n    pdb_seqres\u002F                            # ~ 0.2 GB (download: 0.2 GB)\n        pdb_seqres.txt\n    small_bfd\u002F                             # ~ 17 GB (download: 9.6 GB)\n        bfd-first_non_consensus_sequences.fasta\n    uniref30\u002F                              # ~ 206 GB (download: 52.5 GB)\n        # 7 files.\n    uniprot\u002F                               # ~ 105 GB (download: 53 GB)\n        uniprot.fasta\n    uniref90\u002F                              # ~ 67 GB (download: 34 GB)\n        uniref90.fasta\n```\n\n`bfd\u002F` is only downloaded if you download the full databases, and `small_bfd\u002F`\nis only downloaded if you download the reduced databases.\n\n### Model parameters\n\nWhile the AlphaFold code is licensed under the Apache 2.0 License, the AlphaFold\nparameters and CASP15 prediction data are made available under the terms of the\nCC BY 4.0 license. Please see the [Disclaimer](#license-and-disclaimer) below\nfor more detail.\n\nThe AlphaFold parameters are available from\nhttps:\u002F\u002Fstorage.googleapis.com\u002Falphafold\u002Falphafold_params_2022-12-06.tar, and\nare downloaded as part of the `scripts\u002Fdownload_all_data.sh` script. This script\nwill download parameters for:\n\n*   5 models which were used during CASP14, and were extensively validated for\n    structure prediction quality (see Jumper et al. 2021, Suppl. Methods 1.12\n    for details).\n*   5 pTM models, which were fine-tuned to produce pTM (predicted TM-score) and\n    (PAE) predicted aligned error values alongside their structure predictions\n    (see Jumper et al. 2021, Suppl. Methods 1.9.7 for details).\n*   5 AlphaFold-Multimer models that produce pTM and PAE values alongside their\n    structure predictions.\n\n### Updating existing installation\n\nIf you have a previous version you can either reinstall fully from scratch\n(remove everything and run the setup from scratch) or you can do an incremental\nupdate that will be significantly faster but will require a bit more work. Make\nsure you follow these steps in the exact order they are listed below:\n\n1.  **Update the code.**\n    *   Go to the directory with the cloned AlphaFold repository and run `git\n        fetch origin main` to get all code updates.\n1.  **Update the UniProt, UniRef, MGnify and PDB seqres databases.**\n    *   Remove `\u003CDOWNLOAD_DIR>\u002Funiprot`.\n    *   Run `scripts\u002Fdownload_uniprot.sh \u003CDOWNLOAD_DIR>`.\n    *   Remove `\u003CDOWNLOAD_DIR>\u002Funiclust30`.\n    *   Run `scripts\u002Fdownload_uniref30.sh \u003CDOWNLOAD_DIR>`.\n    *   Remove `\u003CDOWNLOAD_DIR>\u002Funiref90`.\n    *   Run `scripts\u002Fdownload_uniref90.sh \u003CDOWNLOAD_DIR>`.\n    *   Remove `\u003CDOWNLOAD_DIR>\u002Fmgnify`.\n    *   Run `scripts\u002Fdownload_mgnify.sh \u003CDOWNLOAD_DIR>`.\n    *   Remove `\u003CDOWNLOAD_DIR>\u002Fpdb_mmcif`. It is needed to have PDB SeqRes and\n        PDB from exactly the same date. Failure to do this step will result in\n        potential errors when searching for templates when running\n        AlphaFold-Multimer.\n    *   Run `scripts\u002Fdownload_pdb_mmcif.sh \u003CDOWNLOAD_DIR>`.\n    *   Run `scripts\u002Fdownload_pdb_seqres.sh \u003CDOWNLOAD_DIR>`.\n1.  **Update the model parameters.**\n    *   Remove the old model parameters in `\u003CDOWNLOAD_DIR>\u002Fparams`.\n    *   Download new model parameters using\n        `scripts\u002Fdownload_alphafold_params.sh \u003CDOWNLOAD_DIR>`.\n1.  **Follow [Running AlphaFold](#running-alphafold).**\n\n#### Using deprecated model weights\n\nTo use the deprecated v2.2.0 AlphaFold-Multimer model weights:\n\n1.  Change `SOURCE_URL` in `scripts\u002Fdownload_alphafold_params.sh` to\n    `https:\u002F\u002Fstorage.googleapis.com\u002Falphafold\u002Falphafold_params_2022-03-02.tar`,\n    and download the old parameters.\n2.  Change the `_v3` to `_v2` in the multimer `MODEL_PRESETS` in `config.py`.\n\nTo use the deprecated v2.1.0 AlphaFold-Multimer model weights:\n\n1.  Change `SOURCE_URL` in `scripts\u002Fdownload_alphafold_params.sh` to\n    `https:\u002F\u002Fstorage.googleapis.com\u002Falphafold\u002Falphafold_params_2022-01-19.tar`,\n    and download the old parameters.\n2.  Remove the `_v3` in the multimer `MODEL_PRESETS` in `config.py`.\n\n## Running AlphaFold\n\n**The simplest way to run AlphaFold is using the provided Docker script.** This\nwas tested on Google Cloud with a machine using the `nvidia-gpu-cloud-image`\nwith 12 vCPUs, 85 GB of RAM, a 100 GB boot disk, the databases on an additional\n3 TB disk, and an A100 GPU. For your first run, please follow the instructions\nfrom\n[Installation and running your first prediction](#installation-and-running-your-first-prediction)\nsection.\n\n1.  By default, Alphafold will attempt to use all visible GPU devices. To use a\n    subset, specify a comma-separated list of GPU UUID(s) or index(es) using the\n    `--gpu_devices` flag. See\n    [GPU enumeration](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Fuser-guide.html#gpu-enumeration)\n    for more details.\n\n1.  You can control which AlphaFold model to run by adding the `--model_preset=`\n    flag. We provide the following models:\n\n    *   **monomer**: This is the original model used at CASP14 with no\n        ensembling.\n\n    *   **monomer\\_casp14**: This is the original model used at CASP14 with\n        `num_ensemble=8`, matching our CASP14 configuration. This is largely\n        provided for reproducibility as it is 8x more computationally expensive\n        for limited accuracy gain (+0.1 average GDT gain on CASP14 domains).\n\n    *   **monomer\\_ptm**: This is the original CASP14 model fine tuned with the\n        pTM head, providing a pairwise confidence measure. It is slightly less\n        accurate than the normal monomer model.\n\n    *   **multimer**: This is the [AlphaFold-Multimer](#citing-this-work) model.\n        To use this model, provide a multi-sequence FASTA file. In addition, the\n        UniProt database should have been downloaded.\n\n1.  You can control MSA speed\u002Fquality tradeoff by adding\n    `--db_preset=reduced_dbs` or `--db_preset=full_dbs` to the run command. We\n    provide the following presets:\n\n    *   **reduced\\_dbs**: This preset is optimized for speed and lower hardware\n        requirements. It runs with a reduced version of the BFD database. It\n        requires 8 CPU cores (vCPUs), 8 GB of RAM, and 600 GB of disk space.\n\n    *   **full\\_dbs**: This runs with all genetic databases used at CASP14.\n\n    Running the command above with the `monomer` model preset and the\n    `reduced_dbs` data preset would look like this:\n\n    ```bash\n    python3 docker\u002Frun_docker.py \\\n      --fasta_paths=T1050.fasta \\\n      --max_template_date=2020-05-14 \\\n      --model_preset=monomer \\\n      --db_preset=reduced_dbs \\\n      --data_dir=$DOWNLOAD_DIR \\\n      --output_dir=\u002Fhome\u002Fuser\u002Fabsolute_path_to_the_output_dir\n    ```\n\n1.  After generating the predicted model, AlphaFold runs a relaxation step to\n    improve local geometry. By default, only the best model (by pLDDT) is\n    relaxed (`--models_to_relax=best`), but also all of the models\n    (`--models_to_relax=all`) or none of the models (`--models_to_relax=none`)\n    can be relaxed.\n\n1.  The relaxation step can be run on GPU (faster, but could be less stable) or\n    CPU (slow, but stable). This can be controlled with\n    `--enable_gpu_relax=true` (default) or `--enable_gpu_relax=false`.\n\n1.  AlphaFold can reuse MSAs (multiple sequence alignments) for the same\n    sequence via `--use_precomputed_msas=true` option; this can be useful for\n    trying different AlphaFold parameters. This option assumes that the\n    directory structure generated by the first AlphaFold run in the output\n    directory exists and that the protein sequence is the same.\n\n### Running AlphaFold-Multimer\n\nAll steps are the same as when running the monomer system, but you will have to\n\n*   provide an input fasta with multiple sequences,\n*   set `--model_preset=multimer`,\n\nAn example that folds a protein complex `multimer.fasta`:\n\n```bash\npython3 docker\u002Frun_docker.py \\\n  --fasta_paths=multimer.fasta \\\n  --max_template_date=2020-05-14 \\\n  --model_preset=multimer \\\n  --data_dir=$DOWNLOAD_DIR \\\n  --output_dir=\u002Fhome\u002Fuser\u002Fabsolute_path_to_the_output_dir\n```\n\nBy default the multimer system will run 5 seeds per model (25 total predictions)\nfor a small drop in accuracy you may wish to run a single seed per model. This\ncan be done via the `--num_multimer_predictions_per_model` flag, e.g. set it to\n`--num_multimer_predictions_per_model=1` to run a single seed per model.\n\n### AlphaFold prediction speed\n\nThe table below reports prediction runtimes for proteins of various lengths. We\nonly measure unrelaxed structure prediction with three recycles while excluding\nruntimes from MSA and template search. When running `docker\u002Frun_docker.py` with\n`--benchmark=true`, this runtime is stored in `timings.json`. All runtimes are\nfrom a single A100 NVIDIA GPU. Prediction speed on A100 for smaller structures\ncan be improved by increasing `global_config.subbatch_size` in\n`alphafold\u002Fmodel\u002Fconfig.py`.\n\nNo. residues | Prediction time (s)\n-----------: | ------------------:\n100          | 4.9\n200          | 7.7\n300          | 13\n400          | 18\n500          | 29\n600          | 36\n700          | 53\n800          | 60\n900          | 91\n1,000        | 96\n1,100        | 140\n1,500        | 280\n2,000        | 450\n2,500        | 969\n3,000        | 1,240\n3,500        | 2,465\n4,000        | 5,660\n4,500        | 12,475\n5,000        | 18,824\n\n### Examples\n\nBelow are examples on how to use AlphaFold in different scenarios.\n\n#### Folding a monomer\n\nSay we have a monomer with the sequence `\u003CSEQUENCE>`. The input fasta should be:\n\n```fasta\n>sequence_name\n\u003CSEQUENCE>\n```\n\nThen run the following command:\n\n```bash\npython3 docker\u002Frun_docker.py \\\n  --fasta_paths=monomer.fasta \\\n  --max_template_date=2021-11-01 \\\n  --model_preset=monomer \\\n  --data_dir=$DOWNLOAD_DIR \\\n  --output_dir=\u002Fhome\u002Fuser\u002Fabsolute_path_to_the_output_dir\n```\n\n#### Folding a homomer\n\nSay we have a homomer with 3 copies of the same sequence `\u003CSEQUENCE>`. The input\nfasta should be:\n\n```fasta\n>sequence_1\n\u003CSEQUENCE>\n>sequence_2\n\u003CSEQUENCE>\n>sequence_3\n\u003CSEQUENCE>\n```\n\nThen run the following command:\n\n```bash\npython3 docker\u002Frun_docker.py \\\n  --fasta_paths=homomer.fasta \\\n  --max_template_date=2021-11-01 \\\n  --model_preset=multimer \\\n  --data_dir=$DOWNLOAD_DIR \\\n  --output_dir=\u002Fhome\u002Fuser\u002Fabsolute_path_to_the_output_dir\n```\n\n#### Folding a heteromer\n\nSay we have an A2B3 heteromer, i.e. with 2 copies of `\u003CSEQUENCE A>` and 3 copies\nof `\u003CSEQUENCE B>`. The input fasta should be:\n\n```fasta\n>sequence_1\n\u003CSEQUENCE A>\n>sequence_2\n\u003CSEQUENCE A>\n>sequence_3\n\u003CSEQUENCE B>\n>sequence_4\n\u003CSEQUENCE B>\n>sequence_5\n\u003CSEQUENCE B>\n```\n\nThen run the following command:\n\n```bash\npython3 docker\u002Frun_docker.py \\\n  --fasta_paths=heteromer.fasta \\\n  --max_template_date=2021-11-01 \\\n  --model_preset=multimer \\\n  --data_dir=$DOWNLOAD_DIR \\\n  --output_dir=\u002Fhome\u002Fuser\u002Fabsolute_path_to_the_output_dir\n```\n\n#### Folding multiple monomers one after another\n\nSay we have a two monomers, `monomer1.fasta` and `monomer2.fasta`.\n\nWe can fold both sequentially by using the following command:\n\n```bash\npython3 docker\u002Frun_docker.py \\\n  --fasta_paths=monomer1.fasta,monomer2.fasta \\\n  --max_template_date=2021-11-01 \\\n  --model_preset=monomer \\\n  --data_dir=$DOWNLOAD_DIR \\\n  --output_dir=\u002Fhome\u002Fuser\u002Fabsolute_path_to_the_output_dir\n```\n\n#### Folding multiple multimers one after another\n\nSay we have a two multimers, `multimer1.fasta` and `multimer2.fasta`.\n\nWe can fold both sequentially by using the following command:\n\n```bash\npython3 docker\u002Frun_docker.py \\\n  --fasta_paths=multimer1.fasta,multimer2.fasta \\\n  --max_template_date=2021-11-01 \\\n  --model_preset=multimer \\\n  --data_dir=$DOWNLOAD_DIR \\\n  --output_dir=\u002Fhome\u002Fuser\u002Fabsolute_path_to_the_output_dir\n```\n\n### AlphaFold output\n\nThe outputs will be saved in a subdirectory of the directory provided via the\n`--output_dir` flag of `run_docker.py` (defaults to `\u002Ftmp\u002Falphafold\u002F`). The\noutputs include the computed MSAs, unrelaxed structures, relaxed structures,\nranked structures, raw model outputs, prediction metadata, and section timings.\nThe `--output_dir` directory will have the following structure:\n\n```\n\u003Ctarget_name>\u002F\n    features.pkl\n    ranked_{0,1,2,3,4}.pdb\n    ranking_debug.json\n    relax_metrics.json\n    relaxed_model_{1,2,3,4,5}.pdb\n    result_model_{1,2,3,4,5}.pkl\n    timings.json\n    unrelaxed_model_{1,2,3,4,5}.pdb\n    msas\u002F\n        bfd_uniref_hits.a3m\n        mgnify_hits.sto\n        uniref90_hits.sto\n```\n\nThe contents of each output file are as follows:\n\n*   `features.pkl` – A `pickle` file containing the input feature NumPy arrays\n    used by the models to produce the structures.\n*   `unrelaxed_model_*.pdb` – A PDB format text file containing the predicted\n    structure, exactly as outputted by the model.\n*   `relaxed_model_*.pdb` – A PDB format text file containing the predicted\n    structure, after performing an Amber relaxation procedure on the unrelaxed\n    structure prediction (see Jumper et al. 2021, Suppl. Methods 1.8.6 for\n    details).\n*   `ranked_*.pdb` – A PDB format text file containing the predicted structures,\n    after reordering by model confidence. Here `ranked_i.pdb` should contain the\n    prediction with the (`i + 1`)-th highest confidence (so that `ranked_0.pdb`\n    has the highest confidence). To rank model confidence, we use predicted LDDT\n    (pLDDT) scores (see Jumper et al. 2021, Suppl. Methods 1.9.6 for details).\n    If `--models_to_relax=all` then all ranked structures are relaxed. If\n    `--models_to_relax=best` then only `ranked_0.pdb` is relaxed (the rest are\n    unrelaxed). If `--models_to_relax=none`, then the ranked structures are all\n    unrelaxed.\n*   `ranking_debug.json` – A JSON format text file containing the pLDDT values\n    used to perform the model ranking, and a mapping back to the original model\n    names.\n*   `relax_metrics.json` – A JSON format text file containing relax metrics, for\n    instance remaining violations.\n*   `timings.json` – A JSON format text file containing the times taken to run\n    each section of the AlphaFold pipeline.\n*   `msas\u002F` - A directory containing the files describing the various genetic\n    tool hits that were used to construct the input MSA.\n*   `result_model_*.pkl` – A `pickle` file containing a nested dictionary of the\n    various NumPy arrays directly produced by the model. In addition to the\n    output of the structure module, this includes auxiliary outputs such as:\n\n    *   Distograms (`distogram\u002Flogits` contains a NumPy array of shape [N_res,\n        N_res, N_bins] and `distogram\u002Fbin_edges` contains the definition of the\n        bins).\n    *   Per-residue pLDDT scores (`plddt` contains a NumPy array of shape\n        [N_res] with the range of possible values from `0` to `100`, where `100`\n        means most confident). This can serve to identify sequence regions\n        predicted with high confidence or as an overall per-target confidence\n        score when averaged across residues.\n    *   Present only if using pTM models: predicted TM-score (`ptm` field\n        contains a scalar). As a predictor of a global superposition metric,\n        this score is designed to also assess whether the model is confident in\n        the overall domain packing.\n    *   Present only if using pTM models: predicted pairwise aligned errors\n        (`predicted_aligned_error` contains a NumPy array of shape [N_res,\n        N_res] with the range of possible values from `0` to\n        `max_predicted_aligned_error`, where `0` means most confident). This can\n        serve for a visualisation of domain packing confidence within the\n        structure.\n\nThe pLDDT confidence measure is stored in the B-factor field of the output PDB\nfiles (although unlike a B-factor, higher pLDDT is better, so care must be taken\nwhen using for tasks such as molecular replacement).\n\nThis code has been tested to match mean top-1 accuracy on a CASP14 test set with\npLDDT ranking over 5 model predictions (some CASP targets were run with earlier\nversions of AlphaFold and some had manual interventions; see our forthcoming\npublication for details). Some targets such as T1064 may also have high\nindividual run variance over random seeds.\n\n## Inferencing many proteins\n\nThe provided inference script is optimized for predicting the structure of a\nsingle protein, and it will compile the neural network to be specialized to\nexactly the size of the sequence, MSA, and templates. For large proteins, the\ncompile time is a negligible fraction of the runtime, but it may become more\nsignificant for small proteins or if the multi-sequence alignments are already\nprecomputed. In the bulk inference case, it may make sense to use our\n`make_fixed_size` function to pad the inputs to a uniform size, thereby reducing\nthe number of compilations required.\n\nWe do not provide a bulk inference script, but it should be straightforward to\ndevelop on top of the `RunModel.predict` method with a parallel system for\nprecomputing multi-sequence alignments. Alternatively, this script can be run\nrepeatedly with only moderate overhead.\n\n## Note on CASP14 reproducibility\n\nAlphaFold's output for a small number of proteins has high inter-run variance,\nand may be affected by changes in the input data. The CASP14 target T1064 is a\nnotable example; the large number of SARS-CoV-2-related sequences recently\ndeposited changes its MSA significantly. This variability is somewhat mitigated\nby the model selection process; running 5 models and taking the most confident.\n\nTo reproduce the results of our CASP14 system as closely as possible you must\nuse the same database versions we used in CASP. These may not match the default\nversions downloaded by our scripts.\n\nFor genetics:\n\n*   UniRef90:\n    [v2020_01](https:\u002F\u002Fftp.uniprot.org\u002Fpub\u002Fdatabases\u002Funiprot\u002Fprevious_releases\u002Frelease-2020_01\u002Funiref\u002F)\n*   MGnify:\n    [v2018_12](http:\u002F\u002Fftp.ebi.ac.uk\u002Fpub\u002Fdatabases\u002Fmetagenomics\u002Fpeptide_database\u002F2018_12\u002F)\n*   Uniclust30: [v2018_08](http:\u002F\u002Fwwwuser.gwdg.de\u002F~compbiol\u002Funiclust\u002F2018_08\u002F)\n*   BFD: [only version available](https:\u002F\u002Fbfd.mmseqs.com\u002F)\n\nFor templates:\n\n*   PDB: (downloaded 2020-05-14)\n*   PDB70:\n    [2020-05-13](http:\u002F\u002Fwwwuser.gwdg.de\u002F~compbiol\u002Fdata\u002Fhhsuite\u002Fdatabases\u002Fhhsuite_dbs\u002Fold-releases\u002Fpdb70_from_mmcif_200513.tar.gz)\n\nAn alternative for templates is to use the latest PDB and PDB70, but pass the\nflag `--max_template_date=2020-05-14`, which restricts templates only to\nstructures that were available at the start of CASP14.\n\n## Citing this work\n\nIf you use the code or data in this package, please cite:\n\n```bibtex\n@Article{AlphaFold2021,\n  author  = {Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and {\\v{Z}}{\\'\\i}dek, Augustin and Potapenko, Anna and Bridgland, Alex and Meyer, Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie, Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and Jain, Rishub and Adler, Jonas and Back, Trevor and Petersen, Stig and Reiman, David and Clancy, Ellen and Zielinski, Michal and Steinegger, Martin and Pacholska, Michalina and Berghammer, Tamas and Bodenstein, Sebastian and Silver, David and Vinyals, Oriol and Senior, Andrew W and Kavukcuoglu, Koray and Kohli, Pushmeet and Hassabis, Demis},\n  journal = {Nature},\n  title   = {Highly accurate protein structure prediction with {AlphaFold}},\n  year    = {2021},\n  volume  = {596},\n  number  = {7873},\n  pages   = {583--589},\n  doi     = {10.1038\u002Fs41586-021-03819-2}\n}\n```\n\nIn addition, if you use the AlphaFold-Multimer mode, please cite:\n\n```bibtex\n@article {AlphaFold-Multimer2021,\n  author       = {Evans, Richard and O{\\textquoteright}Neill, Michael and Pritzel, Alexander and Antropova, Natasha and Senior, Andrew and Green, Tim and {\\v{Z}}{\\'\\i}dek, Augustin and Bates, Russ and Blackwell, Sam and Yim, Jason and Ronneberger, Olaf and Bodenstein, Sebastian and Zielinski, Michal and Bridgland, Alex and Potapenko, Anna and Cowie, Andrew and Tunyasuvunakool, Kathryn and Jain, Rishub and Clancy, Ellen and Kohli, Pushmeet and Jumper, John and Hassabis, Demis},\n  journal      = {bioRxiv},\n  title        = {Protein complex prediction with AlphaFold-Multimer},\n  year         = {2021},\n  elocation-id = {2021.10.04.463034},\n  doi          = {10.1101\u002F2021.10.04.463034},\n  URL          = {https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002Fearly\u002F2021\u002F10\u002F04\u002F2021.10.04.463034},\n  eprint       = {https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002Fearly\u002F2021\u002F10\u002F04\u002F2021.10.04.463034.full.pdf},\n}\n```\n\n## Community contributions\n\nColab notebooks provided by the community (please note that these notebooks may\nvary from our full AlphaFold system and we did not validate their accuracy):\n\n*   The\n    [ColabFold AlphaFold2 notebook](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fsokrypton\u002FColabFold\u002Fblob\u002Fmain\u002FAlphaFold2.ipynb)\n    by Martin Steinegger, Sergey Ovchinnikov and Milot Mirdita, which uses an\n    API hosted at the Södinglab based on the MMseqs2 server\n    [(Mirdita et al. 2019, Bioinformatics)](https:\u002F\u002Facademic.oup.com\u002Fbioinformatics\u002Farticle\u002F35\u002F16\u002F2856\u002F5280135)\n    for the multiple sequence alignment creation.\n\n## Acknowledgements\n\nAlphaFold communicates with and\u002For references the following separate libraries\nand packages:\n\n*   [Abseil](https:\u002F\u002Fgithub.com\u002Fabseil\u002Fabseil-py)\n*   [Biopython](https:\u002F\u002Fbiopython.org)\n*   [Colab](https:\u002F\u002Fresearch.google.com\u002Fcolaboratory\u002F)\n*   [Docker](https:\u002F\u002Fwww.docker.com)\n*   [HH Suite](https:\u002F\u002Fgithub.com\u002Fsoedinglab\u002Fhh-suite)\n*   [HMMER Suite](http:\u002F\u002Feddylab.org\u002Fsoftware\u002Fhmmer)\n*   [Haiku](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fdm-haiku)\n*   [JAX](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fjax\u002F)\n*   [Kalign](https:\u002F\u002Fmsa.sbc.su.se\u002Fcgi-bin\u002Fmsa.cgi)\n*   [matplotlib](https:\u002F\u002Fmatplotlib.org\u002F)\n*   [ML Collections](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fml_collections)\n*   [NumPy](https:\u002F\u002Fnumpy.org)\n*   [OpenMM](https:\u002F\u002Fgithub.com\u002Fopenmm\u002Fopenmm)\n*   [OpenStructure](https:\u002F\u002Fopenstructure.org)\n*   [pymol3d](https:\u002F\u002Fgithub.com\u002Favirshup\u002Fpy3dmol)\n*   [Sonnet](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fsonnet)\n*   [TensorFlow](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensorflow)\n*   [Tree](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Ftree)\n*   [tqdm](https:\u002F\u002Fgithub.com\u002Ftqdm\u002Ftqdm)\n\nWe thank all their contributors and maintainers!\n\n## Get in Touch\n\nIf you have any questions not covered in this overview, please contact the\nAlphaFold team at [alphafold@deepmind.com](mailto:alphafold@deepmind.com).\n\nWe would love to hear your feedback and understand how AlphaFold has been useful\nin your research. Share your stories with us at\n[alphafold@deepmind.com](mailto:alphafold@deepmind.com).\n\n## License and Disclaimer\n\nThis is not an officially supported Google product.\n\nCopyright 2022 DeepMind Technologies Limited.\n\nAlphaFold 2 and its output are for theoretical modeling only. They are not\nintended, validated, or approved for clinical use. You should not use the\nAlphaFold 2 or its output for clinical purposes or rely on them for medical or\nother professional advice. Any content regarding those topics is provided for\ninformational purposes only and is not a substitute for advice from a qualified\nprofessional.\n\nOutput of AlphaFold 2 are predictions with varying levels of confidence and\nshould be interpreted carefully. Use discretion before relying on, publishing,\ndownloading or otherwise using AlphaFold 2 and its output.\n\n### AlphaFold Code License\n\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use\nthis file except in compliance with the License. You may obtain a copy of the\nLicense at https:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0.\n\nUnless required by applicable law or agreed to in writing, software distributed\nunder the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR\nCONDITIONS OF ANY KIND, either express or implied. See the License for the\nspecific language governing permissions and limitations under the License.\n\n### Model Parameters License\n\nThe AlphaFold parameters are made available under the terms of the Creative\nCommons Attribution 4.0 International (CC BY 4.0) license. You can find details\nat: https:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby\u002F4.0\u002Flegalcode\n\n### Third-party software\n\nUse of the third-party software, libraries or code referred to in the\n[Acknowledgements](#acknowledgements) section above may be governed by separate\nterms and conditions or license provisions. Your use of the third-party\nsoftware, libraries or code is subject to any such terms and you should check\nthat you can comply with any applicable restrictions or terms and conditions\nbefore use.\n\n### Mirrored Databases\n\nThe following databases have been mirrored by DeepMind, and are available with\nreference to the following:\n\n*   [BFD](https:\u002F\u002Fbfd.mmseqs.com\u002F) (unmodified), by Steinegger M. and Söding J.,\n    available under a\n    [Creative Commons Attribution-ShareAlike 4.0 International License](http:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-sa\u002F4.0\u002F).\n\n*   [BFD](https:\u002F\u002Fbfd.mmseqs.com\u002F) (modified), by Steinegger M. and Söding J.,\n    modified by DeepMind, available under a\n    [Creative Commons Attribution-ShareAlike 4.0 International License](http:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-sa\u002F4.0\u002F).\n    See the Methods section of the\n    [AlphaFold proteome paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-021-03828-1)\n    for details.\n\n*   [Uniref30: v2021_03](http:\u002F\u002Fwwwuser.gwdg.de\u002F~compbiol\u002Funiclust\u002F2021_03\u002F)\n    (unmodified), by Mirdita M. et al., available under a\n    [Creative Commons Attribution-ShareAlike 4.0 International License](http:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-sa\u002F4.0\u002F).\n\n*   [MGnify: v2022_05](http:\u002F\u002Fftp.ebi.ac.uk\u002Fpub\u002Fdatabases\u002Fmetagenomics\u002Fpeptide_database\u002F2022_05\u002FREADME.txt)\n    (unmodified), by Mitchell AL et al., available free of all copyright\n    restrictions and made fully and freely available for both non-commercial and\n    commercial use under\n    [CC0 1.0 Universal (CC0 1.0) Public Domain Dedication](https:\u002F\u002Fcreativecommons.org\u002Fpublicdomain\u002Fzero\u002F1.0\u002F).\n","![header](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgoogle-deepmind_alphafold_readme_f0835d8a2191.jpg)\n\n# AlphaFold\n\n本软件包提供了 AlphaFold v2 推理流程的实现。为简便起见，在本文档的其余部分中，我们将该模型简称为 AlphaFold。\n\n我们还提供：\n\n1.  AlphaFold-Multimer 的实现。这仍处于开发阶段，因此 AlphaFold-Multimer 的稳定性预计不如我们的单体 AlphaFold 系统。请参阅[更新现有安装指南](#updating-existing-installation)，了解如何升级和更新代码。\n2.  [技术说明](docs\u002Ftechnical_note_v2.3.0.md)，其中包含更新后的 AlphaFold v2.3.0 模型及推理流程。\n3.  一套[CASP15 基线预测](docs\u002Fcasp15_predictions.zip)，以及对任何人工干预操作的说明文档。\n\n任何披露使用本源代码或模型参数所得成果的出版物，均应[引用](#citing-this-work)《AlphaFold 论文》（https:\u002F\u002Fdoi.org\u002F10.1038\u002Fs41586-021-03819-2），并在适用时引用《AlphaFold-Multimer 论文》（https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2021.10.04.463034v1）。\n\n此外，请参阅[补充信息](https:\u002F\u002Fstatic-content.springer.com\u002Fesm\u002Fart%3A10.1038%2Fs41586-021-03819-2\u002FMediaObjects\u002F41586_2021_3819_MOESM1_ESM.pdf)，以获取方法的详细描述。\n\n**您也可以使用社区支持版本中的简化版 AlphaFold（见下文）。**\n\n如有任何疑问，请联系 AlphaFold 团队：[alphafold@deepmind.com](mailto:alphafold@deepmind.com)。\n\n![CASP14 预测](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgoogle-deepmind_alphafold_readme_f6135c1ce55e.gif)\n\n## 安装与首次运行预测\n\n您需要一台运行 Linux 的机器；AlphaFold 不支持其他操作系统。完整安装需要最多 3 TB 的磁盘空间来存储遗传数据库（建议使用 SSD 存储），并配备现代 NVIDIA GPU（具有更大显存的 GPU 可以预测更大的蛋白质结构）。\n\n请按照以下步骤操作：\n\n1.  安装 [Docker](https:\u002F\u002Fwww.docker.com\u002F)。\n\n    *   安装[NVIDIA Container Toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Finstall-guide.html)，以支持 GPU。\n    *   设置以非 root 用户身份运行 Docker：[非 root 用户管理 Docker](https:\u002F\u002Fdocs.docker.com\u002Fengine\u002Finstall\u002Flinux-postinstall\u002F#manage-docker-as-a-non-root-user)。\n\n1.  克隆此仓库并进入该目录。\n\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Falphafold.git\n    cd .\u002Falphafold\n    ```\n\n1.  下载遗传数据库和模型参数：\n\n    *   安装 `aria2c`。在大多数 Linux 发行版中，可通过包管理器以 `aria2` 软件包的形式获得（例如，在基于 Debian 的发行版上，可运行 `sudo apt install aria2` 进行安装）。`rsync` 同理。\n\n    *   请使用脚本 `scripts\u002Fdownload_all_data.sh` 来下载并设置完整的数据库。这可能需要较长时间（下载大小为 556 GB），因此建议在后台运行该脚本：\n\n    ```bash\n    scripts\u002Fdownload_all_data.sh \u003CDOWNLOAD_DIR> > download.log 2> download_all.log &\n    ```\n\n    *   **注意：下载目录 `\u003CDOWNLOAD_DIR>` 不应位于 AlphaFold 仓库目录的子目录中。** 如果是，则 Docker 构建过程会非常缓慢，因为大型数据库会被复制到 Docker 构建上下文中。\n\n    *   也可以使用精简版数据库运行 AlphaFold；请参阅[完整文档](#genetic-databases)。\n\n1.  通过运行以下命令检查 AlphaFold 是否能够使用 GPU：\n\n    ```bash\n    docker run --rm --gpus all nvidia\u002Fcuda:11.0-base nvidia-smi\n    ```\n\n    该命令的输出应显示您的 GPU 列表。如果未显示，请检查是否已正确完成[NVIDIA Container Toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Finstall-guide.html)的安装步骤，或查看以下[NVIDIA Docker 问题](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fnvidia-docker\u002Fissues\u002F1447#issuecomment-801479573)。\n\n    如果您希望在 HPC 系统上使用 Singularity（一种常见的容器化平台）运行 AlphaFold，建议参考以下链接中的第三方 Singularity 配置：https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Falphafold\u002Fissues\u002F10 或 https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Falphafold\u002Fissues\u002F24。\n\n1.  构建 Docker 镜像：\n\n    ```bash\n    docker build -f docker\u002FDockerfile -t alphafold .\n    ```\n\n    如果遇到以下错误：\n\n    ```\n    W: GPG 错误：https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fcuda\u002Frepos\u002Fubuntu1804\u002Fx86_64 InRelease：以下签名无法验证，因为公钥不可用：NO_PUBKEY A4B469963BF863CC\n    E: 存储库 'https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fcuda\u002Frepos\u002Fubuntu1804\u002Fx86_64 InRelease' 未签名。\n    ```\n\n    请使用 https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Falphafold\u002Fissues\u002F463#issuecomment-1124881779 中描述的解决方法。\n\n1.  安装 `run_docker.py` 的依赖项。注意：您可以选择创建一个[Python 虚拟环境](https:\u002F\u002Fdocs.python.org\u002F3\u002Ftutorial\u002Fvenv.html)，以避免与系统 Python 环境发生冲突。\n\n    ```bash\n    pip3 install -r docker\u002Frequirements.txt\n    ```\n\n1.  确保输出目录存在（默认为 `\u002Ftmp\u002Falphafold`），并且您拥有写入该目录的权限。\n\n1.  运行 `run_docker.py`，指定包含您希望预测结构的蛋白质序列的 FASTA 文件（`--fasta_paths` 参数）。AlphaFold 将在 `--max_template_date` 参数指定的日期之前搜索可用模板；这可用于在建模过程中排除某些模板。`--data_dir` 是下载的遗传数据库所在目录，而 `--output_dir` 是输出目录的绝对路径。\n\n    ```bash\n    python3 docker\u002Frun_docker.py \\\n      --fasta_paths=your_protein.fasta \\\n      --max_template_date=2022-01-01 \\\n      --data_dir=$DOWNLOAD_DIR \\\n      --output_dir=\u002Fhome\u002Fuser\u002Fabsolute_path_to_the_output_dir\n    ```\n\n1.  运行完成后，输出目录将包含目标蛋白质的预测结构。有关更多选项和故障排除提示，请参阅下方文档。\n\n### 遗传数据库\n\n此步骤要求您的机器上已安装 `aria2c`。\n\nAlphaFold 运行需要多个遗传（序列）数据库：\n\n*   [BFD](https:\u002F\u002Fbfd.mmseqs.com\u002F)，\n*   [MGnify](https:\u002F\u002Fwww.ebi.ac.uk\u002Fmetagenomics\u002F)，\n*   [PDB70](http:\u002F\u002Fwwwuser.gwdg.de\u002F~compbiol\u002Fdata\u002Fhhsuite\u002Fdatabases\u002Fhhsuite_dbs\u002F)，\n*   [PDB](https:\u002F\u002Fwww.rcsb.org\u002F)（mmCIF 格式的结构文件），\n*   [PDB seqres](https:\u002F\u002Fwww.rcsb.org\u002F)——仅用于 AlphaFold-Multimer，\n*   [UniRef30（原 UniClust30）](https:\u002F\u002Funiclust.mmseqs.com\u002F)，\n*   [UniProt](https:\u002F\u002Fwww.uniprot.org\u002Funiprot\u002F)——仅用于 AlphaFold-Multimer，\n*   [UniRef90](https:\u002F\u002Fwww.uniprot.org\u002Fhelp\u002Funiref)。\n\n我们提供了一个脚本 `scripts\u002Fdownload_all_data.sh`，可用于下载并设置所有这些数据库：\n\n*   推荐的默认方式：\n\n    ```bash\n    scripts\u002Fdownload_all_data.sh \u003CDOWNLOAD_DIR>\n    ```\n\n    将下载完整的数据库。\n\n*   使用 `reduced_dbs` 参数：\n\n    ```bash\n    scripts\u002Fdownload_all_data.sh \u003CDOWNLOAD_DIR> reduced_dbs\n    ```\n\n    将下载一个精简版的数据库，供后续使用 `reduced_dbs` 数据库预设时调用。在运行 AlphaFold 时，请配合相应的参数 `--db_preset=reduced_dbs` 使用（请参阅 [AlphaFold 参数](#running-alphafold) 部分）。\n\n:ledger: **注意：下载目录 `\u003CDOWNLOAD_DIR>` 不应位于 AlphaFold 代码库目录下。** 如果位于子目录中，Docker 镜像构建过程会因复制大型数据库而变得缓慢。\n\n我们并未提供与 CASP14 完全一致的数据库版本——请参阅 [关于可重复性的说明](#note-on-casp14-reproducibility)。部分数据库为提升速度进行了镜像，详情请见 [镜像数据库](#mirrored-databases)。\n\n:ledger: **注意：完整数据库的总下载量约为 556 GB，解压后总大小达 2.62 TB。请确保您有足够的硬盘空间、带宽和时间来完成下载。为了获得更好的基因搜索性能，建议使用 SSD 硬盘。**\n\n:ledger: **注意：如果下载目录及其包含的数据集没有完全的读写权限，可能会导致 MSA 工具出现错误，并产生模糊不清的外部报错信息。请确保已赋予必要的权限，例如使用 `sudo chmod 755 --recursive \"$DOWNLOAD_DIR\"` 命令。**\n\n`download_all_data.sh` 脚本还会下载模型参数文件。脚本执行完毕后，您将得到如下目录结构：\n\n```\n$DOWNLOAD_DIR\u002F                             # 总计：约 2.62 TB（下载：556 GB）\n    bfd\u002F                                   # 约 1.8 TB（下载：271.6 GB）\n        # 6 个文件。\n    mgnify\u002F                                # 约 120 GB（下载：67 GB）\n        mgy_clusters_2022_05.fa\n    params\u002F                                # 约 5.3 GB（下载：5.3 GB）\n        # 5 个 CASP14 模型，\n        # 5 个 pTM 模型，\n        # 5 个 AlphaFold-Multimer 模型，\n        # LICENSE 文件，\n        # 共 16 个文件。\n    pdb70\u002F                                 # 约 56 GB（下载：19.5 GB）\n        # 9 个文件。\n    pdb_mmcif\u002F                             # 约 238 GB（下载：43 GB）\n        mmcif_files\u002F\n            # 约 199,000 个 .cif 文件。\n        obsolete.dat\n    pdb_seqres\u002F                            # 约 0.2 GB（下载：0.2 GB）\n        pdb_seqres.txt\n    small_bfd\u002F                             # 约 17 GB（下载：9.6 GB）\n        bfd-first_non_consensus_sequences.fasta\n    uniref30\u002F                              # 约 206 GB（下载：52.5 GB）\n        # 7 个文件。\n    uniprot\u002F                               # 约 105 GB（下载：53 GB）\n        uniprot.fasta\n    uniref90\u002F                              # 约 67 GB（下载：34 GB）\n        uniref90.fasta\n```\n\n`bfd\u002F` 只有在下载完整数据库时才会被下载；而 `small_bfd\u002F` 则仅在下载精简数据库时才会被下载。\n\n### 模型参数\n\n尽管 AlphaFold 的代码采用 Apache 2.0 许可证授权，但 AlphaFold 的参数以及 CASP15 的预测数据则依据 CC BY 4.0 许可证条款发布。更多详细信息请参阅下方的 [免责声明](#license-and-disclaimer)。\n\nAlphaFold 参数可从 https:\u002F\u002Fstorage.googleapis.com\u002Falphafold\u002Falphafold_params_2022-12-06.tar 获取，并且会在运行 `scripts\u002Fdownload_all_data.sh` 脚本时一并下载。该脚本将下载以下参数：\n\n*   5 个在 CASP14 中使用并经过广泛验证的结构预测质量的模型（详情请参阅 Jumper 等人 2021 年的研究，补充方法 1.12）。\n*   5 个 pTM 模型，这些模型经过微调，可在进行结构预测的同时输出 pTM（预测 TM 分数）及 PAE（预测对齐误差）值（详情请参阅 Jumper 等人 2021 年的研究，补充方法 1.9.7）。\n*   5 个 AlphaFold-Multimer 模型，它们同样能够在结构预测的同时输出 pTM 和 PAE 值。\n\n### 更新现有安装\n\n如果您已经安装了旧版本，您可以选择从头开始全新安装（删除所有内容并重新运行安装程序），也可以进行增量更新。增量更新速度会快很多，但需要多做一些准备工作。请务必按照以下列出的精确顺序执行这些步骤：\n\n1.  **更新代码。**\n    *   进入克隆的 AlphaFold 仓库目录，运行 `git fetch origin main` 以获取所有代码更新。\n1.  **更新 UniProt、UniRef、MGnify 和 PDB seqres 数据库。**\n    *   删除 `\u003CDOWNLOAD_DIR>\u002Funiprot`。\n    *   运行 `scripts\u002Fdownload_uniprot.sh \u003CDOWNLOAD_DIR>`。\n    *   删除 `\u003CDOWNLOAD_DIR>\u002Funiclust30`。\n    *   运行 `scripts\u002Fdownload_uniref30.sh \u003CDOWNLOAD_DIR>`。\n    *   删除 `\u003CDOWNLOAD_DIR>\u002Funiref90`。\n    *   运行 `scripts\u002Fdownload_uniref90.sh \u003CDOWNLOAD_DIR>`。\n    *   删除 `\u003CDOWNLOAD_DIR>\u002Fmgnify`。\n    *   运行 `scripts\u002Fdownload_mgnify.sh \u003CDOWNLOAD_DIR>`。\n    *   删除 `\u003CDOWNLOAD_DIR>\u002Fpdb_mmcif`。为了确保 PDB SeqRes 和 PDB 数据来自完全相同的日期，必须执行此步骤。如果未按要求操作，在运行 AlphaFold-Multimer 进行模板搜索时可能会出现错误。\n    *   运行 `scripts\u002Fdownload_pdb_mmcif.sh \u003CDOWNLOAD_DIR>`。\n    *   运行 `scripts\u002Fdownload_pdb_seqres.sh \u003CDOWNLOAD_DIR>`。\n1.  **更新模型参数。**\n    *   删除 `\u003CDOWNLOAD_DIR>\u002Fparams` 中的旧模型参数。\n    *   使用 `scripts\u002Fdownload_alphafold_params.sh \u003CDOWNLOAD_DIR>` 下载新的模型参数。\n1.  **按照 [运行 AlphaFold](#running-alphafold) 的说明进行操作。**\n\n#### 使用已弃用的模型权重\n\n要使用已弃用的 v2.2.0 版本 AlphaFold-Multimer 模型权重：\n\n1.  将 `scripts\u002Fdownload_alphafold_params.sh` 中的 `SOURCE_URL` 更改为\n    `https:\u002F\u002Fstorage.googleapis.com\u002Falphafold\u002Falphafold_params_2022-03-02.tar`，\n    并下载旧版参数。\n2.  将 `config.py` 中的 multimer `MODEL_PRESETS` 中的 `_v3` 改为 `_v2`。\n\n要使用已弃用的 v2.1.0 版本 AlphaFold-Multimer 模型权重：\n\n1.  将 `scripts\u002Fdownload_alphafold_params.sh` 中的 `SOURCE_URL` 更改为\n    `https:\u002F\u002Fstorage.googleapis.com\u002Falphafold\u002Falphafold_params_2022-01-19.tar`，\n    并下载旧版参数。\n2.  将 `config.py` 中的 multimer `MODEL_PRESETS` 中的 `_v3` 删除。\n\n## 运行 AlphaFold\n\n**运行 AlphaFold 最简单的方式是使用提供的 Docker 脚本。** 该方法已在 Google Cloud 上经过测试，使用的机器配备 `nvidia-gpu-cloud-image` 镜像，具有 12 个 vCPU、85 GB 内存、100 GB 启动盘，数据库存储在额外的 3 TB 磁盘上，并搭载 A100 GPU。首次运行时，请按照\n[安装与首次预测运行](#installation-and-running-your-first-prediction)\n部分的说明进行操作。\n\n1.  默认情况下，AlphaFold 会尝试使用所有可见的 GPU 设备。若需使用部分 GPU，可通过 `--gpu_devices` 标志指定以逗号分隔的 GPU UUID 或索引列表。有关详细信息，请参阅\n    [GPU 枚举](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Fuser-guide.html#gpu-enumeration)。\n\n1.  您可以通过添加 `--model_preset=` 标志来控制运行的 AlphaFold 模型。我们提供以下模型：\n\n    *   **monomer**：这是 CASP14 时使用的原始模型，未采用集成方法。\n\n    *   **monomer_casp14**：这是 CASP14 时使用的原始模型，设置了 `num_ensemble=8`，与我们的 CASP14 配置一致。此选项主要用于结果的可重复性，尽管其计算成本高出 8 倍，但准确率提升有限（CASP14 数据集上的平均 GDT 提升仅为 0.1）。\n\n    *   **monomer_ptm**：这是在 CASP14 原始模型基础上微调后的版本，增加了 pTM 头部，可提供成对置信度评分。其准确率略低于普通单体模型。\n\n    *   **multimer**：这是 [AlphaFold-Multimer](#citing-this-work) 模型。要使用此模型，需提供包含多个序列的 FASTA 文件，并且必须提前下载 UniProt 数据库。\n\n1.  您可以通过在运行命令中添加 `--db_preset=reduced_dbs` 或 `--db_preset=full_dbs` 来控制 MSA 的速度与质量之间的权衡。我们提供以下预设：\n\n    *   **reduced_dbs**：此预设针对速度和较低的硬件要求进行了优化，使用精简版的 BFD 数据库。它需要 8 个 CPU 核心（vCPUs）、8 GB 内存和 600 GB 磁盘空间。\n\n    *   **full_dbs**：此预设使用 CASP14 时所用的所有遗传数据库。\n\n    例如，使用 `monomer` 模型预设和 `reduced_dbs` 数据预设运行上述命令时，命令如下：\n\n    ```bash\n    python3 docker\u002Frun_docker.py \\\n      --fasta_paths=T1050.fasta \\\n      --max_template_date=2020-05-14 \\\n      --model_preset=monomer \\\n      --db_preset=reduced_dbs \\\n      --data_dir=$DOWNLOAD_DIR \\\n      --output_dir=\u002Fhome\u002Fuser\u002Fabsolute_path_to_the_output_dir\n    ```\n\n1.  在生成预测模型后，AlphaFold 会执行一个松弛步骤以改善局部几何结构。默认情况下，仅对最佳模型（根据 pLDDT 评分）进行松弛处理（`--models_to_relax=best`），但也可以选择对所有模型（`--models_to_relax=all`）或不进行任何松弛处理（`--models_to_relax=none`）。\n\n1.  松弛步骤可以在 GPU 上运行（速度更快，但稳定性可能较差）或在 CPU 上运行（速度较慢，但更稳定）。这可以通过 `--enable_gpu_relax=true`（默认值）或 `--enable_gpu_relax=false` 来控制。\n\n1.  AlphaFold 可以通过 `--use_precomputed_msas=true` 选项复用同一序列的 MSA（多序列比对），这对于尝试不同的 AlphaFold 参数非常有用。此选项的前提是输出目录中已存在首次运行 AlphaFold 生成的目录结构，且蛋白质序列相同。\n\n### 运行 AlphaFold-Multimer\n\n所有步骤与运行单体系统时相同，但您需要：\n\n*   提供包含多个序列的输入 FASTA 文件，\n*   设置 `--model_preset=multimer`，\n\n以下是一个折叠蛋白质复合物 `multimer.fasta` 的示例：\n\n```bash\npython3 docker\u002Frun_docker.py \\\n  --fasta_paths=multimer.fasta \\\n  --max_template_date=2020-05-14 \\\n  --model_preset=multimer \\\n  --data_dir=$DOWNLOAD_DIR \\\n  --output_dir=\u002Fhome\u002Fuser\u002Fabsolute_path_to_the_output_dir\n```\n\n默认情况下，multimer 系统会对每个模型运行 5 个种子（共 25 次预测）。如果您希望在略微降低准确率的情况下减少预测次数，可以将每个模型的种子数设置为 1。这可以通过 `--num_multimer_predictions_per_model` 标志实现，例如将其设置为 `--num_multimer_predictions_per_model=1` 即可让每个模型只运行一个种子。\n\n### AlphaFold 预测速度\n\n下表报告了不同长度蛋白质的预测运行时间。我们仅测量三次迭代的未松弛结构预测，不包括 MSA 和模板搜索的运行时间。当使用 `--benchmark=true` 运行 `docker\u002Frun_docker.py` 时，此运行时间会存储在 `timings.json` 文件中。所有运行时间均来自单个 A100 NVIDIA GPU。对于较小的结构，可以通过在 `alphafold\u002Fmodel\u002Fconfig.py` 中增加 `global_config.subbatch_size` 来提高 A100 上的预测速度。\n\n残基数 | 预测时间 (秒)\n-----------: | ------------------:\n100          | 4.9\n200          | 7.7\n300          | 13\n400          | 18\n500          | 29\n600          | 36\n700          | 53\n800          | 60\n900          | 91\n1,000        | 96\n1,100        | 140\n1,500        | 280\n2,000        | 450\n2,500        | 969\n3,000        | 1,240\n3,500        | 2,465\n4,000        | 5,660\n4,500        | 12,475\n5,000        | 18,824\n\n### 示例\n\n以下是不同场景下使用 AlphaFold 的示例。\n\n#### 折叠单体\n\n假设我们有一个序列为 `\u003CSEQUENCE>` 的单体。输入的 FASTA 文件应为：\n\n```fasta\n>sequence_name\n\u003CSEQUENCE>\n```\n\n然后运行以下命令：\n\n```bash\npython3 docker\u002Frun_docker.py \\\n  --fasta_paths=monomer.fasta \\\n  --max_template_date=2021-11-01 \\\n  --model_preset=monomer \\\n  --data_dir=$DOWNLOAD_DIR \\\n  --output_dir=\u002Fhome\u002Fuser\u002Fabsolute_path_to_the_output_dir\n```\n\n#### 折叠同源多聚体\n\n假设我们有一个由 3 个相同序列 `\u003CSEQUENCE>` 组成的同源多聚体。输入的 FASTA 文件应为：\n\n```fasta\n>sequence_1\n\u003CSEQUENCE>\n>sequence_2\n\u003CSEQUENCE>\n>sequence_3\n\u003CSEQUENCE>\n```\n\n然后运行以下命令：\n\n```bash\npython3 docker\u002Frun_docker.py \\\n  --fasta_paths=homomer.fasta \\\n  --max_template_date=2021-11-01 \\\n  --model_preset=multimer \\\n  --data_dir=$DOWNLOAD_DIR \\\n  --output_dir=\u002Fhome\u002Fuser\u002Fabsolute_path_to_the_output_dir\n```\n\n#### 折叠异源多聚体\n\n假设我们有一个 A2B3 异源多聚体，即包含 2 个 `\u003CSEQUENCE A>` 和 3 个 `\u003CSEQUENCE B>`。输入的 FASTA 文件应为：\n\n```fasta\n>sequence_1\n\u003CSEQUENCE A>\n>sequence_2\n\u003CSEQUENCE A>\n>sequence_3\n\u003CSEQUENCE B>\n>sequence_4\n\u003CSEQUENCE B>\n>sequence_5\n\u003CSEQUENCE B>\n```\n\n然后运行以下命令：\n\n```bash\npython3 docker\u002Frun_docker.py \\\n  --fasta_paths=heteromer.fasta \\\n  --max_template_date=2021-11-01 \\\n  --model_preset=multimer \\\n  --data_dir=$DOWNLOAD_DIR \\\n  --output_dir=\u002Fhome\u002Fuser\u002Fabsolute_path_to_the_output_dir\n```\n\n#### 依次折叠多个单体\n\n假设我们有两个单体文件：`monomer1.fasta` 和 `monomer2.fasta`。\n\n我们可以使用以下命令依次折叠这两个单体：\n\n```bash\npython3 docker\u002Frun_docker.py \\\n  --fasta_paths=monomer1.fasta,monomer2.fasta \\\n  --max_template_date=2021-11-01 \\\n  --model_preset=monomer \\\n  --data_dir=$DOWNLOAD_DIR \\\n  --output_dir=\u002Fhome\u002Fuser\u002Fabsolute_path_to_the_output_dir\n```\n\n#### 依次折叠多个多聚体\n\n假设我们有两个多聚体文件：`multimer1.fasta` 和 `multimer2.fasta`。\n\n我们可以使用以下命令依次折叠这两个多聚体：\n\n```bash\npython3 docker\u002Frun_docker.py \\\n  --fasta_paths=multimer1.fasta,multimer2.fasta \\\n  --max_template_date=2021-11-01 \\\n  --model_preset=multimer \\\n  --data_dir=$DOWNLOAD_DIR \\\n  --output_dir=\u002Fhome\u002Fuser\u002Fabsolute_path_to_the_output_dir\n```\n\n### AlphaFold 输出\n\n输出将保存在通过 `run_docker.py` 的 `--output_dir` 标志指定的目录下的子目录中（默认为 `\u002Ftmp\u002Falphafold\u002F`）。输出内容包括计算得到的多序列比对（MSA）、未弛豫结构、弛豫结构、按置信度排序的结构、原始模型输出、预测元数据以及各步骤的耗时信息。`--output_dir` 目录的结构如下：\n\n```\n\u003Ctarget_name>\u002F\n    features.pkl\n    ranked_{0,1,2,3,4}.pdb\n    ranking_debug.json\n    relax_metrics.json\n    relaxed_model_{1,2,3,4,5}.pdb\n    result_model_{1,2,3,4,5}.pkl\n    timings.json\n    unrelaxed_model_{1,2,3,4,5}.pdb\n    msas\u002F\n        bfd_uniref_hits.a3m\n        mgnify_hits.sto\n        uniref90_hits.sto\n```\n\n每个输出文件的内容说明如下：\n\n*   `features.pkl` – 一个 `pickle` 文件，包含模型用于生成结构的输入特征 NumPy 数组。\n*   `unrelaxed_model_*.pdb` – 一个 PDB 格式的文本文件，包含模型直接输出的预测结构。\n*   `relaxed_model_*.pdb` – 一个 PDB 格式的文本文件，包含对未弛豫结构预测进行 Amber 弛豫处理后的最终结构（详细信息请参见 Jumper 等人 2021 年的研究，补充方法 1.8.6）。\n*   `ranked_*.pdb` – 一个 PDB 格式的文本文件，包含按模型置信度重新排序后的预测结构。其中 `ranked_i.pdb` 应该是置信度第 (`i + 1`) 高的预测结构（即 `ranked_0.pdb` 具有最高置信度）。模型置信度的排序依据是预测的 LDDT 分数（pLDDT；详细信息请参见 Jumper 等人 2021 年的研究，补充方法 1.9.6）。如果使用 `--models_to_relax=all`，则所有排序后的结构都会被弛豫；如果使用 `--models_to_relax=best`，则仅对 `ranked_0.pdb` 进行弛豫，其余均为未弛豫状态；若使用 `--models_to_relax=none`，则所有排序后的结构均保持未弛豫状态。\n*   `ranking_debug.json` – 一个 JSON 格式的文本文件，包含用于模型排序的 pLDDT 值，以及这些值与原始模型名称之间的映射关系。\n*   `relax_metrics.json` – 一个 JSON 格式的文本文件，包含弛豫过程中的各项指标，例如残留的违反约束情况等。\n*   `timings.json` – 一个 JSON 格式的文本文件，记录 AlphaFold 流程中各个步骤的执行时间。\n*   `msas\u002F` – 一个目录，包含用于构建输入 MSA 的各类基因工具比对结果文件。\n*   `result_model_*.pkl` – 一个 `pickle` 文件，包含由模型直接生成的各种嵌套 NumPy 数组。除了结构模块的输出外，还包括以下辅助输出：\n\n    *   距离图（`distogram\u002Flogits` 是一个形状为 [N_res, N_res, N_bins] 的 NumPy 数组，`distogram\u002Fbin_edges` 则定义了 bin 的边界）。\n    *   每个残基的 pLDDT 分数（`plddt` 是一个形状为 [N_res] 的 NumPy 数组，取值范围为 `0` 至 `100`，其中 `100` 表示置信度最高）。这可用于识别预测置信度较高的序列区域，或通过对所有残基取平均来作为整体目标的置信度评分。\n    *   仅在使用 pTM 模型时存在：预测的 TM 得分（`ptm` 字段为一个标量）。作为全局叠加指标的预测值，该分数还用于评估模型对整体结构域堆叠的信心。\n    *   仅在使用 pTM 模型时存在：预测的成对对齐误差（`predicted_aligned_error` 是一个形状为 [N_res, N_res] 的 NumPy 数组，取值范围为 `0` 至 `max_predicted_aligned_error`，其中 `0` 表示置信度最高）。这可用于可视化结构中各结构域堆叠的置信度。\n\n输出的 PDB 文件中，pLDDT 置信度指标存储在 B-factor 字段中（但与传统的 B-factor 不同，pLDDT 值越高表示置信度越高，因此在使用该字段进行分子置换等任务时需特别注意）。\n\n经测试，本代码在 CASP14 测试集上采用 pLDDT 排序结合 5 次模型预测的结果，其平均 top-1 准确率与官方结果一致（部分 CASP 目标曾使用早期版本的 AlphaFold 进行预测，且部分目标还进行了人工干预；具体细节请参见我们即将发表的论文）。然而，某些目标如 T1064 可能会因随机种子的不同而出现较大的单次运行方差。\n\n## 多蛋白推理\n\n提供的推理脚本针对单个蛋白质的结构预测进行了优化，它会编译神经网络以专门适应输入序列、MSA 和模板的具体尺寸。对于大型蛋白质而言，编译时间在整个运行时间中占比极小，但对于小型蛋白质或已预先计算好多序列比对的情况，编译时间可能会变得较为显著。在批量推理场景下，可以考虑使用我们的 `make_fixed_size` 函数将输入数据填充至统一尺寸，从而减少所需的编译次数。\n\n我们并未提供专门的批量推理脚本，但基于 `RunModel.predict` 方法并结合并行系统预先计算多序列比对，开发此类脚本应较为容易。此外，也可以重复调用该脚本，仅带来适度的额外开销。\n\n## 关于 CASP14 可复现性的说明\n\nAlphaFold 对少数蛋白质的输出具有较高的运行间差异，且可能受到输入数据变化的影响。CASP14 目标 T1064 是一个典型例子：近期大量 SARS-CoV-2 相关序列的加入显著改变了其 MSA。尽管通过运行多个模型并选择置信度最高的结果可以在一定程度上缓解这种变异性，但要尽可能复现我们 CASP14 系统的成果，仍需使用我们在 CASP 中所使用的相同数据库版本。这些版本可能与我们脚本默认下载的版本不一致。\n\n基因数据库方面：\n\n*   UniRef90：\n    [v2020_01](https:\u002F\u002Fftp.uniprot.org\u002Fpub\u002Fdatabases\u002Funiprot\u002Fprevious_releases\u002Frelease-2020_01\u002Funiref\u002F)\n*   MGnify：\n    [v2018_12](http:\u002F\u002Fftp.ebi.ac.uk\u002Fpub\u002Fdatabases\u002Fmetagenomics\u002Fpeptide_database\u002F2018_12\u002F)\n*   Uniclust30：[v2018_08](http:\u002F\u002Fwwwuser.gwdg.de\u002F~compbiol\u002Funiclust\u002F2018_08\u002F)\n*   BFD：[唯一可用版本](https:\u002F\u002Fbfd.mmseqs.com\u002F)\n\n模板数据库方面：\n\n*   PDB：（2020-05-14 下载）\n*   PDB70：\n    [2020-05-13](http:\u002F\u002Fwwwuser.gwdg.de\u002F~compbiol\u002Fdata\u002Fhhsuite\u002Fdatabases\u002Fhhsuite_dbs\u002Fold-releases\u002Fpdb70_from_mmcif_200513.tar.gz)\n\n另一种方案是使用最新的 PDB 和 PDB70 数据库，但同时添加 `--max_template_date=2020-05-14` 标志，以限制模板仅使用在 CASP14 开始时已存在的结构。\n\n## 引用本工作\n\n如果您使用本软件包中的代码或数据，请引用以下文献：\n\n```bibtex\n@Article{AlphaFold2021,\n  author  = {Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and {\\v{Z}}{\\'\\i}dek, Augustin and Potapenko, Anna and Bridgland, Alex and Meyer, Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie, Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and Jain, Rishub and Adler, Jonas and Back, Trevor and Petersen, Stig and Reiman, David and Clancy, Ellen and Zielinski, Michal and Steinegger, Martin and Pacholska, Michalina and Berghammer, Tamas and Bodenstein, Sebastian and Silver, David and Vinyals, Oriol and Senior, Andrew W and Kavukcuoglu, Koray and Kohli, Pushmeet and Hassabis, Demis},\n  journal = {Nature},\n  title   = {Highly accurate protein structure prediction with {AlphaFold}},\n  year    = {2021},\n  volume  = {596},\n  number  = {7873},\n  pages   = {583--589},\n  doi     = {10.1038\u002Fs41586-021-03819-2}\n}\n```\n\n此外，如果您使用 AlphaFold-Multimer 模式，请引用以下文献：\n\n```bibtex\n@article {AlphaFold-Multimer2021,\n  author       = {Evans, Richard and O{\\textquoteright}Neill, Michael and Pritzel, Alexander and Antropova, Natasha and Senior, Andrew and Green, Tim and {\\v{Z}}{\\'\\i}dek, Augustin and Bates, Russ and Blackwell, Sam and Yim, Jason and Ronneberger, Olaf and Bodenstein, Sebastian and Zielinski, Michal and Bridgland, Alex and Potapenko, Anna and Cowie, Andrew and Tunyasuvunakool, Kathryn and Jain, Rishub and Clancy, Ellen and Kohli, Pushmeet and Jumper, John and Hassabis, Demis},\n  journal      = {bioRxiv},\n  title        = {Protein complex prediction with AlphaFold-Multimer},\n  year         = {2021},\n  elocation-id = {2021.10.04.463034},\n  doi          = {10.1101\u002F2021.10.04.463034},\n  URL          = {https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002Fearly\u002F2021\u002F10\u002F04\u002F2021.10.04.463034},\n  eprint       = {https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002Fearly\u002F2021\u002F10\u002F04\u002F2021.10.04.463034.full.pdf},\n}\n```\n\n## 社区贡献\n\n社区提供的 Colab 笔记本（请注意，这些笔记本可能与我们的完整 AlphaFold 系统有所不同，我们并未验证其准确性）：\n\n*   [ColabFold AlphaFold2 笔记本](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fsokrypton\u002FColabFold\u002Fblob\u002Fmain\u002FAlphaFold2.ipynb)，由 Martin Steinegger、Sergey Ovchinnikov 和 Milot Mirdita 提供，该笔记本使用基于 MMseqs2 服务器的 Södinglab 托管 API 来进行多序列比对的构建。\n    [(Mirdita et al. 2019, Bioinformatics)](https:\u002F\u002Facademic.oup.com\u002Fbioinformatics\u002Farticle\u002F35\u002F16\u002F2856\u002F5280135)\n\n## 致谢\n\nAlphaFold 与以下独立的库和软件包进行通信和\u002F或引用：\n\n*   [Abseil](https:\u002F\u002Fgithub.com\u002Fabseil\u002Fabseil-py)\n*   [Biopython](https:\u002F\u002Fbiopython.org)\n*   [Colab](https:\u002F\u002Fresearch.google.com\u002Fcolaboratory\u002F)\n*   [Docker](https:\u002F\u002Fwww.docker.com)\n*   [HH Suite](https:\u002F\u002Fgithub.com\u002Fsoedinglab\u002Fhh-suite)\n*   [HMMER Suite](http:\u002F\u002Feddylab.org\u002Fsoftware\u002Fhmmer)\n*   [Haiku](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fdm-haiku)\n*   [JAX](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fjax\u002F)\n*   [Kalign](https:\u002F\u002Fmsa.sbc.su.se\u002Fcgi-bin\u002Fmsa.cgi)\n*   [matplotlib](https:\u002F\u002Fmatplotlib.org\u002F)\n*   [ML Collections](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fml_collections)\n*   [NumPy](https:\u002F\u002Fnumpy.org)\n*   [OpenMM](https:\u002F\u002Fgithub.com\u002Fopenmm\u002Fopenmm)\n*   [OpenStructure](https:\u002F\u002Fopenstructure.org)\n*   [pymol3d](https:\u002F\u002Fgithub.com\u002Favirshup\u002Fpy3dmol)\n*   [Sonnet](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fsonnet)\n*   [TensorFlow](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensorflow)\n*   [Tree](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Ftree)\n*   [tqdm](https:\u002F\u002Fgithub.com\u002Ftqdm\u002Ftqdm)\n\n我们感谢所有这些项目的贡献者和维护者！\n\n## 联系我们\n\n如果您有任何本概述未涵盖的问题，请联系 AlphaFold 团队，邮箱地址为：[alphafold@deepmind.com](mailto:alphafold@deepmind.com)。\n\n我们非常期待您的反馈，并希望了解 AlphaFold 如何在您的研究中发挥作用。请将您的故事分享给我们，发送至：[alphafold@deepmind.com](mailto:alphafold@deepmind.com)。\n\n## 许可与免责声明\n\n本项目并非 Google 官方支持的产品。\n\n版权所有 © 2022 DeepMind Technologies Limited。\n\nAlphaFold 2 及其输出仅用于理论建模，不适用于临床用途，也未经验证或批准用于临床。您不应将 AlphaFold 2 或其输出用于临床目的，亦不应依赖其结果获取医疗或其他专业建议。任何与此类主题相关的内容仅供信息参考，不能替代合格专业人士的建议。\n\nAlphaFold 2 的预测结果具有不同程度的置信度，应谨慎解读。在依赖、发表、下载或以其他方式使用 AlphaFold 2 及其输出之前，请务必谨慎判断。\n\n### AlphaFold 代码许可\n\n本文件根据 Apache License, Version 2.0（“许可证”）授权。除非符合许可证的规定，否则不得使用本文件。您可以在 https:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0 获取许可证副本。\n\n除非适用法律另有要求或双方另有约定，否则根据本许可证分发的软件按“原样”提供，不附带任何形式的保证或条件，无论是明示还是暗示。有关许可证的具体语言、权限及限制，请参阅许可证条款。\n\n### 模型参数许可\n\nAlphaFold 的模型参数依据 Creative Commons Attribution 4.0 International (CC BY 4.0) 许可协议提供。详细信息请访问：https:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby\u002F4.0\u002Flegalcode\n\n### 第三方软件\n\n上述【致谢】部分提及的第三方软件、库或代码的使用，可能受单独的条款、条件或许可协议约束。您对这些第三方软件、库或代码的使用需遵守相应的规定，并在使用前确认自己能够遵守所有适用的限制和条款。\n\n### 镜像数据库\n\n以下数据库已由 DeepMind 进行镜像，并可根据如下说明使用：\n\n*   [BFD](https:\u002F\u002Fbfd.mmseqs.com\u002F)（未修改），由 Steinegger M. 和 Söding J. 提供，采用\n    [知识共享署名-相同方式共享 4.0 国际许可协议](http:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-sa\u002F4.0\u002F)。\n\n*   [BFD](https:\u002F\u002Fbfd.mmseqs.com\u002F)（已修改），由 Steinegger M. 和 Söding J. 提供，经 DeepMind 修改后发布，采用\n    [知识共享署名-相同方式共享 4.0 国际许可协议](http:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-sa\u002F4.0\u002F)。\n    详情请参阅\n    [AlphaFold 蛋白质组论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-021-03828-1) 的方法部分。\n\n*   [Uniref30: v2021_03](http:\u002F\u002Fwwwuser.gwdg.de\u002F~compbiol\u002Funiclust\u002F2021_03\u002F)\n    （未修改），由 Mirdita M. 等人提供，采用\n    [知识共享署名-相同方式共享 4.0 国际许可协议](http:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-sa\u002F4.0\u002F)。\n\n*   [MGnify: v2022_05](http:\u002F\u002Fftp.ebi.ac.uk\u002Fpub\u002Fdatabases\u002Fmetagenomics\u002Fpeptide_database\u002F2022_05\u002FREADME.txt)\n    （未修改），由 Mitchell AL 等人提供，不受任何版权限制，完全免费向公众开放，可用于非商业及商业用途，采用\n    [CC0 1.0 协议（即放弃所有权利的公共领域奉献）](https:\u002F\u002Fcreativecommons.org\u002Fpublicdomain\u002Fzero\u002F1.0\u002F)。","# AlphaFold 快速上手指南\n\n本指南基于 AlphaFold v2 官方文档整理，旨在帮助开发者快速在 Linux 环境下部署并运行蛋白质结构预测。\n\n## 环境准备\n\n在开始之前，请确保您的机器满足以下硬件和软件要求：\n\n*   **操作系统**：必须为 **Linux**（不支持 Windows 或 macOS）。\n*   **存储空间**：完整安装需要约 **3 TB** 磁盘空间用于存放遗传数据库（强烈建议使用 **SSD** 以提升搜索性能）。\n*   **GPU**：需要现代 NVIDIA GPU（显存越大，可预测的蛋白质结构越大）。\n*   **前置依赖**：\n    *   [Docker](https:\u002F\u002Fwww.docker.com\u002F)\n    *   [NVIDIA Container Toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Finstall-guide.html)（用于 GPU 支持）\n    *   `aria2` 和 `rsync`（用于下载数据）\n    *   Python 3 及 `pip3`\n\n> **注意**：请确保已配置好 Docker 以非 root 用户运行，并验证 GPU 可在 Docker 容器中被识别。\n\n## 安装步骤\n\n### 1. 克隆代码库\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Falphafold.git\ncd .\u002Falphafold\n```\n\n### 2. 安装基础工具\n在大多数 Linux 发行版中，可通过包管理器安装所需工具：\n```bash\nsudo apt update\nsudo apt install aria2 rsync\n```\n\n### 3. 下载遗传数据库和模型参数\n此步骤耗时较长（完整数据库下载量约 556 GB，解压后约 2.62 TB）。\n**重要提示**：`\u003CDOWNLOAD_DIR>` 目录**不能**是 AlphaFold 代码库的子目录，否则会导致 Docker 构建极慢。\n\n```bash\n# 将 \u003CDOWNLOAD_DIR> 替换为您拥有足够空间的实际路径，例如 \u002Fdata\u002Falphafold_db\nscripts\u002Fdownload_all_data.sh \u003CDOWNLOAD_DIR> > download.log 2> download_all.log &\n```\n*如需节省空间，可添加 `reduced_dbs` 参数下载精简版数据库，但需配合相应运行参数使用。*\n\n### 4. 验证 GPU 支持\n运行以下命令确认 Docker 能正确调用 GPU：\n```bash\ndocker run --rm --gpus all nvidia\u002Fcuda:11.0-base nvidia-smi\n```\n如果输出显示了您的 GPU 列表，则环境配置成功。\n\n### 5. 构建 Docker 镜像\n```bash\ndocker build -f docker\u002FDockerfile -t alphafold .\n```\n*若遇到 GPG 签名错误，请参考官方 Issue #463 中的解决方案。*\n\n### 6. 安装运行脚本依赖\n建议在 Python 虚拟环境中安装：\n```bash\npip3 install -r docker\u002Frequirements.txt\n```\n\n## 基本使用\n\n准备好包含蛋白质序列的 FASTA 文件（例如 `your_protein.fasta`）后，即可运行预测。\n\n**运行命令示例：**\n\n```bash\npython3 docker\u002Frun_docker.py \\\n  --fasta_paths=your_protein.fasta \\\n  --max_template_date=2022-01-01 \\\n  --data_dir=\u003CDOWNLOAD_DIR> \\\n  --output_dir=\u002Fhome\u002Fuser\u002Fabsolute_path_to_the_output_dir\n```\n\n**参数说明：**\n*   `--fasta_paths`：输入的蛋白质序列文件路径。\n*   `--max_template_date`：限制模板使用的截止日期，避免数据泄露。\n*   `--data_dir`：步骤 3 中下载的遗传数据库所在目录的绝对路径。\n*   `--output_dir`：预测结果输出的绝对路径（请确保该目录存在且有写入权限）。\n\n运行完成后，预测的蛋白质结构文件将保存在输出目录中。","某生物医药公司的结构生物学团队正急需解析一种新型病毒蛋白的三维结构，以加速抗病毒药物的靶点筛选与设计。\n\n### 没有 alphafold 时\n- **耗时漫长**：依赖传统的 X 射线晶体学或冷冻电镜实验，从蛋白表达、纯化到最终解析结构，往往需要数月甚至数年的反复试错。\n- **成本高昂**：实验过程需要昂贵的专业设备、大量试剂消耗以及高水平技术人员的持续投入，单次失败的经济损失巨大。\n- **成功率低**：对于难以结晶或稳定性差的膜蛋白等目标，传统实验方法经常无法获得高质量数据，导致项目被迫停滞。\n- **信息滞后**：在等待实验结果的漫长周期中，药物研发进程被迫中断，错失抢占治疗窗口的最佳时机。\n\n### 使用 alphafold 后\n- **极速预测**：只需输入氨基酸序列，alphafold 即可在数小时至数天内利用 GPU 集群输出高精度的三维结构模型，将周期从“月级”压缩至“天级”。\n- **成本骤降**：无需前期投入巨额实验经费，仅需常规的服务器算力资源，即可对成千上万个潜在靶点进行低成本的大规模虚拟筛选。\n- **突破瓶颈**：即使面对传统手段难以攻克的复杂蛋白或多聚体（通过 AlphaFold-Multimer），alphafold 也能提供可靠的构象预测，填补实验空白。\n- **迭代高效**：研究人员可基于预测结果快速设计突变体或优化药物分子，实现“预测 - 验证 - 优化”的敏捷研发闭环。\n\nalphafold 将蛋白质结构解析从一项高门槛的实验艺术转变为高效的计算流程，彻底重塑了现代药物发现的速度与广度。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgoogle-deepmind_alphafold_f6135c1c.gif","google-deepmind","Google DeepMind","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fgoogle-deepmind_06b1dd17.png","",null,"https:\u002F\u002Fwww.deepmind.com\u002F","https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind",[85,89,92,96],{"name":86,"color":87,"percentage":88},"Python","#3572A5",97.5,{"name":90,"color":91,"percentage":10},"Shell","#89e051",{"name":93,"color":94,"percentage":95},"Dockerfile","#384d54",0.4,{"name":97,"color":98,"percentage":99},"Jupyter Notebook","#DA5B0B",0.1,14437,2597,"2026-04-06T13:42:40","Apache-2.0",4,"Linux","必需。需要现代 NVIDIA GPU（支持 CUDA），需安装 NVIDIA Container Toolkit。显存大小未明确具体数值，但说明显存越大可预测的蛋白质结构越大。README 中构建镜像时使用的测试基础镜像为 nvidia\u002Fcuda:11.0-base。","未说明",{"notes":109,"python":110,"dependencies":111},"1. 不支持 macOS 和 Windows，仅支持 Linux。2. 必须使用 Docker 运行（或在 HPC 系统上使用 Singularity）。3. 完整遗传数据库下载量约 556 GB，解压后占用高达 2.62 TB 磁盘空间，强烈建议使用 SSD 存储以提升性能。4. 也可选择下载缩减版数据库以减少空间占用。5. 模型参数和代码分别遵循 CC BY 4.0 和 Apache 2.0 许可证。","未说明（需通过 pip3 安装依赖，建议使用 Python 虚拟环境）",[112,113,114,115,116],"Docker","NVIDIA Container Toolkit","aria2c","rsync","docker\u002Frequirements.txt 中定义的库",[18],"2026-03-27T02:49:30.150509","2026-04-07T09:48:16.113409",[121,126,131,136,141,146],{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},21501,"升级到新版本（如 2.2.4）后，AlphaFold Multimer 的预测评分（iptm+ptm）显著下降怎么办？","这是一个已知问题，通常会在后续版本中得到修复。用户反馈显示，将版本升级到 2.3.1 或更高版本后，预测评分恢复正常，且在不同 GPU（如 GTX 1080 Ti, RTX A6000, RTX 2070）上表现一致。建议遇到此问题的用户直接升级 AlphaFold 到最新版本。","https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Falphafold\u002Fissues\u002F597",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},21502,"运行较长序列（约 650 个残基以上）时遇到\"ValueError: Cannot create a tensor proto whose content is larger than 2GB\"错误如何解决？","该错误是由于大序列导致内存溢出引起的。官方已通过提交记录（commit 0be2b30b98f0da7aecb973bde04758fae67eb913）修复了此问题，主要优化了大型 Jackhmmer 输出时的内存使用。建议用户更新 AlphaFold 代码库到包含此修复的最新版本。如果无法立即更新，可以尝试清理 fork 并应用减少内存占用的相关补丁。","https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Falphafold\u002Fissues\u002F71",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},21503,"运行 AlphaFold 时出现\"RuntimeError: HHblits failed\"错误该如何排查？","此错误通常由数据库文件或权限问题引起。建议采取以下步骤：\n1. 单独运行 hhblits 命令（添加 -oa3m 选项）以获取更详细的错误信息。\n2. 检查数据库文件（如 uniref30 和 BFD）是否完整且未损坏。\n3. 确认数据库文件的读取和执行权限。如果数据库存储在外部存储中，确保当前用户有权限访问。可以尝试对每个数据库文件执行 `chmod 755` 或 `chmod 444`，或联系文件所有者授予权限。","https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Falphafold\u002Fissues\u002F97",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},21504,"遇到\"AttributeError: 'Config' object has no attribute 'jax_experimental_name_stack'\"错误是什么原因？","该错误通常是由于 JAX 版本不兼容导致的。AlphaFold 特定版本依赖于特定范围的 JAX 版本。当安装的 JAX 版本过新或过旧，缺少 `jax_experimental_name_stack` 属性时会报此错。解决方法是检查 AlphaFold 的 `requirements.txt` 或安装文档，卸载当前 JAX 并安装与 AlphaFold 版本严格匹配的 JAX、Jaxlib 和 Haiku 版本。","https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Falphafold\u002Fissues\u002F635",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},21505,"AlphaFold 绑定的 OpenMM 版本过旧（7.5.1），是否可以升级到最新版本？","AlphaFold 早期版本确实锁定了 OpenMM 7.5.1 并需要应用二硫键相关的补丁。但在较新的 OpenMM 版本中，该修复已原生包含。虽然社区建议移除版本锁定以使用最新版，但需注意官方可能因稳定性原因暂时维持锁定。如果自行升级，请确保移除旧的补丁文件，并测试二硫键功能是否正常。对于普通用户，建议等待官方更新依赖或使用官方推荐的 Docker 镜像以避免环境冲突。","https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Falphafold\u002Fissues\u002F404",{"id":147,"question_zh":148,"answer_zh":149,"source_url":135},21506,"如何在非 Docker 环境下调试 MSA 生成工具（如 Jackhmmer 或 HHblits）的失败问题？","如果在运行 AlphaFold 流水线时遇到 MSA 工具失败，可以在非 Docker 环境下单独调用这些工具进行调试。例如，对于 HHblits，可以使用 `-oa3m` 参数手动运行命令，观察标准输出和错误日志。这有助于判断是数据库路径配置错误、文件格式问题还是资源不足（如内存或磁盘空间）。同时，确保环境变量中正确设置了数据库路径，并且数据库文件具有正确的读取权限。",[151,156,161,166,171,176,181,186,191,196,201,206,211],{"id":152,"version":153,"summary_zh":154,"released_at":155},127495,"v2.3.2","**更新日志**\n\n- 在 Colab 中使用 `shutil` 实现更稳健的下载功能（感谢 @gmihaila）。\n- 在 `run_alphafold.py` 中新增仅对最佳未松弛模型执行松弛步骤的功能。\n- 改进了排名输出的相关文档（感谢 @ulupo）。\n- 从结果 `.pkl` 文件中移除了对 JAX 的依赖。\n- 将 TensorFlow 更新至 2.11.0。\n- 改进了安装 `aria2c` 的说明文档（感谢 @janxkoci）。\n- 使 mmCIF 解析中的 `_chem_comp.type` 逻辑不区分大小写。\n- 改进了在 Colab 中单元按错误顺序提交时的错误提示信息（例如运行时重启的情况）。\n- 修复了错误的类型注解。\n- 将 Colab 环境中的 Python 版本升级至 3.9。\n- 提升了针对 bfloat16 的掩码 softmax 的鲁棒性。\n- 在 Colab 环境中升级 `pyopenssl` 以修复 `cryptography` 依赖问题。","2023-04-05T09:45:53",{"id":157,"version":158,"summary_zh":159,"released_at":160},127496,"v2.3.1","版本 v2.3.1 包含若干小幅更新。\n\n**变更日志**\n\n- 增加推理时丢弃（dropout）的选项。\n- 在 README 中添加 A100 推理耗时信息。\n- 加快 Colab 中多聚体的 MSA 查找速度。\n- 将 Colab 中允许的最大序列长度提高至 4,000。\n- 改进 AlphaFold 的首次安装和运行说明文档。\n- 文档优化及其他小修复（感谢 @eltociear）。\n- 将部分违规计算固定在 CPU 上执行，以解决松弛阶段的 GPU 内存问题。","2023-01-12T11:32:14",{"id":162,"version":163,"summary_zh":164,"released_at":165},127497,"v2.3.0","版本 v2.3.0 更新了 AlphaFold-Multimer 模型参数。这些新模型预计在处理大型蛋白质复合物时会更加准确，但其模型架构和训练方法与我们先前发布的 AlphaFold-Multimer 论文保持一致。更多详情请参阅 [v2.3.0 发布说明](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Falphafold\u002Fblob\u002Fmain\u002Fdocs\u002Ftechnical_note_v2.3.0.md)。\n\n得益于多项内存优化，AlphaFold-Multimer 现在所需的 GPU 内存更少，因此能够处理更长的蛋白质序列。\n\n此外，还进行了一系列其他错误修复和小幅改进。\n\n**变更日志**\n\n* 新增了在大型蛋白质复合物上精度更高的 AlphaFold-Multimer 模型。\n* 在循环迭代中添加了早停机制。\n* 在 pdb_seqres 下载脚本中增加了对非蛋白质序列的过滤，以避免模板搜索出错。\n* 修复了一个 bug：组氨酸残基在松弛后有时会出现原子坐标互换的问题（感谢 @avwillems）。\n* 将 MGnify 更新至 2022_05，UniRef90 更新至 2022_01，UniClust30 更新至 2021_03，Colab 笔记本中的 UniProt 数据更新至 2021_04。\n* 在多聚体推理中使用 `bf16` 数据类型——降低了 GPU 内存占用。\n* 在 `LayerNorm` 中使用 `bf16` 时，将其上采样为 `fp32`；并将 `hk.LayerNorm` 替换为 `common_modules.LayerNorm`。\n* 为与 AlphaFold Colab 笔记本保持一致，将 Jax 更新至 0.3.25，Haiku 更新至 0.0.9。\n* 将 `TriangleMultiplication` 改为使用融合投影，并进行了其他多项内存优化。\n* 将 AlphaFold Colab 笔记本中的 Python 版本升级至 3.8。\n* 改进了 AlphaFold Colab 笔记本的可用性：现在支持最多 20 条链的多聚体结构，提高了序列长度上限，可控制循环迭代次数，并新增了在多聚体模型上运行单条链的选项。\n* 松弛指标现保存在 `relax_metrics.json` 文件中。\n* 解决了一些 Jax 的弃用错误（感谢 @jinmingyi1998）。\n* 进行了多项文档和代码改进（感谢 @mathe42）。","2022-12-13T11:52:50",{"id":167,"version":168,"summary_zh":169,"released_at":170},127498,"v2.2.4","版本 v2.2.4 是一个错误修复版本\n\n**变更日志**\n* 升级第三方库版本：jax 0.3.17、absl-py 1.0.0、haiku 0.0.7、numpy 1.21.6、tensorflow 2.9.0\n* 调整 `jnp.take` 的实现，以适应新版本 jax 的行为，详情请参阅 https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Falphafold\u002Fissues\u002F513（感谢 @sokrypton）。\n* 通过移除包缓存来减小 Docker 镜像的大小，详情请参阅 https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Falphafold\u002Fpull\u002F526（感谢 @TheDen）。\n* 修复 `backbone_loss` 中的错误参数，详情请参阅 https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Falphafold\u002Fissues\u002F570（感谢 @sokrypton）。","2022-09-21T16:56:55",{"id":172,"version":173,"summary_zh":174,"released_at":175},127499,"v2.2.3","版本 v2.2.3 是一个错误修复版本。\n\n**变更日志**\n* 将 Conda 版本固定为 4.13.0，以避免 Docker\u002FColab 部署问题（感谢 @Meghpal 和 @michaelkeith18）。\n* 将 Colab PAE 的 JSON 输出格式更改为与 AlphaFold 蛋白质结构数据库（AFDB）新版本中使用的格式一致的新格式。有关新格式的说明，请参阅 [AFDB 常见问题解答](https:\u002F\u002Falphafold.ebi.ac.uk\u002Ffaq\u002F#faq-7)。\n* 添加 AFDB 的自述文件。\n* 改进了类型提示。\n* 修复了测试，并优化了内部测试基础设施。\n* 修复了由于 https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fjax\u002Fissues\u002F11142 导致的 Dockerfile 中断问题。","2022-08-25T10:46:29",{"id":177,"version":178,"summary_zh":179,"released_at":180},127500,"v2.2.2","一个小型的错误修复版本，修复了在 v2.2.1 中引入的一个 bug（感谢 @lucajovine）。","2022-06-13T15:55:30",{"id":182,"version":183,"summary_zh":184,"released_at":185},127501,"v2.2.1","版本 v2.2.1 是一个 bug 修复版本。\n\n**变更日志**\n\n* 从 CUDA 11.1 更新至 11.1.1，以解决公钥相关问题。\n* 将 protobuf 版本固定为 3.20.1（感谢 @britnyblu、@ShoufaChen、@aputron）。\n* 在 README 中明确说明 AlphaFold 仅支持在 Linux 系统下运行。\n* 修复了 `jax.tree_multimap` 已弃用的警告。\n* 在 `run_alphafold_test` 中不再复用临时输出目录（感谢 @branfosj）。\n* 修正了 `setup.py` 中的版本号（感谢 @cmeesters）。","2022-06-13T10:57:39",{"id":187,"version":188,"summary_zh":189,"released_at":190},127502,"v2.2.0","版本 v2.2.0 更新了 AlphaFold-Multimer 模型参数。这些新模型平均碰撞数显著减少，且精度略有提升。更多详情请参阅更新后的 [AlphaFold-Multimer 论文](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2021.10.04.463034v2)。\n\n此外，还进行了一系列其他错误修复和小幅改进。\n\n**变更日志**\n* 新增 AlphaFold-Multimer 模型，平均碰撞数大幅减少，精度略有提高。\n* 使用 `DeviceRequest` 而非 `runtime=nvidia` 来向容器暴露 GPU（感谢 @aburger）。\n* 简化了 Docker 中文件的挂载方式。\n* 移除了 `GlobalAttention` 中未使用的 `bias` 参数（感谢 @breadbread1984）。\n* 移除了原核生物 MSA 配对算法，因为其并未在平均意义上提升精度。\n* 增加了每模型可使用多个随机种子的功能，以匹配 AlphaFold-Multimer 论文中的设置。\n* 修复了因错误跳过层而导致在使用回收机制训练的模型中，当 `num_recycle=0` 时性能下降的问题（感谢 @sokrypton）。\n* 在 `sharded_map` 中添加了 `split_rng=False`（当前默认值），以支持 Haiku 的新版本发布。\n* 移除了 `amber_minimize.py` 中未使用的代码。","2022-03-10T15:38:38",{"id":192,"version":193,"summary_zh":194,"released_at":195},127503,"v2.1.2","版本 v2.1.2 是一个修复错误的发布版本，同时也包含了之前的许可证变更。\n\n**更新日志**\n* 将 AlphaFold 参数的许可证从 **CC BY-NC 4.0** 更新为 **CC BY 4.0**。实际模型参数未作任何更改。\n* 放松阶段现在默认在 GPU 上运行，因此速度大约提升了 3 倍。您可以通过 `enable_gpu_relax` 标志来控制此行为（感谢 @ojcharles）。\n* 现在可以使用 `run_relax` 标志来禁用放松阶段（感谢 @bkpoon）。\n* Docker 中的 AlphaFold 现在以当前用户身份运行，而不是以 root 用户身份运行，您可以通过 `docker_user` 标志来控制这一点（感谢 @akors）。\n* 在读取原始 Stockholm 文件时截断 MSA，以防止内存不足问题。这将有助于处理由 Jackhmmer 找到的超大规模 MSA 的情况（感谢 @hegelab）。\n* 将 Dockerfile 中的 CUDA 版本更新至 11.1，并修复 JAX 版本（感谢 @chrisroat）。\n* 对 README、Colab 和标志文档进行了小幅改进。","2022-01-28T10:01:10",{"id":197,"version":198,"summary_zh":199,"released_at":200},127504,"v2.1.1","版本 v2.1.1 是 AlphaFold-Multimer 发布版（v2.1.0）的一个错误修复版本。\n\n**变更日志：**\n\n* 修复了一个 bug，该 bug 会导致当多聚体输入 FASTA 文件的序列描述中包含 SwissProt 标识符时程序崩溃（感谢 @arashnh11 和 @RodenLuo）。\n* 修复了 Colab 笔记本中单链 PAE 可视化的一个 bug（感谢 @Alleko）。\n* 对 README 进行了一些澄清和补充。","2021-11-05T10:06:11",{"id":202,"version":203,"summary_zh":204,"released_at":205},127505,"v2.1.0","Version 2.1.0 adds the AlphaFold-Multimer model and fixes a number of issues reported in the last few months.\r\n\r\n**Change log:**\r\n* [new feature] AlphaFold-Multimer data pipeline, model and metrics have been added. Use `model_preset=multimer` to run with AlphaFold-Multimer.\r\n* [change] AlphaFold-Multimer no longer pre-processes the features via TensorFlow but instead does it in the JAX module code.\r\n* Added a note and a check that the directory with data is outside the AlphaFold repository for faster Docker builds (thanks @jamespeapen).\r\n* Advertise Python 3.7, 3.8, 3.9, 3.10 in setup.py (thanks @anukaal).\r\n* Added an FAQ explaining that the Colab on free tier can time out (thanks @mooingcat).\r\n* Stop using hardcoded `\u002Ftmp`, instead use `$TMPDIR` (thanks @meson800, @EricDeveaud).\r\n* Make run_docker fully configurable via flags: `data_dir`, `docker_image_name`, `output_dir` (thanks @akors, @chrisroat).\r\n* Look for stereo_chemical_props.txt relative to the residue_constants module (thanks @rjgildea).\r\n* Crop UniRef90 MSA to 10,000 sequences to prevent hitting the 2 GB proto field limit and use less memory (thanks @biohegedus and @chrisroat).\r\n* Finding third party tool binaries is now more robust and gives you better errors if any are missing (thanks @FanchTheSystem).\r\n* Refactor and a few fixes and usability improvements in the AlphaFold Colab.\r\n","2021-11-02T17:14:38",{"id":207,"version":208,"summary_zh":209,"released_at":210},127506,"v2.0.1","Version 2.0.1 is mainly a bug fix release. We thank everyone who reported issues and proposed solutions.\r\n\r\n**Change log:**\r\n* [new feature] Added AlphaFold Colab notebook that enables convenient folding from your browser.\r\n* [new feature] The `reduced_dbs` preset was added together with small BFD.\r\n* Some of the genetic databases are now mirrored on GCP.\r\n* Added a missing `data\u002F__init__.py` and `model\u002Ftf\u002F__init__.py` files.\r\n* README fixes and additions.\r\n* Switched to using cudnn base image based on Ubuntu 18.04.\r\n* Switched to `tensorflow-cpu` since we don't need a GPU when running the data pipeline.\r\n* Improved logging in the AlphaFold pipeline.\r\n* Fixed a few typos and added and fixed a few comments. \r\n* Added pLDDT in the B-factor column of the output PDBs.\r\n* Skip obsolete PDB templates that don't have a replacement.\r\n* Small test improvements.","2021-09-30T10:40:21",{"id":212,"version":213,"summary_zh":81,"released_at":214},127507,"v2.0.0","2021-07-16T14:58:00"]