[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-google-deepmind--alphagenome_research":3,"tool-google-deepmind--alphagenome_research":65},[4,23,32,40,49,57],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":22},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,2,"2026-04-05T10:45:23",[13,14,15,16,17,18,19,20,21],"图像","数据工具","视频","插件","Agent","其他","语言模型","开发框架","音频","ready",{"id":24,"name":25,"github_repo":26,"description_zh":27,"stars":28,"difficulty_score":29,"last_commit_at":30,"category_tags":31,"status":22},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,3,"2026-04-04T04:44:48",[17,13,20,19,18],{"id":33,"name":34,"github_repo":35,"description_zh":36,"stars":37,"difficulty_score":29,"last_commit_at":38,"category_tags":39,"status":22},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74913,"2026-04-05T10:44:17",[19,13,20,18],{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":46,"last_commit_at":47,"category_tags":48,"status":22},3215,"awesome-machine-learning","josephmisiti\u002Fawesome-machine-learning","awesome-machine-learning 是一份精心整理的机器学习资源清单，汇集了全球优秀的机器学习框架、库和软件工具。面对机器学习领域技术迭代快、资源分散且难以甄选的痛点，这份清单按编程语言（如 Python、C++、Go 等）和应用场景（如计算机视觉、自然语言处理、深度学习等）进行了系统化分类，帮助使用者快速定位高质量项目。\n\n它特别适合开发者、数据科学家及研究人员使用。无论是初学者寻找入门库，还是资深工程师对比不同语言的技术选型，都能从中获得极具价值的参考。此外，清单还延伸提供了免费书籍、在线课程、行业会议、技术博客及线下聚会等丰富资源，构建了从学习到实践的全链路支持体系。\n\n其独特亮点在于严格的维护标准：明确标记已停止维护或长期未更新的项目，确保推荐内容的时效性与可靠性。作为机器学习领域的“导航图”，awesome-machine-learning 以开源协作的方式持续更新，旨在降低技术探索门槛，让每一位从业者都能高效地站在巨人的肩膀上创新。",72149,1,"2026-04-03T21:50:24",[20,18],{"id":50,"name":51,"github_repo":52,"description_zh":53,"stars":54,"difficulty_score":46,"last_commit_at":55,"category_tags":56,"status":22},2234,"scikit-learn","scikit-learn\u002Fscikit-learn","scikit-learn 是一个基于 Python 构建的开源机器学习库，依托于 SciPy、NumPy 等科学计算生态，旨在让机器学习变得简单高效。它提供了一套统一且简洁的接口，涵盖了从数据预处理、特征工程到模型训练、评估及选择的全流程工具，内置了包括线性回归、支持向量机、随机森林、聚类等在内的丰富经典算法。\n\n对于希望快速验证想法或构建原型的数据科学家、研究人员以及 Python 开发者而言，scikit-learn 是不可或缺的基础设施。它有效解决了机器学习入门门槛高、算法实现复杂以及不同模型间调用方式不统一的痛点，让用户无需重复造轮子，只需几行代码即可调用成熟的算法解决分类、回归、聚类等实际问题。\n\n其核心技术亮点在于高度一致的 API 设计风格，所有估算器（Estimator）均遵循相同的调用逻辑，极大地降低了学习成本并提升了代码的可读性与可维护性。此外，它还提供了强大的模型选择与评估工具，如交叉验证和网格搜索，帮助用户系统地优化模型性能。作为一个由全球志愿者共同维护的成熟项目，scikit-learn 以其稳定性、详尽的文档和活跃的社区支持，成为连接理论学习与工业级应用的最",65628,"2026-04-05T10:10:46",[20,18,14],{"id":58,"name":59,"github_repo":60,"description_zh":61,"stars":62,"difficulty_score":10,"last_commit_at":63,"category_tags":64,"status":22},3364,"keras","keras-team\u002Fkeras","Keras 是一个专为人类设计的深度学习框架，旨在让构建和训练神经网络变得简单直观。它解决了开发者在不同深度学习后端之间切换困难、模型开发效率低以及难以兼顾调试便捷性与运行性能的痛点。\n\n无论是刚入门的学生、专注算法的研究人员，还是需要快速落地产品的工程师，都能通过 Keras 轻松上手。它支持计算机视觉、自然语言处理、音频分析及时间序列预测等多种任务。\n\nKeras 3 的核心亮点在于其独特的“多后端”架构。用户只需编写一套代码，即可灵活选择 TensorFlow、JAX、PyTorch 或 OpenVINO 作为底层运行引擎。这一特性不仅保留了 Keras 一贯的高层易用性，还允许开发者根据需求自由选择：利用 JAX 或 PyTorch 的即时执行模式进行高效调试，或切换至速度最快的后端以获得最高 350% 的性能提升。此外，Keras 具备强大的扩展能力，能无缝从本地笔记本电脑扩展至大规模 GPU 或 TPU 集群，是连接原型开发与生产部署的理想桥梁。",63927,"2026-04-04T15:24:37",[20,14,18],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":71,"readme_en":72,"readme_zh":73,"quickstart_zh":74,"use_case_zh":75,"hero_image_url":76,"owner_login":77,"owner_name":78,"owner_avatar_url":79,"owner_bio":80,"owner_company":81,"owner_location":81,"owner_email":81,"owner_twitter":81,"owner_website":82,"owner_url":83,"languages":84,"stars":89,"forks":90,"last_commit_at":91,"license":92,"difficulty_score":93,"env_os":94,"env_gpu":95,"env_ram":96,"env_deps":97,"category_tags":111,"github_topics":81,"view_count":10,"oss_zip_url":81,"oss_zip_packed_at":81,"status":22,"created_at":112,"updated_at":113,"faqs":114,"releases":140},1789,"google-deepmind\u002Falphagenome_research","alphagenome_research","Research code accompanying AlphaGenome ","alphagenome_research 是谷歌 DeepMind 推出的开源研究代码库，旨在配合其统一的 DNA 序列模型 AlphaGenome 使用。它专注于解析长达 100 万个碱基对的 DNA 序列，以单碱基分辨率精准预测基因表达、剪接模式、染色质特征及染色体接触图谱等多种生物学功能，从而帮助科研人员深入理解基因组运作机制并评估调控变异的影响。\n\n该工具主要解决了传统方法难以在长序列范围内高精度预测复杂基因组功能的难题，为遗传学研究和疾病机理探索提供了强大的计算支持。它特别适合生物信息学家、计算生物学家以及从事基因组学研究的专业开发者使用。用户可利用其提供的 JAX 实现模型、变体评分器及数据加载工具，在本地复现研究或进行二次开发；同时也为普通研究人员提供了 Colab 笔记本，便于快速上手体验模型推理与变异分析。\n\n技术亮点方面，alphagenome_research 基于高效的 JAX 框架构建，支持在 NVIDIA H100 GPU 或 TPU 上运行，并提供了便捷的 API 封装类，让用户能轻松调用预训练权重进行“计算机模拟诱变”等高级分析。虽然直接运行模型对硬件有","alphagenome_research 是谷歌 DeepMind 推出的开源研究代码库，旨在配合其统一的 DNA 序列模型 AlphaGenome 使用。它专注于解析长达 100 万个碱基对的 DNA 序列，以单碱基分辨率精准预测基因表达、剪接模式、染色质特征及染色体接触图谱等多种生物学功能，从而帮助科研人员深入理解基因组运作机制并评估调控变异的影响。\n\n该工具主要解决了传统方法难以在长序列范围内高精度预测复杂基因组功能的难题，为遗传学研究和疾病机理探索提供了强大的计算支持。它特别适合生物信息学家、计算生物学家以及从事基因组学研究的专业开发者使用。用户可利用其提供的 JAX 实现模型、变体评分器及数据加载工具，在本地复现研究或进行二次开发；同时也为普通研究人员提供了 Colab 笔记本，便于快速上手体验模型推理与变异分析。\n\n技术亮点方面，alphagenome_research 基于高效的 JAX 框架构建，支持在 NVIDIA H100 GPU 或 TPU 上运行，并提供了便捷的 API 封装类，让用户能轻松调用预训练权重进行“计算机模拟诱变”等高级分析。虽然直接运行模型对硬件有一定要求，但官方也推荐通过云端 API 方式降低使用门槛，让非专业硬件用户也能受益于这一前沿成果。","![AlphaGenome header image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgoogle-deepmind_alphagenome_research_readme_23f783662f15.png)\n\n# AlphaGenome Research\n\n![Presubmit Checks](https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Falphagenome_research\u002Factions\u002Fworkflows\u002Fpresubmit_checks.yml\u002Fbadge.svg)\n\n[**Model Weights**](#model-weights) | [**Installation**](#installation) |\n[**Quick Start**](#quick-start) |\n[**Documentation**](https:\u002F\u002Fwww.alphagenomedocs.com\u002F) |\n[**Community**](https:\u002F\u002Fwww.alphagenomecommunity.com) |\n[**Terms of Use**](https:\u002F\u002Fdeepmind.google.com\u002Fscience\u002Falphagenome\u002Fmodel-terms)\n\nAlphaGenome is a unified DNA sequence model designed to advance regulatory\nvariant-effect prediction and shed light on genome function. It analyzes DNA\nsequences of up to 1 million base pairs to deliver predictions at single\nbase-pair resolution across diverse modalities, including gene expression,\nsplicing patterns, chromatin features, and contact maps.\n\nThis repository provides the following research code:\n\n-   An implementation of the AlphaGenome model, written in\n    [JAX](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fjax).\n-   An implementation of the\n    [AlphaGenome API](https:\u002F\u002Fdeepmind.google.com\u002Fscience\u002Falphagenome) with\n    accompanying variant scorers.\n-   A dataset loader for reading AlphaGenome training data from TFRecords.\n-   Colab notebooks for\n    [running](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fgoogle-deepmind\u002Falphagenome_research\u002Fblob\u002Fmain\u002Fcolabs\u002Fquick_start.ipynb)\n    the model and\n    [analysing](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fgoogle-deepmind\u002Falphagenome_research\u002Fblob\u002Fmain\u002Fcolabs\u002Fvariant_eval_examples.ipynb)\n    evaluations.\n\nWe strongly recommend using our\n[AlphaGenome API](https:\u002F\u002Fdeepmind.google.com\u002Fscience\u002Falphagenome) to interact\nwith the model without needing specialized hardware.\n\n## Installation\n\n\u003C!-- mdformat off(disable for [!TIP] format) -->\n\n> [!TIP]\n> We strongly recommend you create a\n> [Python Virtual Environment](https:\u002F\u002Fdocs.python.org\u002F3\u002Ftutorial\u002Fvenv.html) to\n> prevent conflicts with your system's Python environment.\n\n\u003C!-- mdformat on -->\n\nTo install, clone a local copy of this repository and run `pip install`:\n\n```bash\n$ git clone https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Falphagenome_research.git\n$ pip install -e .\u002Falphagenome_research\n```\n\nThis will install any required dependencies, including this repository in\n[development mode](https:\u002F\u002Fsetuptools.pypa.io\u002Fen\u002Flatest\u002Fuserguide\u002Fdevelopment_mode.html).\n\n### Model weights\n\nTo use our pre-trained model weights, you can download them from either:\n\n-   [Kaggle](https:\u002F\u002Fwww.kaggle.com\u002Fmodels\u002Fgoogle\u002Falphagenome) or\n-   [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fgoogle\u002Falphagenome)\n\nBoth require accepting our non-commercial\n[model terms](https:\u002F\u002Fdeepmind.google.com\u002Fscience\u002Falphagenome\u002Fmodel-terms).\nRequests are processed immediately.\n\n### Model requirements\n\nIn order to run the model, we recommend running with at least an\n[NVIDIA H100 GPU](https:\u002F\u002Fdocs.cloud.google.com\u002Fcompute\u002Fdocs\u002Fgpus#h100-gpus).\nPlease ensure CUDA, cuDNN and JAX are correctly installed; the\n[JAX installation documentation](https:\u002F\u002Fdocs.jax.dev\u002Fen\u002Flatest\u002Finstallation.html#nvidia-gpu)\nis a useful resource in this regard.\n\nFor training, we recommend running on\n[Tensor Processing Units (TPUs) v3](https:\u002F\u002Fdocs.cloud.google.com\u002Ftpu\u002Fdocs\u002Fv3)\nor higher.\n\n## Quick start\n\nThe easiest way to interact with the AlphaGenome model is using the provided\n[DNA Model class](\u002Fsrc\u002Falphagenome_research\u002Fmodel\u002Fdna_model.py). This wraps the core model and provides a\nmore intuitive set of functions for creating predictions, scoring variants,\nperforming in silico mutagenesis (ISM) and more.\n\nIt also provides the following factory functions to create a model instance\nusing our pre-trained weights:\n\n```python\nfrom alphagenome_research.model import dna_model\n\n# To download from Kaggle:\nmodel = dna_model.create_from_kaggle('all_folds')\n\n# or Hugging Face:\nmodel = dna_model.create_from_huggingface('all_folds')\n```\n\nHere's an example of making a variant prediction using model weights downloaded\nfrom Kaggle:\n\n```python\nfrom alphagenome.data import genome\nfrom alphagenome.visualization import plot_components\nfrom alphagenome_research.model import dna_model\nimport matplotlib.pyplot as plt\n\nmodel = dna_model.create_from_kaggle('all_folds')\n\ninterval = genome.Interval(chromosome='chr22', start=35677410, end=36725986)\nvariant = genome.Variant(\n    chromosome='chr22',\n    position=36201698,\n    reference_bases='A',\n    alternate_bases='C',\n)\n\noutputs = model.predict_variant(\n    interval=interval,\n    variant=variant,\n    ontology_terms=['UBERON:0001157'],\n    requested_outputs=[dna_model.OutputType.RNA_SEQ],\n)\n\nplot_components.plot(\n    [\n        plot_components.OverlaidTracks(\n            tdata={\n                'REF': outputs.reference.rna_seq,\n                'ALT': outputs.alternate.rna_seq,\n            },\n            colors={'REF': 'dimgrey', 'ALT': 'red'},\n        ),\n    ],\n    interval=outputs.reference.rna_seq.interval.resize(2**15),\n    # Annotate the location of the variant as a vertical line.\n    annotations=[plot_components.VariantAnnotation([variant], alpha=0.8)],\n)\nplt.show()\n```\n\nFor further examples, please see our\n[quick-start](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fgoogle-deepmind\u002Falphagenome_research\u002Fblob\u002Fmain\u002Fcolabs\u002Fquick_start.ipynb)\nnotebook.\n\n## Citing AlphaGenome\n\nIf you use AlphaGenome in your research, please cite using:\n\n\u003C!-- disableFinding(SNIPPET_INVALID_LANGUAGE) -->\n\n```bibtex\n@article{alphagenome,\n  title={Advancing regulatory variant effect prediction with {AlphaGenome}},\n  author={Avsec, {\\v Z}iga and Latysheva, Natasha and Cheng, Jun and Novati, Guido and Taylor, Kyle R. and Ward, Tom and Bycroft, Clare and Nicolaisen, Lauren and Arvaniti, Eirini and Pan, Joshua and Thomas, Raina and Dutordoir, Vincent and Perino, Matteo and De, Soham and Karollus, Alexander and Gayoso, Adam and Sargeant, Toby and Mottram, Anne and Wong, Lai Hong and Drot{\\'a}r, Pavol and Kosiorek, Adam and Senior, Andrew and Tanburn, Richard and Applebaum, Taylor and Basu, Souradeep and Hassabis, Demis and Kohli, Pushmeet},\n  journal={Nature},\n  volume={649},\n  number={8099},\n  pages={1206--1218},\n  year={2026},\n  doi={10.1038\u002Fs41586-025-10014-0},\n  publisher={Nature Publishing Group UK London}\n}\n```\n\n\u003C!-- enableFinding(SNIPPET_INVALID_LANGUAGE) -->\n\n## Acknowledgements\n\nAlphaGenome's model release uses the following libraries and packages:\n\n*   [Abseil](https:\u002F\u002Fgithub.com\u002Fabseil\u002Fabseil-py)\n*   [anndata](https:\u002F\u002Fgithub.com\u002Fscverse\u002Fanndata)\n*   [Chex](https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Fchex)\n*   [Einshape](https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Feinshape)\n*   [Etils](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fetils)\n*   [Haiku](https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Fdm-haiku)\n*   [huggingface_hub](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fhuggingface_hub)\n*   [JAX](https:\u002F\u002Fgithub.com\u002Fjax-ml\u002Fjax)\n*   [jaxtyping](https:\u002F\u002Fgithub.com\u002Fpatrick-kidger\u002Fjaxtyping)\n*   [kagglehub](https:\u002F\u002Fgithub.com\u002FKaggle\u002Fkagglehub)\n*   [NumPy](https:\u002F\u002Fnumpy.org\u002F)\n*   [Optax](https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Foptax)\n*   [Orbax](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Forbax)\n*   [pandas](https:\u002F\u002Fpandas.pydata.org\u002F)\n*   [pyBigWig](https:\u002F\u002Fgithub.com\u002Fdeeptools\u002FpyBigWig)\n*   [pyarrow](https:\u002F\u002Farrow.apache.org\u002F)\n*   [pyfaidx](https:\u002F\u002Fgithub.com\u002Fmdshw5\u002Fpyfaidx)\n*   [PyRanges](https:\u002F\u002Fgithub.com\u002Fpyranges\u002Fpyranges)\n*   [TensorFlow](https:\u002F\u002Fwww.tensorflow.org\u002F)\n*   [typeguard](https:\u002F\u002Fgithub.com\u002Fagronholm\u002Ftypeguard)\n\nWe thank all their contributors and maintainers!\n\n## License and Disclaimer\n\nCopyright 2026 Google LLC\n\nAll software is licensed under the Apache License, Version 2.0 (Apache 2.0); you\nmay not use this except in compliance with the Apache 2.0 license. You may\nobtain a copy of the Apache 2.0 license at:\nhttps:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0. As noted above, model weights are\navailable via Kaggle and Hugging Face and are subject to the model terms at:\nhttps:\u002F\u002Fdeepmind.google.com\u002Fscience\u002Falphagenome\u002Fmodel-terms.\n\nCode examples and documentation to help you use the AlphaGenome model are\nlicensed under the Creative Commons Attribution 4.0 International License\n(CC-BY). You may obtain a copy of the CC-BY license at:\nhttps:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby\u002F4.0\u002Flegalcode.\n\nUnless set out below under the *Training Data*, *Evaluation Data* or *Training\nand Evaluation Data* headings, all other materials are licensed under the\nCreative Commons Attribution-NonCommercial 4.0 International License (CC-BY-NC).\nYou may obtain a copy of the CC-BY-NC license at:\nhttps:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-nc\u002F4.0\u002Flegalcode.\n\nUnless required by applicable law or agreed to in writing, all software and\nmaterials distributed here under the Apache 2.0 or CC-BY licenses are\ndistributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,\neither express or implied. See the licenses for the specific language governing\npermissions and limitations under those licenses.\n\nThis is not an official Google product.\n\n### Training Data\n\n**FANTOM5:** This data has been reprocessed. The original FANTOM5 data is made\navailable at https:\u002F\u002Ffantom.gsc.riken.jp\u002F5\u002F under a CC-BY license (see link\nabove for a copy). Citation: *Lizio M, et al. Update of the FANTOM web resource:\nexpansion to provide additional transcriptome atlases. Nucleic Acids Res. 47:\nD752–D758 (2019). https:\u002F\u002Fdoi.org\u002F10.1093\u002Fnar\u002Fgky1099.*\n\n**4D Nucleome:** This data has been reprocessed using the method set out in the\n‘Methods’ section of the accompanying paper. The original 4D Nucleome data is\navailable from the 4DN Data Portal at https:\u002F\u002Fdata.4dnucleome.org\u002F and subject\nto the Data Use Guidelines found there. The 4DN Data Portal is part of 4DN,\ncitation 4DN White Paper (https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fnature23884) and 4DN\nData Portal Paper (https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41467-022-29697-4).\n\n### Evaluation Data\n\n**This data includes: (i) list of variants; (ii) target values; and (iii)\nAlphaGenome predicted score.**\n\n**CAGI:** CAGI data can be obtained from\ngenomeinterpretation.org\u002Fchallenges.html and is subject to the terms found here:\nhttp:\u002F\u002Fwww.genomeinterpretation.org\u002Fdata-use-agreement.html.\n\n**GTEx v8:** GTEx v8 data can be obtained from: gtexportal.org\u002Fhome. The data\nused for the work described in this paper was obtained from:\nhttps:\u002F\u002Fgithub.com\u002Fcalico\u002Fborzoi. Please visit the GTEx Portal for the most up\nto date and accurate version of this data.\n\n**GTEx v8 reprocessed into EMBL-EBI eQTL catalogue:** Data originally made\navailable at the GTEx Portal (see above) with modifications made by EMBL-EBI,\nand provided under a CC-BY-4.0 license a copy of which can be found at\nhttps:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby\u002F4.0\u002Flegalcode. Citation: *Kerimov, N.,\nHayhurst, J.D., Peikova, K. et al. A compendium of uniformly processed human\ngene expression and splicing quantitative trait loci. Nat Genet 53, 1290–1299\n(2021). https:\u002F\u002Fdoi.org\u002F10.1038\u002Fs41588-021-00924-w.*\n\n**ChromBPNet:** ChromBPNet data can be obtained at\nhttps:\u002F\u002Fwww.synapse.org\u002FSynapse:syn59449898\u002Ffiles\u002F. Citation: *Pampari, A. et\nal. ChromBPNet: bias factorized, base-resolution deep learning models of\nchromatin accessibility reveal cis-regulatory sequence syntax, transcription\nfactor footprints and regulatory variants. BioRxiv, 2024–12 (2025).*\n\n**ClinVar:** ClinVar data can be found at:\nhttps:\u002F\u002Fftp.ncbi.nlm.nih.gov\u002Fpub\u002Fclinvar\u002Fvcf_GRCh38\u002F subject to this Data Use\nPolicy https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fclinvar\u002Fdocs\u002Fmaintenance_use\u002F. Citation:\n*Landrum, M. J. et al. ClinVar: improving access to variant interpretations and\nsupporting evidence. Nucleic Acids Res. 2018 Jan 4;46(D1):D1062-D1067. doi:\n10.1093\u002Fnar\u002Fgkx1153.*\n\n**MFASS:** MFASS data can be found at https:\u002F\u002Fgithub.com\u002FKosuriLab\u002FMFASS.\nCitation: *A Multiplexed Assay for Exon Recognition Reveals that an\nUnappreciated Fraction of Rare Genetic Variants Cause Large-Effect Splicing\nDisruptions; Chong, Rockie et al.; Molecular Cell, Volume 73, Issue 1, 183 -\n194.e8.*\n\n**eQTL:** eQTL data is provided with a CC-BY-4.0 license a copy of which can be\nfound here: https:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby\u002F4.0\u002Flegalcode. Citation:\n*Kerimov, N., Hayhurst, J.D., Peikova, K. et al. A compendium of uniformly\nprocessed human gene expression and splicing quantitative trait loci. Nat Genet\n53, 1290–1299 (2021). https:\u002F\u002Fdoi.org\u002F10.1038\u002Fs41588-021-00924-w.*\n\n**Open Targets:** Open Targets data can be obtained at\nhttps:\u002F\u002Fplatform-docs.opentargets.org\u002Flicence and is provided with a Creative\nCommons 1.0 Universal license, a copy of which can be found here:\nhttps:\u002F\u002Fcreativecommons.org\u002Fpublicdomain\u002Fzero\u002F1.0\u002Flegalcode.\n\n**PolyA site annotations:** PolyA site annotations can be obtained here:\nhttps:\u002F\u002Fexon.apps.wistar.org\u002Fpolya_db\u002Fv3\u002F. The data used for this project is a\nreprocessed version which can be found:\nhttps:\u002F\u002Fstorage.googleapis.com\u002Fseqnn-share\u002Fhelper\u002Fpolyadb_human_v3.csv.gz.\nCitation: *Linder, J., Srivastava, D., Yuan, H. et al. Predicting RNA-seq\ncoverage from DNA sequence as a unifying model of gene regulation. Nat Genet 57,\n949–961 (2025). https:\u002F\u002Fdoi.org\u002F10.1038\u002Fs41588-024-02053-6.*\n\n### Training & Evaluation Data\n\n**ENCODE:** This data has been reprocessed. The original ENCODE data is made\navailable at https:\u002F\u002Fwww.encodeproject.org\u002Fhelp\u002Fgetting-started\u002F#download\npursuant to the Data Use Policy at\nhttps:\u002F\u002Fwww.encodeproject.org\u002Fhelp\u002Fciting-encode\u002F. The specific data can be\nfound cited in the Supplementary Tables published as part of the *Advancing\nregulatory variant effect prediction with AlphaGenome* paper. The data is\npresented by the ENCODE Consortium, whose most recent publications are:\n\n-   ENCODE integrative analysis (PMID: 22955616; PMCID: PMC3439153)\n-   ENCODE portal (PMID: 41168159; PMCID: PMC12575607; PMID: 31713622; PMCID:\n    PMC7061942)\n-   ENCODE uniform processing pipelines:\n    https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.04.04.535623.\n\n**GENCODE:** Copyright of the released GENCODE dataset is © 2024 EMBL-EBI. A\nmodified version of the GENCODE dataset (which can be found here:\nhttps:\u002F\u002Fwww.gencodegenes.org\u002Fhuman\u002Freleases.html), is made available with\nreference to the following:\n\n-   Copyright © 2024 EMBL-EBI\n-   The GENCODE dataset is subject to the EMBL-EBI terms of use, available at\n    https:\u002F\u002Fwww.ebi.ac.uk\u002Fabout\u002Fterms-of-use.\n-   Citation: Frankish A, et al (2018) GENCODE reference annotation for the\n    human and mouse genome.\n-   Further details about GENCODE can be found at\n    https:\u002F\u002Fwww.gencodegenes.org\u002Fhuman\u002Freleases.html, with additional citation\n    information at https:\u002F\u002Fwww.gencodegenes.org\u002Fpages\u002Fpublications.html and\n    further acknowledgements can be found at\n    https:\u002F\u002Fwww.gencodegenes.org\u002Fpages\u002Fgencode.html.\n\nTo prepare the dataset, the team followed the method set out here:\nhttps:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Falphagenome\u002Fblob\u002Fmain\u002Fscripts\u002Fprocess_gtf.py\n\n### Third-party software\n\nYour use of any third-party software, libraries or code referenced in the\nmaterials in this repository (including the libraries listed in the\n[Acknowledgments](#acknowledgements) section) may be governed by separate terms\nand conditions or license provisions. Your use of the third-party software,\nlibraries or code is subject to any such terms and you should check that you can\ncomply with any applicable restrictions or terms and conditions before use.\n","![AlphaGenome 页眉图片](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgoogle-deepmind_alphagenome_research_readme_23f783662f15.png)\n\n# AlphaGenome 研究\n\n![预提交检查](https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Falphagenome_research\u002Factions\u002Fworkflows\u002Fpresubmit_checks.yml\u002Fbadge.svg)\n\n[**模型权重**](#model-weights) | [**安装**](#installation) |\n[**快速入门**](#quick-start) |\n[**文档**](https:\u002F\u002Fwww.alphagenomedocs.com\u002F) |\n[**社区**](https:\u002F\u002Fwww.alphagenomecommunity.com) |\n[**使用条款**](https:\u002F\u002Fdeepmind.google.com\u002Fscience\u002Falphagenome\u002Fmodel-terms)\n\nAlphaGenome 是一种统一的 DNA 序列模型，旨在推进调控性变异效应预测，并揭示基因组功能。它能够分析长达 100 万个碱基对的 DNA 序列，在单碱基对分辨率下提供跨多种模态的预测结果，包括基因表达、剪接模式、染色质特征和接触图谱等。\n\n本仓库提供了以下研究代码：\n\n- AlphaGenome 模型的实现，使用 [JAX](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fjax) 编写。\n- 包含配套变异评分器的\n  [AlphaGenome API](https:\u002F\u002Fdeepmind.google.com\u002Fscience\u002Falphagenome) 实现。\n- 用于从 TFRecords 中读取 AlphaGenome 训练数据的数据集加载器。\n- Colab 笔记本，用于\n  [运行](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fgoogle-deepmind\u002Falphagenome_research\u002Fblob\u002Fmain\u002Fcolabs\u002Fquick_start.ipynb)\n  模型，以及\n  [分析](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fgoogle-deepmind\u002Falphagenome_research\u002Fblob\u002Fmain\u002Fcolabs\u002Fvariant_eval_examples.ipynb)\n  评估结果。\n\n我们强烈建议使用我们的\n[AlphaGenome API](https:\u002F\u002Fdeepmind.google.com\u002Fscience\u002Falphagenome)，无需专用硬件即可与模型交互。\n\n## 安装\n\n\u003C!-- mdformat off(disable for [!TIP] format) -->\n\n> [!TIP]\n> 我们强烈建议您创建一个\n> [Python 虚拟环境](https:\u002F\u002Fdocs.python.org\u002F3\u002Ftutorial\u002Fvenv.html)，\n> 以避免与系统 Python 环境发生冲突。\n\n\u003C!-- mdformat on -->\n\n要安装，请克隆本仓库的本地副本并运行 `pip install`：\n\n```bash\n$ git clone https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Falphagenome_research.git\n$ pip install -e .\u002Falphagenome_research\n```\n\n这将安装所有必需的依赖项，包括以\n[开发模式](https:\u002F\u002Fsetuptools.pypa.io\u002Fen\u002Flatest\u002Fuserguide\u002Fdevelopment_mode.html) 安装本仓库。\n\n### 模型权重\n\n要使用我们预训练的模型权重，您可以从以下任一平台下载：\n\n-   [Kaggle](https:\u002F\u002Fwww.kaggle.com\u002Fmodels\u002Fgoogle\u002Falphagenome) 或\n-   [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fgoogle\u002Falphagenome)\n\n两者均要求接受我们的非商业\n[模型条款](https:\u002F\u002Fdeepmind.google.com\u002Fscience\u002Falphagenome\u002Fmodel-terms)。请求将立即处理。\n\n### 模型要求\n\n为了运行模型，我们建议至少使用\n[NVIDIA H100 GPU](https:\u002F\u002Fdocs.cloud.google.com\u002Fcompute\u002Fdocs\u002Fgpus#h100-gpus)。\n请确保正确安装 CUDA、cuDNN 和 JAX；在此方面，\n[JAX 安装文档](https:\u002F\u002Fdocs.jax.dev\u002Fen\u002Flatest\u002Finstallation.html#nvidia-gpu)\n是一个有用的参考资料。\n\n对于训练，我们建议使用\n[Tensor Processing Units (TPUs) v3](https:\u002F\u002Fdocs.cloud.google.com\u002Ftpu\u002Fdocs\u002Fv3)\n或更高版本。\n\n## 快速入门\n\n与 AlphaGenome 模型交互最简单的方式是使用提供的\n[DNA Model 类](\u002Fsrc\u002Falphagenome_research\u002Fmodel\u002Fdna_model.py)。该类封装了核心模型，并提供了一组更直观的函数，用于生成预测、对变异进行评分、执行体外诱变（ISM）等操作。\n\n它还提供了以下工厂函数，用于使用我们预训练的权重创建模型实例：\n\n```python\nfrom alphagenome_research.model import dna_model\n\n# 从 Kaggle 下载：\nmodel = dna_model.create_from_kaggle('all_folds')\n\n# 或者从 Hugging Face：\nmodel = dna_model.create_from_huggingface('all_folds')\n```\n\n以下是使用从 Kaggle 下载的模型权重进行变异预测的示例：\n\n```python\nfrom alphagenome.data import genome\nfrom alphagenome.visualization import plot_components\nfrom alphagenome_research.model import dna_model\nimport matplotlib.pyplot as plt\n\nmodel = dna_model.create_from_kaggle('all_folds')\n\ninterval = genome.Interval(chromosome='chr22', start=35677410, end=36725986)\nvariant = genome.Variant(\n    chromosome='chr22',\n    position=36201698,\n    reference_bases='A',\n    alternate_bases='C',\n)\n\noutputs = model.predict_variant(\n    interval=interval,\n    variant=variant,\n    ontology_terms=['UBERON:0001157'],\n    requested_outputs=[dna_model.OutputType.RNA_SEQ],\n)\n\nplot_components.plot(\n    [\n        plot_components.OverlaidTracks(\n            tdata={\n                'REF': outputs.reference.rna_seq,\n                'ALT': outputs.alternate.rna_seq,\n            },\n            colors={'REF': 'dimgrey', 'ALT': 'red'},\n        ),\n    ],\n    interval=outputs.reference.rna_seq.interval.resize(2**15),\n    # 在变异位置添加一条垂直线作为标注。\n    annotations=[plot_components.VariantAnnotation([variant], alpha=0.8)],\n)\nplt.show()\n```\n\n更多示例，请参阅我们的\n[快速入门](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fgoogle-deepmind\u002Falphagenome_research\u002Fblob\u002Fmain\u002Fcolabs\u002Fquick_start.ipynb)\n笔记本。\n\n## 引用 AlphaGenome\n\n如果您在研究中使用 AlphaGenome，请按以下方式引用：\n\n\u003C!-- disableFinding(SNIPPET_INVALID_LANGUAGE) -->\n\n```bibtex\n@article{alphagenome,\n  title={Advancing regulatory variant effect prediction with {AlphaGenome}},\n  author={Avsec, {\\v Z}iga and Latysheva, Natasha and Cheng, Jun and Novati, Guido and Taylor, Kyle R. and Ward, Tom and Bycroft, Clare and Nicolaisen, Lauren and Arvaniti, Eirini and Pan, Joshua and Thomas, Raina and Dutordoir, Vincent and Perino, Matteo and De, Soham and Karollus, Alexander and Gayoso, Adam and Sargeant, Toby and Mottram, Anne and Wong, Lai Hong and Drot{\\'a}r, Pavol and Kosiorek, Adam and Senior, Andrew and Tanburn, Richard and Applebaum, Taylor and Basu, Souradeep and Hassabis, Demis and Kohli, Pushmeet},\n  journal={Nature},\n  volume={649},\n  number={8099},\n  pages={1206--1218},\n  year={2026},\n  doi={10.1038\u002Fs41586-025-10014-0},\n  publisher={Nature Publishing Group UK London}\n}\n```\n\n\u003C!-- enableFinding(SNIPPET_INVALID_LANGUAGE) -->\n\n## 致谢\n\nAlphaGenome 模型发布使用了以下库和软件包：\n\n*   [Abseil](https:\u002F\u002Fgithub.com\u002Fabseil\u002Fabseil-py)\n*   [anndata](https:\u002F\u002Fgithub.com\u002Fscverse\u002Fanndata)\n*   [Chex](https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Fchex)\n*   [Einshape](https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Feinshape)\n*   [Etils](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fetils)\n*   [Haiku](https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Fdm-haiku)\n*   [huggingface_hub](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fhuggingface_hub)\n*   [JAX](https:\u002F\u002Fgithub.com\u002Fjax-ml\u002Fjax)\n*   [jaxtyping](https:\u002F\u002Fgithub.com\u002Fpatrick-kidger\u002Fjaxtyping)\n*   [kagglehub](https:\u002F\u002Fgithub.com\u002FKaggle\u002Fkagglehub)\n*   [NumPy](https:\u002F\u002Fnumpy.org\u002F)\n*   [Optax](https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Foptax)\n*   [Orbax](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Forbax)\n*   [pandas](https:\u002F\u002Fpandas.pydata.org\u002F)\n*   [pyBigWig](https:\u002F\u002Fgithub.com\u002Fdeeptools\u002FpyBigWig)\n*   [pyarrow](https:\u002F\u002Farrow.apache.org\u002F)\n*   [pyfaidx](https:\u002F\u002Fgithub.com\u002Fmdshw5\u002Fpyfaidx)\n*   [PyRanges](https:\u002F\u002Fgithub.com\u002Fpyranges\u002Fpyranges)\n*   [TensorFlow](https:\u002F\u002Fwww.tensorflow.org\u002F)\n*   [typeguard](https:\u002F\u002Fgithub.com\u002Fagronholm\u002Ftypeguard)\n\n我们感谢所有这些项目的贡献者和维护者！\n\n## 许可与免责声明\n\n版权所有 2026 Google LLC\n\n所有软件均采用 Apache License, Version 2.0（Apache 2.0）许可协议；除非符合 Apache 2.0 许可协议的规定，否则不得使用。您可以在以下网址获取 Apache 2.0 许可协议的副本：\nhttps:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0。如上所述，模型权重可通过 Kaggle 和 Hugging Face 获取，并受以下模型条款约束：\nhttps:\u002F\u002Fdeepmind.google.com\u002Fscience\u002Falphagenome\u002Fmodel-terms。\n\n用于帮助您使用 AlphaGenome 模型的代码示例和文档采用知识共享署名 4.0 国际许可协议（CC-BY）授权。您可以在以下网址获取 CC-BY 许可协议的副本：\nhttps:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby\u002F4.0\u002Flegalcode。\n\n除下文“训练数据”、“评估数据”或“训练与评估数据”标题下另有说明外，所有其他材料均采用知识共享署名-非商业性使用 4.0 国际许可协议（CC-BY-NC）授权。您可以在以下网址获取 CC-BY-NC 许可协议的副本：\nhttps:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-nc\u002F4.0\u002Flegalcode。\n\n除非适用法律要求或双方另有书面约定，否则在此依据 Apache 2.0 或 CC-BY 许可协议分发的所有软件和材料均按“现状”提供，不附带任何形式的明示或默示保证或条件。有关这些许可协议下的具体权限和限制，请参阅相应的许可协议文本。\n\n本产品并非 Google 官方产品。\n\n### 训练数据\n\n**FANTOM5：** 此数据已重新处理。原始 FANTOM5 数据可在 https:\u002F\u002Ffantom.gsc.riken.jp\u002F5\u002F 下以 CC-BY 许可协议获取（请参阅上述链接以获取副本）。引用文献：*Lizio M, et al. 更新 FANTOM 网络资源：扩展以提供更多转录组图谱。Nucleic Acids Res. 47: D752–D758 (2019)。https:\u002F\u002Fdoi.org\u002F10.1093\u002Fnar\u002Fgky1099.*\n\n**4D 核组：** 此数据已按照随附论文“方法”部分中所述的方法进行重新处理。原始 4D 核组数据可从 4DN 数据门户 https:\u002F\u002Fdata.4dnucleome.org\u002F 获取，并受该网站上提供的数据使用指南约束。4DN 数据门户是 4DN 的一部分，引用文献为 4DN 白皮书（https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fnature23884）和 4DN 数据门户论文（https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41467-022-29697-4）。\n\n### 评估数据\n\n**此数据包括：(i) 变异列表；(ii) 目标值；以及 (iii) AlphaGenome 预测得分。**\n\n**CAGI：** CAGI 数据可从 genomeinterpretation.org\u002Fchallenges.html 获取，并受此处提供的条款约束：\nhttp:\u002F\u002Fwww.genomeinterpretation.org\u002Fdata-use-agreement.html。\n\n**GTEx v8：** GTEx v8 数据可从 gtexportal.org\u002Fhome 获取。本文所描述工作的数据来源于：\nhttps:\u002F\u002Fgithub.com\u002Fcalico\u002Fborzoi。请访问 GTEx 门户网站以获取最新且最准确的数据版本。\n\n**GTEx v8 重新处理后纳入 EMBL-EBI eQTL 目录：** 原始数据由 GTEx 门户网站提供（见上文），经 EMBL-EBI 修改后发布，并采用 CC-BY-4.0 许可协议授权，其副本可在以下网址找到：\nhttps:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby\u002F4.0\u002Flegalcode。引用文献：*Kerimov, N., Hayhurst, J.D., Peikova, K. et al. 统一处理的人类基因表达及剪接数量性状位点汇编。Nat Genet 53, 1290–1299 (2021)。https:\u002F\u002Fdoi.org\u002F10.1038\u002Fs41588-021-00924-w.*\n\n**ChromBPNet：** ChromBPNet 数据可从\nhttps:\u002F\u002Fwww.synapse.org\u002FSynapse:syn59449898\u002Ffiles\u002F 获取。引用文献：*Pampari, A. et al. ChromBPNet：偏置因子分解、碱基分辨率的染色质可及性深度学习模型揭示顺式调控序列语法、转录因子足迹及调控变异。BioRxiv, 2024–12 (2025)。*\n\n**ClinVar：** ClinVar 数据可在以下网址找到：\nhttps:\u002F\u002Fftp.ncbi.nlm.nih.gov\u002Fpub\u002Fclinvar\u002Fvcf_GRCh38\u002F，并受此数据使用政策约束：\nhttps:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fclinvar\u002Fdocs\u002Fmaintenance_use\u002F。引用文献：*Landrum, M. J. et al. ClinVar：改善变异解读的可及性并提供支持证据。Nucleic Acids Res. 2018 年 1 月 4 日，46(D1): D1062–D1067。doi：10.1093\u002Fnar\u002Fgkx1153.*\n\n**MFASS：** MFASS 数据可在 https:\u002F\u002Fgithub.com\u002FKosuriLab\u002FMFASS 找到。引用文献：*一种多重化外显子识别检测表明，未被充分认识的罕见遗传变异中有相当一部分会导致大效应的剪接紊乱；Chong, Rockie 等人；Molecular Cell，第 73 卷第 1 期，183–194.e8。*\n\n**eQTL：** eQTL 数据采用 CC-BY-4.0 许可协议授权，其副本可在以下网址找到：\nhttps:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby\u002F4.0\u002Flegalcode。引用文献：*Kerimov, N., Hayhurst, J.D., Peikova, K. et al. 统一处理的人类基因表达及剪接数量性状位点汇编。Nat Genet 53, 1290–1299 (2021)。https:\u002F\u002Fdoi.org\u002F10.1038\u002Fs41588-021-00924-w。*\n\n**Open Targets：** Open Targets 数据可在\nhttps:\u002F\u002Fplatform-docs.opentargets.org\u002Flicence 获取，并采用知识共享 1.0 通用许可协议授权，其副本可在以下网址找到：\nhttps:\u002F\u002Fcreativecommons.org\u002Fpublicdomain\u002Fzero\u002F1.0\u002Flegalcode。\n\n**PolyA 位点注释：** PolyA 位点注释可在以下网址获取：\nhttps:\u002F\u002Fexon.apps.wistar.org\u002Fpolya_db\u002Fv3\u002F。本项目使用的数据是经过重新处理的版本，可在以下地址找到：\nhttps:\u002F\u002Fstorage.googleapis.com\u002Fseqnn-share\u002Fhelper\u002Fpolyadb_human_v3.csv.gz。引用文献：*Linder, J., Srivastava, D., Yuan, H. et al. 以 DNA 序列预测 RNA-seq 覆盖率作为统一的基因调控模型。Nat Genet 57, 949–961 (2025)。https:\u002F\u002Fdoi.org\u002F10.1038\u002Fs41588-024-02053-6。*\n\n### 训练与评估数据\n\n**ENCODE：** 本数据已重新处理。原始 ENCODE 数据可根据数据使用政策（https:\u002F\u002Fwww.encodeproject.org\u002Fhelp\u002Fciting-encode\u002F）在 https:\u002F\u002Fwww.encodeproject.org\u002Fhelp\u002Fgetting-started\u002F#download 上获取。具体数据已在《利用 AlphaGenome 推进调控性变异效应预测》论文的补充表格中列出并引用。该数据由 ENCODE 联盟提供，其最新发表的文献包括：\n\n- ENCODE 整合分析（PMID: 22955616；PMCID: PMC3439153）\n- ENCODE 门户（PMID: 41168159；PMCID: PMC12575607；PMID: 31713622；PMCID: PMC7061942）\n- ENCODE 统一处理流程：\n  https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.04.04.535623。\n\n**GENCODE：** 已发布的 GENCODE 数据集版权归 EMBL-EBI 所有，版权年份为 2024 年。经过修改的 GENCODE 数据集版本（可在 https:\u002F\u002Fwww.gencodegenes.org\u002Fhuman\u002Freleases.html 查阅）在使用时需注明以下信息：\n\n- 版权所有 © 2024 EMBL-EBI\n- GENCODE 数据集受 EMBL-EBI 使用条款约束，相关条款可在 https:\u002F\u002Fwww.ebi.ac.uk\u002Fabout\u002Fterms-of-use 查阅。\n- 引用文献：Frankish A, et al (2018) GENCODE 参考注释——人类与小鼠基因组。\n- 关于 GENCODE 的更多详细信息请参见 https:\u002F\u002Fwww.gencodegenes.org\u002Fhuman\u002Freleases.html，进一步的引用信息可在 https:\u002F\u002Fwww.gencodegenes.org\u002Fpages\u002Fpublications.html 查阅，其他致谢内容则位于 https:\u002F\u002Fwww.gencodegenes.org\u002Fpages\u002Fgencode.html。\n\n为准备该数据集，团队遵循了此处所述的方法：https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Falphagenome\u002Fblob\u002Fmain\u002Fscripts\u002Fprocess_gtf.py。\n\n### 第三方软件\n\n您对本仓库中材料所引用的任何第三方软件、库或代码的使用（包括“致谢”章节中列出的库），可能受单独的条款与条件或许可协议约束。您对这些第三方软件、库或代码的使用须遵守相应规定，并应在使用前确认自己能够符合所有适用的限制或条款与条件。","# AlphaGenome Research 快速上手指南\n\nAlphaGenome 是一个统一的 DNA 序列模型，旨在推进调控变异效应预测并揭示基因组功能。它能分析长达 100 万碱基对的 DNA 序列，并在单碱基分辨率下提供基因表达、剪接模式、染色质特征等多种模态的预测。\n\n## 环境准备\n\n### 系统要求\n*   **操作系统**: Linux (推荐) 或 macOS。\n*   **GPU**: 运行模型推荐至少使用 **NVIDIA H100 GPU**。请确保已正确安装 CUDA、cuDNN 和 JAX。\n    *   *注：对于训练任务，推荐使用 TPU v3 或更高版本。*\n*   **Python**: 建议使用 Python 3.8 或更高版本。\n\n### 前置依赖\n强烈建议创建独立的 Python 虚拟环境以避免冲突：\n\n```bash\npython3 -m venv alphagenome_env\nsource alphagenome_env\u002Fbin\u002Factivate\n```\n\n确保 `pip` 为最新版本：\n```bash\npip install --upgrade pip\n```\n\n## 安装步骤\n\n### 1. 克隆仓库并安装\n从 GitHub 克隆代码库并以开发模式安装：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Falphagenome_research.git\npip install -e .\u002Falphagenome_research\n```\n\n此命令将自动安装所有必要的依赖项（包括 JAX, Haiku, TensorFlow 等）。\n\n> **提示**：如果下载速度较慢，可尝试配置国内 pip 镜像源（如清华源）：\n> `pip install -e .\u002Falphagenome_research -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n\n### 2. 获取模型权重\n使用前需下载预训练模型权重。您需要先接受 [非商业用途模型条款](https:\u002F\u002Fdeepmind.google.com\u002Fscience\u002Falphagenome\u002Fmodel-terms)。\n\n权重可通过以下任一平台下载（申请后通常立即处理）：\n*   **Kaggle**: https:\u002F\u002Fwww.kaggle.com\u002Fmodels\u002Fgoogle\u002Falphagenome\n*   **Hugging Face**: https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fgoogle\u002Falphagenome\n\n## 基本使用\n\n最简单的交互方式是使用提供的 `DNA Model` 类。以下示例演示如何从 Kaggle 加载模型并进行变异位点预测。\n\n### 代码示例\n\n```python\nfrom alphagenome.data import genome\nfrom alphagenome.visualization import plot_components\nfrom alphagenome_research.model import dna_model\nimport matplotlib.pyplot as plt\n\n# 1. 加载模型 (从 Kaggle 或 Hugging Face)\n# 若使用 Hugging Face，请改用: model = dna_model.create_from_huggingface('all_folds')\nmodel = dna_model.create_from_kaggle('all_folds')\n\n# 2. 定义基因组区间和变异位点\ninterval = genome.Interval(chromosome='chr22', start=35677410, end=36725986)\nvariant = genome.Variant(\n    chromosome='chr22',\n    position=36201698,\n    reference_bases='A',\n    alternate_bases='C',\n)\n\n# 3. 执行变异预测\n# ontology_terms 指定组织类型 (此处为 UBERON:0001157)，requested_outputs 指定输出模态\noutputs = model.predict_variant(\n    interval=interval,\n    variant=variant,\n    ontology_terms=['UBERON:0001157'],\n    requested_outputs=[dna_model.OutputType.RNA_SEQ],\n)\n\n# 4. 可视化结果\nplot_components.plot(\n    [\n        plot_components.OverlaidTracks(\n            tdata={\n                'REF': outputs.reference.rna_seq,\n                'ALT': outputs.alternate.rna_seq,\n            },\n            colors={'REF': 'dimgrey', 'ALT': 'red'},\n        ),\n    ],\n    interval=outputs.reference.rna_seq.interval.resize(2**15),\n    # 标注变异位置\n    annotations=[plot_components.VariantAnnotation([variant], alpha=0.8)],\n)\nplt.show()\n```\n\n更多详细示例和进阶用法，请参考官方提供的 [Colab 快速入门笔记本](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fgoogle-deepmind\u002Falphagenome_research\u002Fblob\u002Fmain\u002Fcolabs\u002Fquick_start.ipynb)。","某生物制药公司的基因组学团队正在评估一种罕见遗传病候选药物靶点，需要精准预测患者特定 DNA 变异对基因表达及剪接模式的深层影响。\n\n### 没有 alphagenome_research 时\n- 传统模型仅能分析短片段序列，无法捕捉长达 100 万碱基对范围内的远端调控元件相互作用，导致预测盲区。\n- 研究人员需分别运行多个独立工具来预测基因表达、剪接或染色质特征，数据整合困难且结果往往相互矛盾。\n- 缺乏单碱基分辨率的精细评估，难以区分同义突变与致病突变的细微差异，严重拖慢候选靶点的筛选速度。\n- 本地部署高精度模型依赖昂贵的专用硬件集群，中小规模实验室难以承担算力成本，只能依赖低精度的云端 API。\n\n### 使用 alphagenome_research 后\n- 利用其统一的 DNA 序列模型，直接输入百万级碱基序列，一次性获得涵盖基因表达、剪接模式及接触图谱的全方位高分辨率预测。\n- 通过内置的变体评分器（variant scorers）和计算机诱变功能，快速量化特定突变的功能效应，将数周的实验验证工作压缩至几小时。\n- 基于 JAX 的高效实现配合预训练权重，使团队能在单张 NVIDIA H100 GPU 上完成复杂推理，大幅降低了高性能计算的门槛。\n- 提供的 Colab 笔记本和数据加载器让研究人员能立即复现论文结果，并灵活定制针对特定疾病位点的深度分析流程。\n\nalphagenome_research 将原本碎片化、高门槛的基因组功能分析转化为统一、高效且精准的标准化流程，显著加速了从基因变异发现到药物靶点确证的研发周期。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fgoogle-deepmind_alphagenome_research_23f78366.png","google-deepmind","Google DeepMind","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fgoogle-deepmind_06b1dd17.png","",null,"https:\u002F\u002Fwww.deepmind.com\u002F","https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind",[85],{"name":86,"color":87,"percentage":88},"Python","#3572A5",100,704,106,"2026-04-05T06:40:06","Apache-2.0",4,"Linux","推理推荐至少 NVIDIA H100 GPU；需正确安装 CUDA、cuDNN 和 JAX（具体版本未说明，需参考 JAX 官方文档）","未说明",{"notes":98,"python":99,"dependencies":100},"该工具基于 JAX 框架，非 PyTorch。推理强烈推荐使用 NVIDIA H100 GPU，训练推荐 TPU v3 或更高版本。模型权重需从 Kaggle 或 Hugging Face 下载，并同意非商业用途条款。建议创建 Python 虚拟环境以避免冲突。提供了 Colab 笔记本以便在无专用硬件环境下体验。","未说明 (建议使用 Python 虚拟环境)",[101,102,103,104,105,106,107,108,109,110],"jax","dm-haiku","optax","tensorflow","numpy","pandas","pyBigWig","pyfaidx","huggingface_hub","kagglehub",[18],"2026-03-27T02:49:30.150509","2026-04-06T07:05:56.987033",[115,120,125,130,135],{"id":116,"question_zh":117,"answer_zh":118,"source_url":119},9515,"遇到 'INVALID_ARGUMENT: Disallowed host-to-device transfer' 错误怎么办？","这是一个已知的 Bug，原因是 `score_interval` 方法在调用 jit 编译的 `_predict` 函数时，未将 numpy 数组正确转换为 JAX 设备数组（缺少 `jax.device_put()`）。该问题已在提交 be6c782 中修复。请更新到最新版本的 `alphagenome_research` 即可解决。","https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Falphagenome_research\u002Fissues\u002F8",{"id":121,"question_zh":122,"answer_zh":123,"source_url":124},9516,"Quickstart 笔记本中绘图缺少 CTCF 轨道（track）如何解决？","这是因为代码中过滤条件使用了 `.isnull()` 来检查 `genetically_modified` 字段，导致无法正确筛选出 CTCF 数据。正确的做法是将过滤条件改为 `predictions.chip_tf.metadata['genetically_modified'] == False`。该问题已在后续版本中合并修复，如果使用的是旧版本，请手动修改代码或升级库。","https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Falphagenome_research\u002Fissues\u002F16",{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},9517,"加载模型时出现 'CUDA-enabled jaxlib is not installed' 警告或报错怎么办？","这通常意味着 JAX 未正确检测到 CUDA 环境。解决方案如下：\n1. 按照官方指南安装 CUDA（例如 CUDA 13.1）：https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads\n2. 在 `~\u002F.bashrc` 中添加以下环境变量配置：\n   export CUDA13=\u002Fusr\u002Flocal\u002Fcuda-13.1\u002Flib64\n   export LD_PRELOAD=$CUDA13\u002Flibcublas.so.13:$CUDA13\u002FlibcublasLt.so.13\n3. 确保 `LD_LIBRARY_PATH` 没有覆盖 CUDA 库目录。\n4. 参考 JAX 安装指南重新配置虚拟环境：https:\u002F\u002Fdocs.jax.dev\u002Fen\u002Flatest\u002Finstallation.html\n如果仍无法解决，可以在创建模型时显式传递 CPU 设备参数，但性能会下降。","https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Falphagenome_research\u002Fissues\u002F13",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},9518,"为什么 DNASE 轨迹的处理包含 1-bp 的偏移逻辑？","这是为了保留数据的碱基分辨率特性并解耦酶切偏差与真实信号。根据论文的补充信息，原始 BAM 文件被转换为碱基分辨率的 count bigWig 文件。对于 DNase-seq 数据，应用了 0\u002F+1 的读取偏移（shifts），而 ATAC-seq 数据则应用了 +4\u002F-4 的偏移。这种处理是在数据预处理阶段进行的，旨在更准确地反映 Tn5 转座酶插入或 DNase-I 切割位点的真实位置。","https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Falphagenome_research\u002Fissues\u002F19",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},9519,"在小鼠（MUS_MUSCULUS）上运行 predict_interval 时出现维度不匹配错误（AssertionError）怎么办？","这是一个针对非人类物种（如小鼠）进行预测时的已知 Bug，表现为 `augmentation.py` 中的维度不匹配。维护者已确认该问题并发布了修复补丁。如果您遇到此错误，请将 `alphagenome_research` 库更新至最新版本即可解决。","https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Falphagenome_research\u002Fissues\u002F14",[141,146,151],{"id":142,"version":143,"summary_zh":144,"released_at":145},106254,"v0.2.0","### Added\r\n\r\n-   Non-zero mean computation to fine-tuning notebook.\r\n-   Gene LFC loss for GenomeTracksHead.\r\n-   Align alternate predictions for splice junctions when making indel\r\n    predictions.\r\n\r\n### Changed\r\n\r\n-   Fix splice-site and splice-site usage scoring to use the correct gene mask\r\n    extractor.\r\n-   Only consider splice sites that are within a gene body, and overlaps with\r\n    the provided variant.\r\n-   Pass requested outputs to the model prediction call, significantly reducing\r\n    roofline memory consumption by eliding redundant computation.\r\n-   Pad gene masks in gene scoring. This reduces the number of jit\r\n    recompilations for different numbers of genes.","2026-04-02T16:00:16",{"id":147,"version":148,"summary_zh":149,"released_at":150},106255,"v0.1.0","### Added\r\n\r\n-   Add code and notebook example for fine-tuning AlphaGenome.\r\n-   Better support for loading pre-trained model with different heads and\r\n    metadata.\r\n\r\n### Changed\r\n\r\n-   Various fixes to better align predictions with the AlphaGenome API (e.g.\r\n    metadata column names).\r\n-   Fixed variant extraction to correctly insert variants on the negative\r\n    strand.","2026-02-20T09:46:36",{"id":152,"version":153,"summary_zh":154,"released_at":155},106256,"v0.0.1","Initial release.","2026-01-28T15:37:50"]