[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-snap-stanford--ogb":3,"tool-snap-stanford--ogb":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",144730,2,"2026-04-07T23:26:32",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":72,"owner_avatar_url":73,"owner_bio":74,"owner_company":75,"owner_location":75,"owner_email":75,"owner_twitter":75,"owner_website":75,"owner_url":76,"languages":77,"stars":82,"forks":83,"last_commit_at":84,"license":85,"difficulty_score":32,"env_os":74,"env_gpu":86,"env_ram":87,"env_deps":88,"category_tags":99,"github_topics":100,"view_count":32,"oss_zip_url":75,"oss_zip_packed_at":75,"status":17,"created_at":105,"updated_at":106,"faqs":107,"releases":138},5286,"snap-stanford\u002Fogb","ogb","Benchmark datasets, data loaders, and evaluators for graph machine learning","OGB（Open Graph Benchmark）是一个专为图机器学习领域打造的开源基准平台，旨在提供标准化的数据集、数据加载器及评估工具。在图神经网络研究中，过去常面临数据集格式不统一、划分方式随意以及评估指标不一致等痛点，导致不同算法之间难以进行公平对比。OGB 通过提供覆盖节点、链接和图三个层级的预测任务，有效解决了这一难题。\n\n该平台收录了来自科学计算、社交网络及知识图谱等多个领域的丰富数据，规模从小型单卡可处理到需要多卡分布式训练的大型图应有尽有。其核心亮点在于高度兼容主流深度学习框架（如 PyTorch Geometric 和 DGL），用户仅需几行代码即可自动完成数据下载、预处理及标准化划分。此外，OGB 内置了统一的评估器，确保实验结果的可复现性和可比性。\n\nOGB 非常适合从事图机器学习算法研究的科研人员、希望快速验证模型效果的开发者，以及需要权威基准来评估新方法的工程师使用。无论是学术探索还是工业界应用，OGB 都能帮助用户摆脱繁琐的数据工程负担，将精力集中于核心算法的创新与优化上。","\u003Cp align='center'>\n  \u003Cimg width='40%' src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsnap-stanford_ogb_readme_9efe9ec4bd3a.png' \u002F>\n\u003C\u002Fp>\n\n--------------------------------------------------------------------------------\n\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fogb)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fogb\u002F)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-blue.svg)](https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\u002Fblob\u002Fmaster\u002FLICENSE)\n\n## Overview\n\nThe Open Graph Benchmark (OGB) is a collection of benchmark datasets, data loaders, and evaluators for graph machine learning. Datasets cover a variety of graph machine learning tasks and real-world applications.\nThe OGB data loaders are fully compatible with popular graph deep learning frameworks, including [PyTorch Geometric](https:\u002F\u002Fpytorch-geometric.readthedocs.io\u002Fen\u002Flatest\u002F) and [Deep Graph Library (DGL)](https:\u002F\u002Fwww.dgl.ai\u002F). They provide automatic dataset downloading, standardized dataset splits, and unified performance evaluation.\n\n\u003Cp align='center'>\n  \u003Cimg width='80%' src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsnap-stanford_ogb_readme_a868b79d2bae.png' \u002F>\n\u003C\u002Fp>\n\nOGB aims to provide graph datasets that cover important graph machine learning tasks, diverse dataset scale, and rich domains.\n\n**Graph ML Tasks:** We cover three fundamental graph machine learning tasks: prediction at the level of nodes, links, and graphs.\n\n**Diverse scale:** Small-scale graph datasets can be processed within a single GPU, while medium- and large-scale graphs might require multiple GPUs or clever sampling\u002Fpartition techniques.\n\n**Rich domains:** Graph datasets come from diverse domains ranging from scientific ones to social\u002Finformation networks, and also include heterogeneous knowledge graphs. \n\n\u003Cp align='center'>\n  \u003Cimg width='70%' src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsnap-stanford_ogb_readme_944273fad650.png' \u002F>\n\u003C\u002Fp>\n\nOGB is an on-going effort, and we are planning to increase our coverage in the future.\n\n## Installation\nYou can install OGB using Python's package manager `pip`.\n**If you have previously installed ogb, please make sure you update the version to 1.3.6.**\nThe release note is available [here](https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\u002Freleases\u002Ftag\u002F1.3.6).\n\n#### Requirements\n - Python>=3.6\n - PyTorch>=1.6\n - DGL>=0.5.0 or torch-geometric>=2.0.2\n - Numpy>=1.16.0\n - pandas>=0.24.0\n - urllib3>=1.24.0\n - scikit-learn>=0.20.0\n - outdated>=0.2.0\n\n#### Pip install\nThe recommended way to install OGB is using Python's package manager pip:\n```bash\npip install ogb\n```\n\n```bash\npython -c \"import ogb; print(ogb.__version__)\"\n# This should print \"1.3.6\". Otherwise, please update the version by\npip install -U ogb\n```\n\n\n#### From source\nYou can also install OGB from source. This is recommended if you want to contribute to OGB.\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\ncd ogb\npip install -e .\n```\n\n## Package Usage\nWe highlight two key features of OGB, namely, (1) easy-to-use data loaders, and (2) standardized evaluators.\n#### (1) Data loaders\nWe prepare easy-to-use PyTorch Geometric and DGL data loaders. We handle dataset downloading as well as standardized dataset splitting.\nBelow, on PyTorch Geometric, we see that a few lines of code is sufficient to prepare and split the dataset! Needless to say, you can enjoy the same convenience for DGL!\n```python\nfrom ogb.graphproppred import PygGraphPropPredDataset\nfrom torch_geometric.loader import DataLoader\n\n# Download and process data at '.\u002Fdataset\u002Fogbg_molhiv\u002F'\ndataset = PygGraphPropPredDataset(name = 'ogbg-molhiv')\n\nsplit_idx = dataset.get_idx_split() \ntrain_loader = DataLoader(dataset[split_idx['train']], batch_size=32, shuffle=True)\nvalid_loader = DataLoader(dataset[split_idx['valid']], batch_size=32, shuffle=False)\ntest_loader = DataLoader(dataset[split_idx['test']], batch_size=32, shuffle=False)\n```\n\n#### (2) Evaluators\nWe also prepare standardized evaluators for easy evaluation and comparison of different methods. The evaluator takes `input_dict` (a dictionary whose format is specified in `evaluator.expected_input_format`) as input, and returns a dictionary storing the performance metric appropriate for the given dataset.\nThe standardized evaluation protocol allows researchers to reliably compare their methods.\n```python\nfrom ogb.graphproppred import Evaluator\n\nevaluator = Evaluator(name = 'ogbg-molhiv')\n# You can learn the input and output format specification of the evaluator as follows.\n# print(evaluator.expected_input_format) \n# print(evaluator.expected_output_format) \ninput_dict = {'y_true': y_true, 'y_pred': y_pred}\nresult_dict = evaluator.eval(input_dict) # E.g., {'rocauc': 0.7321}\n```\n\n## Citing OGB \u002F OGB-LSC\nIf you use OGB or [OGB-LSC](https:\u002F\u002Fogb.stanford.edu\u002Fdocs\u002Flsc\u002F) datasets in your work, please cite our papers (Bibtex below).\n```\n@article{hu2020ogb,\n  title={Open Graph Benchmark: Datasets for Machine Learning on Graphs},\n  author={Hu, Weihua and Fey, Matthias and Zitnik, Marinka and Dong, Yuxiao and Ren, Hongyu and Liu, Bowen and Catasta, Michele and Leskovec, Jure},\n  journal={arXiv preprint arXiv:2005.00687},\n  year={2020}\n}\n```\n```\n@article{hu2021ogblsc,\n  title={OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs},\n  author={Hu, Weihua and Fey, Matthias and Ren, Hongyu and Nakata, Maho and Dong, Yuxiao and Leskovec, Jure},\n  journal={arXiv preprint arXiv:2103.09430},\n  year={2021}\n}\n```\n","\u003Cp align='center'>\n  \u003Cimg width='40%' src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsnap-stanford_ogb_readme_9efe9ec4bd3a.png' \u002F>\n\u003C\u002Fp>\n\n--------------------------------------------------------------------------------\n\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fogb)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fogb\u002F)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-blue.svg)](https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\u002Fblob\u002Fmaster\u002FLICENSE)\n\n## 概述\n\n开放图基准（OGB）是一套用于图机器学习的基准数据集、数据加载器和评估工具。这些数据集涵盖了多种图机器学习任务和现实世界应用。\nOGB 的数据加载器与流行的图深度学习框架完全兼容，包括 [PyTorch Geometric](https:\u002F\u002Fpytorch-geometric.readthedocs.io\u002Fen\u002Flatest\u002F) 和 [Deep Graph Library (DGL)](https:\u002F\u002Fwww.dgl.ai\u002F)。它们提供自动下载数据集、标准化的数据划分以及统一的性能评估功能。\n\n\u003Cp align='center'>\n  \u003Cimg width='80%' src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsnap-stanford_ogb_readme_a868b79d2bae.png' \u002F>\n\u003C\u002Fp>\n\nOGB 的目标是提供覆盖重要图机器学习任务、具有不同规模并涉及丰富领域的图数据集。\n\n**图机器学习任务：** 我们涵盖了三种基本的图机器学习任务：节点级预测、边级预测和图级预测。\n\n**多样化的规模：** 小规模图数据集可以在单个 GPU 上处理，而中大规模图可能需要使用多个 GPU 或巧妙的采样\u002F分区技术。\n\n**丰富的领域：** 图数据集来自不同的领域，从科学领域到社交网络和信息网络，还包括异构知识图谱。\n\n\u003Cp align='center'>\n  \u003Cimg width='70%' src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsnap-stanford_ogb_readme_944273fad650.png' \u002F>\n\u003C\u002Fp>\n\nOGB 是一项持续进行的工作，我们计划在未来进一步扩大其覆盖范围。\n\n## 安装\n您可以使用 Python 的包管理器 `pip` 来安装 OGB。\n**如果您之前已经安装过 ogb，请确保将其更新至 1.3.6 版本。**\n发布说明请参见 [此处](https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\u002Freleases\u002Ftag\u002F1.3.6)。\n\n#### 需求\n - Python>=3.6\n - PyTorch>=1.6\n - DGL>=0.5.0 或 torch-geometric>=2.0.2\n - Numpy>=1.16.0\n - pandas>=0.24.0\n - urllib3>=1.24.0\n - scikit-learn>=0.20.0\n - outdated>=0.2.0\n\n#### 使用 pip 安装\n推荐使用 Python 的包管理器 pip 来安装 OGB：\n```bash\npip install ogb\n```\n\n```bash\npython -c \"import ogb; print(ogb.__version__)\"\n# 这应该打印出 \"1.3.6\"。否则，请通过以下命令更新版本：\npip install -U ogb\n```\n\n\n#### 从源代码安装\n您也可以从源代码安装 OGB。如果您希望为 OGB 做出贡献，建议采用此方法。\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\ncd ogb\npip install -e .\n```\n\n## 软件包使用\n我们重点介绍 OGB 的两个关键特性，即 (1) 易于使用的数据加载器，以及 (2) 标准化的评估工具。\n#### (1) 数据加载器\n我们提供了易于使用的 PyTorch Geometric 和 DGL 数据加载器。它们负责数据集的下载以及标准化的数据划分。\n以下以 PyTorch Geometric 为例，只需几行代码即可完成数据集的准备和划分！当然，DGL 也同样方便！\n```python\nfrom ogb.graphproppred import PygGraphPropPredDataset\nfrom torch_geometric.loader import DataLoader\n\n# 下载并处理数据至 '.\u002Fdataset\u002Fogbg_molhiv\u002F'\ndataset = PygGraphPropPredDataset(name = 'ogbg-molhiv')\n\nsplit_idx = dataset.get_idx_split() \ntrain_loader = DataLoader(dataset[split_idx['train']], batch_size=32, shuffle=True)\nvalid_loader = DataLoader(dataset[split_idx['valid']], batch_size=32, shuffle=False)\ntest_loader = DataLoader(dataset[split_idx['test']], batch_size=32, shuffle=False)\n```\n\n#### (2) 评估工具\n我们还提供了标准化的评估工具，便于对不同方法进行评估和比较。评估工具接受一个 `input_dict`（其格式由 `evaluator.expected_input_format` 指定）作为输入，并返回一个包含针对特定数据集的性能指标的字典。\n这种标准化的评估协议使研究人员能够可靠地比较各自的方法。\n```python\nfrom ogb.graphproppred import Evaluator\n\nevaluator = Evaluator(name = 'ogbg-molhiv')\n# 您可以通过以下方式了解评估工具的输入和输出格式规范：\n# print(evaluator.expected_input_format) \n# print(evaluator.expected_output_format) \ninput_dict = {'y_true': y_true, 'y_pred': y_pred}\nresult_dict = evaluator.eval(input_dict) # 例如，{'rocauc': 0.7321}\n```\n\n## 引用 OGB \u002F OGB-LSC\n如果您在工作中使用了 OGB 或 [OGB-LSC](https:\u002F\u002Fogb.stanford.edu\u002Fdocs\u002Flsc\u002F) 的数据集，请引用我们的论文（BibTeX 如下）。\n```\n@article{hu2020ogb,\n  title={Open Graph Benchmark: Datasets for Machine Learning on Graphs},\n  author={Hu, Weihua and Fey, Matthias and Zitnik, Marinka and Dong, Yuxiao and Ren, Hongyu and Liu, Bowen and Catasta, Michele and Leskovec, Jure},\n  journal={arXiv preprint arXiv:2005.00687},\n  year={2020}\n}\n```\n```\n@article{hu2021ogblsc,\n  title={OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs},\n  author={Hu, Weihua and Fey, Matthias and Ren, Hongyu and Nakata, Maho and Dong, Yuxiao and Leskovec, Jure},\n  journal={arXiv preprint arXiv:2103.09430},\n  year={2021}\n}\n```","# OGB 快速上手指南\n\nOpen Graph Benchmark (OGB) 是一套用于图机器学习的基准数据集、数据加载器和评估器集合。它涵盖了节点、链接和图级别的预测任务，并完美兼容 PyTorch Geometric (PyG) 和 Deep Graph Library (DGL) 框架，提供自动下载、标准化划分及统一性能评估。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux, macOS, Windows\n*   **Python**: >= 3.6\n*   **深度学习框架**:\n    *   PyTorch >= 1.6\n    *   以及以下任一图神经网络库：\n        *   DGL >= 0.5.0\n        *   或 torch-geometric >= 2.0.2\n*   **其他依赖**:\n    *   Numpy >= 1.16.0\n    *   pandas >= 0.24.0\n    *   scikit-learn >= 0.20.0\n    *   urllib3 >= 1.24.0\n    *   outdated >= 0.2.0\n\n> **提示**：国内用户建议使用清华源或阿里源加速 Python 包的安装。\n\n## 安装步骤\n\n推荐使用 `pip` 进行安装。如果您之前安装过 OGB，请务必更新至最新版本 (1.3.6)。\n\n### 1. 使用 pip 安装（推荐）\n\n```bash\npip install ogb -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n安装完成后，验证版本是否正确：\n\n```bash\npython -c \"import ogb; print(ogb.__version__)\"\n# 应输出 \"1.3.6\"。如果不是，请运行以下命令升级：\n# pip install -U ogb\n```\n\n### 2. 从源码安装（可选）\n\n如果您需要贡献代码或使用最新开发版，可从 GitHub 克隆源码安装：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\ncd ogb\npip install -e .\n```\n\n## 基本使用\n\nOGB 的核心功能包括**易用的数据加载器**和**标准化的评估器**。以下以 PyTorch Geometric 为例，展示如何加载 `ogbg-molhiv` 数据集并进行评估。\n\n### 1. 数据加载与划分\n\nOGB 会自动处理数据集的下载、预处理以及训练\u002F验证\u002F测试集的标准化划分。\n\n```python\nfrom ogb.graphproppred import PygGraphPropPredDataset\nfrom torch_geometric.loader import DataLoader\n\n# 自动下载并处理数据到 '.\u002Fdataset\u002Fogbg_molhiv\u002F'\ndataset = PygGraphPropPredDataset(name='ogbg-molhiv')\n\n# 获取标准化的数据划分索引\nsplit_idx = dataset.get_idx_split() \n\n# 创建数据加载器\ntrain_loader = DataLoader(dataset[split_idx['train']], batch_size=32, shuffle=True)\nvalid_loader = DataLoader(dataset[split_idx['valid']], batch_size=32, shuffle=False)\ntest_loader = DataLoader(dataset[split_idx['test']], batch_size=32, shuffle=False)\n```\n\n### 2. 模型评估\n\n使用 OGB 提供的标准化评估器，只需传入预测结果即可得到符合该数据集标准的性能指标（如 ROC-AUC）。\n\n```python\nfrom ogb.graphproppred import Evaluator\n\n# 初始化评估器\nevaluator = Evaluator(name='ogbg-molhiv')\n\n# 查看输入\u002F输出格式规范（可选）\n# print(evaluator.expected_input_format) \n# print(evaluator.expected_output_format) \n\n# 准备数据：y_true 为真实标签，y_pred 为模型预测值\ninput_dict = {'y_true': y_true, 'y_pred': y_pred}\n\n# 执行评估，返回包含指标字典（例如 {'rocauc': 0.7321}）\nresult_dict = evaluator.eval(input_dict)\n```","某生物制药公司的算法团队正致力于利用图神经网络预测新合成分子的抗病毒活性，以加速药物筛选流程。\n\n### 没有 ogb 时\n- **数据获取繁琐**：研究人员需手动从不同源头下载分子数据集，编写复杂的解析脚本清洗数据，耗时数天且容易出错。\n- **评估标准混乱**：团队成员各自定义训练集、验证集和测试集的划分比例，导致模型结果无法横向对比，复现论文性能极其困难。\n- **框架适配成本高**：将原始数据转换为 PyTorch Geometric 或 DGL 所需的格式需要大量样板代码，分散了优化模型架构的精力。\n- **指标计算不一**：缺乏统一的评估器，不同成员使用的评价指标（如 ROC-AUC 计算方式）存在细微差异，误导了模型选型决策。\n\n### 使用 ogb 后\n- **一键加载数据**：仅需几行代码即可自动下载并预处理标准的 `ogbg-molhiv` 分子数据集，将数据准备时间从几天缩短至几分钟。\n- **标准化数据划分**：ogb 提供官方固定的数据集分割索引，确保所有实验在相同的数据分布下进行，结果具备公平的可比性。\n- **无缝框架集成**：内置的数据加载器直接兼容主流图学习框架，自动处理数据格式转换，让团队能专注于核心算法创新。\n- **统一性能评估**：调用内置的标准评估器即可得出权威指标，消除了人为计算误差，快速锁定最优模型方案。\n\nogb 通过提供标准化的数据基准与评估体系，彻底消除了图机器学习研发中的“重复造轮子”现象，让科研人员能全心聚焦于算法突破。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsnap-stanford_ogb_9efe9ec4.png","snap-stanford","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fsnap-stanford_0df96ae4.png","",null,"https:\u002F\u002Fgithub.com\u002Fsnap-stanford",[78],{"name":79,"color":80,"percentage":81},"Python","#3572A5",100,2080,407,"2026-04-06T10:54:03","MIT","未说明（文档仅提及小数据集可在单 GPU 处理，中大规模可能需要多 GPU，无具体型号或显存要求）","未说明",{"notes":89,"python":90,"dependencies":91},"建议安装版本更新至 1.3.6。该工具提供图机器学习的数据集加载器和评估器，兼容 PyTorch Geometric 和 DGL 框架。首次运行会自动下载数据集，中小规模数据集可单卡运行，大规模数据集可能需要多 GPU 或采样\u002F分区技术。",">=3.6",[92,93,94,95,96,97,98],"PyTorch>=1.6","DGL>=0.5.0 或 torch-geometric>=2.0.2","Numpy>=1.16.0","pandas>=0.24.0","urllib3>=1.24.0","scikit-learn>=0.20.0","outdated>=0.2.0",[16,14],[101,102,103,104],"graph-machine-learning","graph-neural-networks","deep-learning","datasets","2026-03-27T02:49:30.150509","2026-04-08T09:23:08.834479",[108,113,118,123,128,133],{"id":109,"question_zh":110,"answer_zh":111,"source_url":112},23956,"导入 ogb.graphproppred 时卡住或报错怎么办？","这是一个常见的依赖加载问题。解决方法是在导入 ogb 之前先显式导入 sklearn。请在代码最开头添加 `import sklearn`。这是因为 `ogb.graphproppred.evaluate.py` 内部调用了 `sklearn.metrics`，预先导入可以解决无限等待或导入错误的问题。","https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\u002Fissues\u002F289",{"id":114,"question_zh":115,"answer_zh":116,"source_url":117},23957,"在链接预测任务（如 ogbl-collab）中，GNN 模型是否应该使用完整的邻接矩阵进行训练？","是的，这是预期的行为。对于 `ogbl-collab` 等数据集，数据是按时间（年份）划分的。训练图中的边包含直到某一年为止的所有历史边，而验证集和测试集则是预测未来出现的边。因此，模型在训练时看到完整的历史邻接矩阵（包括那些后来出现在验证\u002F测试集中的边的源和目标节点）是合理的，这符合时间序列预测的设置。","https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\u002Fissues\u002F72",{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},23958,"PCQM4Mv2 数据集中的 SDF 文件是否包含氢原子坐标信息？是否需要额外下载 xyz 文件？","不需要额外下载 xyz 文件。SDF 文件本身已经包含了分子所需的所有信息，包括每个原子的 xyz 坐标（含氢原子）。你可以直接使用 SDF 文件进行推理或处理。如果需要使用 xyz 格式，可以通过工具（如 Open Babel）将 SDF 转换为 xyz：`obabel -isdf input.sdf -oxyz output.xyz`。","https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\u002Fissues\u002F336",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},23959,"OGB 排行榜提交后多久能看到结果？如果一周还没显示正常吗？","提交后结果不会立即显示。根据社区反馈，提交后可能需要数天甚至更长时间才能更新到排行榜上。如果提交超过一周仍未显示，建议在 GitHub Issue 上礼貌地提醒维护者（如 @weihua916）进行检查，因为有时可能需要人工触发评估或存在队列延迟。","https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\u002Fissues\u002F488",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},23960,"在标签传播（Label Propagation）等无参数方法中，允许使用验证集标签进行传播吗？","根据 OGB 的最新规则，为了保持基准测试的公平性和一致性，通常禁止在训练过程中直接使用验证集标签，即使对于没有可学习参数的方法（如标签传播）。验证集的主要用途是用于超参数调整和早停。虽然这对于无参数方法可能显得严格，但 OGB 旨在模拟真实的泛化场景，并计划未来引入隐藏测试集以进一步防止过拟合验证集。","https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\u002Fissues\u002F73",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},23961,"ogbl-biokg 数据集中的测试边是如何构建负样本的？","对于每个测试三元组（头实体，关系，尾实体），通过随机替换头实体或尾实体来生成负样本。具体来说，会随机采样 1,000 个负实体（其中 500 个用于替换头实体，500 个用于替换尾实体），并确保生成的新三元组不在知识图谱中已存在。由于这是一个异构图（Heterogeneous Graph），采样时会考虑实体类型的约束。","https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\u002Fissues\u002F92",[139,144,149,154,159,164,169,174,179,184,189,194,199,204,209,214],{"id":140,"version":141,"summary_zh":142,"released_at":143},145502,"1.3.6","请参阅 https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\u002Fpull\u002F420","2023-04-07T06:00:20",{"id":145,"version":146,"summary_zh":147,"released_at":148},145503,"1.3.5","请参阅 https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\u002Fpull\u002F368","2022-11-02T22:06:17",{"id":150,"version":151,"summary_zh":152,"released_at":153},145504,"1.3.4","本次发布引入了以下两项内容：\n- `ogbl-vessel` 数据集（详情请参见 [这里](https:\u002F\u002Fogb.stanford.edu\u002Fdocs\u002Flinkprop\u002F#ogbl-vessel)）@jqmcginnis\n- 链接预测的排名计算优化 https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\u002Fpull\u002F357 @mberr","2022-08-20T11:06:28",{"id":155,"version":156,"summary_zh":157,"released_at":158},145505,"1.3.2","我们已包含两项更新：\n\n- WikiKG90M → WikiKG90Mv2\n- PCQM4M → PCQM4Mv2","2021-09-29T04:42:13",{"id":160,"version":161,"summary_zh":162,"released_at":163},145506,"1.3.1","感谢 DGL 团队，所有 LSC 数据现已托管在 AWS 上。这显著提升了全球范围内的下载速度！底层数据保持完全不变。","2021-04-07T08:38:42",{"id":165,"version":166,"summary_zh":167,"released_at":168},145507,"1.3.0","本次发布包含2021年KDD Cup OGB-LSC赛道的三个大规模数据集。有关这些数据集及KDD Cup的详细信息，请参阅[此处](https:\u002F\u002Fogb.stanford.edu\u002Fkddcup2021\u002F)。","2021-03-15T04:50:05",{"id":170,"version":171,"summary_zh":172,"released_at":173},145508,"1.2.6","当前数据集下载使用的是 http 协议，而非 https。","2021-03-01T05:00:30",{"id":175,"version":176,"summary_zh":177,"released_at":178},145509,"1.2.5","本版本在 `ogbg-code` 数据集中引入了一项重大改动。\n\n- `ogbg-code` 因输入 AST 中存在预测目标（即方法名）泄露而被弃用。\n- 为此引入了 `ogbg-code2` 数据集，该数据集通过将方法名及其在 AST 中的递归定义替换为特殊标记 `_mask_`，有效解决了这一问题。\n\n我们衷心感谢 Charles Sutton (@casutton) 发现了我们数据集中存在的数据泄露问题。","2021-02-24T21:12:21",{"id":180,"version":181,"summary_zh":182,"released_at":183},145510,"1.2.4","本次发布修复了 `ogbl-wikikg` 和 `ogbl-citation` 数据集中负样本的相关 bug，并推出了它们的新版本：`ogbl-wikikg2` 和 `ogbl-citation2`。旧版本已被弃用。","2020-12-29T18:13:08",{"id":185,"version":186,"summary_zh":187,"released_at":188},145511,"1.2.3","本次发布从以下几个方面增强了 OGB 包的功能：\n\n- 通过使用压缩的二进制文件，使 `ogbn-papers100M` 数据加载更加便捷：https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\u002Fissues\u002F46\n- 为外部贡献者引入了 [DatasetSaver](https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\u002Fblob\u002Fmaster\u002Fogb\u002Fio\u002FREADME.md) 模块：https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\u002Fissues\u002F1\n- 使数据集对象与 DGL v0.5 兼容（但异构图数据集不向后兼容）。","2020-09-12T19:08:09",{"id":190,"version":191,"summary_zh":192,"released_at":193},145512,"1.2.2","This release is mainly for **changing the evaluation metric of `ogbg-molpcba` from PRC-AUC to Average Precision (AP)**. AP is shown to be more appropriate to summarize the non-convex nature of the Precision Recall Curve [1]. The leaderboard and our paper have been updated accordingly.\r\n\r\nWe also fix an issue and add a feature:\r\n- Fixed an issue for saving a large library-agnostic data object. https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002Fogb\u002Fissues\u002F48\r\n- Added automatic version check feature so that users will get notified when the package version is outdated.\r\n\r\n\r\n\r\n[1] Jesse Davis and Mark Goadrich. The relationship between precision-recall and roc curves. InInternational Conference on Machine Learning (ICML), pp. 233–240, 2006.","2020-08-12T06:36:24",{"id":195,"version":196,"summary_zh":197,"released_at":198},145513,"1.2.1","This release fixes bugs in a dataset, evaluator, and data loader.\r\n\r\n- Duplicated edges in `ogbn-mag` are removed. The updated dataset will be downloaded and processed automatically as you run your script for `ogbn-mag`. #40 \r\n- Evaluators for `ogbl-collab` and `ogbl-ddi` are updated. Specifically, `ogbl-collab` now uses Hits@50, and `ogbl-ddi` now uses Hits@20.\r\n- DGL data loader bug for `ogbn-mag` and `ogbl-biokg` is fixed. #36 ","2020-06-27T03:24:28",{"id":200,"version":201,"summary_zh":202,"released_at":203},145514,"1.2.0","This is the second major release of OGB, in which we have curated many more exciting graph datasets, including heterogeneous graphs and a web-scale gigantic graph (100+ million nodes, 1+ billion edges).\r\n\r\nFirst, we note that there is **no change** in the datasets released in version `1.1.1`. Therefore, any experimental results obtained using `1.1.1` on the existing datasets are compatible to version `1.2.0`.\r\n\r\nIn this new release, we have additionally released **5 new datasets** listed below.\r\n- `ogbn-papers100M`: Web-scale gigantic paper citation network.\r\n- `ogbn-mag`: Heterogeneous academic graph.\r\n- `ogbl-biokg`: Heterogeneous biomedical knowledge graph.\r\n- `ogbl-ddi`: Drug-drug interaction network.\r\n- `ogbg-code`: Source code Abstract Syntax Trees.","2020-06-11T00:16:58",{"id":205,"version":206,"summary_zh":207,"released_at":208},145515,"1.1.1","OGB package can now automatically fetch the datasets if they have been updated.","2020-05-05T00:53:18",{"id":210,"version":211,"summary_zh":212,"released_at":213},145516,"1.1.0","## First Major Release\r\n\r\nThis is the first major release of OGB.\r\nA number of changes have been made to the datasets, which are summarized below.\r\n\r\n1. Re-indexed all the nodes in the node\u002Flink datasets (The graphs remain essentially the same).\r\n2. In dataset folders for all the datasets, added `mapping\u002F` directory that contains information to map node\u002Fedge\u002Fgraph\u002Flabel indices to real-world entities (e.g., mapping from nodes in PPA to unique protein identifiers, mapping from molecular graphs into the SMILES strings.)\r\n3. Deleted the `ogbn-proteins` node features, and put them in the species variable.\r\n4. Deleted `ogbl-reviews` datasets.\r\n5. Added 4 datasets: `ogbn-arxiv`, `ogbl-citation`, `ogbl-collab`, `ogbl-wikikg`.\r\n6. Renamed `ogbg-ppi` to `ogbg-ppa`.\r\n7. Renamed `ogbg-mol-hiv` and `ogbg-mol-pcba` to `ogbg-molhiv` and `ogbg-molpcba`, respectively.\r\n8. Changed the evaluation metric of imbalanced molecule dataset (e.g., pcba) from ROC-AUC to PRC-AUC. \r\n9. Changed the `get_split_edge()` interface in `LinkPropPredDataset`. The downloaded dataset files are also changed accordingly.\r\n10. Added `num_classes` attribute for multi-class classification datasets.","2020-05-01T22:14:06",{"id":215,"version":216,"summary_zh":217,"released_at":218},145517,"1.0.1","## Minor Changes\r\n\r\nOGB datasets can now be imported more conveniently, *e.g.*:\r\n```python\r\nfrom ogb.graphproppred import GraphPropPredDataset\r\nfrom ogb.graphproppred import PygGraphPropPredDataset\r\nfrom ogb.graphproppred import DglGraphPropPredDataset\r\n```\r\nNote that this will throw an `ImportError` if OGB can not find installations of Pyg or DGL, respectively.","2020-03-23T09:12:43"]