[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-princeton-nlp--ALCE":3,"tool-princeton-nlp--ALCE":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",143909,2,"2026-04-07T11:33:18",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":77,"owner_url":78,"languages":79,"stars":88,"forks":89,"last_commit_at":90,"license":91,"difficulty_score":92,"env_os":93,"env_gpu":94,"env_ram":95,"env_deps":96,"category_tags":106,"github_topics":76,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":108,"updated_at":109,"faqs":110,"releases":141},5137,"princeton-nlp\u002FALCE","ALCE","[EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14627","ALCE 是一个专为评估大语言模型（LLM）生成带引用文本能力而设计的基准测试工具。它的核心目标是解决当前大模型在生成内容时容易“胡编乱造”（幻觉）且缺乏可靠来源佐证的问题，通过自动化手段衡量模型输出在流畅度、事实正确性以及引用质量三个维度的表现。\n\n该工具内置了 ASQA、QAMPARI 和 ELI5 三个专业数据集，并提供了完整的自动评估代码及复现论文基线模型所需的环境。其技术亮点在于不仅支持常规的文本生成评估，还集成了复杂的检索流程（如 BM25 和 GTR 检索），能够模拟从海量语料中查找证据并生成精准引用的全过程，甚至提供了经过重排序的“理想检索”结果以供对比研究。\n\nALCE 主要面向人工智能研究人员、大模型开发者以及从事自然语言处理（NLP）领域的工程师。如果你正在致力于提升大模型的可信度、开发基于检索增强生成（RAG）的应用，或者需要一套严谨的标准来量化模型“言之有据”的能力，ALCE 将是一个非常实用的开源资源。它帮助社区从单纯关注“模型说了什么”，转向更深层地关注“模型的话是否有据可查”。","# Enabling Large Language Models to Generate Text with Citations\n\n\u003Cp align=\"center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprinceton-nlp_ALCE_readme_b134cdd0d02c.png\" alt=\"ALCE\" width=\"15%\">\u003Cbr>*: ALCE is pronounced as \u002Felk\u002F as ALCE is the Latin word for elk (Europe) or moose (North America).\n\u003C\u002Fp>\n\n\n\nThis repository contains the code and data for paper [Enabling Large Language Models to Generate Text with Citations](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14627). \nIn this paper, we propose ALCE, a benchmark for **A**utomatic **L**LMs' **C**itation Evaluation. \nALCE contains three datasets: ASQA, QAMPARI, and ELI5.\nWe provide automatic evaluation code of LLM generations around three dimensions: fluency, correctness, and citation quality. \nThis repository also includes code to reproduce the baselines in our paper.\n\n\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprinceton-nlp_ALCE_readme_75c35e0c592b.png\" alt=\"ALCE\" width=\"100%\">\n\n\n\n\n## Quick Links\n\n  - [Requirements](#requirements)\n  - [Data](#data)\n  - [Code Structure](#code-structure)\n  - [Reproducing Baselines](#reproducing-baselines)\n  - [Evaluation](#evaluation)\n  - [Human Evaluation](#human-evaluation)\n  - [Bug or Questions](#bug-or-questions)\n  - [Citation](#citation)\n\n\n## Requirements\n\nPlease install the latest versions of PyTorch (`torch`), HuggingFace Transformers (`transformers`), HuggingFace Accelerate (`accelerate`), and the OpenAI API package (`openai`). This codebase is tested on \n`torch==2.1.0.dev20230514+cu118`, `transformers==4.28.1`, `accelerate==0.17.1`, and `openai==0.27.4` with Python 3.9.7.\n\n## Data\n\nYou can download datasets (along with retrieval results) by running the following command:\n\n```bash\nbash download_data.sh\n```\n\nAll the data will be stored in `data\u002F`. Our data included top-100 DPR\u002FGTR retrieved results for ASQA and QAMPARI, and top-100 BM25 retrieved results for ELI5. We also provide reranked oracle retrieval results, where top-5 passages can achieve the same recall as the original top-100 recall.\n\n### Retrieval\n\nYou can reproduce the passage retrieval step with the following command:\n```bash\npython retrieval.py --data {path\u002Fto\u002Fdata} --retriever {bm25\u002Fgtr} --output_file {path\u002Fto\u002Foutput}\n```\n\nThere are additional packages required for the retrieval steps.\nSpecifically, you need to install `pyserini==0.21.0`(their github [repo](https:\u002F\u002Fgithub.com\u002Fcastorini\u002Fpyserini\u002Ftree\u002Fmaster) is helpful) and `sentence-transformers==2.2.2`.\n\nFor the BM25 retrieval over Common Crawl using Sphere, you must first download the index from the Sphere [repo](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FSphere), and set the environmental variable `BM25_SPHERE_PATH` to the path of the downloaded index.\nSpecifically, you can use the following command:\n```bash\nwget -P faiss_index https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fsphere\u002Fsphere_sparse_index.tar.gz\ntar -xzvf faiss_index\u002Fsphere_sparse_index.tar.gz -C faiss_index\nexport BM25_SPHERE_PATH=$PWD\u002Ffaiss_index\n```\nIt's important to note that given the large size of the corpus, this step is extremely expensive and time-consuming. We found that larger CPU memory tends to help with the speed. \n\nFor GTR, we first build an index using the DPR wikipedia snapshot, which you can obtain using the download script from the DPR [repo](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FDPR), and then setting the environmental variable `DPR_WIKI_TSV` to the path of the tsv file.\nSpecifically, you can use the following command:\n```bash\nwget https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fdpr\u002Fwikipedia_split\u002Fpsgs_w100.tsv.gz\ngzip -xzvf psgs_w100.tsv.gz\nexport DPR_WIKI_TSV=$PWD\u002Fpsgs_w100.tsv\n```\nThen, you want to set `GTR_EMB` to the path of the GTR embeddings of the Wikipedia corpus, and running the retrieval script for the first time will automatically build and save the index.\nBuilding the dense index can be expensive for GPU memory (we use 80GB GPUs for this) and time-consuming; the entire index will take about 31GB.\nIf you find this step to be too expensive, you can also download it using:\n```bash\nwget https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fprinceton-nlp\u002Fgtr-t5-xxl-wikipedia-psgs_w100-index\u002Fresolve\u002Fmain\u002Fgtr_wikipedia_index.pkl\nexport GTR_EMB=$PWD\u002Fgtr_wikipedia_index.pkl\n```\n\nTo reproduce the DPR retrieval, we refer the DPR [repo](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FDPR), which we used the original DPR checkpoint trained on NQ.\n\n## Code Structure\n\n* `run.py`: run file to reproduce our baseline generations.\n* `eval.py`: eval file to evaluate generations.\n* `prompts`: folder that contains all prompt files.\n* `configs\u002F`: folder that contains all config files to reproduce baselines.\n* `tools\u002F`: misc code (generate summaries\u002Fsnippets, reranking, etc.)\n\n\n## Reproducing Baselines\n\n\nYou can reproduce baselines from our paper by \n\n```bash\npython run.py --config configs\u002F{config_name}\n```\n\nYou can also overwrite any arguments in the config file or add new arguments simply through command line:\n```\npython run.py --config configs\u002F{config_name} --seed 43 --model vicuna-13b\n```\n\nThe naming of config files follow the rule of `{LLM}_{#demos and #passages}_{retriever}_{method}.yaml`. Method names include:\n* `default` corresponds to the **Vanilla** model in our paper.\n* `summary` corresponds to the **Summary** model.\n* `extraction` corresponds to the **Snippet** model. \n* `interact_doc_id` corresponds to the **Interact** model.\n* `interact_search` corresponds to the **InlineSearch** model.\n* `closedbook` corresponds to the **ClosedBook** model.\n\nOur code support both OpenAI API and offline HuggingFace models:\n\n* For OpenAI models (for example, ChatGPT), you need to set the environment variable `OPENAI_API_KEY` and `OPENAI_ORG_ID`. If you are using the Azure OpenAI API, you need to set the environment variable of `OPENAI_API_KEY` and `OPENAI_API_BASE`. You also need to add the flag `--azure`. \n    * Note that in Azure OpenAI API, ChatGPT's name is different and you should set it by `--model gpt-35-turbo`. \n* For the open-source models, you should set the model name equal to the input of HuggingFace models' `.from_pretrained` method. This could either be a local directory (e.g. for the older LLaMA models) or a path to the HuggingFace hub. \n\nFor detailed argument usage, please refer to `run.py`.\n\nModel output along with gold answers and run configs will be stored in a json file in `result\u002F`.\n\n\n### Post-hoc citation\n\nFor closed-book models, one can use `post_hoc_cite.py` to add citations in a post-hoc manner (using GTR-large). To run post-hoc citation, execute\n```bash\npython post_hoc_cite.py --f result\u002F{RESULT JSON FILE NAME} --external_docs data\u002F{CORRESPONDING DATA}\n```\n\nThe output file with post-hoc citations will be stored in `result\u002F`, with a suffix `post_hoc_cite.gtr-t5-large-external`.\n\n## Evaluation\n\nACLE evaluation is implemented in `eval.py`. \n\nFor ASQA, use the following command\n```bash\npython eval.py --f {path\u002Fto\u002Fresult\u002Ffile} --citations --qa --mauve\n```\n\nFor QAMPARI, use the following command\n```bash\npython eval.py --f {path\u002Fto\u002Fresult\u002Ffile} --citations\n```\n\nFor ELI5, use the following command\n```bash\npython eval.py --f {path\u002Fto\u002Fresult\u002Ffile} --citations --claims_nli --mauve\n```\n\nThe evaluation result will be saved in `result\u002F`, with the same name as the input and a suffix `.score`.\n\n## Human Evaluation\n\nThe results from our human evaluation (Section 6) are located under the directory [`human_eval`](human_eval). \nBoth the data and the analysis are available, please refer to the directory for details. \n\n## Bug or Questions?\n\nIf you have any questions related to the code or the paper, feel free to email Tianyu (`tianyug@cs.princeton.edu`). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!\n\n\n\n## Citation\n\nPlease cite our paper if you use ALCE in your work:\n\n```bibtex\n@inproceedings{gao2023enabling,\n   title={Enabling Large Language Models to Generate Text with Citations},\n   author={Gao, Tianyu and Yen, Howard and Yu, Jiatong and Chen, Danqi},\n   year={2023},\n   booktitle={Empirical Methods in Natural Language Processing (EMNLP)},\n}\n```\n","# 使大型语言模型能够生成带引用的文本\n\n\u003Cp align=\"center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprinceton-nlp_ALCE_readme_b134cdd0d02c.png\" alt=\"ALCE\" width=\"15%\">\u003Cbr>*: ALCE 的发音为 \u002Felk\u002F，因为 ALCE 是拉丁语中“麋鹿”（欧洲）或“驼鹿”（北美）的意思。\n\u003C\u002Fp>\n\n\n\n此仓库包含论文《使大型语言模型能够生成带引用的文本》（[arXiv:2305.14627](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14627)）的代码和数据。在该论文中，我们提出了 ALCE，一个用于**自**动评估 **大**型语言模型 **引**用的基准测试集。ALCE 包含三个数据集：ASQA、QAMPARI 和 ELI5。\n我们提供了针对大型语言模型生成内容的自动评估代码，涵盖流畅性、正确性和引用质量三个维度。此外，本仓库还包含用于复现我们论文中基线模型的代码。\n\n\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprinceton-nlp_ALCE_readme_75c35e0c592b.png\" alt=\"ALCE\" width=\"100%\">\n\n\n\n\n## 快速链接\n\n  - [要求](#requirements)\n  - [数据](#data)\n  - [代码结构](#code-structure)\n  - [复现基线](#reproducing-baselines)\n  - [评估](#evaluation)\n  - [人工评估](#human-evaluation)\n  - [问题或疑问](#bug-or-questions)\n  - [引用](#citation)\n\n\n## 要求\n\n请安装最新版本的 PyTorch (`torch`)、HuggingFace Transformers (`transformers`)、HuggingFace Accelerate (`accelerate`) 以及 OpenAI API 包 (`openai`)。本代码库已在以下环境中测试通过：`torch==2.1.0.dev20230514+cu118`、`transformers==4.28.1`、`accelerate==0.17.1` 和 `openai==0.27.4`，Python 版本为 3.9.7。\n\n## 数据\n\n您可以通过运行以下命令下载数据集（连同检索结果）：\n\n```bash\nbash download_data.sh\n```\n\n所有数据将存储在 `data\u002F` 目录下。我们的数据包括 ASQA 和 QAMPARI 的 DPR\u002FGTR 检索结果 top-100，以及 ELI5 的 BM25 检索结果 top-100。我们还提供了重新排序后的最优检索结果，其中 top-5 文档即可达到与原始 top-100 相同的召回率。\n\n### 检索\n\n您可以通过以下命令复现段落检索步骤：\n```bash\npython retrieval.py --data {path\u002Fto\u002Fdata} --retriever {bm25\u002Fgtr} --output_file {path\u002Fto\u002Foutput}\n```\n\n检索步骤还需要一些额外的依赖包。具体来说，您需要安装 `pyserini==0.21.0`（其 GitHub 仓库 [repo](https:\u002F\u002Fgithub.com\u002Fcastorini\u002Fpyserini\u002Ftree\u002Fmaster) 非常有帮助）和 `sentence-transformers==2.2.2`。\n\n对于使用 Sphere 在 Common Crawl 上进行的 BM25 检索，您必须先从 Sphere 的 GitHub 仓库 [repo](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FSphere) 下载索引，并将环境变量 `BM25_SPHERE_PATH` 设置为已下载索引的路径。具体操作如下：\n```bash\nwget -P faiss_index https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fsphere\u002Fsphere_sparse_index.tar.gz\ntar -xzvf faiss_index\u002Fsphere_sparse_index.tar.gz -C faiss_index\nexport BM25_SPHERE_PATH=$PWD\u002Ffaiss_index\n```\n需要注意的是，由于语料库规模庞大，这一步骤非常耗时且成本高昂。我们发现更大的 CPU 内存有助于提高速度。\n\n对于 GTR 检索，我们首先使用 DPR 的 Wikipedia 快照构建索引，您可以从 DPR 的 GitHub 仓库 [repo](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FDPR) 下载脚本获取该快照，然后将环境变量 `DPR_WIKI_TSV` 设置为 tsv 文件的路径。具体操作如下：\n```bash\nwget https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fdpr\u002Fwikipedia_split\u002Fpsgs_w100.tsv.gz\ngzip -xzvf psgs_w100.tsv.gz\nexport DPR_WIKI_TSV=$PWD\u002Fpsgs_w100.tsv\n```\n接下来，您需要将 `GTR_EMB` 设置为 Wikipedia 语料库的 GTR 嵌入向量路径，首次运行检索脚本时会自动构建并保存索引。构建密集索引对 GPU 内存的要求较高（我们使用 80GB 的 GPU），并且耗时较长；整个索引大约占用 31GB 的空间。如果您觉得这一步过于昂贵，也可以直接下载：\n```bash\nwget https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fprinceton-nlp\u002Fgtr-t5-xxl-wikipedia-psgs_w100-index\u002Fresolve\u002Fmain\u002Fgtr_wikipedia_index.pkl\nexport GTR_EMB=$PWD\u002Fgtr_wikipedia_index.pkl\n```\n\n为了复现 DPR 检索，我们参考了 DPR 的 GitHub 仓库 [repo](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FDPR)，并使用了其在 NQ 数据集上训练的原始 DPR 检查点。\n\n## 代码结构\n\n* `run.py`: 用于复现我们基线生成内容的主程序文件。\n* `eval.py`: 用于评估生成内容的评估文件。\n* `prompts`: 包含所有提示模板文件的文件夹。\n* `configs\u002F`: 包含所有用于复现基线配置文件的文件夹。\n* `tools\u002F`: 其他辅助代码（生成摘要\u002F片段、重新排序等）\n\n\n## 复现基线\n\n\n您可以通过以下命令复现我们论文中的基线模型：\n\n```bash\npython run.py --config configs\u002F{config_name}\n```\n\n您还可以通过命令行覆盖配置文件中的任何参数，或添加新的参数：\n```bash\npython run.py --config configs\u002F{config_name} --seed 43 --model vicuna-13b\n```\n\n配置文件的命名遵循 `{LLM}_{#demos and #passages}_{retriever}_{method}.yaml` 的规则。方法名称包括：\n* `default` 对应于我们论文中的**Vanilla**模型。\n* `summary` 对应于**Summary**模型。\n* `extraction` 对应于**Snippet**模型。\n* `interact_doc_id` 对应于**Interact**模型。\n* `interact_search` 对应于**InlineSearch**模型。\n* `closedbook` 对应于**ClosedBook**模型。\n\n我们的代码同时支持 OpenAI API 和离线的 HuggingFace 模型：\n\n* 对于 OpenAI 模型（例如 ChatGPT），您需要设置环境变量 `OPENAI_API_KEY` 和 `OPENAI_ORG_ID`。如果您使用的是 Azure OpenAI API，则需要设置 `OPENAI_API_KEY` 和 `OPENAI_API_BASE` 环境变量，并添加 `--azure` 标志。\n    * 请注意，在 Azure OpenAI API 中，ChatGPT 的名称有所不同，您应将其设置为 `--model gpt-35-turbo`。\n* 对于开源模型，您应将模型名称设置为 HuggingFace 模型 `.from_pretrained` 方法的输入值。这可以是本地目录（例如较旧的 LLaMA 模型）或指向 HuggingFace Hub 的路径。\n\n有关详细参数用法，请参阅 `run.py`。\n\n模型输出、黄金答案和运行配置将被保存在 `result\u002F` 目录下的 JSON 文件中。\n\n\n### 后置引用\n\n对于闭卷模型，可以使用 `post_hoc_cite.py` 以事后方式添加引用（使用 GTR-large）。要运行后置引用功能，请执行：\n```bash\npython post_hoc_cite.py --f result\u002F{RESULT JSON FILE NAME} --external_docs data\u002F{CORRESPONDING DATA}\n```\n\n带有后置引用的输出文件将存储在 `result\u002F` 目录下，文件名后缀为 `post_hoc_cite.gtr-t5-large-external`。\n\n## 评估\n\nACLE 的评估实现于 `eval.py` 文件中。\n\n对于 ASQA 数据集，使用以下命令：\n```bash\npython eval.py --f {结果文件路径} --citations --qa --mauve\n```\n\n对于 QAMPARI 数据集，使用以下命令：\n```bash\npython eval.py --f {结果文件路径} --citations\n```\n\n对于 ELI5 数据集，使用以下命令：\n```bash\npython eval.py --f {结果文件路径} --citations --claims_nli --mauve\n```\n\n评估结果将保存在 `result\u002F` 目录下，文件名与输入文件相同，并添加 `.score` 后缀。\n\n## 人工评估\n\n我们的人工评估结果（第 6 节）位于 [`human_eval`](human_eval) 目录下。该目录同时提供了数据和分析报告，详情请参阅该目录。\n\n## 错误或疑问？\n\n如果您对代码或论文有任何疑问，欢迎发送邮件至 Tianyu（`tianyug@cs.princeton.edu`）。如果在使用代码时遇到任何问题，或希望报告错误，您可以提交一个 issue。请尽可能详细地描述问题，以便我们能够更快更好地帮助您！\n\n\n\n## 引用\n\n如果您在工作中使用了 ALCE，请引用我们的论文：\n\n```bibtex\n@inproceedings{gao2023enabling,\n   title={Enabling Large Language Models to Generate Text with Citations},\n   author={Gao, Tianyu and Yen, Howard and Yu, Jiatong and Chen, Danqi},\n   year={2023},\n   booktitle={Empirical Methods in Natural Language Processing (EMNLP)},\n}\n```","# ALCE 快速上手指南\n\nALCE (Automatic LLMs' Citation Evaluation) 是一个用于评估大语言模型生成文本引用质量的基准测试工具。它包含 ASQA、QAMPARI 和 ELI5 三个数据集，并提供自动评估代码以衡量生成内容的流畅度、正确性和引用质量。\n\n## 环境准备\n\n### 系统要求\n- **Python**: 3.9.7 (推荐)\n- **GPU**: 若需重建密集索引（GTR），建议显存 80GB；若使用预下载索引或仅运行推理\u002F评估，常规显卡即可。\n- **内存**: 处理大规模检索索引时，较大的 CPU 内存有助于提升速度。\n\n### 前置依赖\n请确保安装以下核心库的最新版本：\n- PyTorch (`torch`)\n- HuggingFace Transformers (`transformers`)\n- HuggingFace Accelerate (`accelerate`)\n- OpenAI API (`openai`)\n\n参考测试版本：\n- `torch==2.1.0.dev20230514+cu118`\n- `transformers==4.28.1`\n- `accelerate==0.17.1`\n- `openai==0.27.4`\n\n若需复现检索步骤，还需额外安装：\n- `pyserini==0.21.0`\n- `sentence-transformers==2.2.2`\n\n## 安装步骤\n\n1. **克隆仓库**\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fprinceton-nlp\u002FALCE.git\n   cd ALCE\n   ```\n\n2. **安装 Python 依赖**\n   建议使用国内镜像源加速安装（如清华源）：\n   ```bash\n   pip install torch transformers accelerate openai -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n   ```\n   \n   如需进行检索实验，请额外安装：\n   ```bash\n   pip install pyserini==0.21.0 sentence-transformers==2.2.2 -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n   ```\n\n3. **下载数据**\n   运行官方脚本下载数据集及预检索结果（包含 ASQA, QAMPARI, ELI5 的 top-100 检索结果）：\n   ```bash\n   bash download_data.sh\n   ```\n   *注：数据将存储在 `data\u002F` 目录下。*\n\n4. **(可选) 配置检索索引**\n   若需复现 BM25 (Sphere) 或 GTR 检索步骤，需下载大型索引并设置环境变量。\n   \n   **BM25 Sphere 索引:**\n   ```bash\n   wget -P faiss_index https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fsphere\u002Fsphere_sparse_index.tar.gz\n   tar -xzvf faiss_index\u002Fsphere_sparse_index.tar.gz -C faiss_index\n   export BM25_SPHERE_PATH=$PWD\u002Ffaiss_index\n   ```\n\n   **GTR 索引 (推荐直接下载预处理好的索引以节省时间):**\n   ```bash\n   wget https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fprinceton-nlp\u002Fgtr-t5-xxl-wikipedia-psgs_w100-index\u002Fresolve\u002Fmain\u002Fgtr_wikipedia_index.pkl\n   export GTR_EMB=$PWD\u002Fgtr_wikipedia_index.pkl\n   ```\n\n## 基本使用\n\n### 1. 复现基线模型生成\n通过指定配置文件运行生成任务。配置文件命名规则为 `{LLM}_{demos_passages}_{retriever}_{method}.yaml`。\n\n**示例：运行 Vicuna-13b 模型的 Summary 方法**\n```bash\npython run.py --config configs\u002Fvicuna-13b_summary.yaml --seed 43 --model vicuna-13b\n```\n\n**OpenAI 模型使用说明：**\n若使用 ChatGPT，需设置环境变量：\n```bash\nexport OPENAI_API_KEY=\"your_api_key\"\nexport OPENAI_ORG_ID=\"your_org_id\"\n# 若使用 Azure OpenAI，还需设置 OPENAI_API_BASE 并添加 --azure 标志\n```\n运行命令示例：\n```bash\npython run.py --config configs\u002Fchatgpt_default.yaml --model gpt-35-turbo --azure\n```\n生成结果将保存在 `result\u002F` 目录下的 JSON 文件中。\n\n### 2. 执行自动评估\n根据数据集类型运行 `eval.py` 进行评估。\n\n**评估 ASQA 数据集:**\n```bash\npython eval.py --f result\u002F{result_file_name}.json --citations --qa --mauve\n```\n\n**评估 QAMPARI 数据集:**\n```bash\npython eval.py --f result\u002F{result_file_name}.json --citations\n```\n\n**评估 ELI5 数据集:**\n```bash\npython eval.py --f result\u002F{result_file_name}.json --citations --claims_nli --mauve\n```\n评估结果将保存为同名的 `.score` 文件。\n\n### 3. (可选) 事后添加引用\n对于闭源模型生成的文本，可使用 `post_hoc_cite.py` 利用 GTR-large 模型事后添加引用：\n```bash\npython post_hoc_cite.py --f result\u002F{RESULT_JSON_FILE} --external_docs data\u002F{CORRESPONDING_DATA}\n```","某科技公司的技术文档团队正利用大语言模型自动撰写行业分析报告，需要确保生成的内容不仅流畅，还必须附带准确的信息来源以供核查。\n\n### 没有 ALCE 时\n- **引用幻觉频发**：模型经常编造不存在的论文标题或链接，导致报告可信度极低，人工核实成本高昂。\n- **评估维度单一**：团队只能凭感觉判断文章是否“通顺”，缺乏对“事实正确性”和“引用质量”的量化标准。\n- **基线对比困难**：在尝试不同检索增强（RAG）策略时，无法客观衡量哪种方案生成的引用更精准，优化方向模糊。\n- **人工审核瓶颈**：由于缺乏自动化评估工具，资深编辑需逐字核对每条引文，严重拖慢了报告发布周期。\n\n### 使用 ALCE 后\n- **精准识别幻觉**：ALCE 自动检测并标记出无依据的引用，迫使模型仅基于检索到的真实段落生成内容，大幅降低造假率。\n- **三维量化评估**：通过流畅度、正确性和引用质量三个维度的自动打分，团队能清晰看到生成内容的具体短板。\n- **科学优化策略**：利用 ALCE 基准测试不同检索器（如 BM25 vs GTR）的效果，快速锁定能提升引用准确率的最佳技术组合。\n- **效率显著提升**：自动化评估替代了大部分初筛工作，编辑只需关注高分报告中的细微逻辑问题，发布速度提升数倍。\n\nALCE 将大模型生成内容从“不可控的黑盒”转变为“可量化、可信赖”的知识生产流程，彻底解决了自动化写作中引用难验证的核心痛点。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fprinceton-nlp_ALCE_75c35e0c.png","princeton-nlp","Princeton Natural Language Processing","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fprinceton-nlp_9459cd72.png","",null,"http:\u002F\u002Fnlp.cs.princeton.edu","https:\u002F\u002Fgithub.com\u002Fprinceton-nlp",[80,84],{"name":81,"color":82,"percentage":83},"Python","#3572A5",99.7,{"name":85,"color":86,"percentage":87},"Shell","#89e051",0.3,512,50,"2026-03-23T08:43:11","MIT",4,"未说明","检索步骤构建稠密索引（GTR）时必需，推荐使用 80GB 显存 GPU；运行基线模型取决于所选 LLM 大小；测试环境为 CUDA 11.8","检索步骤（特别是 BM25 Sphere）需要较大 CPU 内存，具体数值未说明",{"notes":97,"python":98,"dependencies":99},"1. 若使用 OpenAI API 需设置环境变量 OPENAI_API_KEY 和 OPENAI_ORG_ID（Azure 用户需额外配置）。2. 检索步骤非常耗时且资源消耗大：BM25 Sphere 索引庞大，GTR 稠密索引约 31GB。3. 可下载预构建的 GTR 索引以避免昂贵的构建过程。4. 复现 DPR 检索需参考外部 DPR 仓库。","3.9.7",[100,101,102,103,104,105],"torch==2.1.0.dev20230514+cu118","transformers==4.28.1","accelerate==0.17.1","openai==0.27.4","pyserini==0.21.0","sentence-transformers==2.2.2",[35,14,107],"其他","2026-03-27T02:49:30.150509","2026-04-08T01:09:03.500397",[111,116,121,126,131,136],{"id":112,"question_zh":113,"answer_zh":114,"source_url":115},23315,"运行模型时出现警告\"Prompt exceeds max length and return an empty string as answer\"（提示词超出最大长度并返回空字符串），该如何解决？","该警告通常意味着在检索到的前 100 个段落中，模型认为“相关”的段落少于 5 个，导致上下文不足或提示词构建问题。建议尝试更换其他数据集或调整配置文件（configs）中的参数。如果问题频繁出现，可以尝试缩短提示词长度或检查检索到的文档质量。","https:\u002F\u002Fgithub.com\u002Fprinceton-nlp\u002FALCE\u002Fissues\u002F24",{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},23316,"使用 ChatGPT (gpt-3.5-turbo) 复现实验时，结果比论文报告中低 5-6 分且无法遵循指令（如引用限制），原因是什么？","这可能是由于 API 调用的不稳定性或特定配置导致的。维护者指出，如果在检索到的前 100 个段落中相关文档较少，会导致表现下降。建议尝试其他数据集或配置（如 'default' 模式通常比 'extraction' 模式表现更稳定）。此外，注意检查是否有\"No summary found in document\"的警告，缺少文档摘要可能会导致 ChatGPT 表现变差。","https:\u002F\u002Fgithub.com\u002Fprinceton-nlp\u002FALCE\u002Fissues\u002F11",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},23317,"运行复现代码时速度极慢（例如加载 checkpoint 耗时很长或推理需要数百小时），是否正常？","如果不使用批处理（batching），交互式或内联搜索模型的推理速度确实会很慢，但不应达到数百小时的程度。在 A100 GPU 上，Vanilla 方法的推理应在 2 小时内完成。如果速度异常慢（如 363 小时），通常是因为显存不足导致系统频繁使用磁盘交换（offload to disk）。建议尝试使用更小的语言模型，或使用显存更大的 GPU。确保正确配置 `offload_folder` 以避免错误，但需注意这会显著降低速度。","https:\u002F\u002Fgithub.com\u002Fprinceton-nlp\u002FALCE\u002Fissues\u002F4",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},23318,"如何获取论文中提到的人工评估结果以及用于评估自动指标准确性的脚本？","维护者已更新仓库，在 `human_eval` 目录中提供了相关脚本。用户可以前往该目录查找用于计算引用召回率（citation recall）、引用精确率（citation precision）、引用不足（insufficient citations）和无关引用（irrelevant citations）等指标的代码，以便将人工标注作为金标准来评估自动指标。","https:\u002F\u002Fgithub.com\u002Fprinceton-nlp\u002FALCE\u002Fissues\u002F20",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},23319,"加载嵌入文件（如 gtr_wikipedia_index.pkl）时遇到 Key Error 或文件只包含 Git LFS 详情，如何解决？","首先，确保已正确安装 Git LFS 并拉取了大文件，否则 `.pkl` 文件可能只包含指针信息。其次，代码中存在一个已修复的 Bug：在使用 LuceneSearcher 进行稀疏检索时，应使用 `hit.raw` 而不是 `h.raw` 来获取段落文本。如果字典中不包含 `raw` 键，请检查是否需要对稀疏索引进行额外的预处理步骤，或确认使用的是更新后的代码版本。","https:\u002F\u002Fgithub.com\u002Fprinceton-nlp\u002FALCE\u002Fissues\u002F8",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},23320,"项目中使用的段落检索（passage retrieval）的语料库和脚本在哪里？","维护者已回应并发布了相关的语料库和检索脚本。用户可以在项目的最新更新或数据发布部分找到用于复现检索步骤的资源。","https:\u002F\u002Fgithub.com\u002Fprinceton-nlp\u002FALCE\u002Fissues\u002F5",[]]