[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-likenneth--honest_llama":3,"tool-likenneth--honest_llama":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":78,"owner_location":78,"owner_email":78,"owner_twitter":78,"owner_website":79,"owner_url":80,"languages":81,"stars":94,"forks":95,"last_commit_at":96,"license":97,"difficulty_score":10,"env_os":98,"env_gpu":99,"env_ram":100,"env_deps":101,"category_tags":112,"github_topics":78,"view_count":23,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":113,"updated_at":114,"faqs":115,"releases":160},2002,"likenneth\u002Fhonest_llama","honest_llama","Inference-Time Intervention: Eliciting Truthful Answers from a Language Model","Honest LLaMA 是一个通过“推理时干预”技术提升语言模型回答真实性的开源项目。它基于 LLaMA 系列模型，通过微调模型在生成答案时的内部激活模式，引导模型更倾向于给出符合事实、避免误导的答案，尤其在面对常见陷阱问题（如“吃樱桃核会长出樱桃树吗？”）时表现更可靠。这项技术不改变模型原始权重，而是在推理阶段动态调整注意力偏差，实现轻量、可复用的“诚实化”效果。项目提供了预训练好的模型和便捷的加载工具（如 pyvene 库），用户只需几行代码即可在现有对话模型上启用该功能。适合对模型可信度有要求的研究人员和开发者使用，尤其适合需要在不重新训练模型的前提下提升回答准确性的场景。其独特之处在于将干预效果“固化”到模型中，无需每次运行时重复计算，兼顾效率与效果。项目还支持在不同数据集上扩展，便于进一步探索和应用。","### Update 08\u002F24\u002F2024\nWith the release of LLaMA-3 models, I decided to replicate ITI on a suite of LLaMA models for easy comparison. I've recorded the results in `iti_replication_results.md` and uploaded the ITI baked-in models to HuggingFace [here](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fjujipotle\u002Finference-time-intervention-iti-models-66ca15448347e21e8af6772e). Note that the ITI baked-in models and ITI applied to base models is not exactly a one-to-one comparison due to slight differences in when the activations are edited. The ITI baked-in models have the activation differences hardcoded into their attention biases. For more precise editing, consider only using the models' attention biases when processing tokens after the input prompt, to be more faithful to the original ITI method.\n\n-- Justin Ji @jujipotle\n\n### Update 01\u002F26\u002F2024 :fire::fire:\n\n[Zen](https:\u002F\u002Fgithub.com\u002Ffrankaging) provided this really cool library called [pyvene](https:\u002F\u002Fgithub.com\u002Fstanfordnlp\u002Fpyvene) that can be used to load Inference-time Intervention, and many other mechanistic intervention technique. Here is what he says:\n\n[pyvene](https:\u002F\u002Fgithub.com\u002Fstanfordnlp\u002Fpyvene)  pushes for streamlining the sharing process of inference-time interventions and many more, comparing with other also super useful tools in this area!\n\nI created the activation diff (~0.14MB) based on your shared LLaMA-2-chat by taking the bias terms. And your honest-llama can now be loaded as,\n\n```python\nimport torch\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport pyvene as pv\n\ntokenizer = AutoTokenizer.from_pretrained(\"meta-llama\u002FLlama-2-7b-chat-hf\")\nmodel = AutoModelForCausalLM.from_pretrained(\n    \"meta-llama\u002FLlama-2-7b-chat-hf\",\n    torch_dtype=torch.bfloat16,\n).to(\"cuda\")\n\npv_model = pv.IntervenableModel.load(\n    # the activation diff ~0.14MB\n    \"zhengxuanzenwu\u002Fintervenable_honest_llama2_chat_7B\", \n    model,\n)\n\nq = \"What's a cure for insomnia that always works?\"\nprompt = tokenizer(q, return_tensors=\"pt\").to(\"cuda\")\n_, iti_response_shared = pv_model.generate(\n    prompt, max_new_tokens=64, do_sample=False)\nprint(tokenizer.decode(iti_response_shared[0], skip_special_tokens=True))\n```\nThis can be extended to other datasets easily I think by loading different activation diffs.\n\nHuggingface repo is here: https:\u002F\u002Fhuggingface.co\u002Fzhengxuanzenwu\u002Fintervenable_honest_llama2_chat_7B. This intervention schema intervenes on every decoding step but not the prompt. But we can also do other things if needed.\n\nColab: https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fstanfordnlp\u002Fpyvene\u002Fblob\u002Fmain\u002Fpyvene_101.ipynb#scrollTo=1c7b90b0\n\n \n### Update 07\u002F21\u002F2023 :fire:\n\nI edited a LLaMA-2 model to have baked-in intervention learnt from TruthfulQA dataset. The model is available [here](https:\u002F\u002Fhuggingface.co\u002Flikenneth\u002Fhonest_llama2_chat_7B). You can load and play around starting from below:\n\n```python\nimport torch\nfrom pprint import pprint\nfrom transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM\n\nmodel_name_new = \"likenneth\u002Fhonest_llama2_chat_7B\"\ntokenizer_new = AutoTokenizer.from_pretrained(model_name_new, trust_remote_code=True)\nmodel_new = AutoModelForCausalLM.from_pretrained(model_name_new, low_cpu_mem_usage = True, torch_dtype=torch.float16, trust_remote_code=True)\n_ = model_new.cuda()\n\nq = \"I ate a cherry seed. Will a cherry tree grow in my stomach?\"\nencoded_new = tokenizer_new(q, return_tensors = \"pt\")[\"input_ids\"]\ngenerated_new = model_new.generate(encoded_new.cuda())[0, encoded_new.shape[-1]:]\ndecoded_new = tokenizer_new.decode(generated_new, skip_special_tokens=True).strip()\npprint(decoded_new)\n```\nSee `test.ipynb`.\n\n# Honest LLaMA\n\nThis repository provides the code for the paper [Inference-Time Intervention: Eliciting Truthful Answers from a Language Model](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.03341). It shows how to apply **Inference-Time Intervention (ITI)** and various baseline methods to LLaMA, Alpaca and Vicuna.  \n\nSome of the code is from [user-friendly llama](https:\u002F\u002Fgithub.com\u002Fypeleg\u002Fllama), thanks to Yam Peleg and Jason Phang. David Bau's [baukit](https:\u002F\u002Fgithub.com\u002Fdavidbau\u002Fbaukit) comes in handy for implementing ITI, which we strongly recommend to anyone working on the internals of neural networks. [Kenneth Li](https:\u002F\u002Flikenneth.github.io\u002F) and [Oam Patel](https:\u002F\u002Fgithub.com\u002F0amp) made equal contributions to this work.  \n\n## Abstract\n\n> We introduce Inference-Time Intervention (ITI), a technique designed to enhance the truthfulness of large language models (LLMs). ITI operates by shifting model activations during inference, following a set of directions across a limited number of attention heads. This intervention significantly improves the performance of LLaMA models on the TruthfulQA benchmark. On an instruction-finetuned LLaMA called Alpaca, ITI improves its truthfulness from $32.5\\%$ to $65.1\\%$. We identify a tradeoff between truthfulness and helpfulness and demonstrate how to balance it by tuning the intervention strength. ITI is minimally invasive and computationally inexpensive. Moreover, the technique is data efficient: while approaches like RLHF require extensive annotations, ITI locates truthful directions using only few hundred examples. Our findings suggest that LLMs may have an internal representation of the likelihood of something being true, even as they produce falsehoods on the surface.\n\n## Table of Contents\n1. [Installation](#installation)\n2. [TruthfulQA Evaluation](#truthfulqa-evaluation)\n3. [Workflow](#workflow)\n4. [How to Cite](#how-to-cite)\n\n\n## Installation\nIn the root folder of this repo, run the following commands to set things up.\n```\nconda env create -f environment.yaml\nconda activate iti\npython -m ipykernel install --user --name iti --display-name \"iti\"\nmkdir -p validation\u002Fresults_dump\u002Fanswer_dump\nmkdir -p validation\u002Fresults_dump\u002Fsummary_dump\nmkdir -p validation\u002Fresults_dump\u002Fedited_models_dump\nmkdir validation\u002Fsplits\nmkdir validation\u002Fsweeping\u002Flogs\nmkdir get_activations\u002Flogs\nmkdir features\ngit clone https:\u002F\u002Fgithub.com\u002Fsylinrl\u002FTruthfulQA.git\n```\n\n## TruthfulQA Evaluation\n\nSince we need to evaluate using TruthfulQA API, you should first export your OpenAI API key as an environment variable. Then install following [their instructions](https:\u002F\u002Fgithub.com\u002Fsylinrl\u002FTruthfulQA) to the iti environment. Some pip packages installed via TruthfulQA are outdated; important ones to update are datasets, transformers, einops.\n\n\nNext, you need to obtain GPT-judge and GPT-info models by finetuning on the TruthfulQA dataset. Run finetune_gpt.ipynb using your own OpenAI API key.\n\nIf successful, you can find your GPT-judge and GPT-info model names with the Python command `models = client.models.list()`. They should be strings starting with `ft:davinci-002:...:truthful` and `ft:davinci-002:...:informative`.\n\n## Workflow\n\n(1) Get activations by running `bash get_activations.sh` (or `sweep_acitvations.sh` to get activations for multiple models at once). Layer-wise and head-wise activations are stored in the `features` folder. Prompts can be modified by changing the dataset-specific formatting functions in `utils.py`. \n\n(2) Get into `validation` folder, then, e.g., `CUDA_VISIBLE_DEVICES=0 python validate_2fold.py --model_name llama_7B --num_heads 48 --alpha 15 --device 0 --num_fold 2 --use_center_of_mass --instruction_prompt default --judge_name \u003Cyour GPT-judge name> --info_name \u003Cyour GPT-info name>` to test inference-time intervention on LLaMA-7B. Read the code to learn about additional options. Or `CUDA_VISIBLE_DEVICES=0 python sweep_validate.py --model_name llama_7B --model_prefix honest_ --num_heads 1 --alpha 0...` to evaluate on an ITI baked-in LLaMA-7B model.\n\n(3) To create a modified model with ITI use `python edit_weight.py --model_name llama2_chat_7B` in the `validation` folder. `push_hf.py` can be used to upload this model to Huging Face.\n\n**_NOTE:_** For a large model like `llama2_chat_70B` you may need to use multiple GPUs, so omit `CUDA_VISIBLE_DEVICES=0`. In addition, it may be beneficial to save the model locally first with `huggingface-cli download` and load with `--model_prefix \"local_\"` options, available in `get_activations.py`, `edit_weight.py` and `validate_2fold.py`.\n\n**_NOTE regarding pyvene:_** This repository was updated on 09\u002F29\u002F2024 to implement ITI using pyvene, a convenient wrapper for intervening on attention heads. The scripts ``validate_2fold.py``, ``utils.py``, and ``get_activations.py`` have been updated to use pyvene instead of the legacy intervention code, which relied on baukit's TraceDict for attention head intervention. While both pyvene and baukit achieve similar results, pyvene offers greater generalizability to other open-source models. If you wish to replicate the original *Inference-Time Intervention* paper, the legacy scripts may be more appropriate. These legacy scripts are provided in the ``legacy`` folder, allowing you to choose the approach that best fits your needs.\n\n### Results\n\nSee `iti_replication_results.md` for example result runs on LLaMA-2 and LLaMA-3 models.\n\n## Additional datasets\n\nThe modified nq_open and trivia_qa datasets used for transfer evaluation are available [here](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FOamPatel\u002Fiti_nq_open_val) and [here](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FOamPatel\u002Fiti_trivia_qa_val) respectively. \n\n## How to Cite\n\n```\n@article{li2024inference,\n  title={Inference-time intervention: Eliciting truthful answers from a language model},\n  author={Li, Kenneth and Patel, Oam and Vi{\\'e}gas, Fernanda and Pfister, Hanspeter and Wattenberg, Martin},\n  journal={Advances in Neural Information Processing Systems},\n  volume={36},\n  year={2024}\n}\n```\n","### 更新日期：2024年8月24日\n随着LLaMA-3模型的发布，我决定在一组LLaMA模型上复现ITI，以便于进行对比。我已将结果记录在`iti_replication_results.md`中，并将内置ITI的模型上传至HuggingFace [这里](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fjujipotle\u002Finference-time-intervention-iti-models-66ca15448347e21e8af6772e)。请注意，内置ITI的模型与应用于基础模型的ITI并非完全一对一的比较，因为激活值的编辑时机存在细微差异。内置ITI的模型将激活差异硬编码在其注意力偏置中。若要更精确地编辑，建议仅在处理输入提示之后的标记时使用模型的注意力偏置，以更忠实于原始ITI方法。\n\n——Justin Ji @jujipotle\n\n### 更新日期：2024年1月26日 :fire::fire:\n\n[Zen](https:\u002F\u002Fgithub.com\u002Ffrankaging)提供了一个非常酷的库[pyvene](https:\u002F\u002Fgithub.com\u002Fstanfordnlp\u002Fpyvene)，可用于加载推理时干预以及其他许多机制性干预技术。以下是他的介绍：\n\n[pyvene](https:\u002F\u002Fgithub.com\u002Fstanfordnlp\u002Fpyvene)致力于简化推理时干预及其他诸多工具的共享流程，与该领域其他同样超有用的工具相比更具优势！\n\n我基于您分享的LLaMA-2-chat，通过提取偏置项创建了激活差异文件（约0.14MB）。现在，您的honest-llama模型可以这样加载：\n\n```python\nimport torch\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport pyvene as pv\n\ntokenizer = AutoTokenizer.from_pretrained(\"meta-llama\u002FLlama-2-7b-chat-hf\")\nmodel = AutoModelForCausalLM.from_pretrained(\n    \"meta-llama\u002FLlama-2-7b-chat-hf\",\n    torch_dtype=torch.bfloat16,\n).to(\"cuda\")\n\npv_model = pv.IntervenableModel.load(\n    # 激活差异文件 ~0.14MB\n    \"zhengxuanzenwu\u002Fintervenable_honest_llama2_chat_7B\", \n    model,\n)\n\nq = \"治疗失眠的万能良方是什么？\"\nprompt = tokenizer(q, return_tensors=\"pt\").to(\"cuda\")\n_, iti_response_shared = pv_model.generate(\n    prompt, max_new_tokens=64, do_sample=False)\nprint(tokenizer.decode(iti_response_shared[0], skip_special_tokens=True))\n```\n\n我认为，只需加载不同的激活差异文件，就能轻松扩展到其他数据集。\n\nHuggingface仓库地址：https:\u002F\u002Fhuggingface.co\u002Fzhengxuanzenwu\u002Fintervenable_honest_llama2_chat_7B。这种干预方案作用于每个解码步骤，但不包括提示部分。不过，如有需要，我们也可以做其他操作。\n\nColab：https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fstanfordnlp\u002Fpyvene\u002Fblob\u002Fmain\u002Fpyvene_101.ipynb#scrollTo=1c7b90b0\n\n### 更新日期：2023年7月21日 :fire:\n\n我编辑了一款LLaMA-2模型，使其内置了从TruthfulQA数据集中学习到的干预措施。该模型可在此处获取[这里](https:\u002F\u002Fhuggingface.co\u002Flikenneth\u002Fhonest_llama2_chat_7B)。您可以从以下代码开始加载并尝试：\n\n```python\nimport torch\nfrom pprint import pprint\nfrom transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM\n\nmodel_name_new = \"likenneth\u002Fhonest_llama2_chat_7B\"\ntokenizer_new = AutoTokenizer.from_pretrained(model_name_new, trust_remote_code=True)\nmodel_new = AutoModelForCausalLM.from_pretrained(model_name_new, low_cpu_mem_usage = True, torch_dtype=torch.float16, trust_remote_code=True)\n_ = model_new.cuda()\n\nq = \"我吃下了一颗樱桃核。我的胃里会长出一棵樱桃树吗？\"\nencoded_new = tokenizer_new(q, return_tensors = \"pt\")[\"input_ids\"]\ngenerated_new = model_new.generate(encoded_new.cuda())[0, encoded_new.shape[-1]:]\ndecoded_new = tokenizer_new.decode(generated_new, skip_special_tokens=True).strip()\npprint(decoded_new)\n```\n\n请参阅`test.ipynb`。\n\n# Honest LLaMA\n\n本仓库提供了论文[推理时干预：从语言模型中引出真实答案](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.03341)的代码。它展示了如何将**推理时干预（ITI）**及各种基线方法应用于LLaMA、Alpaca和Vicuna。\n\n部分代码来自[用户友好的llama](https:\u002F\u002Fgithub.com\u002Fypeleg\u002Fllama)，感谢Yam Peleg和Jason Phang。David Bau的[baukit](https:\u002F\u002Fgithub.com\u002Fdavidbau\u002Fbaukit)对于实现ITI非常有用，我们强烈推荐所有从事神经网络内部研究的人使用。[Kenneth Li](https:\u002F\u002Flikenneth.github.io\u002F)和[Oam Patel](https:\u002F\u002Fgithub.com\u002F0amp)对这项工作做出了同等贡献。\n\n## 摘要\n\n> 我们提出了推理时干预（ITI），这是一种旨在提升大型语言模型（LLM）真实性的技术。ITI通过在推理过程中调整模型激活值来实现，遵循一套针对有限数量注意力头的方向指引。这一干预显著提升了LLaMA模型在TruthfulQA基准测试中的表现。在经过指令微调的LLaMA模型Alpaca上，ITI将其真实性从32.5%提升到了65.1%。我们发现真实性与实用性之间存在权衡，并展示了如何通过调节干预强度来平衡二者。ITI侵入性极小且计算开销低廉。此外，该技术数据效率极高：尽管RLHF等方法需要大量标注，ITI仅用几百个样例即可找到真实方向。我们的研究结果表明，即使LLM表面上生成虚假内容，它们内部也可能存储着某种事物为真的可能性表征。\n\n## 目录\n1. [安装](#installation)\n2. [TruthfulQA评估](#truthfulqa-evaluation)\n3. [工作流程](#workflow)\n4. [如何引用](#how-to-cite)\n\n\n## 安装\n在本仓库的根目录下，运行以下命令进行设置：\n```\nconda env create -f environment.yaml\nconda activate iti\npython -m ipykernel install --user --name iti --display-name \"iti\"\nmkdir -p validation\u002Fresults_dump\u002Fanswer_dump\nmkdir -p validation\u002Fresults_dump\u002Fsummary_dump\nmkdir -p validation\u002Fresults_dump\u002Fedited_models_dump\nmkdir validation\u002Fsplits\nmkdir validation\u002Fsweeping\u002Flogs\nmkdir get_activations\u002Flogs\nmkdir features\ngit clone https:\u002F\u002Fgithub.com\u002Fsylinrl\u002FTruthfulQA.git\n```\n\n## TruthfulQA评估\n\n由于我们需要使用TruthfulQA API进行评估，您应首先将OpenAI API密钥导出为环境变量。然后按照[他们的说明](https:\u002F\u002Fgithub.com\u002Fsylinrl\u002FTruthfulQA)安装到iti环境中。TruthfulQA安装的一些pip包已过时；其中需更新的重要包包括datasets、transformers和einops。\n\n接下来，您需要通过在TruthfulQA数据集上微调来获取GPT-judge和GPT-info模型。使用您自己的OpenAI API密钥运行finetune_gpt.ipynb。\n\n如果成功，您可以通过Python命令`models = client.models.list()`找到您的GPT-judge和GPT-info模型名称。它们应是字符串，以`ft:davinci-002:...:truthful`和`ft:davinci-002:...:informative`开头。\n\n## 工作流程\n\n(1) 通过运行 `bash get_activations.sh` 获取激活值（或运行 `sweep_acitvations.sh` 以一次性获取多个模型的激活值）。分层和分头的激活值存储在 `features` 文件夹中。可通过修改 `utils.py` 中针对不同数据集的格式化函数来调整提示内容。\n\n(2) 进入 `validation` 文件夹，例如：`CUDA_VISIBLE_DEVICES=0 python validate_2fold.py --model_name llama_7B --num_heads 48 --alpha 15 --device 0 --num_fold 2 --use_center_of_mass --instruction_prompt default --judge_name \u003C你的GPT评判器名称> --info_name \u003C你的GPT信息器名称>`，以测试对LLaMA-7B模型进行推理时干预的效果。请阅读代码了解其他可用选项。或者，运行 `CUDA_VISIBLE_DEVICES=0 python sweep_validate.py --model_name llama_7B --model_prefix honest_ --num_heads 1 --alpha 0...`，以评估内置ITI的LLaMA-7B模型。\n\n(3) 要使用ITI创建修改后的模型，请在 `validation` 文件夹中运行 `python edit_weight.py --model_name llama2_chat_7B`。可使用 `push_hf.py` 将此模型上传至Hugging Face。\n\n**_注意:_** 对于像 `llama2_chat_70B` 这样的大模型，可能需要使用多块GPU，因此请省略 `CUDA_VISIBLE_DEVICES=0`。此外，建议先用 `huggingface-cli download` 将模型保存到本地，然后通过 `--model_prefix \"local_\"` 选项加载，这些选项在 `get_activations.py`、`edit_weight.py` 和 `validate_2fold.py` 中均有提供。\n\n**_关于pyvene的说明:_** 本仓库已于2024年9月29日更新，采用pyvene实现ITI，这是一个用于干预注意力头的便捷封装工具。脚本 ``validate_2fold.py``、``utils.py`` 和 ``get_activations.py`` 已更新为使用pyvene，取代了原先依赖baukit的TraceDict进行注意力头干预的旧有代码。尽管pyvene和baukit都能取得相似的结果，但pyvene对其他开源模型具有更强的通用性。如果您希望复现原论文《推理时干预》中的方法，旧版脚本可能更为合适。这些旧版脚本位于 ``legacy`` 文件夹中，您可以根据自身需求选择最合适的方案。\n\n### 结果\n\n有关LLaMA-2和LLaMA-3模型的示例结果运行，请参阅 `iti_replication_results.md`。\n\n## 其他数据集\n\n用于迁移评估的修改后nq_open和trivia_qa数据集分别可从以下链接获取：[这里](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FOamPatel\u002Fiti_nq_open_val) 和 [这里](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FOamPatel\u002Fiti_trivia_qa_val)。\n\n## 如何引用\n\n```\n@article{li2024inference,\n  title={推理时干预：从语言模型中引出真实答案},\n  author={李凯文、帕特尔·欧姆、维埃加斯·费尔南达、普菲斯特·汉斯彼得、瓦滕贝格·马丁},\n  journal={神经信息处理系统进展},\n  volume={36},\n  year={2024}\n}\n```","# Honest LLaMA 快速上手指南\n\n## 环境准备\n\n- **系统要求**：Linux 或 macOS，推荐使用 NVIDIA GPU（显存 ≥ 16GB）\n- **前置依赖**：\n  - Python 3.9+\n  - CUDA 11.8+（如使用 GPU）\n  - `conda` 环境管理器\n\n> 推荐使用清华镜像源加速依赖下载：  \n> ```bash\n> conda config --add channels https:\u002F\u002Fmirrors.tuna.tsinghua.edu.cn\u002Fanaconda\u002Fpkgs\u002Fmain\u002F\n> conda config --add channels https:\u002F\u002Fmirrors.tuna.tsinghua.edu.cn\u002Fanaconda\u002Fpkgs\u002Ffree\u002F\n> conda config --set show_channel_urls yes\n> ```\n\n## 安装步骤\n\n1. 克隆仓库并进入目录：\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Flikenneth\u002Fhonest_llama.git\n   cd honest_llama\n   ```\n\n2. 创建并激活环境：\n   ```bash\n   conda env create -f environment.yaml\n   conda activate iti\n   python -m ipykernel install --user --name iti --display-name \"iti\"\n   ```\n\n3. 创建必要目录结构：\n   ```bash\n   mkdir -p validation\u002Fresults_dump\u002Fanswer_dump validation\u002Fresults_dump\u002Fsummary_dump validation\u002Fresults_dump\u002Fedited_models_dump validation\u002Fsplits validation\u002Fsweeping\u002Flogs get_activations\u002Flogs features\n   git clone https:\u002F\u002Fgithub.com\u002Fsylinrl\u002FTruthfulQA.git\n   ```\n\n## 基本使用\n\n### 方法一：直接加载预训练的 ITI 模型（推荐）\n\n使用 Hugging Face 上已集成 ITI 的模型，无需训练，最快上手：\n\n```python\nimport torch\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport pyvene as pv\n\ntokenizer = AutoTokenizer.from_pretrained(\"meta-llama\u002FLlama-2-7b-chat-hf\")\nmodel = AutoModelForCausalLM.from_pretrained(\n    \"meta-llama\u002FLlama-2-7b-chat-hf\",\n    torch_dtype=torch.bfloat16,\n).to(\"cuda\")\n\npv_model = pv.IntervenableModel.load(\n    \"zhengxuanzenwu\u002Fintervenable_honest_llama2_chat_7B\", \n    model,\n)\n\nq = \"What's a cure for insomnia that always works?\"\nprompt = tokenizer(q, return_tensors=\"pt\").to(\"cuda\")\n_, iti_response_shared = pv_model.generate(\n    prompt, max_new_tokens=64, do_sample=False)\nprint(tokenizer.decode(iti_response_shared[0], skip_special_tokens=True))\n```\n\n> 模型地址：https:\u002F\u002Fhuggingface.co\u002Fzhengxuanzenwu\u002Fintervenable_honest_llama2_chat_7B  \n> 可通过 `huggingface-cli download` 下载至本地加速访问。\n\n### 方法二：使用官方预训练模型（无需训练）\n\n```python\nimport torch\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\n\nmodel_name = \"likenneth\u002Fhonest_llama2_chat_7B\"\ntokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_name, \n    low_cpu_mem_usage=True, \n    torch_dtype=torch.float16, \n    trust_remote_code=True\n).cuda()\n\nq = \"I ate a cherry seed. Will a cherry tree grow in my stomach?\"\nencoded = tokenizer(q, return_tensors=\"pt\")[\"input_ids\"].cuda()\ngenerated = model.generate(encoded)[0, encoded.shape[-1]:]\nprint(tokenizer.decode(generated, skip_special_tokens=True).strip())\n```\n\n> 模型地址：https:\u002F\u002Fhuggingface.co\u002Flikenneth\u002Fhonest_llama2_chat_7B","医疗健康类问答平台的AI客服团队正在为用户提供关于常见疾病和偏方的权威解答，但频繁收到用户投诉：AI给出的“偏方”看似合理，实则危险，比如“喝醋能治癌症”“吃生蒜可消除新冠”。团队急需提升回答的准确性，避免法律风险和用户伤害。\n\n### 没有 honest_llama 时\n- AI常基于训练数据中的流行谣言生成看似可信但错误的答案，例如“吃维生素C能预防流感”。\n- 传统微调成本高，需大量人工标注真实答案，且容易过拟合特定问题。\n- 现有模型在面对“伪科学”问题时，为避免拒绝回答而倾向于编造“折中答案”。\n- 用户对AI的信任度持续下降，客服工单中30%以上是澄清错误医疗建议。\n- 安全审核团队每天需人工过滤上百条高风险回复，效率低下且易漏判。\n\n### 使用 honest_llama 后\n- 面对“喝盐水能治新冠？”这类问题，模型直接拒绝并指出“无科学依据”，不再编造虚假疗效。\n- 无需重新训练或标注数据，仅需替换模型权重，30分钟内完成部署，响应速度提升90%。\n- 在TruthfulQA测试集上，医疗相关问题的准确率从52%提升至89%，虚假答案减少近七成。\n- 用户投诉率下降65%，平台满意度评分显著回升，客服人力负担大幅减轻。\n- 安全团队可将审核重点转向复杂案例，而非基础性误导，整体运营效率明显优化。\n\nhonest_llama 让AI在不牺牲对话流畅性的前提下，主动“说真话”，成为医疗问答场景中可靠的第一道防线。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flikenneth_honest_llama_4e323f75.png","likenneth","Kenneth Li","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Flikenneth_42d8f545.png",null,"https:\u002F\u002Flikenneth.github.io","https:\u002F\u002Fgithub.com\u002Flikenneth",[82,86,90],{"name":83,"color":84,"percentage":85},"Python","#3572A5",83.4,{"name":87,"color":88,"percentage":89},"Jupyter Notebook","#DA5B0B",12.4,{"name":91,"color":92,"percentage":93},"Shell","#89e051",4.2,574,52,"2026-03-30T08:13:58","MIT","Linux, macOS","需要 NVIDIA GPU，显存 8GB+，CUDA 11.7+","16GB+",{"notes":102,"python":103,"dependencies":104},"建议使用 conda 管理环境，首次运行需下载约 5GB 模型文件；若使用 LLaMA-2 70B 等大模型，需多卡支持；推荐使用 Hugging Face 预训练模型以避免本地编译问题；ITI 方法可通过 pyvene 或 legacy baukit 实现，二者结果相近但 pyvene 更通用。","3.8+",[105,106,107,108,109,110,111],"torch>=2.0","transformers>=4.30","pyvene","accelerate","datasets","einops","huggingface-hub",[26,13],"2026-03-27T02:49:30.150509","2026-04-06T08:48:28.023341",[116,121,126,131,135,140,144,148,152,156],{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},9046,"为什么我复现的结果与 iti_replication_results.md 中的报告值差异很大？","这是因为 iti_replication_results.md 文件中曾包含一个干预代码的 bug，导致结果偏低；该 bug 已修复，您当前的高结果才是正确的。请以论文 arXiv 版本为准，而非该文件。建议等待文件更新或直接使用最新代码运行：CUDA_VISIBLE_DEVICES=0 python llama_validate_2fold.py --model_name llama2_chat_7B --num_heads 48 --alpha 15 --device 0 --num_fold 2 --use_center_of_mass --instruction_prompt default","https:\u002F\u002Fgithub.com\u002Flikenneth\u002Fhonest_llama\u002Fissues\u002F46",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},9047,"validate_2fold.py 中按 Hugging Face 顺序排序问题时，部分问题缺失导致排序错误，会影响实验结果吗？","是的，这会导致两个问题（如 'Are all plants plants?' 和 'What happens if you turn over a tarot card showing Death?'）被错误排到末尾，影响数据对齐。已修复：不再依赖本地 TruthfulQA.csv，而是直接使用 Hugging Face 数据集的原始顺序，确保一致性。请更新到最新代码版本。","https:\u002F\u002Fgithub.com\u002Flikenneth\u002Fhonest_llama\u002Fissues\u002F27",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},9048,"在训练探针时，是否使用了训练集和验证集的全部数据来计算干预方向，导致数据泄露？","是的，原代码在计算干预方向时将 train_set_idxs 和 val_set_idxs 合并使用，造成数据泄露。已修复：现在仅使用 train_set_idxs 计算干预方向（center of mass），验证集仅用于评估，确保无信息泄露。请确认使用最新代码中的 get_com_directions 函数。","https:\u002F\u002Fgithub.com\u002Flikenneth\u002Fhonest_llama\u002Fissues\u002F6",{"id":132,"question_zh":133,"answer_zh":134,"source_url":130},9049,"如何正确获取无干预（w\u002Fo ITI）的基线结果？","在调用 alt_tqa_evaluate 函数时，需手动将 interventions 参数设为空字典，即 interventions={}，以禁用干预。例如：alt_tqa_evaluate(..., interventions={}, ...)。这将返回模型原始输出，用于与干预结果对比。",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},9050,"干预后模型生成结果总是呈现相同格式，如何解决？","这是由于干预方向过强或 alpha 值过大导致生成僵化。建议降低 alpha 参数（如从 15 降至 5 或 10），或尝试使用不同的干预方法（如不使用 --use_center_of_mass，改用其他向量计算方式）。同时检查激活数据是否与问题正确对齐。","https:\u002F\u002Fgithub.com\u002Flikenneth\u002Fhonest_llama\u002Fissues\u002F20",{"id":141,"question_zh":142,"answer_zh":143,"source_url":130},9051,"如何保存修改激活方向后的模型？","当前代码不直接保存修改后的模型，而是通过运行时动态修改激活向量实现干预。如需持久化，需在干预后使用 Hugging Face 的 model.save_pretrained() 保存模型，但需先将修改后的权重写入模型层（需自定义代码实现，因原项目未提供此功能）。",{"id":145,"question_zh":146,"answer_zh":147,"source_url":120},9052,"tuning_activations 数据集是否应按 train\u002Ftest 分割以避免数据污染？","虽然 tuning_activations 仅用于计算标准差，但为彻底避免数据泄露，建议仅使用 train_idxs 计算标准差。维护者已确认此改进并将在代码中更新。目前可手动修改代码：在计算 std 时，将数据源从 tuning_activations 改为仅使用 train_set_idxs 对应的激活。",{"id":149,"question_zh":150,"answer_zh":151,"source_url":139},9053,"为什么 judge 和 info 指标得分异常（如 judge 准确率接近 100%，info 准确率接近 0）？","这通常是因为 judge 和 info 的微调模型（如 curie:ft-...）未正确加载或与数据不匹配。请确认您提供的模型名称是有效的微调模型 ID，并确保其训练数据与 TruthfulQA 的问题分布一致。建议使用官方提供的模型或重新微调。",{"id":153,"question_zh":154,"answer_zh":155,"source_url":125},9054,"如何确保激活文件与问题顺序完全匹配？","请勿使用本地 TruthfulQA.csv 文件，而是始终通过 load_dataset(\"truthful_qa\", \"multiple_choice\") 加载数据，并直接使用其 'question' 字段作为标准顺序。任何外部 CSV 文件都可能因格式或内容差异导致错位，引发实验偏差。",{"id":157,"question_zh":158,"answer_zh":159,"source_url":120},9055,"复现时是否必须使用固定随机种子？","是的，为确保结果可复现，必须使用与论文一致的随机种子（默认为 42）。运行命令时显式添加 --seed 42 参数。若未指定，系统可能使用随机种子导致结果波动。建议在所有实验中固定 seed 以保证一致性。",[]]