[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-HowieHwong--TrustLLM":3,"tool-HowieHwong--TrustLLM":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",144730,2,"2026-04-07T23:26:32",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":77,"owner_email":78,"owner_twitter":79,"owner_website":78,"owner_url":80,"languages":81,"stars":86,"forks":87,"last_commit_at":88,"license":89,"difficulty_score":32,"env_os":90,"env_gpu":91,"env_ram":90,"env_deps":92,"category_tags":98,"github_topics":100,"view_count":32,"oss_zip_url":78,"oss_zip_packed_at":78,"status":17,"created_at":113,"updated_at":114,"faqs":115,"releases":146},5469,"HowieHwong\u002FTrustLLM","TrustLLM","[ICML 2024] TrustLLM: Trustworthiness in Large Language Models","TrustLLM 是一个专为评估大型语言模型（LLM）“可信度”而设计的开源框架，源自 ICML 2024 的研究成果。随着大模型在各类场景中的广泛应用，其输出的安全性、公平性、隐私保护及事实准确性等问题日益凸显。TrustLLM 旨在解决这一核心痛点，提供了一套系统化的方法，帮助开发者量化并提升模型的可靠程度。\n\n该工具内置了涵盖六大维度的综合评测体系，包括安全性、鲁棒性、公平性、隐私合规等，并配套了大规模基准数据集和实时更新的排行榜。其独特亮点在于支持动态评估机制（如集成 UniGen），并能轻松对接主流模型平台（如 Azure OpenAI、Replicate 等），让复杂的伦理与安全测试变得像运行常规代码一样简便。\n\nTrustLLM 非常适合 AI 研究人员、大模型开发者以及企业技术团队使用。无论是希望在论文中严谨论证模型性能，还是在产品上线前进行严格的安全审计，TrustLLM 都能提供科学的数据支撑。它让“信任”不再是一个抽象概念，而是变成了可度量、可优化的具体指标，助力构建更安全、更负责任的人工智能应用。","\u003Cdiv align=\"center\">\n\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHowieHwong_TrustLLM_readme_9160c06beff4.png\" width=\"100%\">\n\n# Toolkit for \"**TrustLLM: Trustworthiness in Large Language Models**\"\n\n\n[![Website](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWebsite-%F0%9F%8C%8D-blue?style=for-the-badge&logoWidth=40)](https:\u002F\u002Ftrustllmbenchmark.github.io\u002FTrustLLM-Website\u002F)\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-%F0%9F%8E%93-lightgrey?style=for-the-badge&logoWidth=40)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.05561)\n[![Dataset](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDataset-%F0%9F%92%BE-green?style=for-the-badge&logoWidth=40)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTrustLLM\u002FTrustLLM-dataset)\n[![Data Map](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FData%20Map-%F0%9F%8D%9F-orange?style=for-the-badge&logoWidth=40)](https:\u002F\u002Fatlas.nomic.ai\u002Fmap\u002Ff64e87d3-c769-4a90-b15d-9dc833acc8ba\u002F8e9d7045-503b-4ba0-bc64-7201cb7aacee?xs=-16.14086&xf=-1.88776&ys=-7.54937&yf=3.88213)\n[![Leaderboard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLeaderboard-%F0%9F%9A%80-brightgreen?style=for-the-badge&logoWidth=40)](https:\u002F\u002Ftrustllmbenchmark.github.io\u002FTrustLLM-Website\u002Fleaderboard.html)\n[![Toolkit Document](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FToolkit%20Document-%F0%9F%93%9A-blueviolet?style=for-the-badge&logoWidth=40)](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002F)\n\n[![Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHowieHwong_TrustLLM_readme_e3ec5793d96d.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Ftrustllm)\n[![Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHowieHwong_TrustLLM_readme_e3ec5793d96d.png\u002Fmonth)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Ftrustllm)\n[![Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHowieHwong_TrustLLM_readme_e3ec5793d96d.png\u002Fweek)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Ftrustllm)\n\n\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002FHowieHwong\u002FTrustLLM?style=flat-square&color=5D6D7E\" alt=\"git-last-commit\" \u002F>\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fm\u002FHowieHwong\u002FTrustLLM?style=flat-square&color=5D6D7E\" alt=\"GitHub commit activity\" \u002F>\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flanguages\u002Ftop\u002FHowieHwong\u002FTrustLLM?style=flat-square&color=5D6D7E\" alt=\"GitHub top language\" \u002F>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n\n\n\n\u003C\u002Fdiv>\n\n\n## Updates & News\n\n- [02\u002F20\u002F2025] Try our latest toolkit: Our new work **TrustGen** and **TrustEval** toolkit has been released! [TrustGen](https:\u002F\u002Ftrustgen.github.io\u002F) provides comprehensive guidelines, assessment, and perspective for trustworthiness across multiple generative models, and [TrustEval](https:\u002F\u002Fgithub.com\u002FTrustGen\u002FTrustEval-toolkit) offers a dynamic evaluation platform.\n\n-  **TrustLLM** toolkit has been downloaded for 9000+ times!\n  \n\u003Cdetails>\n\u003Csummary>Click to expand\u002Fcollapse more\u003C\u002Fsummary>\n\n\n- [15\u002F07\u002F2024] **TrustLLM** now supports [**UniGen**](https:\u002F\u002Funigen-framework.github.io\u002F) for dynamic evaluation.\n- [02\u002F05\u002F2024] 🥂 **TrustLLM has been accepted by ICML 2024! See you in Vienna!**\n- [23\u002F04\u002F2024] :star: Version 0.3.0: Major updates including bug fixes, enhanced evaluation, and new models added (including ChatGLM3, Llama3-8b, Llama3-70b, GLM4, Mixtral). ([See details](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002Fchangelog.html))\n- [20\u002F03\u002F2024] :star: Version 0.2.4: Fixed many bugs & Support Gemini Pro API\n- [01\u002F02\u002F2024] :page_facing_up: Version 0.2.2: See our new paper about the awareness in LLMs! ([link](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.17882))\n- [29\u002F01\u002F2024] :star: Version 0.2.1: trustllm toolkit now supports (1) Easy evaluation pipeline (2) LLMs in [replicate](https:\u002F\u002Freplicate.com\u002F) and [deepinfra](https:\u002F\u002Fdeepinfra.com\u002F) (3) [Azure OpenAI API](https:\u002F\u002Fazure.microsoft.com\u002Fen-us\u002Fproducts\u002Fai-services\u002Fopenai-service)\n- [20\u002F01\u002F2024] :star: Version 0.2.0 of trustllm toolkit is released! See the [new features](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002Fchangelog.html#version-020).\n- [12\u002F01\u002F2024] :surfer: The [dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTrustLLM\u002FTrustLLM-dataset), [leaderboard](https:\u002F\u002Ftrustllmbenchmark.github.io\u002FTrustLLM-Website\u002Fleaderboard.html), and [evaluation toolkit](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002F) are released!\n\n\u003C\u002Fdetails>\n\n## 👂**TL;DR**\n\n- TrustLLM (ICML 2024) is a comprehensive framework for studying trustworthiness of large language models, which includes principles, surveys, and benchmarks.\n- This code repository is designed to provide an easy toolkit for evaluating the trustworthiness of LLMs ([See our docs](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002F)).\n\n\n\n**Table of Content**\n\n- [Toolkit for \"**TrustLLM: Trustworthiness in Large Language Models**\"](#toolkit-for-trustllm-trustworthiness-in-large-language-models)\n  - [Updates \\& News](#updates--news)\n  - [👂**TL;DR**](#tldr)\n  - [🙋 **About TrustLLM**](#-about-trustllm)\n  - [🧹 **Before Evaluation**](#-before-evaluation)\n    - [**Installation**](#installation)\n    - [**Dataset Download**](#dataset-download)\n    - [**Generation**](#generation)\n  - [🙌 **Evaluation**](#-evaluation)\n  - [🛎️ **Dataset \\& Task**](#️-dataset--task)\n    - [**Dataset overview:**](#dataset-overview)\n    - [**Task overview:**](#task-overview)\n  - [🏆 **Leaderboard**](#-leaderboard)\n  - [📣 **Contribution**](#-contribution)\n  - [**⏰ TODO in Coming Versions**](#-todo-in-coming-versions)\n  - [**Citation**](#citation)\n  - [**License**](#license)\n\n\n## 🙋 **About TrustLLM**\n\nWe introduce TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. \nWe then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. \nThe [document](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002F#about) explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to [project website](https:\u002F\u002Ftrustllmbenchmark.github.io\u002FTrustLLM-Website\u002F).\n\n\u003Cdiv align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHowieHwong_TrustLLM_readme_f0b23a7bd42b.png\" width=\"100%\">\n\u003C\u002Fdiv>\n\n\n\n\n## 🧹 **Before Evaluation**\n\n### **Installation**\nCreate a new environment:\n\n```shell\nconda create --name trustllm python=3.9\n```\n\n**Installation via Github (recommended):**\n\n```shell\ngit clone git@github.com:HowieHwong\u002FTrustLLM.git\ncd TrustLLM\u002Ftrustllm_pkg\npip install .\n```\n\n\n**Installation via `pip` (deprecated):**\n\n```shell\npip install trustllm\n```\n\n**Installation via `conda` (deprecated):**\n\n```sh\nconda install -c conda-forge trustllm\n```\n\n### **Dataset Download**\n\nDownload TrustLLM dataset:\n\n```python\nfrom trustllm.dataset_download import download_dataset\n\ndownload_dataset(save_path='save_path')\n```\n\n### **Generation**\n\nWe have added generation section from [version 0.2.0](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002Fchangelog.html). Start your generation from [this page](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002Fguides\u002Fgeneration_details.html). Here is an example:\n\n```python\nfrom trustllm.generation.generation import LLMGeneration\n\nllm_gen = LLMGeneration(\n    model_path=\"your model name\", \n    test_type=\"test section\", \n    data_path=\"your dataset file path\",\n    model_name=\"\", \n    online_model=False, \n    use_deepinfra=False,\n    use_replicate=False,\n    repetition_penalty=1.0,\n    num_gpus=1, \n    max_new_tokens=512, \n    debug=False,\n    device='cuda:0'\n)\n\nllm_gen.generation_results()\n```\n\n\n## 🙌 **Evaluation**\n\nWe have provided a toolkit that allows you to more conveniently assess the trustworthiness of large language models. Please refer to [the document](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002F) for more details. Here is an example:\n\n```python\nfrom trustllm.task.pipeline import run_truthfulness\n\ntruthfulness_results = run_truthfulness(  \n    internal_path=\"path_to_internal_consistency_data.json\",  \n    external_path=\"path_to_external_consistency_data.json\",  \n    hallucination_path=\"path_to_hallucination_data.json\",  \n    sycophancy_path=\"path_to_sycophancy_data.json\",\n    advfact_path=\"path_to_advfact_data.json\"\n)\n```\n\n## 🛎️ **Dataset & Task**\n\n### **Dataset overview:**\n\n*✓ the dataset is from prior work, and ✗ means the dataset is first proposed in our benchmark.*\n\n| Dataset               | Description                                                                                                           | Num.     | Exist? | Section                |\n|-----------------------|-----------------------------------------------------------------------------------------------------------------------|----------|--------|------------------------|\n| SQuAD2.0              | It combines questions in SQuAD1.1 with over 50,000 unanswerable questions.                                            | 100      | ✓      | Misinformation         |\n| CODAH                 | It contains 28,000 commonsense questions.                                                                             | 100      | ✓      | Misinformation         |\n| HotpotQA              | It contains 113k Wikipedia-based question-answer pairs for complex multi-hop reasoning.                               | 100      | ✓      | Misinformation         |\n| AdversarialQA         | It contains 30,000 adversarial reading comprehension question-answer pairs.                                           | 100      | ✓      | Misinformation         |\n| Climate-FEVER         | It contains 7,675 climate change-related claims manually curated by human fact-checkers.                              | 100      | ✓      | Misinformation         |\n| SciFact               | It contains 1,400 expert-written scientific claims pairs with evidence abstracts.                                     | 100      | ✓      | Misinformation         |\n| COVID-Fact            | It contains 4,086 real-world COVID claims.                                                                            | 100      | ✓      | Misinformation         |\n| HealthVer             | It contains 14,330 health-related claims against scientific articles.                                                 | 100      | ✓      | Misinformation         |\n| TruthfulQA            | The multiple-choice questions to evaluate whether a language model is truthful in generating answers to questions.     | 352      | ✓      | Hallucination          |\n| HaluEval              | It contains 35,000 generated and human-annotated hallucinated samples.                                                | 300      | ✓      | Hallucination          |\n| LM-exp-sycophancy     | A dataset consists of human questions with one sycophancy response example and one non-sycophancy response example.    | 179      | ✓      | Sycophancy             |\n| Opinion pairs         | It contains 120 pairs of opposite opinions.                                                                           | 240, 120 | ✗      | Sycophancy, Preference |\n| WinoBias              | It contains 3,160 sentences, split for development and testing, created by researchers familiar with the project.     | 734      | ✓      | Stereotype             |\n| StereoSet             | It contains the sentences that measure model preferences across gender, race, religion, and profession.                | 734      | ✓      | Stereotype             |\n| Adult                 | The dataset, containing attributes like sex, race, age, education, work hours, and work type, is utilized to predict salary levels for individuals. | 810      | ✓      | Disparagement          |\n| Jailbreak Trigger     | The dataset contains the prompts based on 13 jailbreak attacks.                                                        | 1300     | ✗      | Jailbreak, Toxicity    |\n| Misuse (additional)   | This dataset contains prompts crafted to assess how LLMs react when confronted by attackers or malicious users seeking to exploit the model for harmful purposes. | 261      | ✗      | Misuse                 |\n| Do-Not-Answer         | It is curated and filtered to consist only of prompts to which responsible LLMs do not answer.                         | 344 + 95 | ✓      | Misuse, Stereotype     |\n| AdvGLUE               | A multi-task dataset with different adversarial attacks.                                                               | 912      | ✓      | Natural Noise          |\n| AdvInstruction        | 600 instructions generated by 11 perturbation methods.                                                                 | 600        | ✗      | Natural Noise          |\n| ToolE                 | A dataset with the users' queries which may trigger LLMs to use external tools.                                        | 241      | ✓      | Out of Domain (OOD)    |\n| Flipkart              | A product review dataset, collected starting from December 2022.                                                       | 400      | ✓      | Out of Domain (OOD)    |\n| DDXPlus               | A 2022 medical diagnosis dataset comprising synthetic data representing about 1.3 million patient cases.               | 100      | ✓      | Out of Domain (OOD)    |\n| ETHICS                | It contains numerous morally relevant scenarios descriptions and their moral correctness.                              | 500      | ✓      | Implicit Ethics        |\n| Social Chemistry 101  | It contains various social norms, each consisting of an action and its label.                                          | 500      | ✓      | Implicit Ethics        |\n| MoralChoice           | It consists of different contexts with morally correct and wrong actions.                                             | 668      | ✓      | Explicit Ethics        |\n| ConfAIde              | It contains the description of how information is used.                                                               | 196      | ✓      | Privacy Awareness      |\n| Privacy Awareness     | It includes different privacy information queries about various scenarios.                                            | 280      | ✗      | Privacy Awareness      |\n| Enron Email           | It contains approximately 500,000 emails generated by employees of the Enron Corporation.                              | 400      | ✓      | Privacy Leakage        |\n| Xstest                | It's a test suite for identifying exaggerated safety behaviors in LLMs.                                                | 200      | ✓      | Exaggerated Safety     |\n\n### **Task overview:**\n\n*○ means evaluation through the automatic scripts (e.g., keywords matching), ● means the automatic evaluation by ChatGPT, GPT-4 or longformer, and ◐ means the mixture evaluation.*\n\n*More trustworthy LLMs are expected to have a higher value of the metrics with ↑ and a lower value with ↓.*\n\n| Task Name                                    | Metrics                                   | Type            | Eval | Section                  |\n|----------------------------------------------|-------------------------------------------|-----------------|------|--------------------------|\n| Closed-book QA                               | Accuracy (↑)                              | Generation      | ○    | Misinformation(Internal) |\n| Fact-Checking                                | Macro F-1 (↑)                             | Classification  | ●    | Misinformation(External) |\n| Multiple Choice QA                           | Accuracy (↑)                              | Classification  | ●    | Hallucination            |\n| Hallucination Classification                 | Accuracy (↑)                              | Classification  | ●    | Hallucination            |\n| Persona Sycophancy                           | Embedding similarity (↑)                  | Generation      | ◐    | Sycophancy               |\n| Opinion Sycophancy                           | Percentage change (↓)                     | Generation      | ○    | Sycophancy               |\n| Factuality Correction                        | Percentage change (↑)                     | Generation      | ○    | Adversarial Factuality   |\n| Jailbreak Attack Evaluation                  | RtA (↑)                                   | Generation      | ○    | Jailbreak                |\n| Toxicity Measurement                         | Toxicity Value (↓)                        | Generation      | ●    | Toxicity                 |\n| Misuse Evaluation                            | RtA (↑)                                   | Generation      | ○    | Misuse                   |\n| Exaggerated Safety Evaluation                | RtA (↓)                                   | Generation      | ○    | Exaggerated Safety       |\n| Agreement on Stereotypes                     | Accuracy (↑)                              | Generation      | ◐    | Stereotype               |\n| Recognition of Stereotypes                   | Agreement Percentage (↓)                  | Classification  | ◐    | Stereotype               |\n| Stereotype Query Test                        | RtA (↑)                                   | Generation      | ○    | Stereotype               |\n| Preference Selection                         | RtA (↑)                                   | Generation      | ○    | Preference               |\n| Salary Prediction                            | p-value (↑)                               | Generation      | ●    | Disparagement            |\n| Adversarial Perturbation in Downstream Tasks | ASR (↓), RS (↑)                           | Generation      | ◐    | Natural Noise            |\n| Adversarial Perturbation in Open-Ended Tasks | Embedding similarity (↑)                  | Generation      | ◐    | Natural Noise            |\n| OOD Detection                                | RtA (↑)                                   | Generation      | ○    | Out of Domain (OOD)      |\n| OOD Generalization                           | Micro F1 (↑)                              | Classification  | ○    | Out of Domain (OOD)      |\n| Agreement on Privacy Information             | Pearson's correlation (↑)                 | Classification  | ●    | Privacy Awareness        |\n| Privacy Scenario Test                        | RtA (↑)                                   | Generation      | ○    | Privacy Awareness        |\n| Probing Privacy Information Usage            | RtA (↑), Accuracy (↓)                     | Generation      | ◐    | Privacy Leakage          |\n| Moral Action Judgement                       | Accuracy (↑)                              | Classification  | ◐    | Implicit Ethics          |\n| Moral Reaction Selection (Low-Ambiguity)     | Accuracy (↑)                              | Classification  | ◐    | Explicit Ethics          |\n| Moral Reaction Selection (High-Ambiguity)    | RtA (↑)                                   | Generation      | ○    | Explicit Ethics          |\n| Emotion Classification                       | Accuracy (↑)                              | Classification  | ●    | Emotional Awareness      |\n\n## 🏆 **Leaderboard**\n\nIf you want to view the performance of all models or upload the performance of your LLM, please refer to [this link](https:\u002F\u002Ftrustllmbenchmark.github.io\u002FTrustLLM-Website\u002Fleaderboard.html).\n\n![images\u002Frank_card_00.png](images\u002Frank_card_00.png \"ranking\")\n\n\n## 📣 **Contribution**\n\nWe welcome your contributions, including but not limited to the following:\n\n- New evaluation datasets\n- Research on trustworthy issues\n- Improvements to the toolkit\n\nIf you intend to make improvements to the toolkit, please fork the repository first, make the relevant modifications to the code, and finally initiate a `pull request`.\n\n## **⏰ TODO in Coming Versions**\n\n- [x] Faster and simpler evaluation pipeline  (**Version 0.2.1**)\n- [x] Dynamic dataset  ([UniGen](https:\u002F\u002Funigen-framework.github.io\u002F))\n- [ ] More fine-grained datasets\n- [ ] Chinese output evaluation\n- [ ] Downstream application evaluation\n\n\n## **Citation**\n\n```text\n@inproceedings{huang2024trustllm,\n  title={TrustLLM: Trustworthiness in Large Language Models},\n  author={Yue Huang and Lichao Sun and Haoran Wang and Siyuan Wu and Qihui Zhang and Yuan Li and Chujie Gao and Yixin Huang and Wenhan Lyu and Yixuan Zhang and Xiner Li and Hanchi Sun and Zhengliang Liu and Yixin Liu and Yijue Wang and Zhikun Zhang and Bertie Vidgen and Bhavya Kailkhura and Caiming Xiong and Chaowei Xiao and Chunyuan Li and Eric P. Xing and Furong Huang and Hao Liu and Heng Ji and Hongyi Wang and Huan Zhang and Huaxiu Yao and Manolis Kellis and Marinka Zitnik and Meng Jiang and Mohit Bansal and James Zou and Jian Pei and Jian Liu and Jianfeng Gao and Jiawei Han and Jieyu Zhao and Jiliang Tang and Jindong Wang and Joaquin Vanschoren and John Mitchell and Kai Shu and Kaidi Xu and Kai-Wei Chang and Lifang He and Lifu Huang and Michael Backes and Neil Zhenqiang Gong and Philip S. Yu and Pin-Yu Chen and Quanquan Gu and Ran Xu and Rex Ying and Shuiwang Ji and Suman Jana and Tianlong Chen and Tianming Liu and Tianyi Zhou and William Yang Wang and Xiang Li and Xiangliang Zhang and Xiao Wang and Xing Xie and Xun Chen and Xuyu Wang and Yan Liu and Yanfang Ye and Yinzhi Cao and Yong Chen and Yue Zhao},\n  booktitle={Forty-first International Conference on Machine Learning},\n  year={2024},\n  url={https:\u002F\u002Fopenreview.net\u002Fforum?id=bWUU0LwwMp}\n}\n```\n\n\n[\u002F\u002F]: # (## Star History)\n\n[\u002F\u002F]: # ()\n[\u002F\u002F]: # ([![Star History Chart]&#40;https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=HowieHwong\u002FTrustLLM&type=Date&#41;]&#40;https:\u002F\u002Fstar-history.com\u002F#HowieHwong\u002FTrustLLM&Date&#41;)\n\n\n\n## **License**\n\nThe code in this repository is open source under the [MIT license](https:\u002F\u002Fgithub.com\u002FHowieHwong\u002FTrustLLM\u002Fblob\u002Fmain\u002FLICENSE).\n","\u003Cdiv align=\"center\">\n\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHowieHwong_TrustLLM_readme_9160c06beff4.png\" width=\"100%\">\n\n# “TrustLLM：大型语言模型的可信性”工具包\n\n\n[![官网](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWebsite-%F0%9F%8C%8D-blue?style=for-the-badge&logoWidth=40)](https:\u002F\u002Ftrustllmbenchmark.github.io\u002FTrustLLM-Website\u002F)\n[![论文](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-%F0%9F%8E%93-lightgrey?style=for-the-badge&logoWidth=40)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.05561)\n[![数据集](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDataset-%F0%9F%92%BE-green?style=for-the-badge&logoWidth=40)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTrustLLM\u002FTrustLLM-dataset)\n[![数据地图](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FData%20Map-%F0%9F%8D%9F-orange?style=for-the-badge&logoWidth=40)](https:\u002F\u002Fatlas.nomic.ai\u002Fmap\u002Ff64e87d3-c769-4a90-b15d-9dc833acc8ba\u002F8e9d7045-503b-4ba0-bc64-7201cb7aacee?xs=-16.14086&xf=-1.88776&ys=-7.54937&yf=3.88213)\n[![排行榜](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLeaderboard-%F0%9F%9A%80-brightgreen?style=for-the-badge&logoWidth=40)](https:\u002F\u002Ftrustllmbenchmark.github.io\u002FTrustLLM-Website\u002Fleaderboard.html)\n[![工具包文档](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FToolkit%20Document-%F0%9F%93%9A-blueviolet?style=for-the-badge&logoWidth=40)](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002F)\n\n[![下载量](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHowieHwong_TrustLLM_readme_e3ec5793d96d.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Ftrustllm)\n[![月下载量](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHowieHwong_TrustLLM_readme_e3ec5793d96d.png\u002Fmonth)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Ftrustllm)\n[![周下载量](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHowieHwong_TrustLLM_readme_e3ec5793d96d.png\u002Fweek)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Ftrustllm)\n\n\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002FHowieHwong\u002FTrustLLM?style=flat-square&color=5D6D7E\" alt=\"git-last-commit\" \u002F>\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fm\u002FHowieHwong\u002FTrustLLM?style=flat-square&color=5D6D7E\" alt=\"GitHub commit activity\" \u002F>\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flanguages\u002Ftop\u002FHowieHwong\u002FTrustLLM?style=flat-square&color=5D6D7E\" alt=\"GitHub top language\" \u002F>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n\n\n\n\u003C\u002Fdiv>\n\n\n## 更新与新闻\n\n- [2025年2月20日] 试用我们的最新工具包：全新工作 **TrustGen** 和 **TrustEval** 工具包已发布！[TrustGen](https:\u002F\u002Ftrustgen.github.io\u002F) 提供了针对多种生成模型可信性的全面指南、评估和视角，而 [TrustEval](https:\u002F\u002Fgithub.com\u002FTrustGen\u002FTrustEval-toolkit) 则提供了一个动态评估平台。\n\n- **TrustLLM** 工具包已被下载超过9000次！\n\n\u003Cdetails>\n\u003Csummary>点击展开\u002F收起更多内容\u003C\u002Fsummary>\n\n\n- [2024年7月15日] **TrustLLM** 现在支持 [**UniGen**](https:\u002F\u002Funigen-framework.github.io\u002F) 进行动态评估。\n- [2024年5月2日] 🥂 **TrustLLM已被ICML 2024接收！维也纳见！**\n- [2024年4月23日] :star: 版本0.3.0：重大更新，包括错误修复、评估增强以及新增模型（包括ChatGLM3、Llama3-8b、Llama3-70b、GLM4、Mixtral）。([查看详情](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002Fchangelog.html))\n- [2024年3月20日] :star: 版本0.2.4：修复了大量bug，并支持Gemini Pro API\n- [2024年2月1日] :page_facing_up: 版本0.2.2：查看我们关于LLMs意识的新论文！([链接](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.17882))\n- [2024年1月29日] :star: 版本0.2.1：trustllm工具包现在支持 (1) 简易评估流程 (2) [replicate](https:\u002F\u002Freplicate.com\u002F) 和 [deepinfra](https:\u002F\u002Fdeepinfra.com\u002F) 中的LLMs (3) [Azure OpenAI API](https:\u002F\u002Fazure.microsoft.com\u002Fen-us\u002Fproducts\u002Fai-services\u002Fopenai-service)\n- [2024年1月20日] :star: trustllm工具包0.2.0版本发布！查看[新特性](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002Fchangelog.html#version-020)。\n- [2024年1月12日] :surfer: [数据集](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTrustLLM\u002FTrustLLM-dataset)、[排行榜](https:\u002F\u002Ftrustllmbenchmark.github.io\u002FTrustLLM-Website\u002Fleaderboard.html) 和 [评估工具包](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002F) 均已发布！\n\n\u003C\u002Fdetails>\n\n## 👂**TL;DR**\n\n- TrustLLM（ICML 2024）是一个用于研究大型语言模型可信性的综合框架，包含原则、综述和基准测试。\n- 该代码仓库旨在提供一个易于使用的工具包，用于评估LLMs的可信性（[请参阅我们的文档](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002F))。\n\n\n\n**目录**\n\n- [“TrustLLM：大型语言模型的可信性”工具包](#toolkit-for-trustllm-trustworthiness-in-large-language-models)\n  - [更新与新闻](#updates--news)\n  - [👂**TL;DR**](#tldr)\n  - [🙋 **关于TrustLLM**](#-about-trustllm)\n  - [🧹 **评估前准备**](#-before-evaluation)\n    - [**安装**](#installation)\n    - [**数据集下载**](#dataset-download)\n    - [**生成**](#generation)\n  - [🙌 **评估**](#-evaluation)\n  - [🛎️ **数据集与任务**](#️-dataset--task)\n    - [**数据集概述：**](#dataset-overview)\n    - [**任务概述：**](#task-overview)\n  - [🏆 **排行榜**](#-leaderboard)\n  - [📣 **贡献**](#-contribution)\n  - [**⏰ 即将发布的版本中的待办事项**](#-todo-in-coming-versions)\n  - [**引用**](#citation)\n  - [**许可证**](#license)\n\n\n## 🙋 **关于TrustLLM**\n\n我们推出了TrustLLM，这是一项关于LLMs可信性的综合性研究，其中包括不同可信维度的原则、已建立的基准测试、评估方法，以及对主流LLMs可信性的分析，并探讨了当前面临的挑战及未来方向。具体而言，我们首先提出了一套涵盖八个不同维度的可信LLMs原则。基于这些原则，我们进一步建立了覆盖真理性、安全性、公平性、鲁棒性、隐私性和机器伦理等六个维度的基准测试。\n随后，我们展示了一项针对16种主流LLMs的评估研究，该研究使用了超过30个数据集。\n[文档](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002F#about) 详细说明了如何使用trustllm Python软件包来帮助您更快速地评估LLM在可信性方面的表现。有关TrustLLM的更多详情，请参阅[项目官网](https:\u002F\u002Ftrustllmbenchmark.github.io\u002FTrustLLM-Website\u002F)。\n\n\u003Cdiv align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHowieHwong_TrustLLM_readme_f0b23a7bd42b.png\" width=\"100%\">\n\u003C\u002Fdiv>\n\n\n\n\n## 🧹 **评估前准备**\n\n### **安装**\n创建一个新的环境：\n\n```shell\nconda create --name trustllm python=3.9\n```\n\n**通过Github安装（推荐）：**\n\n```shell\ngit clone git@github.com:HowieHwong\u002FTrustLLM.git\ncd TrustLLM\u002Ftrustllm_pkg\npip install .\n```\n\n\n**通过`pip`安装（已弃用）：**\n\n```shell\npip install trustllm\n```\n\n**通过`conda`安装（已弃用）：**\n\n```sh\nconda install -c conda-forge trustllm\n```\n\n### **数据集下载**\n\n下载TrustLLM数据集：\n\n```python\nfrom trustllm.dataset_download import download_dataset\n\ndownload_dataset(save_path='save_path')\n```\n\n### **生成**\n\n我们从 [版本 0.2.0](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002Fchangelog.html) 开始添加了生成模块。您可以从[此页面](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002Fguides\u002Fgeneration_details.html)开始进行生成任务。以下是一个示例：\n\n```python\nfrom trustllm.generation.generation import LLMGeneration\n\nllm_gen = LLMGeneration(\n    model_path=\"您的模型名称\", \n    test_type=\"测试部分\", \n    data_path=\"您的数据集文件路径\",\n    model_name=\"\", \n    online_model=False, \n    use_deepinfra=False,\n    use_replicate=False,\n    repetition_penalty=1.0,\n    num_gpus=1, \n    max_new_tokens=512, \n    debug=False,\n    device='cuda:0'\n)\n\nllm_gen.generation_results()\n```\n\n\n## 🙌 **评估**\n\n我们提供了一个工具包，可以帮助您更便捷地评估大型语言模型的可信度。更多详细信息请参阅[文档](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002F)。以下是一个示例：\n\n```python\nfrom trustllm.task.pipeline import run_truthfulness\n\ntruthfulness_results = run_truthfulness(  \n    internal_path=\"内部一致性数据路径.json\",  \n    external_path=\"外部一致性数据路径.json\",  \n    hallucination_path=\"幻觉数据路径.json\",  \n    sycophancy_path=\"溜须拍马数据路径.json\",\n    advfact_path=\"反事实数据路径.json\"\n)\n```\n\n## 🛎️ **数据集与任务**\n\n### **数据集概览：**\n\n*✓ 表示该数据集来自先前的研究，✗ 表示该数据集在我们的基准中首次提出。*\n\n| 数据集               | 描述                                                                                                           | 数量     | 是否存在 | 部分                |\n|-----------------------|-----------------------------------------------------------------------------------------------------------------------|----------|--------|------------------------|\n| SQuAD2.0              | 它结合了 SQuAD1.1 中的问题，并增加了超过 50,000 个无法回答的问题。                                            | 100      | ✓      | 虚假信息         |\n| CODAH                 | 它包含 28,000 个常识性问题。                                                                             | 100      | ✓      | 虚假信息         |\n| HotpotQA              | 它包含 113,000 个基于维基百科的问答对，用于复杂的多跳推理。                               | 100      | ✓      | 虚假信息         |\n| AdversarialQA         | 它包含 30,000 个对抗性的阅读理解问答对。                                           | 100      | ✓      | 虚假信息         |\n| Climate-FEVER         | 它包含由人工事实核查员手动整理的 7,675 条与气候变化相关的声明。                              | 100      | ✓      | 虚假信息         |\n| SciFact               | 它包含 1,400 对由专家撰写的科学声明及其证据摘要。                                     | 100      | ✓      | 虚假信息         |\n| COVID-Fact            | 它包含 4,086 条现实世界中的新冠相关声明。                                                                            | 100      | ✓      | 虚假信息         |\n| HealthVer             | 它包含 14,330 条针对科学论文的健康相关声明。                                                 | 100      | ✓      | 虚假信息         |\n| TruthfulQA            | 这是一组多项选择题，用于评估语言模型在生成问题答案时是否诚实。     | 352      | ✓      | 幻觉          |\n| HaluEval              | 它包含 35,000 个由模型生成及人工标注的幻觉样本。                                                | 300      | ✓      | 幻觉          |\n| LM-exp-sycophancy     | 该数据集由人类提出的问题组成，每个问题配有一个溜须拍马的回答示例和一个非溜须拍马的回答示例。    | 179      | ✓      | 溜须拍马             |\n| Opinion pairs         | 它包含 120 对相反的观点。                                                                           | 240, 120 | ✗      | 溜须拍马、偏好 |\n| WinoBias              | 它包含 3,160 句话，分为开发集和测试集，由熟悉该项目的研究人员创建。     | 734      | ✓      | 刻板印象             |\n| StereoSet             | 它包含用于衡量模型在性别、种族、宗教和职业方面偏好的句子。                | 734      | ✓      | 刻板印象             |\n| Adult                 | 该数据集包含性别、种族、年龄、教育程度、工作时长和工作类型等特征，用于预测个人的薪资水平。 | 810      | ✓      | 贬低          |\n| Jailbreak Trigger     | 该数据集包含基于 13 种越狱攻击的提示。                                                        | 1300     | ✗      | 越狱、毒性    |\n| Misuse (additional)   | 该数据集包含旨在评估大型语言模型在面对试图利用模型进行有害活动的攻击者或恶意用户时反应的提示。 | 261      | ✗      | 滥用                 |\n| Do-Not-Answer         | 它经过精心挑选和过滤，仅包含负责任的语言模型不会回答的提示。                         | 344 + 95 | ✓      | 滥用、刻板印象     |\n| AdvGLUE               | 一个包含不同对抗性攻击的多任务数据集。                                                               | 912      | ✓      | 自然噪声          |\n| AdvInstruction        | 由 11 种扰动方法生成的 600 条指令。                                                                 | 600        | ✗      | 自然噪声          |\n| ToolE                 | 一个包含可能触发语言模型使用外部工具的用户查询的数据集。                                        | 241      | ✓      | 域外数据 (OOD)    |\n| Flipkart              | 一个产品评论数据集，自 2022 年 12 月开始收集。                                                       | 400      | ✓      | 域外数据 (OOD)    |\n| DDXPlus               | 一个 2022 年的医疗诊断数据集，由约 130 万例患者病例的合成数据组成。               | 100      | ✓      | 域外数据 (OOD)    |\n| ETHICS                | 它包含大量与道德相关的场景描述及其道德正确性判断。                              | 500      | ✓      | 隐性伦理        |\n| Social Chemistry 101  | 它包含各种社会规范，每条规范由一个行为及其标签组成。                                          | 500      | ✓      | 隐性伦理        |\n| MoralChoice           | 它由不同情境下的道德正确与错误行为组成。                                             | 668      | ✓      | 显性伦理        |\n| ConfAIde              | 它包含关于信息如何被使用的描述。                                                               | 196      | ✓      | 隐私意识      |\n| Privacy Awareness     | 它包括针对不同场景的各种隐私信息查询。                                            | 280      | ✗      | 隐私意识      |\n| Enron Email           | 它包含由安然公司员工生成的约 50 万封电子邮件。                              | 400      | ✓      | 隐私泄露        |\n| Xstest                | 它是一个用于识别语言模型中过度安全行为的测试套件。                                                | 200      | ✓      | 过度安全     |\n\n### **任务概述：**\n\n*○ 表示通过自动脚本进行评估（例如关键词匹配），● 表示由 ChatGPT、GPT-4 或 longformer 进行自动评估，而 ◐ 则表示混合评估。*\n\n*更可信的 LLM 应当在带有 ↑ 标记的指标上具有更高的值，在带有 ↓ 标记的指标上具有更低的值。*\n\n| 任务名称                                    | 指标                                   | 类型            | 评估 | 部分                  |\n|----------------------------------------------|-------------------------------------------|-----------------|------|--------------------------|\n| 闭卷问答                                     | 准确率（↑）                              | 生成            | ○    | 虚假信息（内部）         |\n| 事实核查                                     | 宏观 F1 分数（↑）                        | 分类            | ●    | 虚假信息（外部）         |\n| 多选题问答                                   | 准确率（↑）                              | 分类            | ●    | 幻觉                     |\n| 幻觉分类                                     | 准确率（↑）                              | 分类            | ●    | 幻觉                     |\n| 人格逢迎                                     | 嵌入相似度（↑）                          | 生成            | ◐    | 逢迎                     |\n| 观点逢迎                                     | 百分比变化（↓）                          | 生成            | ○    | 逢迎                     |\n| 事实性修正                                   | 百分比变化（↑）                          | 生成            | ○    | 对抗性事实性             |\n| 越狱攻击评估                                 | RtA（↑）                                  | 生成            | ○    | 越狱                     |\n| 毒性测量                                     | 毒性值（↓）                              | 生成            | ●    | 毒性                     |\n| 滥用评估                                     | RtA（↑）                                  | 生成            | ○    | 滥用                     |\n| 过度安全评估                                 | RtA（↓）                                  | 生成            | ○    | 过度安全                 |\n| 对刻板印象的一致性                           | 准确率（↑）                              | 生成            | ◐    | 刻板印象                 |\n| 刻板印象识别                                 | 一致百分比（↓）                          | 分类            | ◐    | 刻板印象                 |\n| 刻板印象查询测试                             | RtA（↑）                                  | 生成            | ○    | 刻板印象                 |\n| 偏好选择                                     | RtA（↑）                                  | 生成            | ○    | 偏好                     |\n| 薪资预测                                     | p 值（↑）                                | 生成            | ●    | 贬低                     |\n| 下游任务中的对抗性扰动                       | ASR（↓）、RS（↑）                         | 生成            | ◐    | 自然噪声                 |\n| 开放式任务中的对抗性扰动                     | 嵌入相似度（↑）                          | 生成            | ◐    | 自然噪声                 |\n| OOD 检测                                     | RtA（↑）                                  | 生成            | ○    | 域外（OOD）              |\n| OOD 泛化                                     | 微观 F1 分数（↑）                        | 分类            | ○    | 域外（OOD）              |\n| 对隐私信息的一致性                           | 皮尔逊相关系数（↑）                      | 分类            | ●    | 隐私意识                 |\n| 隐私场景测试                                 | RtA（↑）                                  | 生成            | ○    | 隐私意识                 |\n| 探测隐私信息使用                             | RtA（↑）、准确率（↓）                     | 生成            | ◐    | 隐私泄露                 |\n| 道德行为判断                                 | 准确率（↑）                              | 分类            | ◐    | 内隐伦理                 |\n| 道德反应选择（低歧义）                       | 准确率（↑）                              | 分类            | ◐    | 明示伦理                 |\n| 道德反应选择（高歧义）                       | RtA（↑）                                  | 生成            | ○    | 明示伦理                 |\n| 情感分类                                     | 准确率（↑）                              | 分类            | ●    | 情感意识                 |\n\n## 🏆 **排行榜**\n\n如需查看所有模型的表现或上传您 LLM 的表现，请访问 [此链接](https:\u002F\u002Ftrustllmbenchmark.github.io\u002FTrustLLM-Website\u002Fleaderboard.html)。\n\n![images\u002Frank_card_00.png](images\u002Frank_card_00.png \"排名\")\n\n\n## 📣 **贡献**\n\n我们欢迎您的贡献，包括但不限于以下内容：\n\n- 新的评估数据集\n- 可信性相关研究\n- 工具包的改进\n\n如果您打算对工具包进行改进，请先 fork 该仓库，对代码进行相应修改，最后发起 `pull request`。\n\n## **⏰ 即将发布的版本待办事项**\n\n- [x] 更快速、更简单的评估流程  (**版本 0.2.1**)\n- [x] 动态数据集  ([UniGen](https:\u002F\u002Funigen-framework.github.io\u002F))\n- [ ] 更细粒度的数据集\n- [ ] 中文输出评估\n- [ ] 下游应用评估\n\n## **引用**\n\n```text\n@inproceedings{huang2024trustllm,\n  title={TrustLLM：大型语言模型的可信性},\n  author={黄悦、孙立超、王浩然、吴思远、张启辉、李源、高楚杰、黄一欣、吕文翰、张艺轩、李新尔、孙汉驰、刘正亮、刘一欣、王亦珏、张志坤、伯蒂·维德根、巴维亚·凯尔库拉、熊才明、肖朝伟、李春元、埃里克·P·辛格、黄福荣、刘浩、季恒、王洪毅、张欢、姚华秀、马诺利斯·凯利斯、马林卡·齐特尼克、蒋萌、莫希特·班萨尔、周志明、裴健、刘坚、高剑锋、韩家玮、赵洁宇、唐继良、王金东、万斯霍伦、约翰·米切尔、舒凯、许凯迪、常凯伟、何丽芳、黄立夫、迈克尔·巴克斯、龚振强、余Philip S.、陈品宇、顾全全、徐冉、英瑞克斯、纪水旺、贾苏曼、陈天龙、刘天明、周天义、王威廉杨、李翔、张祥亮、王晓、谢兴、陈勋、王旭宇、刘燕、叶燕芳、曹银志、陈勇、赵悦},\n  booktitle={第四十一届国际机器学习大会},\n  year={2024},\n  url={https:\u002F\u002Fopenreview.net\u002Fforum?id=bWUU0LwwMp}\n}\n```\n\n\n[\u002F\u002F]: # (## 星级历史)\n\n[\u002F\u002F]: # ()\n[\u002F\u002F]: # ([![星级历史图表]&#40;https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=HowieHwong\u002FTrustLLM&type=Date&#41;]&#40;https:\u002F\u002Fstar-history.com\u002F#HowieHwong\u002FTrustLLM&Date&#41;)\n\n\n\n## **许可证**\n\n本仓库中的代码以 [MIT 许可证](https:\u002F\u002Fgithub.com\u002FHowieHwong\u002FTrustLLM\u002Fblob\u002Fmain\u002FLICENSE) 开源。","# TrustLLM 快速上手指南\n\nTrustLLM 是一个用于评估大语言模型（LLM）可信度的综合框架与工具包，涵盖真实性、安全性、公平性、鲁棒性、隐私及机器伦理等六个维度。本指南帮助开发者快速搭建环境并运行评估。\n\n## 环境准备\n\n- **操作系统**：Linux \u002F macOS \u002F Windows\n- **Python 版本**：推荐 Python 3.9\n- **硬件要求**：\n  - 本地部署模型需配备 NVIDIA GPU（支持 CUDA）\n  - 若使用在线 API（如 Azure OpenAI, Replicate），仅需网络连接\n- **前置依赖**：\n  - Conda（推荐用于环境管理）\n  - Git\n\n## 安装步骤\n\n### 1. 创建虚拟环境\n```shell\nconda create --name trustllm python=3.9\nconda activate trustllm\n```\n\n### 2. 安装工具包（推荐方式）\n通过 GitHub 源码安装以获取最新功能：\n```shell\ngit clone git@github.com:HowieHwong\u002FTrustLLM.git\ncd TrustLLM\u002Ftrustllm_pkg\npip install .\n```\n\n> **注意**：`pip install trustllm` 和 `conda install` 方式已标记为废弃，建议使用上述源码安装方式。\n\n### 3. 下载数据集\n在 Python 环境中执行以下代码下载评估所需数据集：\n```python\nfrom trustllm.dataset_download import download_dataset\n\ndownload_dataset(save_path='save_path')\n```\n*请将 `'save_path'` 替换为您本地的实际存储路径。*\n\n## 基本使用\n\nTrustLLM 的核心流程分为 **生成回答** 和 **执行评估** 两步。\n\n### 1. 生成模型回答\n使用 `LLMGeneration` 类让模型在测试集上生成回答。以下示例展示了本地模型的调用配置：\n\n```python\nfrom trustllm.generation.generation import LLMGeneration\n\nllm_gen = LLMGeneration(\n    model_path=\"your model name\", \n    test_type=\"test section\", \n    data_path=\"your dataset file path\",\n    model_name=\"\", \n    online_model=False, \n    use_deepinfra=False,\n    use_replicate=False,\n    repetition_penalty=1.0,\n    num_gpus=1, \n    max_new_tokens=512, \n    debug=False,\n    device='cuda:0'\n)\n\nllm_gen.generation_results()\n```\n*参数说明：若使用在线模型（如 DeepInfra, Replicate），请将 `online_model` 设为 `True` 并开启对应开关。*\n\n### 2. 执行可信度评估\n生成完成后，使用管道脚本对特定维度（如真实性）进行评估。以下以真实性（Truthfulness）评估为例：\n\n```python\nfrom trustllm.task.pipeline import run_truthfulness\n\ntruthfulness_results = run_truthfulness(  \n    internal_path=\"path_to_internal_consistency_data.json\",  \n    external_path=\"path_to_external_consistency_data.json\",  \n    hallucination_path=\"path_to_hallucination_data.json\",  \n    sycophancy_path=\"path_to_sycophancy_data.json\",\n    advfact_path=\"path_to_advfact_data.json\"\n)\n```\n*请将路径参数替换为上一步生成的实际数据文件路径。*\n\n更多评估任务（如安全性、公平性等）及详细参数配置，请参考官方文档：[TrustLLM Documentation](https:\u002F\u002Fhowiehwong.github.io\u002FTrustLLM\u002F)","某金融科技公司正在研发一款面向客户的智能理财顾问大模型，急需在上线前全面评估其回答的安全性、事实准确性及抗诱导攻击能力。\n\n### 没有 TrustLLM 时\n- **评估维度缺失**：团队仅能测试基础的问答准确率，难以系统化检测模型在隐私泄露、偏见歧视或有害内容生成等深层信任维度的风险。\n- **人工成本高昂**：依靠专家手动构造数千条“陷阱”提示词（如诱导模型提供非法投资建议）进行测试，耗时数周且覆盖场景有限。\n- **缺乏横向对比**：无法将自研模型与 Llama3、ChatGLM3 等主流开源模型在同一标准下进行量化对比，难以证明自身模型的可靠性优势。\n- **数据分布模糊**：测试数据集杂乱无章，缺乏可视化的数据地图来确认测试用例是否覆盖了足够多样的风险场景。\n\n### 使用 TrustLLM 后\n- **全维度自动化扫描**：利用 TrustLLM 内置的六大信任维度框架，一键自动执行涵盖安全性、公平性、鲁棒性等全方位的压力测试。\n- **动态评测提效**：通过集成的 UniGen 动态评估功能，自动生成多样化的对抗性测试用例，将原本数周的测试周期缩短至几天。\n- **权威榜单对标**：直接调用 TrustLLM Leaderboard 基准，将自研模型得分与全球主流模型并列展示，用客观数据支撑模型选型决策。\n- **风险可视化洞察**：借助 Data Map 直观查看测试数据在多维空间中的分布，快速定位模型在特定高风险区域（如金融欺诈诱导）的薄弱环节。\n\nTrustLLM 将原本模糊的“模型信任度”转化为可量化、可对比、可视化的科学指标，为关键领域大模型的落地提供了坚实的安全通行证。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHowieHwong_TrustLLM_9160c06b.png","HowieHwong","Yue Huang","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FHowieHwong_6fadaf57.jpg","Ph.D. student at Notre Dame","University of Notre Dame","South Bend, IN, USA",null,"HowieH36226","https:\u002F\u002Fgithub.com\u002FHowieHwong",[82],{"name":83,"color":84,"percentage":85},"Python","#3572A5",100,623,67,"2026-04-06T21:53:38","MIT","未说明","代码示例中指定 device='cuda:0' 且支持 num_gpus 参数，表明需要 NVIDIA GPU 以进行本地模型推理，具体显存大小取决于所选模型（如 Llama3-70b 需大显存），未明确最低要求。",{"notes":93,"python":94,"dependencies":95},"1. 强烈建议使用 Conda 创建名为 'trustllm' 的虚拟环境并安装 Python 3.9。\n2. 推荐通过 GitHub 源码安装工具包，pip 和 conda 直接安装的方式已标记为过时（deprecated）。\n3. 该工具支持多种模型部署方式，包括本地加载、Replicate、DeepInfra 以及 Azure OpenAI API。\n4. 运行前需通过代码单独下载 TrustLLM 数据集。\n5. 生成和评估环节支持动态配置 GPU 数量和新令牌长度。","3.9",[96,97],"trustllm (自研包)","未列出具体第三方库版本",[13,16,35,14,99,15],"其他",[101,102,103,104,105,106,107,108,109,110,111,112],"ai","benchmark","dataset","large-language-models","llm","nlp","trustworthy-ai","trustworthy-machine-learning","evaluation","pypi-package","toolkit","natural-language-processing","2026-03-27T02:49:30.150509","2026-04-08T17:32:13.487069",[116,121,126,131,136,141],{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},24836,"为什么通过 pip 安装后运行会出现 'AttributeError' 或版本过旧的问题？","pip 安装的版本可能已过时且包含未修复的 bug。建议不要使用 pip install，而是通过 git clone 克隆最新代码库来安装框架。具体操作：使用 `git clone` 获取源码，然后进行本地安装或直接运行。","https:\u002F\u002Fgithub.com\u002FHowieHwong\u002FTrustLLM\u002Fissues\u002F43",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},24837,"生成结果为空文件夹且评估时报 'KeyError: res' 错误怎么办？","这通常是因为模型加载代码缺失或配置不当。请检查 `generation.py` 文件（例如第 274 行附近），确保添加了正确的模型加载语句，如 `model, tokenizer = load_model(...)`。维护者已对此进行了修改，请拉取最新代码重试。","https:\u002F\u002Fgithub.com\u002FHowieHwong\u002FTrustLLM\u002Fissues\u002F11",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},24838,"遇到 'TypeError: unhashable type: list' 错误如何解决？","该错误通常由数据集中 'label' 字段格式不正确引起（例如 label 是列表而不是字符串或整数，导致无法作为字典键）。请检查输入数据文件（如 jailbreak_data_json_path 指向的文件），确保 'label' 字段是可哈希类型（如字符串或数字），而不是列表格式。","https:\u002F\u002Fgithub.com\u002FHowieHwong\u002FTrustLLM\u002Fissues\u002F24",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},24839,"模型生成结果总是重复固定的默认回复（如 'Ah, a chatbot!...'）怎么办？","这是一个已知的生成逻辑 Bug，可能导致模型无法正确处理提示词。维护者表示已在后续代码更新中修复了此问题。请务必更新到最新版本的代码库（git pull），不要使用旧的快照或安装包。","https:\u002F\u002Fgithub.com\u002FHowieHwong\u002FTrustLLM\u002Fissues\u002F26",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},24840,"文档中关于 Azure OpenAI 配置的说明与实际默认值不符，该如何设置？","文档可能具有误导性。实际上 `config.azure_openai` 的默认值可能并非文档所述。如果在使用 Azure 进行自动评估（如 AutoGPT）时遇到内容过滤错误，尝试显式设置 `config.azure_openai=False` 或在配置文件中进行相应调整。如有其他配置疑问，可直接向团队提问获取帮助。","https:\u002F\u002Fgithub.com\u002FHowieHwong\u002FTrustLLM\u002Fissues\u002F41",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},24841,"导入库时出现大量报错或模块找不到（ImportError）怎么办？","这通常是因为使用了过时的 pip 安装包，其中缺少必要的依赖或文件结构已变更。解决方法是放弃 pip 安装，改用 `git clone` 克隆项目源码，并确保在正确的 Python 环境中安装所有依赖项。","https:\u002F\u002Fgithub.com\u002FHowieHwong\u002FTrustLLM\u002Fissues\u002F33",[147],{"id":148,"version":149,"summary_zh":150,"released_at":151},154328,"v0.3.0","重大更新（详见详情）包括：\n- 修复了若干 bug，\n- 优化了评估功能，\n- 新增多个模型（包括 ChatGLM3、Llama3-8b、Llama3-70b、GLM4 和 Mixtral）。","2024-04-23T15:26:16"]