[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tool-THU-BPM--MarkLLM":3,"similar-THU-BPM--MarkLLM":100},{"id":4,"github_repo":5,"name":6,"description_en":7,"description_zh":8,"ai_summary_zh":8,"readme_en":9,"readme_zh":10,"quickstart_zh":11,"use_case_zh":12,"hero_image_url":13,"owner_login":14,"owner_name":15,"owner_avatar_url":16,"owner_bio":17,"owner_company":18,"owner_location":18,"owner_email":18,"owner_twitter":18,"owner_website":18,"owner_url":19,"languages":20,"stars":36,"forks":37,"last_commit_at":38,"license":39,"difficulty_score":40,"env_os":41,"env_gpu":42,"env_ram":41,"env_deps":43,"category_tags":50,"github_topics":53,"view_count":60,"oss_zip_url":18,"oss_zip_packed_at":18,"status":61,"created_at":62,"updated_at":63,"faqs":64,"releases":99},6458,"THU-BPM\u002FMarkLLM","MarkLLM","[EMNLP 2024 Demo] MarkLLM: An Open-Source Toolkit for LLM Watermarking","MarkLLM 是一款专为大型语言模型（LLM）文本水印技术打造的开源工具包，旨在帮助开发者轻松实现生成内容的溯源与版权保护。随着 AI 生成文本的泛滥，如何区分人类创作与机器生成内容成为行业难题，MarkLLM 通过集成多种先进的水印算法，让模型在输出文本时嵌入难以察觉的标记，从而有效解决内容归属验证和防止滥用等问题。\n\n该工具特别适合人工智能研究人员、大模型开发者以及关注内容安全的企业技术团队使用。无论是想要复现前沿论文算法的学者，还是需要在产品中落地水印功能的工程师，都能通过 MarkLLM 快速上手。其核心亮点在于高度的模块化设计，不仅支持多种主流水印方案的即插即用，还提供了从水印嵌入到检测验证的完整流程演示。此外，项目背后团队在 ICLR、ACL 等顶会上发表了多篇关于语义鲁棒性水印和防伪造水印的研究成果，这些前沿技术也在工具中得到了体现或参考。MarkLLM 致力于降低技术门槛，推动文本水印技术在社区中的普及与应用，让 AI 生成内容更加透明可信。","\u003Cdiv align=\"center\">\r\n\r\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FTHU-BPM_MarkLLM_readme_f308cec3ec87.jpg\" style=\"width: 40%;\"\u002F>\r\n\r\n# An Open-Source Toolkit for LLM Watermarking\r\n\r\n[![Homepage](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHomepage-5F259F?style=for-the-badge&logo=homepage&logoColor=white)](https:\u002F\u002Fgenerative-watermark.github.io\u002F)\r\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.10051) [![HF Models](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHF--Models-%23FFD14D?style=for-the-badge&logo=huggingface&logoColor=black)](https:\u002F\u002Fhuggingface.co\u002FGenerative-Watermark-Toolkits)  [![EMNLP](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEMNLP--Demo-%230C2E82.svg?style=for-the-badge&logo=conferene&logoColor=white)](https:\u002F\u002Faclanthology.org\u002F2024.emnlp-demo.7\u002F) [![colab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGoogle--Colab-%23D97700?style=for-the-badge&logo=Google-colab&logoColor=white)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F169MS4dY6fKNPZ7-92ETz1bAm_xyNAs0B?usp=sharing) [![video](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FVideo--Description-%23000000?style=for-the-badge&logo=Airplay-Video&logoColor=white)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=QN3BhNvw14E&)\r\n\r\n\u003C\u002Fdiv>\r\n\r\n\r\n> 🎉 **We welcome PRs!** If you have implemented a LLM watermarking algorithm or are interested in contributing one, we'd love to include it in MarkLLM. Join our community and help make text watermarking more accessible to everyone!\r\n\r\n> 🔥 If you are interested in watermarking for diffusion models (image\u002Fvideo watermark), please refer to the [MarkDiffusion](https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FMarkDiffusion\u002F) toolkit from our group.\r\n\r\n### 💡 Some other watermark papers from our team that may interest you ✨\r\n\r\n1. [\u003Cu>**(ICLR 2024) A Semantic Invariant Robust Watermark for Large Language Models\u003C\u002Fu>**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.06356)\r\n   \r\n   \u003Cspan style=\"color:gray\">Aiwei Liu, Leyi Pan, Xuming Hu, Shiao Meng, Lijie Wen\u003C\u002Fspan>\r\n   \r\n   [![GitHub Stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTHU-BPM\u002FRobust_Watermark?style=social&logo=github)](https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FRobust_Watermark)\r\n     [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2310.06356-red)](#)\r\n\r\n2. [\u003Cu>**(ICLR 2024) An Unforgeable Publicly Verifiable Watermark for Large Language Models\u003C\u002Fu>**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.16230)\r\n   \r\n   \u003Cspan style=\"color:gray\">Aiwei Liu, Leyi Pan, Xuming Hu, Shu'ang Li, Lijie Wen, Irwin King, Philip S. Yu\u003C\u002Fspan>\r\n   \r\n   [![GitHub Stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTHU-BPM\u002Funforgeable_watermark?style=social&logo=github)](https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002Funforgeable_watermark)\r\n     [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2307.16230-red)](#)\r\n\r\n3. [\u003Cu>**(ACM Computing Surveys) A Survey of Text Watermarking in the Era of Large Language Models\u003C\u002Fu>**](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3691626)\r\n   \r\n   \u003Cspan style=\"color:gray\">Aiwei Liu*, Leyi Pan*, Yijian Lu, Jingjing Li, Xuming Hu, Xi Zhang, Lijie Wen, Irwin King, Hui Xiong, Philip S. Yu\u003C\u002Fspan>\r\n   \r\n   [![Home](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHome-Text_Watermarking_Survey-blue?style=flat&logo=html5)](https:\u002F\u002Fsurvey-text-watermark.github.io\u002F)\r\n     [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2312.07913-red)](#)\r\n\r\n4. [\u003Cu>**(ICLR 2025 Spotlight) Can Watermarked LLMs be Identified by Users via Crafted Prompts?\u003C\u002Fu>**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.03168)\r\n   \r\n   \u003Cspan style=\"color:gray\">Aiwei Liu, Sheng Guan, Yiming Liu, Leyi Pan, Yifei Zhang, Liancheng Fang, Lijie Wen, Philip S. Yu, Xuming Hu\u003C\u002Fspan>\r\n   \r\n   [![GitHub Stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTHU-BPM\u002FWatermarked_LLM_Identification?style=social&logo=github)](https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FWatermarked_LLM_Identification)\r\n     [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2410.03168-red)](#)\r\n\r\n5. [\u003Cu>**(ACL 2025 Main) Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?\u003C\u002Fu>**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.11598)\r\n   \r\n   \u003Cspan style=\"color:gray\">Leyi Pan, Aiwei Liu, Shiyu Huang, Yijian Lu, Xuming Hu, Lijie Wen, Irwin King, Philip S. Yu\u003C\u002Fspan>\r\n   \r\n   [![GitHub Stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTHU-BPM\u002FWatermark-Radioactivity-Attack?style=social&logo=github)](https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FWatermark-Radioactivity-Attack)\r\n     [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2502.11598-red)](#)\r\n\r\n6. [\u003Cu>**(NAACL 2025 Findings) WaterSeeker: Pioneering Efficient Detection of Watermarked Segments in Large Documents\u003C\u002Fu>**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.05112)\r\n   \r\n   \u003Cspan style=\"color:gray\">Leyi Pan, Aiwei Liu, Yijian Lu, Zitian Gao, Yichen Di, Lijie Wen, Irwin King, Philip S. Yu\u003C\u002Fspan>\r\n   \r\n   [![GitHub Stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTHU-BPM\u002FWaterSeeker?style=social&logo=github)](https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FWaterSeeker)\r\n     [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2409.05112-red)](#)\r\n\r\n7. [\u003Cu>**(ACL 2024 Main) An Entropy-based Text Watermarking Detection Method\u003C\u002Fu>**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.13485)\r\n   \r\n   \u003Cspan style=\"color:gray\">Yijian Lu, AIwei Liu, Dianzhi Yu, Jingjing Li, Irwin King\u003C\u002Fspan>\r\n   \r\n   [![GitHub Stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fluyijian3\u002FEWD?style=social&logo=github)](https:\u002F\u002Fgithub.com\u002Fluyijian3\u002FEWD)\r\n     [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2403.13485-red)](#)\r\n  \r\n8. [\u003Cu>**(ACL 2024 Main) Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models\u003C\u002Fu>**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.14007)\r\n   \r\n   \u003Cspan style=\"color:gray\">Zhiwei He, Binglin Zhou, Hongkun Hao, Aiwei Liu, Xing Wang, Zhaopeng Tu, Zhuosheng Zhang, Rui Wang\u003C\u002Fspan>\r\n   \r\n   [![GitHub Stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fzwhe99\u002FX-SIR?style=social&logo=github)](https:\u002F\u002Fgithub.com\u002Fzwhe99\u002FX-SIR)\r\n     [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2402.14007-red)](#)\r\n\r\n\r\n### Contents\r\n\r\n- [MarkLLM: An Open-Source Toolkit for LLM Watermarking](#markllm-an-open-source-toolkit-for-llm-watermarking)\r\n    - [Contents](#contents)\r\n    - [Notes](#notes)\r\n    - [Updates](#updates)\r\n    - [Introduction to MarkLLM](#introduction-to-markllm)\r\n      - [Overview](#overview)\r\n      - [Key Features of MarkLLM](#key-features-of-markllm)\r\n    - [How to use the toolkit in your own code](#how-to-use-the-toolkit-in-your-own-code)\r\n      - [Setting up the environment](#setting-up-the-environment)\r\n      - [Invoking watermarking algorithms](#invoking-watermarking-algorithms)\r\n      - [Visualizing mechanisms](#visualizing-mechanisms)\r\n      - [Applying evaluation pipelines](#applying-evaluation-pipelines)\r\n    - [More user examples](#more-user-examples)\r\n    - [Demo jupyter notebooks](#demo-jupyter-notebooks)\r\n    - [Citations](#citations)\r\n\r\n### ❗❗❗ Notes\r\nAs the MarkLLM repository content becomes increasingly rich and its size grows larger, we have created a model storage repository on Hugging Face called [Generative-Watermark-Toolkits](https:\u002F\u002Fhuggingface.co\u002FGenerative-Watermark-Toolkits) to facilitate usage. This repository contains various default models for watermarking algorithms that involve self-trained models. We have removed the model weights from the corresponding `model\u002F` folders of these watermarking algorithms in the main repository. **When using the code, please first download the corresponding models from the Hugging Face repository according to the config paths and save them to the `model\u002F` directory before running the code.**\r\n\r\n### Updates\r\n- 🎉 **(2025.09.22)** Add [SemStamp](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.03991) watermark method. Thanks Huan Wang for her PR!\r\n- 🎉 **(2025.09.17)** Add [IE](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.14112) watermark method. Thanks Tianle Gu for her PR!\r\n- 🎉 **(2025.09.14)** Add [Watermark Stealing](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.19361) attack method. Thanks Shuhao Zhang for his PR!\r\n- 🎉 **(2025.07.17)** Add [k-SemStamp](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.11399) watermarking method. Thanks Huan Wang for her PR!\r\n- 🎉 **(2025.07.17)** Add [Adaptive Watermark](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.13927) watermarking method. Thanks Yepeng Liu for his PR!\r\n- 🎉 **(2025.05.24)** Add [MorphMark](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.11541) watermarking method. Thanks Zongqi Wang for his PR!\r\n- 🎉 **(2025.03.12)** Add [Permute-and-Flip](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.05864) (PF) watermarking method. Thanks Zian Wang for his PR!\r\n- 🎉 **(2025.02.27)** Add δ-reweight and LLR score detection for Unbiased watermarking method.\r\n- 🎉 **(2025.01.08)** Add AutoConfiguration for watermarking methods.\r\n- 🎉 **(2024.12.21)** Provide example code for integrating VLLM with MarkLLM in `MarkvLLM_demo.py`. Thanks to @zhangjf-nlp for his PR!\r\n- 🎉 **(2024.11.21)** Support distortionary version of [SynthID-Text](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-024-08025-4) method (Nature). \r\n- 🎉 **(2024.11.03)** Add [SynthID-Text](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-024-08025-4) method (Nature) and support detection methods including mean, weighted mean, and bayesian. \r\n- 🎉 **(2024.11.01)** Add [TS-Watermark](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.18059) method (ICML 2024). Thanks to Kyle Zheng and Minjia Huo for their PR! \r\n- 🎉 **(2024.10.07)** Provide an alternative, equivalent implementation of the EXP watermarking algorithm (**EXPGumbel**) utilizing Gumbel noise. With this implementation, users should be able to modify the watermark strength by adjusting the sampling temperature in the configuration file.\r\n- 🎉 **(2024.10.07)** Add [Unbiased](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10669) watermarking method.\r\n- 🎉 **(2024.10.06)** We are excited to announce that our paper \"MarkLLM: An Open-Source Toolkit for LLM Watermarking\" has been accepted by **EMNLP 2024 Demo**!\r\n- 🎉 **(2024.08.08)** Add [DiPmark](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.07710) watermarking method. Thanks to Sheng Guan for his PR!\r\n- 🎉 **(2024.08.01)** Released as a [python package](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmarkllm\u002F)! Try `pip install markllm`. We provide a user example at the end of this file.\r\n- 🎉 **(2024.07.13)** Add ITSEdit watermarking method. Thanks to Yiming Liu for his PR!\r\n- 🎉 **(2024.07.09)** Add more hashing schemes for KGW (skip, min, additive, selfhash). Thanks to Yichen Di for his PR!\r\n- 🎉 **(2024.07.08)** Add top-k filter for watermarking methods in Christ family. Thanks to Kai Shi for his PR!\r\n- 🎉 **(2024.07.03)** Updated Back-Translation Attack. Thanks to Zihan Tang for his PR!\r\n- 🎉 **(2024.06.19)** Updated Random Walk Attack from the impossibility results of strong watermarking [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.04378) at [ICML](https:\u002F\u002Fopenreview.net\u002Fpdf\u002Fc85c77848c1a0a1a53da8fb873d2b27c5b8509c1.pdf), 2024. ([Blog](https:\u002F\u002Fkempnerinstitute.harvard.edu\u002Fresearch\u002Fdeeper-learning\u002Fwatermarking-in-the-sand\u002F)). Thanks to Hanlin Zhang for his PR!\r\n- 🎉 **(2024.05.23)** We're thrilled to announce the release of our website demo!\r\n\r\n### Introduction to MarkLLM\r\n\r\n#### Overview\r\n\r\nMarkLLM is an open-source toolkit developed to facilitate the research and application of watermarking technologies within large language models (LLMs). As the use of large language models (LLMs) expands, ensuring the authenticity and origin of machine-generated text becomes critical. MarkLLM simplifies the access, understanding, and assessment of watermarking technologies, making it accessible to both researchers and the broader community.\r\n\r\n\u003Cimg src=\"images\\overview.png\"  alt=\"overview\" style=\"zoom:35%;\" \u002F>\r\n\r\n#### Key Features of MarkLLM\r\n\r\n- **Implementation Framework:** MarkLLM provides a unified and extensible platform for the implementation of various LLM watermarking algorithms. It currently supports nine specific algorithms from two prominent families, facilitating the integration and expansion of watermarking techniques.\r\n\r\n  **Framework Design**:\r\n\r\n  \u003Cdiv align=\"center\">\r\n      \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FTHU-BPM_MarkLLM_readme_7fbd1e78e2dd.png\" alt=\"unified_implementation\" width=\"400\"\u002F>\r\n  \u003C\u002Fdiv>\r\n\r\n  **Currently Supported Algorithms:**\r\n\r\n  | Algorithm Name     | Publication      | Link                                                                                                                                                                       |\r\n  | ------------------ | ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\r\n  | KGW                | ICML 2023    | [\\[2301.10226\\] A Watermark for Large Language Models (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.10226)                                                                            |\r\n  | Unigram            | ICLR 2024    | [\\[2306.17439\\] Provable Robust Watermarking for AI-Generated Text (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.17439)                                                               |\r\n  | SWEET              | ACL 2024    | [\\[2305.15060\\] Who Wrote this Code? Watermarking for Code Generation (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.15060)                                                            |\r\n  | UPV                | ICLR 2024    | [\\[2307.16230\\] An Unforgeable Publicly Verifiable Watermark for Large Language Models (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.16230)                                           |\r\n  | EWD                | ACL 2024    | [\\[2403.13485\\] An Entropy-based Text Watermarking Detection Method (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.13485)                                                              |\r\n  | SIR                | ICLR 2024    | [\\[2310.06356\\] A Semantic Invariant Robust Watermark for Large Language Models (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.06356)                                                  |\r\n  | X-SIR              | ACL 2024    | [\\[2402.14007\\] Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.14007) |\r\n  | DiPmark            | ICML 2024    | [\\[2310.07710\\] A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.07710)                           |\r\n  | Unbiased Watermark | ICLR 2024    | [\\[2310.10669\\] Unbiased Watermark for Large Language Models (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10669)                                                                     |\r\n  | TS Watermark | ICML 2024    | [\\[2402.18059\\] Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.18059)                                                                     |\r\n  | SynthID-Text | Nature 2024   | [Scalable Watermarking for Identifying Large Language Model Outputs (*Nature*)](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-024-08025-4)                                                                     |\r\n  | PF Watermark | ICLR 2025   | [\\[2402.05864\\] Permute-and-Flip: An Optimally Stable and Watermarkable Decoder for LLMs](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.05864)  \r\n  | MorphMark | ACL 2025   | [\\[2505.11541\\] MorphMark: Flexible Adaptive Watermarking for Large Language Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.11541)                                                                     |\r\n  | Adaptive Watermark | ICML 2024   | [\\[2401.13927\\] Adaptive Text Watermark for Large Language Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.13927) |\r\n  | SemStamp | NAACL 2024  | [\\[2310.03991\\]SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation](2310.03991) |\r\n  | k-SemStamp | ACL 2024 (Findings)   | [\\[2402.11399\\] k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text](2402.11399) |\r\n  | EXP\u002FEXPGumbel      | Lecture Note | https:\u002F\u002Fwww.scottaaronson.com\u002Ftalks\u002Fwatermark.ppt                                                                                                                          |\r\n  | EXP-Edit           | TMLR 2024 | [\\[2307.15593\\] Robust Distortion-free Watermarks for Language Models (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.15593)                                                           |\r\n  | ITS-Edit           | TMLR 2024 | [\\[2307.15593\\] Robust Distortion-free Watermarks for Language Models (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.15593)                                                           |\r\n  | IE           | Arxiv Preprint | [\\[2505.14112\\] Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.14112)                                                           |\r\n- **Visualization Solutions:** The toolkit includes custom visualization tools that enable clear and insightful views into how different watermarking algorithms operate under various scenarios. These visualizations help demystify the algorithms' mechanisms, making them more understandable for users.\r\n\r\n  \u003Cimg src=\"images\\mechanism_visualization.png\" alt=\"mechanism_visualization\" style=\"zoom:35%;\" \u002F>\r\n- **Evaluation Module:** With 12 evaluation tools that cover detectability, robustness, and impact on text quality, MarkLLM stands out in its comprehensive approach to assessing watermarking technologies. It also features customizable automated evaluation pipelines that cater to diverse needs and scenarios, enhancing the toolkit's practical utility.\r\n\r\n  **Tools:**\r\n\r\n  - **Success Rate Calculator of Watermark Detection:** FundamentalSuccessRateCalculator, DynamicThresholdSuccessRateCalculator\r\n  - **Text Editor:** WordDeletion, SynonymSubstitution, ContextAwareSynonymSubstitution, GPTParaphraser, DipperParaphraser, RandomWalkAttack\r\n  - **Text Quality Analyzer:** PPLCalculator, LogDiversityAnalyzer, BLEUCalculator, PassOrNotJudger, GPTDiscriminator\r\n\r\n  **Pipelines:**\r\n\r\n  - **Watermark Detection Pipeline:** WatermarkedTextDetectionPipeline, UnwatermarkedTextDetectionPipeline\r\n  - **Text Quality Pipeline:** DirectTextQualityAnalysisPipeline, ReferencedTextQualityAnalysisPipeline, ExternalDiscriminatorTextQualityAnalysisPipeline\r\n\r\n### How to use the toolkit in your own code\r\n\r\n#### Setting up the environment\r\n\r\n- python 3.10\r\n- pytorch\r\n- pip install -r requirements.txt\r\n\r\n*Tips:* If you wish to utilize the EXPEdit or ITSEdit algorithm, you will need to import for .pyx file, take EXPEdit as an example:\r\n\r\n- run `python watermark\u002Fexp_edit\u002Fcython_files\u002Fsetup.py build_ext --inplace`\r\n- move the generated `.so` file into `watermark\u002Fexp_edit\u002Fcython_files\u002F`\r\n\r\n#### Invoking watermarking algorithms\r\n\r\n```python\r\nimport torch\r\nfrom watermark.auto_watermark import AutoWatermark\r\nfrom utils.transformers_config import TransformersConfig\r\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\r\n\r\n# Device\r\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\r\n\r\n# Transformers config\r\ntransformers_config = TransformersConfig(model=AutoModelForCausalLM.from_pretrained('facebook\u002Fopt-1.3b').to(device),\r\n                                         tokenizer=AutoTokenizer.from_pretrained('facebook\u002Fopt-1.3b'),\r\n                                         vocab_size=50272,\r\n                                         device=device,\r\n                                         max_new_tokens=200,\r\n                                         min_length=230,\r\n                                         do_sample=True,\r\n                                         no_repeat_ngram_size=4)\r\n  \r\n# Load watermark algorithm\r\nmyWatermark = AutoWatermark.load('KGW', \r\n                                 algorithm_config='config\u002FKGW.json',\r\n                                 transformers_config=transformers_config)\r\n\r\n# Prompt\r\nprompt = 'Good Morning.'\r\n\r\n# Generate and detect\r\nwatermarked_text = myWatermark.generate_watermarked_text(prompt)\r\ndetect_result = myWatermark.detect_watermark(watermarked_text)\r\nunwatermarked_text = myWatermark.generate_unwatermarked_text(prompt)\r\ndetect_result = myWatermark.detect_watermark(unwatermarked_text)\r\n```\r\n\r\n#### Visualizing mechanisms\r\n\r\nAssuming you already have a pair of `watermarked_text` and `unwatermarked_text`, and you wish to visualize the differences and specifically highlight the watermark within the watermarked text using a watermarking algorithm, you can utilize the visualization tools available in the `visualize\u002F` directory.\r\n\r\n**KGW Family**\r\n\r\n```python\r\nimport torch\r\nfrom visualize.font_settings import FontSettings\r\nfrom watermark.auto_watermark import AutoWatermark\r\nfrom utils.transformers_config import TransformersConfig\r\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\r\nfrom visualize.visualizer import DiscreteVisualizer\r\nfrom visualize.legend_settings import DiscreteLegendSettings\r\nfrom visualize.page_layout_settings import PageLayoutSettings\r\nfrom visualize.color_scheme import ColorSchemeForDiscreteVisualization\r\n\r\n# Load watermark algorithm\r\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\r\ntransformers_config = TransformersConfig(\r\n    \t\t\t\t\t\tmodel=AutoModelForCausalLM.from_pretrained('facebook\u002Fopt-1.3b').to(device),\r\n                            tokenizer=AutoTokenizer.from_pretrained('facebook\u002Fopt-1.3b'),\r\n                            vocab_size=50272,\r\n                            device=device,\r\n                            max_new_tokens=200,\r\n                            min_length=230,\r\n                            do_sample=True,\r\n                            no_repeat_ngram_size=4)\r\nmyWatermark = AutoWatermark.load('KGW', \r\n                                 algorithm_config='config\u002FKGW.json',\r\n                                 transformers_config=transformers_config)\r\n# Get data for visualization\r\nwatermarked_data = myWatermark.get_data_for_visualization(watermarked_text)\r\nunwatermarked_data = myWatermark.get_data_for_visualization(unwatermarked_text)\r\n\r\n# Init visualizer\r\nvisualizer = DiscreteVisualizer(color_scheme=ColorSchemeForDiscreteVisualization(),\r\n                                font_settings=FontSettings(), \r\n                                page_layout_settings=PageLayoutSettings(),\r\n                                legend_settings=DiscreteLegendSettings())\r\n# Visualize\r\nwatermarked_img = visualizer.visualize(data=watermarked_data, \r\n                                       show_text=True, \r\n                                       visualize_weight=True, \r\n                                       display_legend=True)\r\n\r\nunwatermarked_img = visualizer.visualize(data=unwatermarked_data,\r\n                                         show_text=True, \r\n                                         visualize_weight=True, \r\n                                         display_legend=True)\r\n# Save\r\nwatermarked_img.save(\"KGW_watermarked.png\")\r\nunwatermarked_img.save(\"KGW_unwatermarked.png\")\r\n```\r\n\r\n\u003Cdiv align=\"center\">\r\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FTHU-BPM_MarkLLM_readme_5171ae5b4c77.png\" alt=\"1\" width=\"500\" \u002F>\r\n\u003C\u002Fdiv>\r\n\r\n**Christ Family**\r\n\r\n```python\r\nimport torch\r\nfrom visualize.font_settings import FontSettings\r\nfrom watermark.auto_watermark import AutoWatermark\r\nfrom utils.transformers_config import TransformersConfig\r\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\r\nfrom visualize.visualizer import ContinuousVisualizer\r\nfrom visualize.legend_settings import ContinuousLegendSettings\r\nfrom visualize.page_layout_settings import PageLayoutSettings\r\nfrom visualize.color_scheme import ColorSchemeForContinuousVisualization\r\n\r\n# Load watermark algorithm\r\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\r\ntransformers_config = TransformersConfig(\r\n    \t\t\t\t\t\tmodel=AutoModelForCausalLM.from_pretrained('facebook\u002Fopt-1.3b').to(device),\r\n                            tokenizer=AutoTokenizer.from_pretrained('facebook\u002Fopt-1.3b'),\r\n                            vocab_size=50272,\r\n                            device=device,\r\n                            max_new_tokens=200,\r\n                            min_length=230,\r\n                            do_sample=True,\r\n                            no_repeat_ngram_size=4)\r\nmyWatermark = AutoWatermark.load('EXP', \r\n                                 algorithm_config='config\u002FEXP.json',\r\n                                 transformers_config=transformers_config)\r\n# Get data for visualization\r\nwatermarked_data = myWatermark.get_data_for_visualization(watermarked_text)\r\nunwatermarked_data = myWatermark.get_data_for_visualization(unwatermarked_text)\r\n\r\n# Init visualizer\r\nvisualizer = ContinuousVisualizer(color_scheme=ColorSchemeForContinuousVisualization(),\r\n                                  font_settings=FontSettings(), \r\n                                  page_layout_settings=PageLayoutSettings(),\r\n                                  legend_settings=ContinuousLegendSettings())\r\n# Visualize\r\nwatermarked_img = visualizer.visualize(data=watermarked_data, \r\n                                       show_text=True, \r\n                                       visualize_weight=True, \r\n                                       display_legend=True)\r\n\r\nunwatermarked_img = visualizer.visualize(data=unwatermarked_data,\r\n                                         show_text=True, \r\n                                         visualize_weight=True, \r\n                                         display_legend=True)\r\n# Save\r\nwatermarked_img.save(\"EXP_watermarked.png\")\r\nunwatermarked_img.save(\"EXP_unwatermarked.png\")\r\n```\r\n\r\n\u003Cdiv align=\"center\">\r\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FTHU-BPM_MarkLLM_readme_35e60216f7fa.png\" alt=\"2\" width=\"500\" \u002F>\r\n\u003C\u002Fdiv>\r\n\r\nFor more examples on how to use the visualization tools, please refer to the `test\u002Ftest_visualize.py` script in the project directory.\r\n\r\n#### Applying evaluation pipelines\r\n\r\n**Using Watermark Detection Pipelines**\r\n\r\n```python\r\nimport torch\r\nfrom evaluation.dataset import C4Dataset\r\nfrom watermark.auto_watermark import AutoWatermark\r\nfrom utils.transformers_config import TransformersConfig\r\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\r\nfrom evaluation.tools.text_editor import TruncatePromptTextEditor, WordDeletion\r\nfrom evaluation.tools.success_rate_calculator import DynamicThresholdSuccessRateCalculator\r\nfrom evaluation.pipelines.detection import WatermarkedTextDetectionPipeline, UnWatermarkedTextDetectionPipeline, DetectionPipelineReturnType\r\n\r\n# Load dataset\r\nmy_dataset = C4Dataset('dataset\u002Fc4\u002Fprocessed_c4.json')\r\n\r\n# Device\r\ndevice = 'cuda' if torch.cuda.is_available() else 'cpu'\r\n\r\n# Transformers config\r\ntransformers_config = TransformersConfig(\r\n    model=AutoModelForCausalLM.from_pretrained('facebook\u002Fopt-1.3b').to(device),\r\n    tokenizer=AutoTokenizer.from_pretrained('facebook\u002Fopt-1.3b'),\r\n    vocab_size=50272,\r\n    device=device,\r\n    max_new_tokens=200,\r\n    do_sample=True,\r\n    min_length=230,\r\n    no_repeat_ngram_size=4)\r\n\r\n# Load watermark algorithm\r\nmy_watermark = AutoWatermark.load('KGW', \r\n                                  algorithm_config='config\u002FKGW.json',\r\n                                  transformers_config=transformers_config)\r\n\r\n# Init pipelines\r\npipeline1 = WatermarkedTextDetectionPipeline(\r\n    dataset=my_dataset, \r\n    text_editor_list=[TruncatePromptTextEditor(), WordDeletion(ratio=0.3)],\r\n    show_progress=True, \r\n    return_type=DetectionPipelineReturnType.SCORES) \r\n\r\npipeline2 = UnWatermarkedTextDetectionPipeline(dataset=my_dataset, \r\n                                               text_editor_list=[],\r\n                                               show_progress=True,\r\n                                               return_type=DetectionPipelineReturnType.SCORES)\r\n\r\n# Evaluate\r\ncalculator = DynamicThresholdSuccessRateCalculator(labels=['TPR', 'F1'], rule='best')\r\nprint(calculator.calculate(pipeline1.evaluate(my_watermark), pipeline2.evaluate(my_watermark)))\r\n```\r\n\r\n**Using Text Quality Analysis Pipeline**\r\n\r\n```python\r\nimport torch\r\nfrom evaluation.dataset import C4Dataset\r\nfrom watermark.auto_watermark import AutoWatermark\r\nfrom utils.transformers_config import TransformersConfig\r\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\r\nfrom evaluation.tools.text_editor import TruncatePromptTextEditor\r\nfrom evaluation.tools.text_quality_analyzer import PPLCalculator\r\nfrom evaluation.pipelines.quality_analysis import DirectTextQualityAnalysisPipeline, QualityPipelineReturnType\r\n\r\n# Load dataset\r\nmy_dataset = C4Dataset('dataset\u002Fc4\u002Fprocessed_c4.json')\r\n\r\n# Device\r\ndevice = 'cuda' if torch.cuda.is_available() else 'cpu'\r\n\r\n# Transformer config\r\ntransformers_config = TransformersConfig(\r\n    model=AutoModelForCausalLM.from_pretrained('facebook\u002Fopt-1.3b').to(device),                             \ttokenizer=AutoTokenizer.from_pretrained('facebook\u002Fopt-1.3b'),\r\n    vocab_size=50272,\r\n    device=device,\r\n    max_new_tokens=200,\r\n    min_length=230,\r\n    do_sample=True,\r\n    no_repeat_ngram_size=4)\r\n\r\n# Load watermark algorithm\r\nmy_watermark = AutoWatermark.load('KGW', \r\n                                  algorithm_config='config\u002FKGW.json',\r\n                                  transformers_config=transformers_config)\r\n\r\n# Init pipeline\r\nquality_pipeline = DirectTextQualityAnalysisPipeline(\r\n    dataset=my_dataset, \r\n    watermarked_text_editor_list=[TruncatePromptTextEditor()],\r\n    unwatermarked_text_editor_list=[],                                             \r\n    analyzer=PPLCalculator(\r\n        model=AutoModelForCausalLM.from_pretrained('..model\u002Fllama-7b\u002F', device_map='auto'),                 \t\ttokenizer=LlamaTokenizer.from_pretrained('..model\u002Fllama-7b\u002F'),\r\n        device=device),\r\n    unwatermarked_text_source='natural', \r\n    show_progress=True, \r\n    return_type=QualityPipelineReturnType.MEAN_SCORES)\r\n\r\n# Evaluate\r\nprint(quality_pipeline.evaluate(my_watermark))\r\n```\r\n\r\nFor more examples on how to use the pipelines, please refer to the `test\u002Ftest_pipeline.py` script in the project directory.\r\n\r\n**Leveraging example scripts for evaluation**\r\n\r\nIn the `evaluation\u002Fexamples\u002F` directory of our repository, you will find a collection of Python scripts specifically designed for systematic and automated evaluation of various algorithms. By using these examples, you can quickly and effectively gauge the d etectability, robustness and impact on text quality of each algorithm implemented within our toolkit.\r\n\r\nNote: To execute the scripts in `evaluation\u002Fexamples\u002F`, first run the following command to set the environment variables.\r\n\r\n```bash\r\nexport PYTHONPATH=\"path_to_the_MarkLLM_project:$PYTHONPATH\"\r\n```\r\n\r\n### More user examples\r\n\r\nAdditional user examples are available in `test\u002F`. To execute the scripts contained within, first run the following command to set the environment variables.\r\n\r\n```bash\r\nexport PYTHONPATH=\"path_to_the_MarkLLM_project:$PYTHONPATH\"\r\n```\r\n\r\n### Demo jupyter notebooks\r\n\r\nIn addition to the Colab Jupyter notebook we provide (some models cannot be downloaded due to storage limits), you can also easily deploy using `MarkLLM_demo.ipynb` on your local machine.\r\n\r\n\r\n### Citations\r\n\r\n```\r\n@inproceedings{pan-etal-2024-markllm,\r\n    title = \"{M}ark{LLM}: An Open-Source Toolkit for {LLM} Watermarking\",\r\n    author = \"Pan, Leyi  and\r\n      Liu, Aiwei  and\r\n      He, Zhiwei  and\r\n      Gao, Zitian  and\r\n      Zhao, Xuandong  and\r\n      Lu, Yijian  and\r\n      Zhou, Binglin  and\r\n      Liu, Shuliang  and\r\n      Hu, Xuming  and\r\n      Wen, Lijie  and\r\n      King, Irwin  and\r\n      Yu, Philip S.\",\r\n    editor = \"Hernandez Farias, Delia Irazu  and\r\n      Hope, Tom  and\r\n      Li, Manling\",\r\n    booktitle = \"Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations\",\r\n    month = nov,\r\n    year = \"2024\",\r\n    address = \"Miami, Florida, USA\",\r\n    publisher = \"Association for Computational Linguistics\",\r\n    url = \"https:\u002F\u002Faclanthology.org\u002F2024.emnlp-demo.7\",\r\n    pages = \"61--71\",\r\n    abstract = \"Watermarking for Large Language Models (LLMs), which embeds imperceptible yet algorithmically detectable signals in model outputs to identify LLM-generated text, has become crucial in mitigating the potential misuse of LLMs. However, the abundance of LLM watermarking algorithms, their intricate mechanisms, and the complex evaluation procedures and perspectives pose challenges for researchers and the community to easily understand, implement and evaluate the latest advancements. To address these issues, we introduce MarkLLM, an open-source toolkit for LLM watermarking. MarkLLM offers a unified and extensible framework for implementing LLM watermarking algorithms, while providing user-friendly interfaces to ensure ease of access. Furthermore, it enhances understanding by supporting automatic visualization of the underlying mechanisms of these algorithms. For evaluation, MarkLLM offers a comprehensive suite of 12 tools spanning three perspectives, along with two types of automated evaluation pipelines. Through MarkLLM, we aim to support researchers while improving the comprehension and involvement of the general public in LLM watermarking technology, fostering consensus and driving further advancements in research and application. Our code is available at https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FMarkLLM.\",\r\n}\r\n```\r\n\r\n","\u003Cdiv align=\"center\">\r\n\r\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FTHU-BPM_MarkLLM_readme_f308cec3ec87.jpg\" style=\"width: 40%;\"\u002F>\r\n\r\n# 面向大语言模型水印的开源工具包\r\n\r\n[![主页](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHomepage-5F259F?style=for-the-badge&logo=homepage&logoColor=white)](https:\u002F\u002Fgenerative-watermark.github.io\u002F)\r\n[![论文](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.10051) [![HF模型](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHF--Models-%23FFD14D?style=for-the-badge&logo=huggingface&logoColor=black)](https:\u002F\u002Fhuggingface.co\u002FGenerative-Watermark-Toolkits)  [![EMNLP](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEMNLP--Demo-%230C2E82.svg?style=for-the-badge&logo=conferene&logoColor=white)](https:\u002F\u002Faclanthology.org\u002F2024.emnlp-demo.7\u002F) [![colab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGoogle--Colab-%23D97700?style=for-the-badge&logo=Google-colab&logoColor=white)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F169MS4dY6fKNPZ7-92ETz1bAm_xyNAs0B?usp=sharing) [![视频](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FVideo--Description-%23000000?style=for-the-badge&logo=Airplay-Video&logoColor=white)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=QN3BhNvw14E&)\r\n\r\n\u003C\u002Fdiv>\r\n\r\n\r\n> 🎉 **我们欢迎PR！** 如果你已经实现了一个大语言模型水印算法，或者有兴趣贡献一个，我们非常乐意将其加入MarkLLM。加入我们的社区，帮助让文本水印技术对每个人来说更加易用吧！\r\n\r\n> 🔥 如果你对扩散模型（图像\u002F视频）的水印感兴趣，请参考我们团队的[MarkDiffusion](https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FMarkDiffusion\u002F)工具包。\r\n\r\n### 💡 我们团队的其他一些可能让你感兴趣的水印论文 ✨\r\n\r\n1. [\u003Cu>**(ICLR 2024) 大语言模型的语义不变鲁棒水印\u003C\u002Fu>**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.06356)\r\n   \r\n   \u003Cspan style=\"color:gray\">刘艾伟、潘雷伊、胡旭明、孟啸、文立杰\u003C\u002Fspan>\r\n   \r\n   [![GitHub星标](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTHU-BPM\u002FRobust_Watermark?style=social&logo=github)](https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FRobust_Watermark)\r\n     [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2310.06356-red)](#)\r\n\r\n2. [\u003Cu>**(ICLR 2024) 大语言模型的不可伪造公开可验证水印\u003C\u002Fu>**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.16230)\r\n   \r\n   \u003Cspan style=\"color:gray\">刘艾伟、潘雷伊、胡旭明、李书昂、文立杰、Irwin King、Philip S. Yu\u003C\u002Fspan>\r\n   \r\n   [![GitHub星标](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTHU-BPM\u002Funforgeable_watermark?style=social&logo=github)](https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002Funforgeable_watermark)\r\n     [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2307.16230-red)](#)\r\n\r\n3. [\u003Cu>**(ACM Computing Surveys) 大语言模型时代文本水印综述\u003C\u002Fu>**](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3691626)\r\n   \r\n   \u003Cspan style=\"color:gray\">刘艾伟*、潘雷伊*、陆一健、李晶晶、胡旭明、张曦、文立杰、Irwin King、熊辉、Philip S. Yu\u003C\u002Fspan>\r\n   \r\n   [![主页](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHome-Text_Watermarking_Survey-blue?style=flat&logo=html5)](https:\u002F\u002Fsurvey-text-watermark.github.io\u002F)\r\n     [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2312.07913-red)](#)\r\n\r\n4. [\u003Cu>**(ICLR 2025 Spotlight) 用户能否通过精心设计的提示识别出已加水印的大语言模型？\u003C\u002Fu>](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.03168)\r\n   \r\n   \u003Cspan style=\"color:gray\">刘艾伟、关晟、刘一鸣、潘雷伊、张艺飞、方连成、文立杰、Philip S. Yu、胡旭明\u003C\u002Fspan>\r\n   \r\n   [![GitHub星标](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTHU-BPM\u002FWatermarked_LLM_Identification?style=social&logo=github)](https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FWatermarked_LLM_Identification)\r\n     [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2410.03168-red)](#)\r\n\r\n5. [\u003Cu>**(ACL 2025 Main) 大语言模型水印能否稳健地防止未经授权的知识蒸馏？\u003C\u002Fu>](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.11598)\r\n   \r\n   \u003Cspan style=\"color:gray\">潘雷伊、刘艾伟、黄世宇、陆一健、胡旭明、文立杰、Irwin King、Philip S. Yu\u003C\u002Fspan>\r\n   \r\n   [![GitHub星标](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTHU-BPM\u002FWatermark-Radioactivity-Attack?style=social&logo=github)](https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FWatermark-Radioactivity-Attack)\r\n     [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2502.11598-red)](#)\r\n\r\n6. [\u003Cu>**(NAACL 2025 Findings) WaterSeeker：开创性地高效检测大型文档中的加水印片段\u003C\u002Fu>](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.05112)\r\n   \r\n   \u003Cspan style=\"color:gray\">潘雷伊、刘艾伟、陆一健、高子天、狄一辰、文立杰、Irwin King、Philip S. Yu\u003C\u002Fspan>\r\n   \r\n   [![GitHub星标](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTHU-BPM\u002FWaterSeeker?style=social&logo=github)](https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FWaterSeeker)\r\n     [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2409.05112-red)](#)\r\n\r\n7. [\u003Cu>**(ACL 2024 Main) 基于熵的文本水印检测方法\u003C\u002Fu>](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.13485)\r\n   \r\n   \u003Cspan style=\"color:gray\">陆一健、刘艾伟、于典志、李晶晶、Irwin King\u003C\u002Fspan>\r\n   \r\n   [![GitHub星标](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fluyijian3\u002FEWD?style=social&logo=github)](https:\u002F\u002Fgithub.com\u002Fluyijian3\u002FEWD)\r\n     [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2403.13485-red)](#)\r\n  \r\n8. [\u003Cu>**(ACL 2024 Main) 水印能否在翻译中存活？关于大语言模型文本水印的跨语言一致性\u003C\u002Fu>](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.14007)\r\n   \r\n   \u003Cspan style=\"color:gray\">何志伟、周炳林、郝洪坤、刘艾伟、王兴、涂兆鹏、张卓生、王睿\u003C\u002Fspan>\r\n   \r\n   [![GitHub星标](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fzwhe99\u002FX-SIR?style=social&logo=github)](https:\u002F\u002Fgithub.com\u002Fzwhe99\u002FX-SIR)\r\n     [![Arxiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2402.14007-red)](#)\r\n\r\n\r\n### 目录\r\n\r\n- [MarkLLM：面向大语言模型水印的开源工具包](#markllm-an-open-source-toolkit-for-llm-watermarking)\r\n    - [目录](#contents)\r\n    - [注释](#notes)\r\n    - [更新](#updates)\r\n    - [MarkLLM简介](#introduction-to-markllm)\r\n      - [概述](#overview)\r\n      - [MarkLLM的主要特性](#key-features-of-markllm)\r\n    - [如何在自己的代码中使用该工具包](#how-to-use-the-toolkit-in-your-own-code)\r\n      - [环境搭建](#setting-up-the-environment)\r\n      - [调用水印算法](#invoking-watermarking-algorithms)\r\n      - [机制可视化](#visualizing-mechanisms)\r\n      - [应用评估流水线](#applying-evaluation-pipelines)\r\n    - [更多用户示例](#more-user-examples)\r\n    - [演示Jupyter笔记本](#demo-jupyter-notebooks)\r\n    - [引用](#citations)\n\n### ❗❗❗ 注意事项\n随着 MarkLLM 仓库内容日益丰富、体积不断增大，我们已在 Hugging Face 上创建了一个名为 [Generative-Watermark-Toolkits](https:\u002F\u002Fhuggingface.co\u002FGenerative-Watermark-Toolkits) 的模型存储库，以方便用户使用。该仓库包含了多种涉及自训练模型的水印算法的默认模型。我们已从主仓库中相应水印算法的 `model\u002F` 文件夹中移除了模型权重。**在使用代码时，请先根据配置路径从 Hugging Face 仓库下载对应的模型，并将其保存到 `model\u002F` 目录下，再运行代码。**\n\n### 更新日志\n- 🎉 **(2025.09.22)** 新增 [SemStamp](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.03991) 水印方法。感谢 Huan Wang 的 PR！\n- 🎉 **(2025.09.17)** 新增 [IE](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.14112) 水印方法。感谢 Tianle Gu 的 PR！\n- 🎉 **(2025.09.14)** 新增 [Watermark Stealing](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.19361) 攻击方法。感谢 Shuhao Zhang 的 PR！\n- 🎉 **(2025.07.17)** 新增 [k-SemStamp](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.11399) 水印方法。感谢 Huan Wang 的 PR！\n- 🎉 **(2025.07.17)** 新增 [Adaptive Watermark](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.13927) 水印方法。感谢 Yepeng Liu 的 PR！\n- 🎉 **(2025.05.24)** 新增 [MorphMark](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.11541) 水印方法。感谢 Zongqi Wang 的 PR！\n- 🎉 **(2025.03.12)** 新增 [Permute-and-Flip](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.05864) (PF) 水印方法。感谢 Zian Wang 的 PR！\n- 🎉 **(2025.02.27)** 为 Unbiased 水印方法新增 δ-reweight 和 LLR 分数检测功能。\n- 🎉 **(2025.01.08)** 为水印方法添加 AutoConfiguration 功能。\n- 🎉 **(2024.12.21)** 在 `MarkvLLM_demo.py` 中提供将 VLLM 与 MarkLLM 集成的示例代码。感谢 @zhangjf-nlp 的 PR！\n- 🎉 **(2024.11.21)** 支持 [SynthID-Text](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-024-08025-4) 方法（Nature）的失真版本。\n- 🎉 **(2024.11.03)** 新增 [SynthID-Text](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-024-08025-4) 方法（Nature），并支持均值、加权均值和贝叶斯等检测方法。\n- 🎉 **(2024.11.01)** 新增 [TS-Watermark](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.18059) 方法（ICML 2024）。感谢 Kyle Zheng 和 Minjia Huo 的 PR！\n- 🎉 **(2024.10.07)** 提供 EXP 水印算法的一种等价替代实现（EXPGumbel），采用 Gumbel 噪声。通过此实现，用户可以通过调整配置文件中的采样温度来改变水印强度。\n- 🎉 **(2024.10.07)** 新增 [Unbiased](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10669) 水印方法。\n- 🎉 **(2024.10.06)** 我们很高兴地宣布，我们的论文“MarkLLM: 一个用于 LLM 水印的开源工具包”已被 **EMNLP 2024 Demo** 接受！\n- 🎉 **(2024.08.08)** 新增 [DiPmark](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.07710) 水印方法。感谢 Sheng Guan 的 PR！\n- 🎉 **(2024.08.01)** 作为 [python 包](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmarkllm\u002F) 发布！尝试运行 `pip install markllm`。我们在本文末尾提供了用户示例。\n- 🎉 **(2024.07.13)** 新增 ITSEdit 水印方法。感谢 Yiming Liu 的 PR！\n- 🎉 **(2024.07.09)** 为 KGW 添加更多哈希方案（跳过、最小值、加法、自哈希）。感谢 Yichen Di 的 PR！\n- 🎉 **(2024.07.08)** 为 Christ 系列水印方法添加 top-k 过滤器。感谢 Kai Shi 的 PR！\n- 🎉 **(2024.07.03)** 更新了反向翻译攻击。感谢 Zihan Tang 的 PR！\n- 🎉 **(2024.06.19)** 根据强水印不可行性结果的相关论文（[ICML](https:\u002F\u002Fopenreview.net\u002Fpdf\u002Fc85c77848c1a0a1a53da8fb873d2b27c5b8509c1.pdf)，2024年；[博客](https:\u002F\u002Fkempnerinstitute.harvard.edu\u002Fresearch\u002Fdeeper-learning\u002Fwatermarking-in-the-sand\u002F)），更新了随机游走攻击。感谢 Hanlin Zhang 的 PR！\n- 🎉 **(2024.05.23)** 我们非常高兴地宣布，我们的网站演示已正式上线！\n\n### MarkLLM 简介\n\n#### 概述\n\nMarkLLM 是一个开源工具包，旨在促进大型语言模型（LLMs）中水印技术的研究与应用。随着大型语言模型的广泛应用，确保机器生成文本的真实性和来源变得至关重要。MarkLLM 简化了对水印技术的访问、理解和评估，使其既适用于研究人员，也便于更广泛的社区使用。\n\n\u003Cimg src=\"images\\overview.png\"  alt=\"overview\" style=\"zoom:35%;\" \u002F>\n\n#### MarkLLM 的主要特性\n\n- **实现框架**：MarkLLM 提供了一个统一且可扩展的平台，用于实现各种 LLM 水印算法。目前支持来自两个重要家族的九种具体算法，从而促进了水印技术的集成与扩展。\n\n  **框架设计**：\n\n  \u003Cdiv align=\"center\">\n      \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FTHU-BPM_MarkLLM_readme_7fbd1e78e2dd.png\" alt=\"unified_implementation\" width=\"400\"\u002F>\n  \u003C\u002Fdiv>\n\n  **当前支持的算法**：\n\n  | 算法名称     | 发表期刊      | 链接                                                                                                                                                                       |\n  | ------------------ | ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n  | KGW                | ICML 2023    | [\\[2301.10226\\] 大型语言模型的水印 (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.10226)                                                                            |\n  | Unigram            | ICLR 2024    | [\\[2306.17439\\] 可证明鲁棒的AI生成文本水印 (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.17439)                                                               |\n  | SWEET              | ACL 2024    | [\\[2305.15060\\] 这段代码是谁写的？代码生成中的水印 (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.15060)                                                            |\n  | UPV                | ICLR 2024    | [\\[2307.16230\\] 一种不可伪造、可公开验证的大规模语言模型水印 (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.16230)                                           |\n  | EWD                | ACL 2024    | [\\[2403.13485\\] 基于熵的文本水印检测方法 (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.13485)                                                              |\n  | SIR                | ICLR 2024    | [\\[2310.06356\\] 大型语言模型的语义不变鲁棒水印 (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.06356)                                                  |\n  | X-SIR              | ACL 2024    | [\\[2402.14007\\] 水印能经受住翻译吗？关于大型语言模型文本水印的跨语言一致性 (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.14007) |\n  | DiPmark            | ICML 2024    | [\\[2310.07710\\] 一种具有弹性且易于使用的分布保持型水印 (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.07710)                           |\n  | 无偏水印 | ICLR 2024    | [\\[2310.10669\\] 大型语言模型的无偏水印 (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10669)                                                                     |\n  | TS水印 | ICML 2024    | [\\[2402.18059\\] 针对大型语言模型的增强可检测性和语义连贯性的特定于标记的水印 (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.18059)                                                                     |\n  | SynthID-Text | Nature 2024   | [用于识别大型语言模型输出的可扩展水印 (*Nature*)](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-024-08025-4)                                                                     |\n  | PF水印 | ICLR 2025   | [\\[2402.05864\\] 排列翻转：一种最优稳定且可水印的 LLM 解码器](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.05864)  \n  | MorphMark | ACL 2025   | [\\[2505.11541\\] MorphMark：大型语言模型的灵活自适应水印](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.11541)                                                                     |\n  | 自适应水印 | ICML 2024   | [\\[2401.13927\\] 大型语言模型的自适应文本水印](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.13927) |\n  | SemStamp | NAACL 2024  | [\\[2310.03991\\]SemStamp：一种具有释义鲁棒性的语义水印，用于文本生成](2310.03991) |\n  | k-SemStamp | ACL 2024 (Findings)   | [\\[2402.11399\\] k-SemStamp：一种基于聚类的语义水印，用于检测机器生成文本](2402.11399) |\n  | EXP\u002FEXPGumbel      | 讲座笔记 | https:\u002F\u002Fwww.scottaaronson.com\u002Ftalks\u002Fwatermark.ppt                                                                                                                          |\n  | EXP-Edit           | TMLR 2024 | [\\[2307.15593\\] 鲁棒无失真的语言模型水印 (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.15593)                                                           |\n  | ITS-Edit           | TMLR 2024 | [\\[2307.15593\\] 鲁棒无失真的语言模型水印 (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.15593)                                                           |\n  | IE           | Arxiv 预印本 | [\\[2505.14112\\] 不可见的熵：迈向安全高效的低熵 LLM 水印 (arxiv.org)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.14112)                                                           |\n- **可视化解决方案**：该工具包包含自定义可视化工具，能够在不同场景下清晰、深入地展示各类水印算法的工作原理。这些可视化工具有助于揭示算法机制，使用户更容易理解。\n\n  \u003Cimg src=\"images\\mechanism_visualization.png\" alt=\"mechanism_visualization\" style=\"zoom:35%;\" \u002F>\n- **评估模块**：MarkLLM 拥有 12 种评估工具，涵盖可检测性、鲁棒性以及对文本质量的影响，以其全面的评估方法在水印技术评估领域脱颖而出。此外，它还提供可定制的自动化评估流程，以满足不同需求和场景，进一步提升了工具包的实用性。\n\n  **工具**：\n\n  - **水印检测成功率计算器**：FundamentalSuccessRateCalculator、DynamicThresholdSuccessRateCalculator\n  - **文本编辑器**：WordDeletion、SynonymSubstitution、ContextAwareSynonymSubstitution、GPTParaphraser、DipperParaphraser、RandomWalkAttack\n  - **文本质量分析器**：PPLCalculator、LogDiversityAnalyzer、BLEUCalculator、PassOrNotJudger、GPTDiscriminator\n\n  **管道**：\n\n  - **水印检测管道**：WatermarkedTextDetectionPipeline、UnwatermarkedTextDetectionPipeline\n  - **文本质量管道**：DirectTextQualityAnalysisPipeline、ReferencedTextQualityAnalysisPipeline、ExternalDiscriminatorTextQualityAnalysisPipeline\n\n### 如何在您自己的代码中使用该工具包\n\n#### 环境设置\n\n- Python 3.10\n- PyTorch\n- 运行 `pip install -r requirements.txt`\n\n*提示：* 如果您希望使用 EXPEdit 或 ITSEdit 算法，您需要导入 `.pyx` 文件。以 EXPEdit 为例：\n\n- 运行 `python watermark\u002Fexp_edit\u002Fcython_files\u002Fsetup.py build_ext --inplace`\n- 将生成的 `.so` 文件移动到 `watermark\u002Fexp_edit\u002Fcython_files\u002F` 目录下。\n\n#### 调用水印算法\n\n```python\nimport torch\nfrom watermark.auto_watermark import AutoWatermark\nfrom utils.transformers_config import TransformersConfig\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\n# 设备\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n\n# Transformers 配置\ntransformers_config = TransformersConfig(model=AutoModelForCausalLM.from_pretrained('facebook\u002Fopt-1.3b').to(device),\n                                         tokenizer=AutoTokenizer.from_pretrained('facebook\u002Fopt-1.3b'),\n                                         vocab_size=50272,\n                                         device=device,\n                                         max_new_tokens=200,\n                                         min_length=230,\n                                         do_sample=True,\n                                         no_repeat_ngram_size=4)\n  \n# 加载水印算法\nmyWatermark = AutoWatermark.load('KGW', \n                                 algorithm_config='config\u002FKGW.json',\n                                 transformers_config=transformers_config)\n\n# 提示词\nprompt = 'Good Morning.'\n\n# 生成并检测\nwatermarked_text = myWatermark.generate_watermarked_text(prompt)\ndetect_result = myWatermark.detect_watermark(watermarked_text)\nunwatermarked_text = myWatermark.generate_unwatermarked_text(prompt)\ndetect_result = myWatermark.detect_watermark(unwatermarked_text)\n```\n\n#### 可视化机制\n\n假设您已经有一对 `watermarked_text` 和 `unwatermarked_text`，并且希望通过水印算法可视化它们之间的差异，并在加水印的文本中特别突出水印部分，您可以使用 `visualize\u002F` 目录下的可视化工具。\n\n**KGW 家族**\n\n```python\nimport torch\nfrom visualize.font_settings import FontSettings\nfrom watermark.auto_watermark import AutoWatermark\nfrom utils.transformers_config import TransformersConfig\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nfrom visualize.visualizer import DiscreteVisualizer\nfrom visualize.legend_settings import DiscreteLegendSettings\nfrom visualize.page_layout_settings import PageLayoutSettings\nfrom visualize.color_scheme import ColorSchemeForDiscreteVisualization\n\n# 加载水印算法\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\ntransformers_config = TransformersConfig(\n    \t\t\t\t\t\tmodel=AutoModelForCausalLM.from_pretrained('facebook\u002Fopt-1.3b').to(device),\n                            tokenizer=AutoTokenizer.from_pretrained('facebook\u002Fopt-1.3b'),\n                            vocab_size=50272,\n                            device=device,\n                            max_new_tokens=200,\n                            min_length=230,\n                            do_sample=True,\n                            no_repeat_ngram_size=4)\nmyWatermark = AutoWatermark.load('KGW', \n                                 algorithm_config='config\u002FKGW.json',\n                                 transformers_config=transformers_config)\n# 获取用于可视化的数据\nwatermarked_data = myWatermark.get_data_for_visualization(watermarked_text)\nunwatermarked_data = myWatermark.get_data_for_visualization(unwatermarked_text)\n\n# 初始化可视化工具\nvisualizer = DiscreteVisualizer(color_scheme=ColorSchemeForDiscreteVisualization(),\n                                font_settings=FontSettings(), \n                                page_layout_settings=PageLayoutSettings(),\n                                legend_settings=DiscreteLegendSettings())\n# 可视化\nwatermarked_img = visualizer.visualize(data=watermarked_data, \n                                       show_text=True, \n                                       visualize_weight=True, \n                                       display_legend=True)\n\nunwatermarked_img = visualizer.visualize(data=unwatermarked_data,\n                                         show_text=True, \n                                         visualize_weight=True, \n                                         display_legend=True)\n# 保存\nwatermarked_img.save(\"KGW_watermarked.png\")\nunwatermarked_img.save(\"KGW_unwatermarked.png\")\n```\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FTHU-BPM_MarkLLM_readme_5171ae5b4c77.png\" alt=\"1\" width=\"500\" \u002F>\n\u003C\u002Fdiv>\n\n**Christ 家族**\n\n```python\nimport torch\nfrom visualize.font_settings import FontSettings\nfrom watermark.auto_watermark import AutoWatermark\nfrom utils.transformers_config import TransformersConfig\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nfrom visualize.visualizer import ContinuousVisualizer\nfrom visualize.legend_settings import ContinuousLegendSettings\nfrom visualize.page_layout_settings import PageLayoutSettings\nfrom visualize.color_scheme import ColorSchemeForContinuousVisualization\n\n# 加载水印算法\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\ntransformers_config = TransformersConfig(\n    \t\t\t\t\t\tmodel=AutoModelForCausalLM.from_pretrained('facebook\u002Fopt-1.3b').to(device),\n                            tokenizer=AutoTokenizer.from_pretrained('facebook\u002Fopt-1.3b'),\n                            vocab_size=50272,\n                            device=device,\n                            max_new_tokens=200,\n                            min_length=230,\n                            do_sample=True,\n                            no_repeat_ngram_size=4)\nmyWatermark = AutoWatermark.load('EXP', \n                                 algorithm_config='config\u002FEXP.json',\n                                 transformers_config=transformers_config)\n# 获取用于可视化的数据\nwatermarked_data = myWatermark.get_data_for_visualization(watermarked_text)\nunwatermarked_data = myWatermark.get_data_for_visualization(unwatermarked_text)\n\n# 初始化可视化工具\nvisualizer = ContinuousVisualizer(color_scheme=ColorSchemeForContinuousVisualization(),\n                                  font_settings=FontSettings(), \n                                  page_layout_settings=PageLayoutSettings(),\n                                  legend_settings=ContinuousLegendSettings())\n\n# 可视化\nwatermarked_img = visualizer.visualize(data=watermarked_data, \n                                       show_text=True, \n                                       visualize_weight=True, \n                                       display_legend=True)\n\nunwatermarked_img = visualizer.visualize(data=unwatermarked_data,\n                                         show_text=True, \n                                         visualize_weight=True, \n                                         display_legend=True)\n# 保存\nwatermarked_img.save(\"EXP_watermarked.png\")\nunwatermarked_img.save(\"EXP_unwatermarked.png\")\n```\n\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FTHU-BPM_MarkLLM_readme_35e60216f7fa.png\" alt=\"2\" width=\"500\" \u002F>\n\u003C\u002Fdiv>\n\n有关如何使用可视化工具的更多示例，请参阅项目目录中的 `test\u002Ftest_visualize.py` 脚本。\n\n#### 应用评估流水线\n\n**使用水印检测流水线**\n\n```python\nimport torch\nfrom evaluation.dataset import C4Dataset\nfrom watermark.auto_watermark import AutoWatermark\nfrom utils.transformers_config import TransformersConfig\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nfrom evaluation.tools.text_editor import TruncatePromptTextEditor, WordDeletion\nfrom evaluation.tools.success_rate_calculator import DynamicThresholdSuccessRateCalculator\nfrom evaluation.pipelines.detection import WatermarkedTextDetectionPipeline, UnWatermarkedTextDetectionPipeline, DetectionPipelineReturnType\n\n# 加载数据集\nmy_dataset = C4Dataset('dataset\u002Fc4\u002Fprocessed_c4.json')\n\n# 设备\ndevice = 'cuda' if torch.cuda.is_available() else 'cpu'\n\n# Transformers 配置\ntransformers_config = TransformersConfig(\n    model=AutoModelForCausalLM.from_pretrained('facebook\u002Fopt-1.3b').to(device),\n    tokenizer=AutoTokenizer.from_pretrained('facebook\u002Fopt-1.3b'),\n    vocab_size=50272,\n    device=device,\n    max_new_tokens=200,\n    do_sample=True,\n    min_length=230,\n    no_repeat_ngram_size=4)\n\n# 加载水印算法\nmy_watermark = AutoWatermark.load('KGW', \n                                  algorithm_config='config\u002FKGW.json',\n                                  transformers_config=transformers_config)\n\n# 初始化流水线\npipeline1 = WatermarkedTextDetectionPipeline(\n    dataset=my_dataset, \n    text_editor_list=[TruncatePromptTextEditor(), WordDeletion(ratio=0.3)],\n    show_progress=True, \n    return_type=DetectionPipelineReturnType.SCORES) \n\npipeline2 = UnWatermarkedTextDetectionPipeline(dataset=my_dataset, \n                                               text_editor_list=[],\n                                               show_progress=True,\n                                               return_type=DetectionPipelineReturnType.SCORES)\n\n# 评估\ncalculator = DynamicThresholdSuccessRateCalculator(labels=['TPR', 'F1'], rule='best')\nprint(calculator.calculate(pipeline1.evaluate(my_watermark), pipeline2.evaluate(my_watermark)))\n```\n\n**使用文本质量分析流水线**\n\n```python\nimport torch\nfrom evaluation.dataset import C4Dataset\nfrom watermark.auto_watermark import AutoWatermark\nfrom utils.transformers_config import TransformersConfig\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nfrom evaluation.tools.text_editor import TruncatePromptTextEditor\nfrom evaluation.tools.text_quality_analyzer import PPLCalculator\nfrom evaluation.pipelines.quality_analysis import DirectTextQualityAnalysisPipeline, QualityPipelineReturnType\n\n# 加载数据集\nmy_dataset = C4Dataset('dataset\u002Fc4\u002Fprocessed_c4.json')\n\n# 设备\ndevice = 'cuda' if torch.cuda.is_available() else 'cpu'\n\n# Transformer 配置\ntransformers_config = TransformersConfig(\n    model=AutoModelForCausalLM.from_pretrained('facebook\u002Fopt-1.3b').to(device),                             \ttokenizer=AutoTokenizer.from_pretrained('facebook\u002Fopt-1.3b'),\n    vocab_size=50272,\n    device=device,\n    max_new_tokens=200,\n    min_length=230,\n    do_sample=True,\n    no_repeat_ngram_size=4)\n\n# 加载水印算法\nmy_watermark = AutoWatermark.load('KGW', \n                                  algorithm_config='config\u002FKGW.json',\n                                  transformers_config=transformers_config)\n\n# 初始化流水线\nquality_pipeline = DirectTextQualityAnalysisPipeline(\n    dataset=my_dataset, \n    watermarked_text_editor_list=[TruncatePromptTextEditor()],\n    unwatermarked_text_editor_list=[],\n    analyzer=PPLCalculator(\n        model=AutoModelForCausalLM.from_pretrained('..model\u002Fllama-7b\u002F', device_map='auto'),                 \t\ttokenizer=LlamaTokenizer.from_pretrained('..model\u002Fllama-7b\u002F'),\n        device=device),\n    unwatermarked_text_source='natural', \n    show_progress=True, \n    return_type=QualityPipelineReturnType.MEAN_SCORES)\n\n# 评估\nprint(quality_pipeline.evaluate(my_watermark))\n```\n\n有关如何使用这些流水线的更多示例，请参阅项目目录中的 `test\u002Ftest_pipeline.py` 脚本。\n\n**利用示例脚本来进行评估**\n\n在我们仓库的 `evaluation\u002Fexamples\u002F` 目录中，您会找到一组专门用于系统化和自动化评估各种算法的 Python 脚本。通过使用这些示例，您可以快速有效地评估我们工具包中每种算法的可检测性、鲁棒性以及对文本质量的影响。\n\n注意：要执行 `evaluation\u002Fexamples\u002F` 中的脚本，首先需要运行以下命令来设置环境变量。\n\n```bash\nexport PYTHONPATH=\"path_to_the_MarkLLM_project:$PYTHONPATH\"\n```\n\n### 更多用户示例\n\n额外的用户示例可在 `test\u002F` 目录中找到。要执行其中包含的脚本，首先需要运行以下命令来设置环境变量。\n\n```bash\nexport PYTHONPATH=\"path_to_the_MarkLLM_project:$PYTHONPATH\"\n```\n\n### 演示 Jupyter 笔记本\n\n除了我们提供的 Colab Jupyter 笔记本之外（由于存储限制，部分模型无法下载），您还可以轻松地在本地机器上使用 `MarkLLM_demo.ipynb` 进行部署。\n\n### 引用\n\n```\n@inproceedings{pan-etal-2024-markllm,\n    title = \"{M}ark{LLM}: 用于大语言模型水印的开源工具包\",\n    author = \"潘雷毅 和 刘艾伟 和 何志伟 和 高子腾 和 赵轩东 和 陆义健 和 周炳林 和 刘书亮 和 胡旭明 和 文立杰 和 金尔文 和 余Philip S.\",\n    editor = \"埃尔南德斯·法里亚斯，黛丽娅·伊拉苏 和 霍普，汤姆 和 李曼玲\",\n    booktitle = \"2024年自然语言处理经验方法会议：系统演示论文集\",\n    month = nov,\n    year = \"2024\",\n    address = \"美国佛罗里达州迈阿密\",\n    publisher = \"计算语言学协会\",\n    url = \"https:\u002F\u002Faclanthology.org\u002F2024.emnlp-demo.7\",\n    pages = \"61--71\",\n    abstract = \"针对大语言模型（LLMs）的水印技术，通过在模型输出中嵌入不易察觉但可被算法检测的信号来识别由LLM生成的文本，在缓解LLM潜在滥用方面变得至关重要。然而，现有的LLM水印算法种类繁多、机制复杂，且评估流程和视角多样，这给研究人员及学术界理解、实现和评估最新进展带来了挑战。为解决这些问题，我们提出了MarkLLM——一个用于LLM水印的开源工具包。MarkLLM提供了一个统一且可扩展的框架，用于实现各类LLM水印算法，并配备用户友好的界面以降低使用门槛。此外，它还通过自动可视化这些算法的底层机制，进一步提升理解度。在评估方面，MarkLLM提供了涵盖三个视角的12种工具以及两类自动化评估流水线。借助MarkLLM，我们旨在支持研究人员的工作，同时提高公众对LLM水印技术的理解与参与度，促进共识形成，并推动该领域的研究与应用进一步发展。我们的代码可在https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FMarkLLM获取。\",\n}\n```","# MarkLLM 快速上手指南\n\nMarkLLM 是一个开源的大语言模型（LLM）水印工具包，旨在简化水印技术的接入、理解与评估。它提供了统一的可扩展框架，支持多种主流水印算法及其攻击检测方法。\n\n## 1. 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux, macOS 或 Windows\n*   **Python 版本**: Python 3.8 或更高版本\n*   **硬件建议**: 若需运行本地模型推理或训练，建议配备 NVIDIA GPU 及对应的 CUDA 驱动；若仅使用 API 或轻量级检测，CPU 即可。\n*   **前置依赖**: 建议先更新 `pip` 和 `setuptools`。\n\n```bash\npython -m pip install --upgrade pip setuptools wheel\n```\n\n## 2. 安装步骤\n\n您可以通过 PyPI 直接安装稳定版，或从源码安装以获取最新功能。\n\n### 方式一：通过 PyPI 安装（推荐）\n\n这是最快捷的安装方式：\n\n```bash\npip install markllm\n```\n\n> **国内加速提示**：如果您在中国大陆地区，建议使用清华或阿里镜像源加速安装：\n> ```bash\n> pip install markllm -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n### 方式二：从源码安装\n\n如果您需要贡献代码或使用尚未发布的最新特性：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FMarkLLM.git\ncd MarkLLM\npip install -e .\n```\n\n### ⚠️ 重要提示：模型权重下载\n\n由于仓库体积限制，部分涉及自训练模型的水印算法权重未包含在代码库中。\n**在运行代码前**，请务必访问 [Hugging Face - Generative-Watermark-Toolkits](https:\u002F\u002Fhuggingface.co\u002FGenerative-Watermark-Toolkits) 下载所需的默认模型，并将其保存到项目对应的 `model\u002F` 目录下，否则程序可能无法运行。\n\n## 3. 基本使用\n\nMarkLLM 的设计目标是让开发者能在几行代码内集成水印功能。以下是一个最简单的使用示例，展示如何调用水印算法生成带水印的文本。\n\n### 简单示例：生成带水印的文本\n\n假设您已配置好环境并下载了相关模型，可以通过以下代码调用内置的水印算法（例如 KGW 或 Unbiased 等）：\n\n```python\nfrom markllm import WatermarkManager\n\n# 初始化水印管理器\n# config_path 指向具体的算法配置文件，通常位于 configs\u002F 目录下\nmanager = WatermarkManager(config_path=\"configs\u002Fkgw_config.yaml\")\n\n# 定义输入提示词\nprompt = \"请写一首关于春天的短诗。\"\n\n# 生成带水印的文本\nwatermarked_text = manager.generate(prompt)\n\nprint(\"生成的带水印文本：\")\nprint(watermarked_text)\n\n# 检测文本中是否包含水印\ndetection_result = manager.detect(watermarked_text)\n\nprint(\"\\n检测结果：\")\nprint(f\"是否检测到水印：{detection_result['is_watermarked']}\")\nprint(f\"置信度得分：{detection_result['score']}\")\n```\n\n### 进阶：在现有代码中集成\n\n如果您已经在自己的项目中使用 Hugging Face `transformers` 或其他推理框架，MarkLLM 提供了统一的接口来拦截 logits 并应用水印逻辑。您只需实例化对应的水印处理器，并在生成循环中调用其 `logits_processor` 即可。\n\n更多详细用法、Jupyter Notebook 演示以及针对不同算法（如 SynthID-Text, DiPmark 等）的特定配置，请参考项目根目录下的 `examples\u002F` 文件夹或官方文档。","某金融科技公司正在部署自研的合规报告生成大模型，亟需解决内容版权归属及防止模型被非法蒸馏的问题。\n\n### 没有 MarkLLM 时\n- **版权难以举证**：当发现竞争对手发布的研报与自家模型输出高度相似时，因缺乏隐蔽的技术标识，无法从法律层面证明对方窃取了生成内容。\n- **防御手段缺失**：面对黑产通过大量查询进行“知识蒸馏”以复制模型能力的行为，团队没有任何技术机制来追踪或阻断这种未经授权的知识迁移。\n- **算法复现困难**：研究人员想验证最新的学术水印方案（如语义不变性水印），却需要从零阅读论文并复现复杂的数学逻辑，耗时数周且容易出错。\n- **评估标准混乱**：缺乏统一的测试框架，难以量化水印在保持文本流畅度的同时，抵抗改写、翻译等攻击的鲁棒性。\n\n### 使用 MarkLLM 后\n- **隐形确权溯源**：集成 MarkLLM 的水印算法后，模型生成的每份报告都嵌入了人眼不可见但可机器检测的数字指纹，为版权纠纷提供了确凿的技术证据。\n- **主动防御蒸馏**：利用工具内置的防蒸馏水印策略，一旦检测到异常的批量查询试图提取模型知识，系统能迅速识别并标记来源，有效遏制模型被盗用。\n- **一键集成前沿算法**：开发人员直接调用 MarkLLM 封装好的接口，几分钟内即可部署 ICLR 2024 等顶会提出的最新水印方案，无需重复造轮子。\n- **标准化鲁棒评测**：通过工具自带的评估模块，团队能快速测试水印在经历删改、润色后的存活率，确保在不妨碍用户阅读体验的前提下实现强防护。\n\nMarkLLM 将复杂的水印学术研究转化为工业界开箱即用的防御武器，让大模型内容拥有了可验证的“数字身份证”。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FTHU-BPM_MarkLLM_5171ae5b.png","THU-BPM","BPM Team of Tsinghua University","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FTHU-BPM_e90b57cc.png","",null,"https:\u002F\u002Fgithub.com\u002FTHU-BPM",[21,25,29,33],{"name":22,"color":23,"percentage":24},"Python","#3572A5",93.1,{"name":26,"color":27,"percentage":28},"Jupyter Notebook","#DA5B0B",6.3,{"name":30,"color":31,"percentage":32},"Shell","#89e051",0.3,{"name":34,"color":35,"percentage":32},"Cython","#fedf5b",914,84,"2026-04-10T19:31:40","Apache-2.0",3,"未说明","未说明（通常运行 LLM 水印算法及加载模型需要 NVIDIA GPU，具体显存取决于所选模型大小）",{"notes":44,"python":41,"dependencies":45},"1. 模型存储：由于仓库体积限制，默认模型权重已从主仓库移除。运行代码前，必须根据配置路径从 Hugging Face 仓库（Generative-Watermark-Toolkits）下载相应模型并保存至本地的 `model\u002F` 目录。\n2. 安装方式：支持通过 `pip install markllm` 直接安装 Python 包。\n3. 功能集成：提供了与 vLLM 集成的示例代码（MarkvLLM_demo.py）。\n4. 算法支持：支持多种水印算法（如 KGW, Unbiased, SynthID-Text, DiPmark 等）及攻击方法。",[46,47,48,49],"torch","transformers","accelerate","vllm (可选，用于集成演示)",[51,52],"开发框架","语言模型",[54,55,56,57,58,59],"llm","toolkit","watermark","large-language-models","safety","trustworthy-ai",2,"ready","2026-03-27T02:49:30.150509","2026-04-11T08:04:12.113393",[65,70,75,80,85,90,95],{"id":66,"question_zh":67,"answer_zh":68,"source_url":69},29233,"如何在配置中启用无偏水印（Unbiased Watermark）的 delta-reweight 或 gamma-reweight 算法？","项目已实现论文中描述的 δ-reweight 和 LLR 分数检测方法。您可以通过修改 `config\u002FUnbiased.json` 配置文件中的类型设置来选择使用 \"delta\" 或 \"gamma\" 重加权方法。","https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FMarkLLM\u002Fissues\u002F24",{"id":71,"question_zh":72,"answer_zh":73,"source_url":74},29234,"推理时水印（如 KGW 系列）是否依赖于特定模型？如果两个模型架构和训练数据相同但初始化不同，水印能区分它们吗？","不能区分。推理时水印（Inference-time watermarking，如 KGW 或 Christ Family 方法）不修改模型权重，而是在模型输出 logits 上添加偏差或在采样过程中干预。只要对模型输出应用了水印方法，无论具体模型或其训练细节（如初始化）如何，水印都是可检测的。水印本身并非每个模型独有。","https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FMarkLLM\u002Fissues\u002F5",{"id":76,"question_zh":77,"answer_zh":78,"source_url":79},29235,"为什么在使用 UPV 方法检测时，返回的 score 字段为 None？","这取决于 UPV 的运行模式：\n1. **Network 模式**：score 为 None。因为该方法训练了一个检测网络进行整段文本的二分类，训练过程中不包含置信度信息。\n2. **Key 模式**：会生成 score。此时检测网络被禁用，改用词汇划分器（生成器网络）判断每个 token 位置是红色还是绿色，从而计算出分数。\n如果您需要具体的置信度分数，请切换到 Key 模式。","https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FMarkLLM\u002Fissues\u002F51",{"id":81,"question_zh":82,"answer_zh":83,"source_url":84},29236,"运行 EXP-Gumbel 算法检测时，为什么 F1 分数较低（例如只有 0.66）？","F1 分数低通常由以下两个配置错误导致：\n1. **未移除攻击测试组件**：在评估正常情况（无攻击）下的性能时，请确保从文本编辑器列表（text editor list）中移除 `WordDeletion`。该组件仅用于测试鲁棒性。\n2. **阈值方向设置错误**：对于 EXP-Gumbel 算法，p 值越低表示水印存在越强。您必须在 `DynamicThresholdSuccessRateCalculator` 中设置 `reverse=True`。\n修正上述配置后，F1 分数可提升至 0.99 左右。","https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FMarkLLM\u002Fissues\u002F38",{"id":86,"question_zh":87,"answer_zh":88,"source_url":89},29237,"通过 pip 安装 markllm 后，导入 `markllm.evaluation.tools` 时报错 `ModuleNotFoundError` 怎么办？","这是由于 pip 打包过程中遗漏了 `tools\u002F` 文件夹导致的。解决方法有两种：\n1. **重新安装**：运行 `pip install` 再次安装，开发者已重新打包并上传了包含该文件夹的新版本。\n2. **源码安装（推荐）**：使用 `git clone` 克隆仓库直接使用。这种方式更灵活，可以调整配置并确保使用的是最新版本的工具包。","https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FMarkLLM\u002Fissues\u002F25",{"id":91,"question_zh":92,"answer_zh":93,"source_url":94},29238,"代码中使用了 Pillow 库已废弃的 `getsize()` 和 `textsize()` 方法导致报错，该如何修复？","Pillow 9.0.0+ 版本已移除这些方法，需替换为 `getbbox()`。具体修改如下：\n1. 在 `_calculate_line_space` 方法中：用 `getbbox()` 替代 `getsize()`，并通过 `word_width = bbox[2] - bbox[0]` 计算宽度。\n2. 在 `visualize` 方法中：用 `getbbox()` 替代 `textsize()`，并通过 `token_width = bbox[2] - bbox[0]` 计算宽度。\n此修改已在 commit d876de6 中完成。","https:\u002F\u002Fgithub.com\u002FTHU-BPM\u002FMarkLLM\u002Fissues\u002F46",{"id":96,"question_zh":97,"answer_zh":98,"source_url":74},29239,"当前的水印算法对不同的推理超参数（如 Beam Search 或不同的 Beam Size）是否具有鲁棒性？","是的，当前的水印算法通常针对不同推理超参数（如采样方法、Beam Size 等）进行了鲁棒性测试。经验结果表明，水印方法对这些变化具有较好的鲁棒性。不过，建议您在具体的使用场景中，针对不同的推理设置测试特定的水印方法，以确保其有效性。",[],[101,112,120,128,136,145],{"id":102,"name":103,"github_repo":104,"description_zh":105,"stars":106,"difficulty_score":40,"last_commit_at":107,"category_tags":108,"status":61},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,"2026-04-06T06:32:30",[109,51,110,111],"Agent","图像","数据工具",{"id":113,"name":114,"github_repo":115,"description_zh":116,"stars":117,"difficulty_score":40,"last_commit_at":118,"category_tags":119,"status":61},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[51,110,109],{"id":121,"name":122,"github_repo":123,"description_zh":124,"stars":125,"difficulty_score":60,"last_commit_at":126,"category_tags":127,"status":61},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",150037,"2026-04-10T23:33:47",[51,109,52],{"id":129,"name":130,"github_repo":131,"description_zh":132,"stars":133,"difficulty_score":60,"last_commit_at":134,"category_tags":135,"status":61},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[51,110,109],{"id":137,"name":138,"github_repo":139,"description_zh":140,"stars":141,"difficulty_score":60,"last_commit_at":142,"category_tags":143,"status":61},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[144,109,110,51],"插件",{"id":146,"name":147,"github_repo":148,"description_zh":149,"stars":150,"difficulty_score":60,"last_commit_at":151,"category_tags":152,"status":61},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[144,51]]