[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-zilliztech--GPTCache":3,"tool-zilliztech--GPTCache":62},[4,18,28,37,45,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":24,"last_commit_at":25,"category_tags":26,"status":17},9989,"n8n","n8n-io\u002Fn8n","n8n 是一款面向技术团队的公平代码（fair-code）工作流自动化平台，旨在让用户在享受低代码快速构建便利的同时，保留编写自定义代码的灵活性。它主要解决了传统自动化工具要么过于封闭难以扩展、要么完全依赖手写代码效率低下的痛点，帮助用户轻松连接 400 多种应用与服务，实现复杂业务流程的自动化。\n\nn8n 特别适合开发者、工程师以及具备一定技术背景的业务人员使用。其核心亮点在于“按需编码”：既可以通过直观的可视化界面拖拽节点搭建流程，也能随时插入 JavaScript 或 Python 代码、调用 npm 包来处理复杂逻辑。此外，n8n 原生集成了基于 LangChain 的 AI 能力，支持用户利用自有数据和模型构建智能体工作流。在部署方面，n8n 提供极高的自由度，支持完全自托管以保障数据隐私和控制权，也提供云端服务选项。凭借活跃的社区生态和数百个现成模板，n8n 让构建强大且可控的自动化系统变得简单高效。",184740,2,"2026-04-19T23:22:26",[16,14,13,15,27],"插件",{"id":29,"name":30,"github_repo":31,"description_zh":32,"stars":33,"difficulty_score":10,"last_commit_at":34,"category_tags":35,"status":17},10095,"AutoGPT","Significant-Gravitas\u002FAutoGPT","AutoGPT 是一个旨在让每个人都能轻松使用和构建 AI 的强大平台，核心功能是帮助用户创建、部署和管理能够自动执行复杂任务的连续型 AI 智能体。它解决了传统 AI 应用中需要频繁人工干预、难以自动化长流程工作的痛点，让用户只需设定目标，AI 即可自主规划步骤、调用工具并持续运行直至完成任务。\n\n无论是开发者、研究人员，还是希望提升工作效率的普通用户，都能从 AutoGPT 中受益。开发者可利用其低代码界面快速定制专属智能体；研究人员能基于开源架构探索多智能体协作机制；而非技术背景用户也可直接选用预置的智能体模板，立即投入实际工作场景。\n\nAutoGPT 的技术亮点在于其模块化“积木式”工作流设计——用户通过连接功能块即可构建复杂逻辑，每个块负责单一动作，灵活且易于调试。同时，平台支持本地自托管与云端部署两种模式，兼顾数据隐私与使用便捷性。配合完善的文档和一键安装脚本，即使是初次接触的用户也能在几分钟内启动自己的第一个 AI 智能体。AutoGPT 正致力于降低 AI 应用门槛，让人人都能成为 AI 的创造者与受益者。",183572,"2026-04-20T04:47:55",[13,36,27,14,15],"语言模型",{"id":38,"name":39,"github_repo":40,"description_zh":41,"stars":42,"difficulty_score":10,"last_commit_at":43,"category_tags":44,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":46,"name":47,"github_repo":48,"description_zh":49,"stars":50,"difficulty_score":24,"last_commit_at":51,"category_tags":52,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",161692,"2026-04-20T11:33:57",[14,13,36],{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":59,"last_commit_at":60,"category_tags":61,"status":17},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[13,27],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":78,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":99,"forks":100,"last_commit_at":101,"license":102,"difficulty_score":24,"env_os":103,"env_gpu":104,"env_ram":104,"env_deps":105,"category_tags":115,"github_topics":116,"view_count":24,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":134,"updated_at":135,"faqs":136,"releases":166},10166,"zilliztech\u002FGPTCache","GPTCache","Semantic cache for LLMs. Fully integrated with LangChain and llama_index. ","GPTCache 是一个专为大语言模型（LLM）打造的语义缓存库，旨在通过智能存储和复用历史回答来优化应用性能。随着 AI 应用的普及，频繁调用 LLM API 不仅成本高昂，还常面临响应延迟的问题。GPTCache 通过理解问题的语义而非简单的文字匹配，能够识别出含义相似的用户提问并直接返回缓存结果，从而将 API 调用成本降低约 10 倍，同时将响应速度提升高达 100 倍。\n\n这款工具主要面向开发者和研究人员，特别是那些正在使用 LangChain 或 llama_index 框架构建 AI 应用的技术人员。其核心亮点在于“语义缓存”技术，即利用向量相似度判断问题是否等价，即使提问措辞不同也能精准命中缓存。此外，GPTCache 已提供 Docker 服务器镜像，支持跨语言调用，让非 Python 环境也能轻松集成。对于希望在不牺牲回答质量的前提下，显著减少 Token 消耗并提升系统并发能力的团队来说，GPTCache 是一个高效且易于集成的解决方案。","# GPTCache : A Library for Creating Semantic Cache for LLM Queries\nSlash Your LLM API Costs by 10x 💰, Boost Speed by 100x ⚡ \n\n[![Release](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fgptcache?label=Release&color&logo=Python)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fgptcache\u002F)\n[![pip download](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Fgptcache.svg?color=bright-green&logo=Pypi)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fgptcache\u002F)\n[![Codecov](https:\u002F\u002Fimg.shields.io\u002Fcodecov\u002Fc\u002Fgithub\u002Fzilliztech\u002FGPTCache\u002Fdev?label=Codecov&logo=codecov&token=E30WxqBeJJ)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fzilliztech\u002FGPTCache)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-blue.svg)](https:\u002F\u002Fopensource.org\u002Flicense\u002Fmit\u002F)\n[![Twitter](https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Furl\u002Fhttps\u002Ftwitter.com\u002Fzilliz_universe.svg?style=social&label=Follow%20%40Zilliz)](https:\u002F\u002Ftwitter.com\u002Fzilliz_universe)\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1092648432495251507?label=Discord&logo=discord)](https:\u002F\u002Fdiscord.gg\u002FQ8C6WEjSWV)\n\n🎉 GPTCache has been fully integrated with 🦜️🔗[LangChain](https:\u002F\u002Fgithub.com\u002Fhwchase17\u002Flangchain) ! Here are detailed [usage instructions](https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002Fmodules\u002Fmodel_io\u002Fmodels\u002Fllms\u002Fintegrations\u002Fllm_caching#gptcache).\n\n🐳 [The GPTCache server docker image](https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fblob\u002Fmain\u002Fdocs\u002Fusage.md#Use-GPTCache-server) has been released, which means that **any language** will be able to use GPTCache!\n\n📔 This project is undergoing swift development, and as such, the API may be subject to change at any time. For the most up-to-date information, please refer to the latest [documentation]( https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002F) and [release note](https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fblob\u002Fmain\u002Fdocs\u002Frelease_note.md).\n\n**NOTE:** As the number of large models is growing explosively and their API shape is constantly evolving, we no longer add support for new API or models. We encourage the usage of using the get and set API in gptcache, here is the demo code: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fblob\u002Fmain\u002Fexamples\u002Fadapter\u002Fapi.py\n\n## Quick Install\n\n`pip install gptcache`\n\n## 🚀 What is GPTCache?\n\nChatGPT and various large language models (LLMs) boast incredible versatility, enabling the development of a wide range of applications. However, as your application grows in popularity and encounters higher traffic levels, the expenses related to LLM API calls can become substantial. Additionally, LLM services might exhibit slow response times, especially when dealing with a significant number of requests.\n\nTo tackle this challenge, we have created GPTCache, a project dedicated to building a semantic cache for storing LLM responses. \n\n## 😊 Quick Start\n\n**Note**:\n\n- You can quickly try GPTCache and put it into a production environment without heavy development. However, please note that the repository is still under heavy development.\n- By default, only a limited number of libraries are installed to support the basic cache functionalities. When you need to use additional features, the related libraries will be **automatically installed**.\n- Make sure that the Python version is **3.8.1 or higher**, check: `python --version`\n- If you encounter issues installing a library due to a low pip version, run: `python -m pip install --upgrade pip`.\n\n### dev install\n\n```bash\n# clone GPTCache repo\ngit clone -b dev https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache.git\ncd GPTCache\n\n# install the repo\npip install -r requirements.txt\npython setup.py install\n```\n\n### example usage\n\nThese examples will help you understand how to use exact and similar matching with caching. You can also run the example on [Colab](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1m1s-iTDfLDk-UwUAQ_L8j1C-gzkcr2Sk?usp=share_link). And more examples you can refer to the [Bootcamp](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Fopenai\u002Fchat.html)\n\nBefore running the example, **make sure** the OPENAI_API_KEY environment variable is set by executing `echo $OPENAI_API_KEY`. \n\nIf it is not already set, it can be set by using `export OPENAI_API_KEY=YOUR_API_KEY` on Unix\u002FLinux\u002FMacOS systems or `set OPENAI_API_KEY=YOUR_API_KEY` on Windows systems. \n\n> It is important to note that this method is only effective temporarily, so if you want a permanent effect, you'll need to modify the environment variable configuration file. For instance, on a Mac, you can modify the file located at `\u002Fetc\u002Fprofile`.\n\n\u003Cdetails>\n\n\u003Csummary> Click to \u003Cstrong>SHOW\u003C\u002Fstrong> example code \u003C\u002Fsummary>\n\n#### OpenAI API original usage\n\n```python\nimport os\nimport time\n\nimport openai\n\n\ndef response_text(openai_resp):\n    return openai_resp['choices'][0]['message']['content']\n\n\nquestion = 'what‘s chatgpt'\n\n# OpenAI API original usage\nopenai.api_key = os.getenv(\"OPENAI_API_KEY\")\nstart_time = time.time()\nresponse = openai.ChatCompletion.create(\n  model='gpt-3.5-turbo',\n  messages=[\n    {\n        'role': 'user',\n        'content': question\n    }\n  ],\n)\nprint(f'Question: {question}')\nprint(\"Time consuming: {:.2f}s\".format(time.time() - start_time))\nprint(f'Answer: {response_text(response)}\\n')\n\n```\n\n#### OpenAI API + GPTCache, exact match cache\n\n> If you ask ChatGPT the exact same two questions, the answer to the second question will be obtained from the cache without requesting ChatGPT again.\n\n```python\nimport time\n\n\ndef response_text(openai_resp):\n    return openai_resp['choices'][0]['message']['content']\n\nprint(\"Cache loading.....\")\n\n# To use GPTCache, that's all you need\n# -------------------------------------------------\nfrom gptcache import cache\nfrom gptcache.adapter import openai\n\ncache.init()\ncache.set_openai_key()\n# -------------------------------------------------\n\nquestion = \"what's github\"\nfor _ in range(2):\n    start_time = time.time()\n    response = openai.ChatCompletion.create(\n      model='gpt-3.5-turbo',\n      messages=[\n        {\n            'role': 'user',\n            'content': question\n        }\n      ],\n    )\n    print(f'Question: {question}')\n    print(\"Time consuming: {:.2f}s\".format(time.time() - start_time))\n    print(f'Answer: {response_text(response)}\\n')\n```\n\n#### OpenAI API + GPTCache, similar search cache\n\n> After obtaining an answer from ChatGPT in response to several similar questions, the answers to subsequent questions can be retrieved from the cache without the need to request ChatGPT again.\n\n```python\nimport time\n\n\ndef response_text(openai_resp):\n    return openai_resp['choices'][0]['message']['content']\n\nfrom gptcache import cache\nfrom gptcache.adapter import openai\nfrom gptcache.embedding import Onnx\nfrom gptcache.manager import CacheBase, VectorBase, get_data_manager\nfrom gptcache.similarity_evaluation.distance import SearchDistanceEvaluation\n\nprint(\"Cache loading.....\")\n\nonnx = Onnx()\ndata_manager = get_data_manager(CacheBase(\"sqlite\"), VectorBase(\"faiss\", dimension=onnx.dimension))\ncache.init(\n    embedding_func=onnx.to_embeddings,\n    data_manager=data_manager,\n    similarity_evaluation=SearchDistanceEvaluation(),\n    )\ncache.set_openai_key()\n\nquestions = [\n    \"what's github\",\n    \"can you explain what GitHub is\",\n    \"can you tell me more about GitHub\",\n    \"what is the purpose of GitHub\"\n]\n\nfor question in questions:\n    start_time = time.time()\n    response = openai.ChatCompletion.create(\n        model='gpt-3.5-turbo',\n        messages=[\n            {\n                'role': 'user',\n                'content': question\n            }\n        ],\n    )\n    print(f'Question: {question}')\n    print(\"Time consuming: {:.2f}s\".format(time.time() - start_time))\n    print(f'Answer: {response_text(response)}\\n')\n```\n\n#### OpenAI API + GPTCache, use temperature\n\n> You can always pass a parameter of temperature while requesting the API service or model.\n> \n> The range of `temperature` is [0, 2], default value is 0.0.\n> \n> A higher temperature means a higher possibility of skipping cache search and requesting large model directly.\n> When temperature is 2, it will skip cache and send request to large model directly for sure. When temperature is 0, it will search cache before requesting large model service.\n> \n> The default `post_process_messages_func` is `temperature_softmax`. In this case, refer to [API reference](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Freferences\u002Fprocessor.html#module-gptcache.processor.post) to learn about how `temperature` affects output.\n\n```python\nimport time\n\nfrom gptcache import cache, Config\nfrom gptcache.manager import manager_factory\nfrom gptcache.embedding import Onnx\nfrom gptcache.processor.post import temperature_softmax\nfrom gptcache.similarity_evaluation.distance import SearchDistanceEvaluation\nfrom gptcache.adapter import openai\n\ncache.set_openai_key()\n\nonnx = Onnx()\ndata_manager = manager_factory(\"sqlite,faiss\", vector_params={\"dimension\": onnx.dimension})\n\ncache.init(\n    embedding_func=onnx.to_embeddings,\n    data_manager=data_manager,\n    similarity_evaluation=SearchDistanceEvaluation(),\n    post_process_messages_func=temperature_softmax\n    )\n# cache.config = Config(similarity_threshold=0.2)\n\nquestion = \"what's github\"\n\nfor _ in range(3):\n    start = time.time()\n    response = openai.ChatCompletion.create(\n        model=\"gpt-3.5-turbo\",\n        temperature = 1.0,  # Change temperature here\n        messages=[{\n            \"role\": \"user\",\n            \"content\": question\n        }],\n    )\n    print(\"Time elapsed:\", round(time.time() - start, 3))\n    print(\"Answer:\", response[\"choices\"][0][\"message\"][\"content\"])\n```\n\n\u003C\u002Fdetails>\n\nTo use GPTCache exclusively, only the following lines of code are required, and there is no need to modify any existing code.\n\n```python\nfrom gptcache import cache\nfrom gptcache.adapter import openai\n\ncache.init()\ncache.set_openai_key()\n```\n\nMore Docs：\n\n- [Usage, how to use GPTCache better](docs\u002Fusage.md)\n- [Features, all features currently supported by the cache](docs\u002Ffeature.md)\n- [Examples, learn better custom caching](examples\u002FREADME.md)\n- [Distributed Caching and Horizontal Scaling ](docs\u002Fhorizontal-scaling-usage.md)\n\n## 🎓 Bootcamp\n\n- GPTCache with **LangChain**\n  - [QA Generation](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Flangchain\u002Fqa_generation.html)\n  - [Question Answering](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Flangchain\u002Fquestion_answering.html)\n  - [SQL Chain](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Flangchain\u002Fsqlite.html)\n  - [BabyAGI User Guide](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Flangchain\u002Fbaby_agi.html)\n- GPTCache with **Llama_index**\n  - [WebPage QA](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Fllama_index\u002Fwebpage_qa.html)\n- GPTCache with **OpenAI**\n  - [Chat completion](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Fopenai\u002Fchat.html)\n  - [Language Translation](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Fopenai\u002Flanguage_translate.html)\n  - [SQL Translate](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Fopenai\u002Fsql_translate.html)\n  - [Twitter Classifier](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Fopenai\u002Ftweet_classifier.html)\n  - [Multimodal: Image Generation](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Fopenai\u002Fimage_generation.html)\n  - [Multimodal: Speech to Text](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Fopenai\u002Fspeech_to_text.html)\n- GPTCache with **Replicate**\n  - [Visual Question Answering](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Freplicate\u002Fvisual_question_answering.html)\n- GPTCache with **Temperature Param**\n  - [OpenAI Chat](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Ftemperature\u002Fchat.html)\n  - [OpenAI Image Creation](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Ftemperature\u002Fcreate_image.html)\n\n## 😎 What can this help with?\nGPTCache offers the following primary benefits:\n\n- **Decreased expenses**: Most LLM services charge fees based on a combination of number of requests and [token count](https:\u002F\u002Fopenai.com\u002Fpricing). GPTCache effectively minimizes your expenses by caching query results, which in turn reduces the number of requests and tokens sent to the LLM service. As a result, you can enjoy a more cost-efficient experience when using the service.\n- **Enhanced performance**: LLMs employ generative AI algorithms to generate responses in real-time, a process that can sometimes be time-consuming. However, when a similar query is cached, the response time significantly improves, as the result is fetched directly from the cache, eliminating the need to interact with the LLM service. In most situations, GPTCache can also provide superior query throughput compared to standard LLM services.\n- **Adaptable development and testing environment**: As a developer working on LLM applications, you're aware that connecting to LLM APIs is generally necessary, and comprehensive testing of your application is crucial before moving it to a production environment. GPTCache provides an interface that mirrors LLM APIs and accommodates storage of both LLM-generated and mocked data. This feature enables you to effortlessly develop and test your application, eliminating the need to connect to the LLM service.\n- **Improved scalability and availability**: LLM services frequently enforce [rate limits](https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fguides\u002Frate-limits), which are constraints that APIs place on the number of times a user or client can access the server within a given timeframe. Hitting a rate limit means that additional requests will be blocked until a certain period has elapsed, leading to a service outage. With GPTCache, you can easily scale to accommodate an increasing volume of queries, ensuring consistent performance as your application's user base expands.\n\n## 🤔 How does it work?\n\nOnline services often exhibit data locality, with users frequently accessing popular or trending content. Cache systems take advantage of this behavior by storing commonly accessed data, which in turn reduces data retrieval time, improves response times, and eases the burden on backend servers. Traditional cache systems typically utilize an exact match between a new query and a cached query to determine if the requested content is available in the cache before fetching the data.\n\nHowever, using an exact match approach for LLM caches is less effective due to the complexity and variability of LLM queries, resulting in a low cache hit rate. To address this issue, GPTCache adopt alternative strategies like semantic caching. Semantic caching identifies and stores similar or related queries, thereby increasing cache hit probability and enhancing overall caching efficiency. \n\nGPTCache employs embedding algorithms to convert queries into embeddings and uses a vector store for similarity search on these embeddings. This process allows GPTCache to identify and retrieve similar or related queries from the cache storage, as illustrated in the [Modules section](https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache#-modules). \n\nFeaturing a modular design, GPTCache makes it easy for users to customize their own semantic cache. The system offers various implementations for each module, and users can even develop their own implementations to suit their specific needs.\n\nIn a semantic cache, you may encounter false positives during cache hits and false negatives during cache misses. GPTCache offers three metrics to gauge its performance, which are helpful for developers to optimize their caching systems:\n\n- **Hit Ratio**: This metric quantifies the cache's ability to fulfill content requests successfully, compared to the total number of requests it receives. A higher hit ratio indicates a more effective cache.\n- **Latency**: This metric measures the time it takes for a query to be processed and the corresponding data to be retrieved from the cache. Lower latency signifies a more efficient and responsive caching system.\n- **Recall**: This metric represents the proportion of queries served by the cache out of the total number of queries that should have been served by the cache. Higher recall percentages indicate that the cache is effectively serving the appropriate content.\n\nA [sample benchmark](https:\u002F\u002Fgithub.com\u002Fzilliztech\u002Fgpt-cache\u002Fblob\u002Fmain\u002Fexamples\u002Fbenchmark\u002Fbenchmark_sqlite_faiss_onnx.py) is included for users to start with assessing the performance of their semantic cache.\n\n## 🤗 Modules\n\n![GPTCache Struct](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fzilliztech_GPTCache_readme_35facb6a37e3.png)\n\n- **LLM Adapter**: \nThe LLM Adapter is designed to integrate different LLM models by unifying their APIs and request protocols. GPTCache offers a standardized interface for this purpose, with current support for ChatGPT integration.\n  - [x] Support OpenAI ChatGPT API.\n  - [x] Support [langchain](https:\u002F\u002Fgithub.com\u002Fhwchase17\u002Flangchain).\n  - [x] Support [minigpt4](https:\u002F\u002Fgithub.com\u002FVision-CAIR\u002FMiniGPT-4.git).\n  - [x] Support [Llamacpp](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp.git).\n  - [x] Support [dolly](https:\u002F\u002Fgithub.com\u002Fdatabrickslabs\u002Fdolly.git).\n  - [ ] Support other LLMs, such as Hugging Face Hub, Bard, Anthropic.\n- **Multimodal Adapter (experimental)**: \nThe Multimodal Adapter is designed to integrate different large multimodal models by unifying their APIs and request protocols. GPTCache offers a standardized interface for this purpose, with current support for integrations of image generation, audio transcription.\n  - [x] Support OpenAI Image Create API.\n  - [x] Support OpenAI Audio Transcribe API.\n  - [x] Support Replicate BLIP API.\n  - [x] Support Stability Inference API.\n  - [x] Support Hugging Face Stable Diffusion Pipeline (local inference).\n  - [ ] Support other multimodal services or self-hosted large multimodal models.\n- **Embedding Generator**: \nThis module is created to extract embeddings from requests for similarity search. GPTCache offers a generic interface that supports multiple embedding APIs, and presents a range of solutions to choose from. \n  - [x] Disable embedding. This will turn GPTCache into a keyword-matching cache.\n  - [x] Support OpenAI embedding API.\n  - [x] Support [ONNX](https:\u002F\u002Fonnx.ai\u002F) with the GPTCache\u002Fparaphrase-albert-onnx model.\n  - [x] Support [Hugging Face](https:\u002F\u002Fhuggingface.co\u002F) embedding with transformers, ViTModel, Data2VecAudio.\n  - [x] Support [Cohere](https:\u002F\u002Fdocs.cohere.ai\u002Freference\u002Fembed) embedding API.\n  - [x] Support [fastText](https:\u002F\u002Ffasttext.cc) embedding.\n  - [x] Support [SentenceTransformers](https:\u002F\u002Fwww.sbert.net) embedding.\n  - [x] Support [Timm](https:\u002F\u002Ftimm.fast.ai\u002F) models for image embedding.\n  - [ ] Support other embedding APIs.\n- **Cache Storage**:\n**Cache Storage** is where the response from LLMs, such as ChatGPT, is stored. Cached responses are retrieved to assist in evaluating similarity and are returned to the requester if there is a good semantic match. At present, GPTCache supports SQLite and offers a universally accessible interface for extension of this module.\n  - [x] Support [SQLite](https:\u002F\u002Fsqlite.org\u002Fdocs.html).\n  - [x] Support [DuckDB](https:\u002F\u002Fduckdb.org\u002F).\n  - [x] Support [PostgreSQL](https:\u002F\u002Fwww.postgresql.org\u002F).\n  - [x] Support [MySQL](https:\u002F\u002Fwww.mysql.com\u002F).\n  - [x] Support [MariaDB](https:\u002F\u002Fmariadb.org\u002F).\n  - [x] Support [SQL Server](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fsql-server\u002F).\n  - [x] Support [Oracle](https:\u002F\u002Fwww.oracle.com\u002F).\n  - [x] Support [DynamoDB](https:\u002F\u002Faws.amazon.com\u002Fdynamodb\u002F).\n  - [ ] Support [MongoDB](https:\u002F\u002Fwww.mongodb.com\u002F).\n  - [ ] Support [Redis](https:\u002F\u002Fredis.io\u002F).\n  - [ ] Support [Minio](https:\u002F\u002Fmin.io\u002F).\n  - [ ] Support [HBase](https:\u002F\u002Fhbase.apache.org\u002F).\n  - [ ] Support [ElasticSearch](https:\u002F\u002Fwww.elastic.co\u002F).\n  - [ ] Support other storages.\n- **Vector Store**:\nThe **Vector Store** module helps find the K most similar requests from the input request's extracted embedding. The results can help assess similarity. GPTCache provides a user-friendly interface that supports various vector stores, including Milvus, Zilliz Cloud, and FAISS. More options will be available in the future.\n  - [x] Support [Milvus](https:\u002F\u002Fmilvus.io\u002F), an open-source vector database for production-ready AI\u002FLLM applications. \n  - [x] Support [Zilliz Cloud](https:\u002F\u002Fcloud.zilliz.com\u002F), a fully-managed cloud vector database based on Milvus.\n  - [x] Support [Milvus Lite](https:\u002F\u002Fgithub.com\u002Fmilvus-io\u002Fmilvus-lite), a lightweight version of Milvus that can be embedded into your Python application.\n  - [x] Support [FAISS](https:\u002F\u002Ffaiss.ai\u002F), a library for efficient similarity search and clustering of dense vectors.\n  - [x] Support [Hnswlib](https:\u002F\u002Fgithub.com\u002Fnmslib\u002Fhnswlib), header-only C++\u002Fpython library for fast approximate nearest neighbors.\n  - [x] Support [PGVector](https:\u002F\u002Fgithub.com\u002Fpgvector\u002Fpgvector), open-source vector similarity search for Postgres.\n  - [x] Support [Chroma](https:\u002F\u002Fgithub.com\u002Fchroma-core\u002Fchroma), the AI-native open-source embedding database.\n  - [x] Support [DocArray](https:\u002F\u002Fgithub.com\u002Fdocarray\u002Fdocarray), DocArray is a library for representing, sending and storing multi-modal data, perfect for Machine Learning applications.\n  - [x] Support qdrant\n  - [x] Support weaviate\n  - [ ] Support other vector databases.\n- **Cache Manager**:\nThe **Cache Manager** is responsible for controlling the operation of both the **Cache Storage** and **Vector Store**.\n  - **Eviction Policy**:\n  Cache eviction can be managed in memory using python's `cachetools` or in a distributed fashion using Redis as a key-value store. \n  - **In-Memory Caching**\n  \n  Currently, GPTCache makes decisions about evictions based solely on the number of lines. This approach can result in inaccurate resource evaluation and may cause out-of-memory (OOM) errors. We are actively investigating and developing a more sophisticated strategy.\n    - [x] Support LRU eviction policy.\n    - [x] Support FIFO eviction policy.\n    - [x] Support LFU eviction policy.\n    - [x] Support RR eviction policy.\n    - [ ] Support more complicated eviction policies.\n  - **Distributed Caching**\n  \n  If you were to scale your GPTCache deployment horizontally using in-memory caching, it won't be possible. Since the cached information would be limited to the single pod.\n  \n  With Distributed Caching, cache information consistent across all replicas we can use Distributed Cache stores like Redis. \n    - [x] Support Redis distributed cache\n    - [x] Support memcached distributed cache\n  \n- **Similarity Evaluator**: \nThis module collects data from both the **Cache Storage** and **Vector Store**, and uses various strategies to determine the similarity between the input request and the requests from the **Vector Store**. Based on this similarity, it determines whether a request matches the cache. GPTCache provides a standardized interface for integrating various strategies, along with a collection of implementations to use. The following similarity definitions are currently supported or will be supported in the future:\n  - [x] The distance we obtain from the **Vector Store**.\n  - [x] A model-based similarity determined using the GPTCache\u002Falbert-duplicate-onnx model from [ONNX](https:\u002F\u002Fonnx.ai\u002F).\n  - [x] Exact matches between the input request and the requests obtained from the **Vector Store**.\n  - [x] Distance represented by applying linalg.norm from numpy to the embeddings.\n  - [ ] BM25 and other similarity measurements.\n  - [ ] Support other model serving framework such as PyTorch.\n \n  \n  **Note**:Not all combinations of different modules may be compatible with each other. For instance, if we disable the **Embedding Extractor**, the **Vector Store** may not function as intended. We are currently working on implementing a combination sanity check for **GPTCache**.\n\n## 😇 Roadmap\nComing soon! [Stay tuned!](https:\u002F\u002Ftwitter.com\u002Fzilliz_universe)\n\n## 😍 Contributing\nWe are extremely open to contributions, be it through new features, enhanced infrastructure, or improved documentation.\n\nFor comprehensive instructions on how to contribute, please refer to our [contribution guide](docs\u002Fcontributing.md).\n","# GPTCache：用于为大语言模型查询创建语义缓存的库\n将您的 LLM API 成本降低 10 倍 💰，速度提升 100 倍 ⚡ \n\n[![发布版本](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fgptcache?label=Release&color&logo=Python)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fgptcache\u002F)\n[![pip 下载量](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Fgptcache.svg?color=bright-green&logo=Pypi)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fgptcache\u002F)\n[![Codecov](https:\u002F\u002Fimg.shields.io\u002Fcodecov\u002Fc\u002Fgithub\u002Fzilliztech\u002FGPTCache\u002Fdev?label=Codecov&logo=codecov&token=E30WxqBeJJ)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fzilliztech\u002FGPTCache)\n[![许可证](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-blue.svg)](https:\u002F\u002Fopensource.org\u002Flicense\u002Fmit\u002F)\n[![Twitter](https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Furl\u002Fhttps\u002Ftwitter.com\u002Fzilliz_universe.svg?style=social&label=Follow%20%40Zilliz)](https:\u002F\u002Ftwitter.com\u002Fzilliz_universe)\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1092648432495251507?label=Discord&logo=discord)](https:\u002F\u002Fdiscord.gg\u002FQ8C6WEjSWV)\n\n🎉 GPTCache 已与 🦜️🔗[LangChain](https:\u002F\u002Fgithub.com\u002Fhwchase17\u002Flangchain) 完全集成！以下是详细的 [使用说明](https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002Fmodules\u002Fmodel_io\u002Fmodels\u002Fllms\u002Fintegrations\u002Fllm_caching#gptcache)。\n\n🐳 [GPTCache 服务器 Docker 镜像](https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fblob\u002Fmain\u002Fdocs\u002Fusage.md#Use-GPTCache-server) 已发布，这意味着 **任何编程语言** 都可以使用 GPTCache！\n\n📔 本项目正处于快速开发中，因此 API 可能会随时发生变化。有关最新信息，请参阅最新的 [文档]( https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002F) 和 [发布说明](https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fblob\u002Fmain\u002Fdocs\u002Frelease_note.md)。\n\n**注意**：随着大型模型数量的激增以及其 API 形态的不断演变，我们不再新增对新 API 或模型的支持。我们鼓励使用 GPTCache 中的 get 和 set API，示例代码如下：https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fblob\u002Fmain\u002Fexamples\u002Fadapter\u002Fapi.py\n\n## 快速安装\n\n`pip install gptcache`\n\n## 🚀 什么是 GPTCache？\n\nChatGPT 和各种大型语言模型（LLMs）具有惊人的多功能性，能够支持多种应用的开发。然而，随着您的应用越来越受欢迎并面临更高的流量压力，与 LLM API 调用相关的成本可能会变得非常高昂。此外，LLM 服务在处理大量请求时可能会出现响应缓慢的情况。\n\n为了解决这一挑战，我们开发了 GPTCache，这是一个专门用于构建语义缓存以存储 LLM 响应的项目。\n\n## 😊 快速入门\n\n**注意**：\n\n- 您无需进行大量开发即可快速试用 GPTCache 并将其部署到生产环境。但请注意，该仓库目前仍在积极开发中。\n- 默认情况下，仅安装有限数量的库来支持基本的缓存功能。当您需要使用其他功能时，相关库将会 **自动安装**。\n- 请确保 Python 版本为 **3.8.1 或更高**，可通过运行 `python --version` 进行检查。\n- 如果由于 pip 版本过低而导致无法安装某些库，请运行：`python -m pip install --upgrade pip`。\n\n### 开发环境安装\n\n```bash\n# 克隆 GPTCache 仓库\ngit clone -b dev https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache.git\ncd GPTCache\n\n# 安装仓库\npip install -r requirements.txt\npython setup.py install\n```\n\n### 示例用法\n\n这些示例将帮助您理解如何使用精确匹配和相似匹配进行缓存。您也可以在 [Colab](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1m1s-iTDfLDk-UwUAQ_L8j1C-gzkcr2Sk?usp=share_link) 上运行这些示例。更多示例请参考 [训练营](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Fopenai\u002Fchat.html)。\n\n在运行示例之前，请 **确保** 已设置 OPENAI_API_KEY 环境变量，方法是执行 `echo $OPENAI_API_KEY`。如果尚未设置，可以在 Unix\u002FLinux\u002FMacOS 系统上使用 `export OPENAI_API_KEY=YOUR_API_KEY`，或在 Windows 系统上使用 `set OPENAI_API_KEY=YOUR_API_KEY` 来设置。\n\n> 请注意，此方法仅为临时生效，若需永久生效，则需修改环境变量配置文件。例如，在 Mac 上，您可以编辑位于 `\u002Fetc\u002Fprofile` 的文件。\n\n\u003Cdetails>\n\n\u003Csummary>点击以\u003Cstrong>显示\u003C\u002Fstrong>示例代码\u003C\u002Fsummary>\n\n#### OpenAI API 原始用法\n\n```python\nimport os\nimport time\n\nimport openai\n\n\ndef response_text(openai_resp):\n    return openai_resp['choices'][0]['message']['content']\n\n\nquestion = 'what‘s chatgpt'\n\n# OpenAI API 原始用法\nopenai.api_key = os.getenv(\"OPENAI_API_KEY\")\nstart_time = time.time()\nresponse = openai.ChatCompletion.create(\n  model='gpt-3.5-turbo',\n  messages=[\n    {\n        'role': 'user',\n        'content': question\n    }\n  ],\n)\nprint(f'Question: {question}')\nprint(\"Time consuming: {:.2f}s\".format(time.time() - start_time))\nprint(f'Answer: {response_text(response)}\\n')\n\n```\n\n#### OpenAI API + GPTCache，精确匹配缓存\n\n> 如果您向 ChatGPT 提出完全相同的两个问题，第二个问题的答案将直接从缓存中获取，而无需再次向 ChatGPT 发起请求。\n\n```python\nimport time\n\n\ndef response_text(openai_resp):\n    return openai_resp['choices'][0]['message']['content']\n\nprint(\"Cache loading.....\")\n\n# 使用 GPTCache，您只需要做这些\n# -------------------------------------------------\nfrom gptcache import cache\nfrom gptcache.adapter import openai\n\ncache.init()\ncache.set_openai_key()\n\n# -------------------------------------------------\n\nquestion = \"什么是GitHub\"\nfor _ in range(2):\n    start_time = time.time()\n    response = openai.ChatCompletion.create(\n      model='gpt-3.5-turbo',\n      messages=[\n        {\n            'role': 'user',\n            'content': question\n        }\n      ],\n    )\n    print(f'问题: {question}')\n    print(\"耗时: {:.2f}秒\".format(time.time() - start_time))\n    print(f'答案: {response_text(response)}\\n')\n```\n\n#### OpenAI API + GPTCache，相似搜索缓存\n\n> 在针对几个相似的问题从ChatGPT获取答案后，后续问题的答案可以直接从缓存中获取，而无需再次请求ChatGPT。\n\n```python\nimport time\n\n\ndef response_text(openai_resp):\n    return openai_resp['choices'][0]['message']['content']\n\nfrom gptcache import cache\nfrom gptcache.adapter import openai\nfrom gptcache.embedding import Onnx\nfrom gptcache.manager import CacheBase、VectorBase、get_data_manager\nfrom gptcache.similarity_evaluation.distance import SearchDistanceEvaluation\n\nprint(\"缓存加载中.....\")\n\nonnx = Onnx()\ndata_manager = get_data_manager(CacheBase(\"sqlite\")、VectorBase(\"faiss\"、dimension=onnx.dimension))\ncache.init(\n    embedding_func=onnx.to_embeddings,\n    data_manager=data_manager,\n    similarity_evaluation=SearchDistanceEvaluation(),\n    )\ncache.set_openai_key()\n\nquestions = [\n    \"什么是GitHub\",\n    \"你能解释一下GitHub是什么吗\",\n    \"你能多告诉我一些关于GitHub的信息吗\",\n    \"GitHub的用途是什么\"\n]\n\nfor question in questions:\n    start_time = time.time()\n    response = openai.ChatCompletion.create(\n        model='gpt-3.5-turbo',\n        messages=[\n            {\n                'role': 'user',\n                'content': question\n            }\n        ],\n    )\n    print(f'问题: {question}')\n    print(\"耗时: {:.2f}秒\".format(time.time() - start_time))\n    print(f'答案: {response_text(response)}\\n')\n```\n\n#### OpenAI API + GPTCache，使用temperature参数\n\n> 在请求API服务或模型时，您可以始终传递temperature参数。\n> \n> temperature的取值范围是[0, 2]，默认值为0.0。\n> \n> 温度越高，越有可能跳过缓存搜索，直接请求大模型。当temperature为2时，一定会跳过缓存并直接向大模型发送请求。当temperature为0时，则会在请求大模型服务之前先搜索缓存。\n> \n> 默认的post_process_messages_func是temperature_softmax。在这种情况下，请参阅[API参考](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Freferences\u002Fprocessor.html#module-gptcache.processor.post)以了解temperature如何影响输出。\n\n```python\nimport time\n\nfrom gptcache import cache、Config\nfrom gptcache.manager import manager_factory\nfrom gptcache.embedding import Onnx\nfrom gptcache.processor.post import temperature_softmax\nfrom gptcache.similarity_evaluation.distance import SearchDistanceEvaluation\nfrom gptcache.adapter import openai\n\ncache.set_openai_key()\n\nonnx = Onnx()\ndata_manager = manager_factory(\"sqlite,faiss\"、vector_params={\"dimension\": onnx.dimension})\n\ncache.init(\n    embedding_func=onnx.to_embeddings,\n    data_manager=data_manager,\n    similarity_evaluation=SearchDistanceEvaluation(),\n    post_process_messages_func=temperature_softmax\n    )\n# cache.config = Config(similarity_threshold=0.2)\n\nquestion = \"什么是GitHub\"\n\nfor _ in range(3):\n    start = time.time()\n    response = openai.ChatCompletion.create(\n        model=\"gpt-3.5-turbo\",\n        temperature = 1.0,  # 在这里更改temperature\n        messages=[{\n            \"role\": \"user\",\n            \"content\": question\n        }],\n    )\n    print(\"用时:\", round(time.time() - start, 3))\n    print(\"答案:\", response[\"choices\"][0][\"message\"][\"内容\"])\n```\n\n\u003C\u002Fdetails>\n\n要完全使用GPTCache，只需以下几行代码，无需修改任何现有代码。\n\n```python\nfrom gptcache import cache\nfrom gptcache.adapter import openai\n\ncache.init()\ncache.set_openai_key()\n```\n\n更多文档：\n\n- [使用指南，如何更好地使用GPTCache](docs\u002Fusage.md)\n- [功能介绍，缓存目前支持的所有功能](docs\u002Ffeature.md)\n- [示例，学习更好的自定义缓存](examples\u002FREADME.md)\n- [分布式缓存与水平扩展](docs\u002Fhorizontal-scaling-usage.md)\n\n## 🎓 培训营\n\n- GPTCache与**LangChain**\n  - [问答生成](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Flangchain\u002Fqa_generation.html)\n  - [问题回答](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Flangchain\u002Fquestion_answering.html)\n  - [SQL链](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Flangchain\u002Fsqlite.html)\n  - [BabyAGI用户指南](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Flangchain\u002Fbaby_agi.html)\n- GPTCache与**Llama_index**\n  - [网页问答](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Fllama_index\u002Fwebpage_qa.html)\n- GPTCache与**OpenAI**\n  - [聊天完成](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Fopenai\u002Fchat.html)\n  - [语言翻译](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Fopenai\u002Flanguage_translate.html)\n  - [SQL翻译](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Fopenai\u002Fsql_translate.html)\n  - [推文分类器](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Fopenai\u002Ftweet_classifier.html)\n  - [多模态：图像生成](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Fopenai\u002Fimage_generation.html)\n  - [多模态：语音转文字](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Fopenai\u002Fspeech_to_text.html)\n- GPTCache与**Replicate**\n  - [视觉问答](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Freplicate\u002Fvisual_question_answering.html)\n- GPTCache与**Temperature参数**\n  - [OpenAI聊天](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Ftemperature\u002Fchat.html)\n  - [OpenAI图像创作](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Ftemperature\u002Fcreate_image.html)\n\n## 😎 这能帮上什么忙？\nGPTCache 提供以下主要优势：\n\n- **降低成本**：大多数大模型服务会根据请求数量和[令牌计数](https:\u002F\u002Fopenai.com\u002Fpricing)来收取费用。GPTCache 通过缓存查询结果，有效减少发送到大模型服务的请求数量和令牌用量，从而显著降低使用成本。这样一来，您在使用该服务时将获得更具性价比的体验。\n- **提升性能**：大模型采用生成式 AI 算法实时生成响应，这一过程有时可能较为耗时。然而，当相似的查询已被缓存时，响应时间会大幅缩短，因为可以直接从缓存中获取结果，无需再与大模型服务交互。在大多数情况下，相比标准的大模型服务，GPTCache 还能提供更高的查询吞吐量。\n- **灵活的开发与测试环境**：作为开发大模型应用的工程师，您知道通常需要连接到大模型 API，并且在将应用部署到生产环境之前，进行全面测试至关重要。GPTCache 提供一个与大模型 API 兼容的接口，支持存储由大模型生成的数据以及模拟数据。这一特性使您能够轻松地开发和测试应用，而无需实际连接到大模型服务。\n- **增强可扩展性和可用性**：大模型服务经常实施[速率限制](https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fguides\u002Frate-limits)，即 API 对用户或客户端在特定时间内访问服务器次数的约束。一旦达到速率限制，后续请求将被阻止，直到一定时间过去才会恢复，这可能导致服务中断。借助 GPTCache，您可以轻松扩展以应对不断增加的查询量，确保随着用户规模的增长，应用始终保持稳定的性能。\n\n## 🤔 它是如何工作的？\n在线服务通常具有数据局部性特征，用户往往会频繁访问热门或流行的内容。缓存系统正是利用这一特性，将常用数据存储起来，从而缩短数据检索时间、加快响应速度，并减轻后端服务器的压力。传统的缓存系统通常采用精确匹配的方式，即通过比对新查询与缓存中的查询是否完全一致，来判断所需内容是否已在缓存中，然后再进行数据获取。\n\n然而，对于大模型的缓存来说，采用精确匹配的方法效果并不理想，因为大模型的查询往往复杂多样，导致缓存命中率较低。为解决这一问题，GPTCache 采用了语义缓存等替代策略。语义缓存能够识别并存储相似或相关的查询，从而提高缓存命中概率，提升整体缓存效率。\n\nGPTCache 使用嵌入算法将查询转换为嵌入向量，并借助向量存储对这些嵌入向量进行相似度搜索。这一过程使 GPTCache 能够从缓存存储中识别并检索出相似或相关的查询，如[模块部分](https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache#-modules)所示。\n\nGPTCache 采用模块化设计，方便用户自定义自己的语义缓存。系统为每个模块提供了多种实现方式，用户甚至可以根据自身需求开发专属的实现方案。\n\n在语义缓存中，可能会出现缓存命中时的误报以及缓存未命中时的漏报。为了帮助开发者优化缓存系统，GPTCache 提供了三项性能指标：\n\n- **命中率**：该指标用于衡量缓存成功满足内容请求的能力，相对于其收到的总请求数而言。命中率越高，说明缓存越有效。\n- **延迟**：该指标用于测量查询被处理并从缓存中检索到相应数据所需的时间。延迟越低，表明缓存系统越高效、响应越迅速。\n- **召回率**：该指标表示由缓存提供的查询数量占本应由缓存提供的总查询数量的比例。召回率越高，说明缓存能够有效地提供合适的内容。\n\n此外，还提供了一个[示例基准测试](https:\u002F\u002Fgithub.com\u002Fzilliztech\u002Fgpt-cache\u002Fblob\u002Fmain\u002Fexamples\u002Fbenchmark\u002Fbenchmark_sqlite_faiss_onnx.py)，供用户开始评估其语义缓存的性能。\n\n## 🤗 模块\n\n![GPTCache 结构](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fzilliztech_GPTCache_readme_35facb6a37e3.png)\n\n- **LLM 适配器**：\nLLM 适配器旨在通过统一不同 LLM 模型的 API 和请求协议来实现集成。GPTCache 为此提供了标准化接口，目前支持 ChatGPT 集成。\n  - [x] 支持 OpenAI ChatGPT API。\n  - [x] 支持 [langchain](https:\u002F\u002Fgithub.com\u002Fhwchase17\u002Flangchain)。\n  - [x] 支持 [minigpt4](https:\u002F\u002Fgithub.com\u002FVision-CAIR\u002FMiniGPT-4.git)。\n  - [x] 支持 [Llamacpp](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp.git)。\n  - [x] 支持 [dolly](https:\u002F\u002Fgithub.com\u002Fdatabrickslabs\u002Fdolly.git)。\n  - [ ] 支持其他 LLM，例如 Hugging Face Hub、Bard、Anthropic。\n- **多模态适配器（实验性）**：\n多模态适配器旨在通过统一不同大型多模态模型的 API 和请求协议来实现集成。GPTCache 为此提供了标准化接口，目前支持图像生成、音频转录等功能的集成。\n  - [x] 支持 OpenAI Image Create API。\n  - [x] 支持 OpenAI Audio Transcribe API。\n  - [x] 支持 Replicate BLIP API。\n  - [x] 支持 Stability Inference API。\n  - [x] 支持 Hugging Face Stable Diffusion Pipeline（本地推理）。\n  - [ ] 支持其他多模态服务或自托管的大型多模态模型。\n- **嵌入生成器**：\n该模块用于从请求中提取嵌入向量，以进行相似度搜索。GPTCache 提供了一个通用接口，支持多种嵌入 API，并提供多种解决方案供选择。\n  - [x] 禁用嵌入功能。这会将 GPTCache 变为基于关键词匹配的缓存。\n  - [x] 支持 OpenAI 嵌入 API。\n  - [x] 支持 [ONNX](https:\u002F\u002Fonnx.ai\u002F)，使用 GPTCache\u002Fparaphrase-albert-onnx 模型。\n  - [x] 支持 [Hugging Face](https:\u002F\u002Fhuggingface.co\u002F) 嵌入，包括 transformers、ViTModel、Data2VecAudio。\n  - [x] 支持 [Cohere](https:\u002F\u002Fdocs.cohere.ai\u002Freference\u002Fembed) 嵌入 API。\n  - [x] 支持 [fastText](https:\u002F\u002Ffasttext.cc) 嵌入。\n  - [x] 支持 [SentenceTransformers](https:\u002F\u002Fwww.sbert.net) 嵌入。\n  - [x] 支持 [Timm](https:\u002F\u002Ftimm.fast.ai\u002F) 模型用于图像嵌入。\n  - [ ] 支持其他嵌入 API。\n- **缓存存储**：\n**缓存存储**是存放来自 LLM（如 ChatGPT）响应的地方。缓存的响应会被检索出来，用于评估相似性，并在语义匹配良好时返回给请求者。目前，GPTCache 支持 SQLite，并提供一个通用接口以便扩展此模块。\n  - [x] 支持 [SQLite](https:\u002F\u002Fsqlite.org\u002Fdocs.html)。\n  - [x] 支持 [DuckDB](https:\u002F\u002Fduckdb.org\u002F)。\n  - [x] 支持 [PostgreSQL](https:\u002F\u002Fwww.postgresql.org\u002F)。\n  - [x] 支持 [MySQL](https:\u002F\u002Fwww.mysql.com\u002F)。\n  - [x] 支持 [MariaDB](https:\u002F\u002Fmariadb.org\u002F)。\n  - [x] 支持 [SQL Server](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fsql-server\u002F)。\n  - [x] 支持 [Oracle](https:\u002F\u002Fwww.oracle.com\u002F)。\n  - [x] 支持 [DynamoDB](https:\u002F\u002Faws.amazon.com\u002Fdynamodb\u002F)。\n  - [ ] 支持 [MongoDB](https:\u002F\u002Fwww.mongodb.com\u002F)。\n  - [ ] 支持 [Redis](https:\u002F\u002Fredis.io\u002F)。\n  - [ ] 支持 [Minio](https:\u002F\u002Fmin.io\u002F)。\n  - [ ] 支持 [HBase](https:\u002F\u002Fhbase.apache.org\u002F)。\n  - [ ] 支持 [ElasticSearch](https:\u002F\u002Fwww.elastic.co\u002F)。\n  - [ ] 支持其他存储系统。\n- **向量存储**：\n**向量存储**模块根据输入请求提取的嵌入向量，帮助找到与之最相似的 K 个请求。其结果可用于评估相似性。GPTCache 提供了一个用户友好的接口，支持多种向量存储，包括 Milvus、Zilliz Cloud 和 FAISS。未来还将提供更多选项。\n  - [x] 支持 [Milvus](https:\u002F\u002Fmilvus.io\u002F)——一款面向生产级 AI\u002FLLM 应用的开源向量数据库。\n  - [x] 支持 [Zilliz Cloud](https:\u002F\u002Fcloud.zilliz.com\u002F)——基于 Milvus 的全托管云向量数据库。\n  - [x] 支持 [Milvus Lite](https:\u002F\u002Fgithub.com\u002Fmilvus-io\u002Fmilvus-lite)——一款可嵌入 Python 应用的轻量级 Milvus 版本。\n  - [x] 支持 [FAISS](https:\u002F\u002Ffaiss.ai\u002F)——一个用于高效相似度搜索和稠密向量聚类的库。\n  - [x] 支持 [Hnswlib](https:\u002F\u002Fgithub.com\u002Fnmslib\u002Fhnswlib)——一个仅包含头文件的 C++\u002FPython 库，用于快速近似最近邻搜索。\n  - [x] 支持 [PGVector](https:\u002F\u002Fgithub.com\u002Fpgvector\u002Fpgvector)——一个面向 Postgres 的开源向量相似度搜索工具。\n  - [x] 支持 [Chroma](https:\u002F\u002Fgithub.com\u002Fchroma-core\u002Fchroma)——一款原生支持 AI 的开源嵌入式数据库。\n  - [x] 支持 [DocArray](https:\u002F\u002Fgithub.com\u002Fdocarray\u002Fdocarray)——一个用于表示、传输和存储多模态数据的库，非常适合机器学习应用。\n  - [x] 支持 qdrant\n  - [x] 支持 weaviate\n  - [ ] 支持其他向量数据库。\n- **缓存管理器**：\n**缓存管理器**负责控制 **缓存存储** 和 **向量存储** 的运行。\n  - **逐出策略**：\n缓存逐出可以在内存中使用 Python 的 `cachetools` 进行管理，也可以通过 Redis 作为键值存储实现分布式管理。\n  - **内存缓存**\n  \n  目前，GPTCache 的逐出决策仅基于条目数量。这种方法可能导致资源评估不准确，并引发内存不足（OOM）错误。我们正在积极研究并开发更复杂的策略。\n    - [x] 支持 LRU 逐出策略。\n    - [x] 支持 FIFO 逐出策略。\n    - [x] 支持 LFU 逐出策略。\n    - [x] 支持 RR 逐出策略。\n    - [ ] 支持更复杂的逐出策略。\n  - **分布式缓存**\n  \n  如果您尝试使用内存缓存对 GPTCache 部署进行水平扩展，则无法实现。因为缓存信息将仅限于单个 Pod。\n  \n  通过分布式缓存，可以在所有副本之间保持缓存信息的一致性，从而可以使用 Redis 等分布式缓存存储。\n    - [x] 支持 Redis 分布式缓存。\n    - [x] 支持 memcached 分布式缓存。\n  \n- **相似度评估器**：\n该模块从 **缓存存储** 和 **向量存储** 中收集数据，并使用多种策略来确定输入请求与 **向量存储** 中请求之间的相似度。基于此相似度，它决定请求是否与缓存匹配。GPTCache 提供了标准化接口，用于集成各种策略，并附带一系列实现方案可供使用。目前支持或未来将支持的相似度定义如下：\n  - [x] 由 **向量存储** 得到的距离。\n  - [x] 基于模型的相似度，使用来自 [ONNX](https:\u002F\u002Fonnx.ai\u002F) 的 GPTCache\u002Falbert-duplicate-onnx 模型计算。\n  - [x] 输入请求与 **向量存储** 中获取的请求之间的完全匹配。\n  - [x] 通过将 numpy 的 linalg.norm 应用于嵌入向量所表示的距离。\n  - [ ] BM25 和其他相似度测量方法。\n  - [ ] 支持 PyTorch 等其他模型推理框架。\n \n  \n  **注意**：并非所有模块组合都能相互兼容。例如，如果禁用 **嵌入提取器**，**向量存储** 可能无法正常工作。我们目前正在为 **GPTCache** 实现组合合理性检查。\n\n## 😇 路线图\n即将推出！[敬请关注！](https:\u002F\u002Ftwitter.com\u002Fzilliz_universe)\n\n## 😍 参与贡献\n我们非常欢迎各种形式的贡献，无论是新增功能、优化基础设施，还是改进文档。\n\n有关如何参与贡献的详细说明，请参阅我们的[贡献指南](docs\u002Fcontributing.md)。","# GPTCache 快速上手指南\n\nGPTCache 是一个专为大语言模型（LLM）查询设计的语义缓存库。它能将 LLM API 成本降低 10 倍，响应速度提升 100 倍，通过缓存相似问题的回答来避免重复调用昂贵的 LLM 服务。\n\n## 环境准备\n\n在开始之前，请确保满足以下系统要求：\n\n*   **Python 版本**：3.8.1 或更高。\n    *   检查命令：`python --version`\n*   **pip 版本**：建议升级到最新版本以避免安装错误。\n    *   升级命令：`python -m pip install --upgrade pip`\n*   **API Key**：如果使用 OpenAI 等模型，需提前设置好环境变量 `OPENAI_API_KEY`。\n    *   Linux\u002FMac: `export OPENAI_API_KEY=YOUR_API_KEY`\n    *   Windows: `set OPENAI_API_KEY=YOUR_API_KEY`\n\n> **提示**：国内开发者若遇到网络安装问题，可使用清华源或阿里源加速安装（例如：`pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple gptcache`）。\n\n## 安装步骤\n\n### 方式一：通过 PyPI 安装（推荐）\n\n这是最快速的安装方式，仅安装核心功能。当需要使用特定功能（如向量数据库、嵌入模型）时，相关依赖库会自动按需安装。\n\n```bash\npip install gptcache\n```\n\n### 方式二：开发版安装\n\n如果你需要体验最新功能或贡献代码，可以从 GitHub 克隆源码安装：\n\n```bash\n# 克隆仓库\ngit clone -b dev https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache.git\ncd GPTCache\n\n# 安装依赖并构建\npip install -r requirements.txt\npython setup.py install\n```\n\n## 基本使用\n\nGPTCache 的设计目标是**零代码侵入**。你只需添加几行初始化代码，即可让现有的 OpenAI 调用自动具备缓存能力。\n\n### 1. 精确匹配缓存（Exact Match）\n\n适用于完全相同的问题直接返回缓存结果，无需再次请求 LLM。\n\n```python\nimport time\nfrom gptcache import cache\nfrom gptcache.adapter import openai\n\n# 初始化缓存并设置 Key\ncache.init()\ncache.set_openai_key()\n\ndef response_text(openai_resp):\n    return openai_resp['choices'][0]['message']['content']\n\nquestion = \"what's github\"\n\n# 第一次请求：调用 LLM 并缓存结果\n# 第二次请求：直接从缓存获取，速度极快且无 API 费用\nfor _ in range(2):\n    start_time = time.time()\n    response = openai.ChatCompletion.create(\n      model='gpt-3.5-turbo',\n      messages=[{'role': 'user', 'content': question}],\n    )\n    print(f'Question: {question}')\n    print(\"Time consuming: {:.2f}s\".format(time.time() - start_time))\n    print(f'Answer: {response_text(response)}\\n')\n```\n\n### 2. 语义相似匹配缓存（Similarity Search）\n\n适用于问题表述不同但意图相似的场景（例如：\"GitHub 是什么\" 和 \"介绍一下 GitHub\"）。这需要配置嵌入模型（Embedding）和向量存储。\n\n```python\nimport time\nfrom gptcache import cache\nfrom gptcache.adapter import openai\nfrom gptcache.embedding import Onnx\nfrom gptcache.manager import CacheBase, VectorBase, get_data_manager\nfrom gptcache.similarity_evaluation.distance import SearchDistanceEvaluation\n\n# 配置嵌入模型和向量数据库\nonnx = Onnx()\ndata_manager = get_data_manager(CacheBase(\"sqlite\"), VectorBase(\"faiss\", dimension=onnx.dimension))\n\n# 初始化缓存\ncache.init(\n    embedding_func=onnx.to_embeddings,\n    data_manager=data_manager,\n    similarity_evaluation=SearchDistanceEvaluation(),\n)\ncache.set_openai_key()\n\nquestions = [\n    \"what's github\",\n    \"can you explain what GitHub is\",  # 语义相似，将命中缓存\n    \"can you tell me more about GitHub\", # 语义相似，将命中缓存\n]\n\nfor question in questions:\n    start_time = time.time()\n    response = openai.ChatCompletion.create(\n        model='gpt-3.5-turbo',\n        messages=[{'role': 'user', 'content': question}],\n    )\n    print(f'Question: {question}')\n    print(\"Time consuming: {:.2f}s\".format(time.time() - start_time))\n    print(f'Answer: {response_text(response)}\\n')\n```\n\n### 进阶提示：温度参数（Temperature）\n\n你可以通过传递 `temperature` 参数来控制缓存行为：\n*   `temperature = 0`：优先搜索缓存，未命中才请求模型。\n*   `temperature = 2`：跳过缓存，直接请求模型。\n*   中间值：按概率决定是否跳过缓存搜索。\n\n```python\nresponse = openai.ChatCompletion.create(\n    model=\"gpt-3.5-turbo\",\n    temperature=1.0,  # 调整此参数控制缓存命中率\n    messages=[{\"role\": \"user\", \"content\": \"what's github\"}],\n)\n```","某电商公司开发了一款基于大模型的智能客服系统，用于实时回答用户关于商品详情、物流状态及售后政策的咨询。\n\n### 没有 GPTCache 时\n- **运营成本高昂**：面对每日数万次的重复性提问（如“发货地在哪里”），系统每次都调用昂贵的 LLM API，导致月度账单激增。\n- **响应延迟明显**：在高并发时段，由于需等待外部模型生成回复，用户平均需等待 3-5 秒才能收到消息，体验流畅度差。\n- **服务稳定性风险**：一旦遇到网络波动或第三方 API 限流，整个客服系统将直接不可用，导致大量用户投诉。\n- **资源浪费严重**：计算资源被大量消耗在处理语义完全相同的请求上，无法将算力集中在处理复杂的个性化问题上。\n\n### 使用 GPTCache 后\n- **成本大幅降低**：GPTCache 通过语义匹配拦截了约 80% 的常见重复问题，直接返回缓存结果，使 API 调用成本降低了 10 倍。\n- **响应速度飞跃**：对于命中缓存的请求，系统无需联网等待生成，响应时间从秒级缩短至毫秒级，整体吞吐量提升 100 倍。\n- **系统高可用保障**：即使外部大模型服务暂时中断，GPTCache 仍能依靠本地缓存维持核心问答功能的正常运转，保障业务连续性。\n- **智能流量分流**：系统自动区分简单与复杂问题，仅将真正需要推理的新颖问题发送给大模型，显著优化了资源配置效率。\n\nGPTCache 通过构建高效的语义缓存层，在几乎不改变原有代码架构的前提下，实现了智能应用成本与性能的数量级优化。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fzilliztech_GPTCache_7eb67818.png","zilliztech","Zilliz","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fzilliztech_c42ecbb4.png","Vector Database for Enterprise-grade AI and LLM applications",null,"info@zilliz.com","zilliz_universe","https:\u002F\u002Fzilliz.com\u002Fcloud","https:\u002F\u002Fgithub.com\u002Fzilliztech",[83,87,91,95],{"name":84,"color":85,"percentage":86},"Python","#3572A5",99.7,{"name":88,"color":89,"percentage":90},"Shell","#89e051",0.2,{"name":92,"color":93,"percentage":94},"Makefile","#427819",0.1,{"name":96,"color":97,"percentage":98},"Dockerfile","#384d54",0,7991,575,"2026-04-19T12:06:25","MIT","Linux, macOS, Windows","未说明",{"notes":106,"python":107,"dependencies":108},"默认仅安装基础库，使用高级功能（如向量搜索）时会自动安装相关依赖（如 onnx、faiss）。支持通过 Docker 部署服务端以实现多语言调用。项目处于快速开发中，API 可能随时变更。使用前需配置 OPENAI_API_KEY 环境变量。建议将 pip 升级至最新版本以避免安装问题。","3.8.1+",[109,110,111,112,113,114],"openai","onnx","faiss","sqlite","langchain","llama_index",[13,36,14,27],[117,118,119,120,121,122,123,124,109,125,126,113,127,128,129,130,131,132,133],"chatbot","chatgpt","chatgpt-api","llm","milvus","similarity-search","vector-search","aigc","memcache","gpt","autogpt","redis","babyagi","llama-index","llama","dolly","semantic-search","2026-03-27T02:49:30.150509","2026-04-20T20:24:04.421102",[137,142,147,152,157,162],{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},45632,"在使用 LangChain 的 ConversationalRetrievalChain 时，为什么缓存没有生效或聊天历史无法保存？","这通常是因为相似度评估的距离阈值设置不当。解决方案是在初始化缓存时，显式设置 SearchDistanceEvaluation 的 max_distance 参数。例如：\ncache.init(\n    ...\n    similarity_evaluation=SearchDistanceEvaluation(max_distance=1.0)\n)\n如果不设置该值，可能导致匹配失败从而无法命中缓存。","https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fissues\u002F481",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},45633,"调用 OpenAI Moderation API 时报错 'NoneType' object is not subscriptable 怎么办？","该错误通常发生在旧版本中处理消息内容为空时。建议升级到最新版本（如 0.1.32 或更高），因为最新源码已修复此问题。如果问题依旧，请确保传入的数据格式正确，并检查 pre_embedding_func 是否能正确处理输入数据（例如使用 get_openai_moderation_input 作为预处理函数）。","https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fissues\u002F386",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},45634,"发送多条消息给 OpenAI API 时，为什么返回的不是最后一条消息的答案？","这可能是由于使用的嵌入模型不支持双语或多语言上下文导致的。如果是中英文混合场景，建议更换为支持双语的嵌入模型（bilingual embedding model）。默认模型可能在处理非纯英文内容时提取不到正确的最后一条消息内容。","https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fissues\u002F373",{"id":153,"question_zh":154,"answer_zh":155,"source_url":156},45635,"如何在 GPTCache 中处理过长的 Prompt 以避免错误或提高缓存命中率？","可以使用上下文处理器（Context Processor）来对长 Prompt 进行摘要处理。例如，使用 SummarizationContextProcess：\nfrom gptcache.processor.context.summarization_context import SummarizationContextProcess\nfrom gptcache import cache\n\ncontext_process = SummarizationContextProcess()\ncache.init(\n    ...\n    context_process=context_process\n)\n这样可以压缩长文本，使其更适合嵌入和相似度匹配。","https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fissues\u002F301",{"id":158,"question_zh":159,"answer_zh":160,"source_url":161},45636,"GPTCache 是否支持 Hugging Face Transformers 的 LLM 模型？","是的，GPTCache 支持 Hugging Face 模型。你不需要特定的适配器，可以直接通过 GPTCache 的核心 API 进行集成。只需自定义 embedding_func 和 llm 调用逻辑，将 Hugging Face 模型生成的内容传入缓存系统即可。如果遇到具体问题，可提交详细复现步骤以便进一步协助。","https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fissues\u002F335",{"id":163,"question_zh":164,"answer_zh":165,"source_url":141},45637,"如何在 LangChain 中正确配置 GPTCache 以确保缓存正常工作？","确保按以下步骤配置：\n1. 初始化 CacheBase（如 sqlite）和 VectorBase（如 faiss）。\n2. 使用 get_data_manager 创建数据管理器。\n3. 设置 pre_embedding_func（如 get_messages_last_content 或 get_prompt）。\n4. 初始化 cache 并传入 embedding_func、data_manager 和 similarity_evaluation。\n5. 将 LangChain 的 llm_cache 设置为 GPTCache 实例。\n示例代码参考官方文档或 Issue #481 中的完整可运行示例。",[167,172,177,182,187,192,197,202,207,212,217,222,227,232,237,242,247,252,257,262],{"id":168,"version":169,"summary_zh":170,"released_at":171},360513,"0.1.44","## 变更内容\n* 由 @SimFG 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F636 中修复了使用 init_similar_cache 方法时出现的空指针内存淘汰问题。\n* 由 @SimFG 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F637 中将版本更新至 0.1.44。\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.43...0.1.44","2024-08-01T11:25:55",{"id":173,"version":174,"summary_zh":175,"released_at":176},360514,"0.1.43","## 变更内容\n* 修复在安装 Pydantic v2 并使用 LangChain 时出现的元类冲突错误，由 @SimFG 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F555 中完成\n* 更新 usage.md 文件，由 @jmahmood 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F565 中完成\n* 添加禁用报告配置，由 @SimFG 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F564 中完成\n* 避免在未使用 Redis 的情况下安装 Redis 库，由 @leio10 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F574 中完成\n* 将版本更新至 `0.1.43`，由 @SimFG 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F580 中完成\n\n## 新贡献者\n* @jmahmood 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F565 中完成了首次贡献\n* @leio10 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F574 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.42...0.1.43","2023-11-28T02:09:09",{"id":178,"version":179,"summary_zh":180,"released_at":181},360515,"0.1.42","## 变更内容\n* [mod] 更新了标量\u002F向量存储、分布式缓存等相关文档，由 @a9raag 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F539 中完成\n* 修复：修正 redis-om 的正确依赖名称，由 @KeshavSingh29 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F541 中完成\n* 添加“命中回调”函数，由 @SimFG 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F542 中完成\n* 文档：修复 LangChainChat 示例，由 @tongtie 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F543 中完成\n* 将版本更新至 `0.1.42`，由 @SimFG 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F546 中完成\n\n## 新贡献者\n* @KeshavSingh29 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F541 中完成了首次贡献\n* @tongtie 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F543 中完成了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.41...0.1.42","2023-09-28T10:27:32",{"id":183,"version":184,"summary_zh":185,"released_at":186},360516,"0.1.41","## 变更内容\n* [修复] 由于缺少 Redis 依赖，导致无法导入 GPTCache 由 @a9raag 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F522 中完成\n* 功能：为 DynamoDB 添加标量存储支持 由 @gauthamchandra 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F531 中完成\n* 支持异步补全的缓存及缓存补全 由 @Rested 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F513 中完成\n* 修复 LangChain Chat 的 Pydantic Bug 由 @SimFG 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F538 中完成\n\n## 新贡献者\n* @gauthamchandra 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F531 中完成了首次贡献\n* @Rested 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F513 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.40...0.1.41","2023-09-14T06:44:04",{"id":188,"version":189,"summary_zh":190,"released_at":191},360517,"0.1.40","## 变更内容\n* 由 @a9raag 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F518 中实现的基于 Redis 的分布式缓存功能\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.39.1...0.1.40","2023-08-23T03:19:44",{"id":193,"version":194,"summary_zh":195,"released_at":196},360518,"0.1.39.1","## 变更内容\n* 由 @RayceRossum 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F515 中修复了 pgvector.py 中的反向搜索结果问题\n* 由 @SimFG 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F516 中更新了 pgvector 存储的测试用例\n\n## 新贡献者\n* @RayceRossum 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F515 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.39...0.1.39.1","2023-08-15T03:07:51",{"id":198,"version":199,"summary_zh":200,"released_at":201},360519,"0.1.39","## 变更内容\n* 由 @keenborder786 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F502 中修复了 NumpyNormEvaluation 的 NumPy 错误\n* 由 @keenborder786 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F505 中修复了测试 `requirements.txt` 文件中的小包名问题\n* 由 @SimFG 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F507 中为 Azure OpenAI 添加了静态方法\n* 由 @keenborder786 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F508 中更新了 LangChainLLMs 的 `cache_obj`\n* 由 @SimFG 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F510 中使用了修复后的 Usearch 库版本\n* 由 @SimFG 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F514 中修复了 pgvector 的错误搜索结果\n\n## 新贡献者\n* @keenborder786 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F502 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.38...0.1.39","2023-08-12T08:58:24",{"id":203,"version":204,"summary_zh":205,"released_at":206},360520,"0.1.38","## 变更内容\n* 处理 OpenAI 仅嵌入 API 基础 URL 的变更，由 @keeganmccallum 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F495 中完成\n* 添加对 Weaviate 向量存储中自定义类模式的支持，由 @pranaychandekar 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F500 中完成\n* 修复错误：“SSDataManager”对象没有属性，由 @SimFG 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F501 中完成\n\n## 新贡献者\n* @keeganmccallum 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F495 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.37...0.1.38","2023-07-31T11:51:41",{"id":208,"version":209,"summary_zh":210,"released_at":211},360521,"0.1.37","## 🎉 GPTCache 新功能介绍\n\n1. 支持 Weaviate 向量数据库\n\n## 变更内容\n* @SimFG 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F492 中为 ChromaDB 使用旧版本\n* @pranaychandekar 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F493 中新增对 Weaviate 向量数据库的支持\n* @SimFG 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F494 中将版本更新至 `0.1.37`\n\n## 新贡献者\n* @pranaychandekar 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F493 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.36...0.1.37","2023-07-23T12:14:49",{"id":213,"version":214,"summary_zh":215,"released_at":216},360522,"0.1.36","##  🎉 GPTCache 新功能介绍\n\n1. 修复远程 Redis 缓存存储的连接错误\n2. 为 Chat Completions API 添加 OpenAI 代理\n\n## 变更内容\n* @SimFG 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F477 中，在参考文档中添加了 `redis` 和 `mongo` 存储的示例\n* @SimFG 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F480 中实现了作为 OpenAI Completions API 的监听器\n* @BlackMagicKau 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F483 中进行了开发\n* Bugfix\u002F#486：@a9raag 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F487 中提供了用于创建所有对象模型的 `redis_connection`\n\n## 新贡献者\n* @BlackMagicKau 在 https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F483 中完成了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.35...0.1.36","2023-07-14T15:04:34",{"id":218,"version":219,"summary_zh":220,"released_at":221},360523,"0.1.35","## 🎉 Introduction to new functions of GPTCache\r\n\r\n1. Support the redis as the cache store, usage example: [redis+onnx](https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fblob\u002Fmain\u002Ftests\u002Fintegration_tests\u002Ftest_redis_onnx.py)\r\n2. Add report table for easy analysis of cache data\r\n\r\n## What's Changed\r\n* [add] support for redis cache storage by @a9raag in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F465\r\n* Improve the position of lint comment by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F466\r\n* Add redis integration test case by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F467\r\n* Upgrade the actions\u002Fsetup-python to v4 by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F471\r\n* Add the report table by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F472\r\n* Update the version to `0.1.35` by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F473\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.34...0.1.35","2023-07-07T12:17:23",{"id":223,"version":224,"summary_zh":225,"released_at":226},360524,"0.1.34","## 🎉 Introduction to new functions of GPTCache\r\n\r\n1. Add support for Qdrant Vector Store\r\n2. Add support for Mongodb Cache Store\r\n3. Fix bug about the redis vector and onnx similarity evaluation\r\n\r\n## What's Changed\r\n* Correct the wrong search return value in the Redis vector store. by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F452\r\n* [Feature] Cache consistency check for Chroma & Milvus by @wybryan in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F448\r\n* Fix the pylint error and add the chromadb test by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F457\r\n* [add] support for mongodb storage by @a9raag in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F454\r\n* Fix the wrong return value of onnx similarity evaluation by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F460\r\n\r\n## New Contributors\r\n* @a9raag made their first contribution in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F454\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.33...0.1.34\r\n","2023-06-30T12:09:49",{"id":228,"version":229,"summary_zh":230,"released_at":231},360525,"0.1.33","## 🎉 Introduction to new functions of GPTCache\r\n\r\n1. Make some improvements to the code by fixing a few bugs. For further information, please refer to the new pull request list.\r\n2. Add [How to better configure your cache](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fconfigure_it.html) document\r\n\r\n## What's Changed\r\n* Updated link to langchain instructions by @technicalpickles in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F434\r\n* Fix the eviction error by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F440\r\n* [Feature] search only operation support by @wybryan in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F445\r\n* T10H-85 - VectorBase change for namespace allocation by @jacktempo7 in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F449\r\n* Add `How to better configure your cache` document by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F450\r\n\r\n## New Contributors\r\n* @technicalpickles made their first contribution in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F434\r\n* @wybryan made their first contribution in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F445\r\n* @jacktempo7 made their first contribution in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F449\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.32...0.1.33","2023-06-27T14:46:09",{"id":233,"version":234,"summary_zh":235,"released_at":236},360526,"0.1.32","## 🎉 Introduction to new functions of GPTCache\r\n\r\n1. Support the redis as vector store\r\n\r\n```python\r\nfrom gptcache.manager import VectorBase\r\n\r\nvector_base = VectorBase(\"redis\", dimension=10)\r\n```\r\n\r\n2. Fix the context len config bug\r\n\r\n## What's Changed\r\n* Fix context_len in config by @zc277584121 in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F430\r\n* Fix sequence match example by @zc277584121 in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F431\r\n* Add the Redis vector store by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F432\r\n\r\n## New Contributors\r\n* @zc277584121 made their first contribution in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F430\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.31...0.1.32","2023-06-15T14:49:47",{"id":238,"version":239,"summary_zh":240,"released_at":241},360527,"0.1.31","## 🎉 Introduction to new functions of GPTCache\r\n\r\n1. To improve the precision of cache hits, four similarity evaluation methods were added\r\n\r\n- SBERT CrossEncoder Evaluation\r\n- Cohere rerank api (**Free accounts can make up to 100 calls per minute.**)\r\n- Multi-round dialog similarity weight matching\r\n- Time Evaluation. For the cached answer, first check the time dimension, such as only using the generated cache for the past day\r\n\r\n2. Fix some bugs\r\n\r\n- OpenAI exceptions type #416\r\n- LangChainChat does work with _agenerate function #400\r\n\r\nmore details: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fblob\u002Fmain\u002Fdocs\u002Frelease_note.md\r\n\r\n## What's Changed\r\n* Raise the same type's error for the openai by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F421\r\n* Add sequence match evaluation. by @wxywb in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F420\r\n* Add the Time Evaluation by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F423\r\n* Improve SequenceMatchEvaluation for several cases. by @wxywb in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F424\r\n* Change the evaluation score of sequence evaluation to be larger as th… by @wxywb in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F425\r\n* LangchainChat support `_agenerate` function by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F426\r\n* Add SBERT CrossEncoder evaluation. by @wxywb in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F428\r\n* Update the version to `0.1.31` by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F429\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.30...0.1.31","2023-06-14T13:27:48",{"id":243,"version":244,"summary_zh":245,"released_at":246},360528,"0.1.30","## 🎉 Introduction to new functions of GPTCache\r\n\r\n1. Support to use the cohere rerank api to evaluate the similarity\r\n\r\n```python\r\nfrom gptcache.similarity_evaluation import CohereRerankEvaluation\r\n\r\nevaluation = CohereRerankEvaluation()\r\nscore = evaluation.evaluation(\r\n    {\r\n        'question': 'What is the color of sky?'\r\n    },\r\n    {\r\n        'answer': 'the color of sky is blue'\r\n    }\r\n)\r\n```\r\n\r\n2. Improve the gptcache server api, refer to the \"\u002Fdocs\" path after starting the server\r\n3. Fix the bug about the langchain track token usage\r\n\r\n## What's Changed\r\n* Add input summarization. by @wxywb in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F404\r\n* Langchain track token usage by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F409\r\n* Support to download the cache files by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F410\r\n* Support to use the cohere rerank api to evaluate the similarity by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F412\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.29...0.1.30","2023-06-07T14:11:29",{"id":248,"version":249,"summary_zh":250,"released_at":251},360529,"0.1.29","# 🎉 Introduction to new functions of GPTCache\r\n\r\n1. Improve the GPTCache server by using FASTAPI\r\n\r\n**NOTE**: The api struct has been optimized, details: [Use GPTCache server](https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fblob\u002Fdev\u002Fdocs\u002Fusage.md#use-gptcache-server)\r\n\r\n2. Add the usearch vector store\r\n\r\n```python\r\nfrom gptcache.manager import manager_factory\r\n\r\ndata_manager = manager_factory(\"sqlite,usearch\", vector_params={\"dimension\": 10})\r\n```\r\n\r\n## What's Changed\r\n* Improve the unit test flow by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F397\r\n* Add: USearch vector search engine by @VoVoR in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F399\r\n* Add the saved token report, auto flush data by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F401\r\n* Use the fastapi to improve the GPTCache server by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F405\r\n* Update the version to `0.1.29` by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F406\r\n\r\n## New Contributors\r\n* @VoVoR made their first contribution in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F399\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.28...0.1.29","2023-06-02T08:51:34",{"id":253,"version":254,"summary_zh":255,"released_at":256},360530,"0.1.28","## 🎉 Introduction to new functions of GPTCache\r\n\r\nTo handle a large prompt, there are currently two options available:\r\n\r\n1. Increase the column size of CacheStorage.\r\n\r\n```python\r\nfrom gptcache.manager import manager_factory\r\n\r\ndata_manager = manager_factory(\r\n    \"sqlite,faiss\", scalar_params={\"table_len_config\": {\"question_question\": 5000}}\r\n)\r\n\r\n```\r\nMore Details:\r\n- 'question_question': the question column size in the question table, default to 3000.\r\n- 'answer_answer': the answer column size in the answer table, default to 3000.\r\n- 'session_id': the session id column size in the session table, default to 1000.\r\n- 'dep_name': the name column size in the dep table, default to 1000.\r\n- 'dep_data': the data column size in the dep table, default to 3000.\r\n\r\n2. When using a template, use the dynamic value in the template as the cache key instead of using the entire template as the key.\r\n\r\n- **str template**\r\n```python\r\nfrom gptcache import Config\r\nfrom gptcache.processor.pre import last_content_without_template\r\n\r\ntemplate_obj = \"tell me a joke about {subject}\"\r\nprompt = template_obj.format(subject=\"animal\")\r\nvalue = last_content_without_template(\r\n    data={\"messages\": [{\"content\": prompt}]}, cache_config=Config(template=template_obj)\r\n)\r\nprint(value)\r\n# ['animal']\r\n```\r\n\r\n- **langchain prompt template**\r\n\r\n```python\r\nfrom langchain import PromptTemplate\r\n\r\nfrom gptcache import Config\r\nfrom gptcache.processor.pre import last_content_without_template\r\n\r\ntemplate_obj = PromptTemplate.from_template(\"tell me a joke about {subject}\")\r\nprompt = template_obj.format(subject=\"animal\")\r\n\r\nvalue = last_content_without_template(\r\n    data={\"messages\": [{\"content\": prompt}]},\r\n    cache_config=Config(template=template_obj.template),\r\n)\r\nprint(value)\r\n# ['animal']\r\n```\r\n\r\n3. Wrap the openai object, reference: [BaseCacheLLM](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Fdev\u002Freferences\u002Fadapter.html#module-gptcache.adapter.base)\r\n\r\n```python\r\nimport random\r\n\r\nfrom gptcache import Cache\r\nfrom gptcache.adapter import openai\r\nfrom gptcache.adapter.api import init_similar_cache\r\nfrom gptcache.processor.pre import last_content\r\n\r\ncache_obj = Cache()\r\ninit_similar_cache(\r\n    data_dir=str(random.random()), pre_func=last_content, cache_obj=cache_obj\r\n)\r\n\r\n\r\ndef proxy_openai_chat_complete(*args, **kwargs):\r\n    nonlocal is_proxy\r\n    is_proxy = True\r\n    import openai as real_openai\r\n\r\n    return real_openai.ChatCompletion.create(*args, **kwargs)\r\n\r\n\r\nopenai.ChatCompletion.llm = proxy_openai_chat_complete\r\nopenai.ChatCompletion.cache_args = {\"cache_obj\": cache_obj}\r\n\r\nopenai.ChatCompletion.create(\r\n    model=\"gpt-3.5-turbo\",\r\n    messages=[\r\n        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\r\n        {\"role\": \"user\", \"content\": \"What's GitHub\"},\r\n    ],\r\n)\r\n```\r\n\r\n## What's Changed\r\n* Add the BaseCacheLLM abstract class to wrap the llm by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F394\r\n* Add the pre-function of handling long prompt and Update context doc by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F395\r\n* Support to config the context pre-process by the yaml file by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F396\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.27...0.1.28","2023-05-29T16:07:56",{"id":258,"version":259,"summary_zh":260,"released_at":261},360531,"0.1.27","## 🎉 Introduction to new functions of GPTCache\r\n\r\n1. Support the uform embedding, which can be used the **bilingual** (english + chinese) language\r\n\r\nthanks @ashvardanian 's contribution\r\n\r\n```python\r\nfrom gptcache.embedding import UForm\r\n\r\ntest_sentence = 'Hello, world.'\r\nencoder = UForm(model='unum-cloud\u002Fuform-vl-english')\r\nembed = encoder.to_embeddings(test_sentence)\r\n\r\ntest_sentence = '什么是Github'\r\nencoder = UForm(model='unum-cloud\u002Fuform-vl-multilingual')\r\nembed = encoder.to_embeddings(test_sentence)\r\n```\r\n\r\n## What's Changed\r\n* Fix the wrong LangChainChat comment by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F381\r\n* Add UForm multi-modal embedding by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F382\r\n* Support to config the cache storage data size by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F383\r\n* Update the protobuf version in the doc by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F387\r\n* Update the version to `0.1.27` by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F389\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.26...0.1.27","2023-05-25T16:01:40",{"id":263,"version":264,"summary_zh":265,"released_at":266},360532,"0.1.26","## 🎉 Introduction to new functions of GPTCache\r\n\r\n1. Support the paddlenlp embedding @vax521\r\n\r\n```python\r\nfrom gptcache.embedding import PaddleNLP\r\n\r\ntest_sentence = 'Hello, world.'\r\nencoder = PaddleNLP(model='ernie-3.0-medium-zh')\r\nembed = encoder.to_embeddings(test_sentence)\r\n```\r\n\r\n2. Support [the openai Moderation api](https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fapi-reference\u002Fmoderations)\r\n\r\n```python\r\nfrom gptcache.adapter import openai\r\nfrom gptcache.adapter.api import init_similar_cache\r\nfrom gptcache.processor.pre import get_openai_moderation_input\r\n\r\ninit_similar_cache(pre_func=get_openai_moderation_input)\r\nopenai.Moderation.create(\r\n    input=\"hello, world\",\r\n)\r\n```\r\n\r\n3. Add the llama_index bootcamp, through which you can learn how GPTCache works with llama index\r\n\r\ndetails: [WebPage QA](https:\u002F\u002Fgptcache.readthedocs.io\u002Fen\u002Flatest\u002Fbootcamp\u002Fllama_index\u002Fwebpage_qa.html)\r\n\r\n## What's Changed\r\n* Replace summarization test model. by @wxywb in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F368\r\n* Add the llama index bootcamp by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F371\r\n* Update the llama index example url by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F372\r\n* Support the openai moderation adapter by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F376\r\n* Paddlenlp embedding support by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F377\r\n* Update the cache config template file and example directory by @SimFG in https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fpull\u002F380\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fzilliztech\u002FGPTCache\u002Fcompare\u002F0.1.25...0.1.26","2023-05-23T13:34:00"]