[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-ianarawjo--ChainForge":3,"tool-ianarawjo--ChainForge":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":76,"owner_twitter":76,"owner_website":81,"owner_url":82,"languages":83,"stars":108,"forks":109,"last_commit_at":110,"license":111,"difficulty_score":23,"env_os":112,"env_gpu":113,"env_ram":112,"env_deps":114,"category_tags":120,"github_topics":121,"view_count":23,"oss_zip_url":76,"oss_zip_packed_at":76,"status":16,"created_at":128,"updated_at":129,"faqs":130,"releases":161},3382,"ianarawjo\u002FChainForge","ChainForge","An open-source visual programming environment for battle-testing prompts to LLMs.","ChainForge 是一款开源的可视化编程环境，专为大规模测试和优化大语言模型（LLM）提示词而设计。它解决了传统“聊天式”调试效率低下的痛点，让用户无需编写繁琐代码，即可通过直观的数据流图快速对比不同提示词、模型及参数设置下的响应质量。\n\n这款工具特别适合 AI 开发者、研究人员以及需要系统化评估模型表现的产品团队使用。其核心亮点在于支持同时向多个主流模型（如 OpenAI、Anthropic、Google Gemini 及本地 Ollama 模型等）发送请求，并内置了灵活的评估指标系统，能即时将测试结果可视化。此外，ChainForge 还能利用生成式 AI 自动创建测试数据或辅助编写评估代码，大幅缩短实验周期。无论是进行严谨的基准测试，还是快速验证创意想法，ChainForge 都能帮助用户以更科学、高效的方式找到最佳模型配置方案。","# ⛓️🛠️ ChainForge \n\n**An open-source visual environment for battle-testing prompts to LLMs.** [![Mentioned in Awesome Chainforge](https:\u002F\u002Fawesome.re\u002Fmentioned-badge.svg)](https:\u002F\u002Fgithub.com\u002FloloMD\u002Fawesome_chainforge)\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1405689845245804628?label=discord&logo=discord&color=7289da)](https:\u002F\u002Fdiscord.gg\u002FDNpuumFx)\n\n\u003Cimg width=\"1517\" alt=\"banner\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fianarawjo_ChainForge_readme_64088f7eba5c.png\">\n\nChainForge is a data flow prompt engineering environment for analyzing and evaluating LLM responses. It enables rapid-fire, quick-and-dirty comparison of prompts, models, and response quality that goes beyond ad-hoc chatting with individual LLMs. With ChainForge, you can:\n\n- **Query multiple LLMs at once** to test prompt ideas and variations quickly and effectively.\n- **Compare response quality across prompt permutations, across models, and across model settings** to choose the best prompt and model for your use case.\n- **Setup evaluation metrics** (scoring function) and immediately visualize results across prompts, prompt parameters, models, and model settings.\n- **Use AI to streamline this entire process**: Create synthetic tables and input examples with built-in genAI features, or supercharge writing evals by prompting a model to give you starter code.  \n\n[Read the docs to learn more.](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002F) ChainForge comes with a number of example evaluation flows to give you a sense of what's possible, including 188 example flows generated from benchmarks in OpenAI evals.\n\nChainForge is built on [ReactFlow](https:\u002F\u002Freactflow.dev) and [Flask](https:\u002F\u002Fflask.palletsprojects.com\u002Fen\u002F2.3.x\u002F).\n\n**_For user-curated resources and learning materials, check out the [🌟Awesome ChainForge](https:\u002F\u002Fgithub.com\u002FloloMD\u002Fawesome_chainforge) repo!_** \n\n\n# Table of Contents\n\n- 👉 [Documentation](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002F) 📖\n- [Installation](#installation)\n- [Example Experiments](#example-experiments)\n- [Share with Others](#share-with-others)\n- [Features](#features) (see the [docs](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fnodes\u002F) for more comprehensive info)\n- [Development and How to Cite](#development)\n\n# Installation\n\nYou can install ChainForge locally, or try it out on the web at **https:\u002F\u002Fchainforge.ai\u002Fplay\u002F**. The web version of ChainForge has a limited feature set. In a locally installed version you can load API keys automatically from environment variables, write Python code to evaluate LLM responses, or query locally-run models hosted via Ollama.\n\nTo install Chainforge on your machine, make sure you have Python 3.8 or higher, then run\n\n```bash\npip install chainforge\n```\n\nOnce installed, do\n\n```bash\nchainforge serve\n```\n\nOpen [localhost:8000](http:\u002F\u002Flocalhost:8000\u002F) in a Google Chrome, Firefox, Microsoft Edge, or Brave browser.\n\nYou can set your API keys by clicking the Settings icon in the top-right corner. If you prefer to not worry about this everytime you open ChainForge, we **highly recommend** that save your OpenAI, Anthropic, Google, etc API keys and\u002For Amazon AWS credentials to your local environment. For more details, see the [How to Install](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fgetting_started\u002F).\n\n## Run using Docker\n\nYou can use our [Dockerfile](\u002FDockerfile) to run `ChainForge` locally using `Docker Desktop`:\n\n- Build the `Dockerfile`:\n  ```shell\n  docker build -t chainforge .\n  ```\n\n- Run the image:\n  ```shell\n  docker run -p 8000:8000 chainforge\n  ```\n\nNow you can open the browser of your choice and open `http:\u002F\u002F127.0.0.1:8000`.\n\n# Supported providers\n\n- OpenAI\n- Anthropic\n- Google Gemini\n- DeepSeek\n- HuggingFace (Inference and Endpoints)\n- Together.ai\n- [Ollama API](https:\u002F\u002Fgithub.com\u002Fjmorganca\u002Follama) (locally-hosted models)\n- Microsoft Azure OpenAI Endpoints\n- [Aleph Alpha](https:\u002F\u002Fdocs.aleph-alpha.com\u002Fdocs\u002Fintroduction)\n- Amazon Bedrock-hosted on-demand inference, including Anthropic Claude 3\n- ...and any other provider through [custom provider scripts](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fcustom_providers\u002F)!\n\n# Example experiments\n\nWe've prepared many example flows to give you a sense of what's possible with Chainforge.\nClick the \"Example Flows\" button on the top-right corner and select one. Here is a basic comparison example, plotting the length of responses across different models and arguments for the prompt parameter `{game}`:\n\n\u003Cimg width=\"1593\" alt=\"basic-compare\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fianarawjo_ChainForge_readme_438e0f279c8b.png\">\n\nYou can also conduct **ground truth evaluations** using Tabular Data nodes. For instance, we can compare each LLM's ability to answer math problems by comparing each response to the expected answer:\n\n\u003Cimg width=\"1775\" alt=\"Screen Shot 2023-07-04 at 9 21 50 AM\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fianarawjo_ChainForge_readme_8551975bb39c.png\">\n\nJust import a dataset, hook it up to a template variable in a Prompt Node, and press run. \n\n# Compare responses across models and prompts\n\nCompare across models and prompt variables with an interactive response inspector, including a formatted table and exportable data:\n\n\u003Cimg width=\"1460\" alt=\"Screen Shot 2023-07-19 at 5 03 55 PM\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fianarawjo_ChainForge_readme_9c19107e0ae2.png\">\n\nThe key power of ChainForge lies in **combinatorial power**: ChainForge takes the _cross product_ of inputs to prompt templates, meaning you can produce every combination of input values.\nThis is incredibly effective at sending off hundreds of queries at once to verify model behavior more robustly than one-off prompting. \n\nHere's [a tutorial to get started comparing across prompt templates](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fcompare_prompts\u002F).\n\n# Share with others\n\nThe web version of ChainForge (https:\u002F\u002Fchainforge.ai\u002Fplay\u002F) includes a Share button.\n\nSimply click Share to generate a unique link for your flow and copy it to your clipboard:\n\n![ezgif-2-a4d8048bba](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fianarawjo_ChainForge_readme_20a52824cee4.png)\n\nFor instance, here's a experiment I made that tries to get an LLM to reveal a secret key: https:\u002F\u002Fchainforge.ai\u002Fplay\u002F?f=28puvwc788bog\n\n> **Note**\n> To prevent abuse, you can only share up to 10 flows at a time, and each flow must be \u003C5MB after compression.\n> If you share more than 10 flows, the oldest link will break, so make sure to always Export important flows to `cforge` files,\n> and use Share to only pass data ephemerally.\n\nFor finer details about the features of specific nodes, check out the [List of Nodes](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fnodes\u002F).\n\n# Features\n\nA key goal of ChainForge is facilitating **comparison** and **evaluation** of prompts and models. Overall, you can:\n\n- **Compare across prompts and prompt parameters**: Find the best set of prompts that maximizes your eval target metrics (e.g., lowest code error rate). Or, see how changing parameters in a prompt template affects the quality of responses.\n- **Compare across models**: Compare responses for every prompt across models and different model settings, to find the best model for your use case. \n\nThe features that enable this area:\n\n- **Prompt permutations**: Setup a prompt template and feed it variations of input variables. ChainForge will prompt all selected LLMs with all possible permutations of the input prompt, so that you can get a better sense of prompt quality. You can also chain prompt templates at arbitrary depth (e.g., to compare templates).\n- **Model settings**: Change the settings of supported models, and compare across settings. For instance, you can measure the impact of a system message on ChatGPT by adding several ChatGPT models, changing individual settings, and nicknaming each one. ChainForge will send out queries to each version of the model.\n- **Evaluation nodes**: Probe LLM responses in a chain and test them (classically) for some desired behavior. At a basic level, this is Python script based. We plan to add preset evaluator nodes for common use cases in the near future (e.g., name-entity recognition). Note that you can also chain LLM responses into prompt templates to help evaluate outputs cheaply before more extensive evaluation methods.\n- **Visualization nodes**: Visualize evaluation results on plots like grouped box-and-whisker (for numeric metrics) and histograms (for boolean metrics). Currently we only support numeric and boolean metrics. We aim to provide users more control and options for plotting in the future.\n- **Chat turns**: Go beyond prompts and template follow-up chat messages, just like prompts. You can test how the wording of the user's query might change an LLM's output, or compare quality of later responses across multiple chat models (or the same chat model with different settings!).\n\nAlongside built-in [gen AI features 🪄💫](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fgen_ai\u002F) like synthetic data generation, prompt engineering is accelerated: you can compare prompts and model performance sometimes **_without needing to write a single line of code_**, speeding up the process of iteration and discovery tenfold. \n\nWe've also found that some users simply want to use ChainForge to make tons of parametrized queries to LLMs (e.g., chaining prompt templates into prompt templates), possibly score them, and then output the results to a spreadsheet (Excel `xlsx`). To do this, attach an Inspect node to the output of a Prompt node and click `Export Data`.\n\nFor more specific details, see our [documentation](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fnodes\u002F).\n\n---\n\n# Development\n\nChainForge was created by [Ian Arawjo](http:\u002F\u002Fianarawjo.com\u002Findex.html), a postdoctoral scholar in Harvard HCI's [Glassman Lab](http:\u002F\u002Fglassmanlab.seas.harvard.edu\u002F) with support from the Harvard HCI community. Collaborators include PhD students [Priyan Vaithilingam](https:\u002F\u002Fpriyan.info) and [Chelse Swoopes](https:\u002F\u002Fseas.harvard.edu\u002Fperson\u002Fchelse-swoopes), Harvard undergraduate [Sean Yang](https:\u002F\u002Fshawsean.com), and faculty members [Elena Glassman](http:\u002F\u002Fglassmanlab.seas.harvard.edu\u002Fglassman.html) and [Martin Wattenberg](https:\u002F\u002Fwww.bewitched.com\u002Fabout.html). Additional collaborators include UC Berkeley PhD student Shreya Shankar and Université de Montréal undergraduate Cassandre Hamel.\n\nThis work was partially funded by the NSF grants IIS-2107391, IIS-2040880, and IIS-1955699. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.\n\nWe provide ongoing releases of this tool in the hopes that others find it useful for their projects.\n\n## Inspiration and Links\n\nChainForge is meant to be general-purpose, and is not developed for a specific API or LLM back-end. Our ultimate goal is integration into other tools for the systematic evaluation and auditing of LLMs. We hope to help others who are developing prompt-analysis flows in LLMs, or otherwise auditing LLM outputs. This project was inspired by own our use case, but also shares some comraderie with two related (closed-source) research projects, both led by [Sherry Wu](https:\u002F\u002Fwww.cs.cmu.edu\u002F~sherryw\u002F):\n\n- \"PromptChainer: Chaining Large Language Model Prompts through Visual Programming\" (Wu et al., CHI ’22 LBW) [Video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=p6MA8q19uo0)\n- \"AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts\" (Wu et al., CHI ’22)\n\nUnlike these projects, we are focusing on supporting evaluation across prompts, prompt parameters, and models.\n\n## How to collaborate?\n\nWe welcome open-source collaborators. If you want to report a bug or request a feature, open an [Issue](https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fissues). We also encourage users to implement the requested feature \u002F bug fix and submit a Pull Request.\n\n---\n\n# Cite Us\n\nIf you use ChainForge for research purposes, whether by building upon the source code or investigating LLM behavior using the tool, we ask that you cite our [CHI research paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Ffull\u002F10.1145\u002F3613904.3642016) in any related publications. The BibTeX you can use is:\n\n```bibtex\n@inproceedings{arawjo2024chainforge,\n  title={ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing},\n  author={Arawjo, Ian and Swoopes, Chelse and Vaithilingam, Priyan and Wattenberg, Martin and Glassman, Elena L},\n  booktitle={Proceedings of the CHI Conference on Human Factors in Computing Systems},\n  pages={1--18},\n  year={2024}\n}\n```\n\n# License\n\nChainForge is released under the MIT License.\n","# ⛓️🛠️ ChainForge \n\n**一个用于对齐大语言模型提示进行实战测试的开源可视化环境。** [![在Awesome Chainforge中提及](https:\u002F\u002Fawesome.re\u002Fmentioned-badge.svg)](https:\u002F\u002Fgithub.com\u002FloloMD\u002Fawesome_chainforge)\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1405689845245804628?label=discord&logo=discord&color=7289da)](https:\u002F\u002Fdiscord.gg\u002FDNpuumFx)\n\n\u003Cimg width=\"1517\" alt=\"banner\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fianarawjo_ChainForge_readme_64088f7eba5c.png\">\n\nChainForge 是一个基于数据流的提示工程环境，用于分析和评估大语言模型的响应。它能够快速、简便地比较不同的提示、模型及响应质量，超越了与单个大语言模型进行临时性对话的方式。通过 ChainForge，您可以：\n\n- **同时查询多个大语言模型**，以快速有效地测试提示创意及其变体。\n- **跨提示排列、跨模型以及跨模型设置比较响应质量**，从而为您的应用场景选择最佳的提示和模型。\n- **设置评估指标**（评分函数），并立即可视化不同提示、提示参数、模型及模型设置下的结果。\n- **利用 AI 流程化整个过程**：借助内置的生成式 AI 功能创建合成表格和输入示例，或通过让模型为您提供初始代码来加速评估脚本的编写。\n\n[阅读文档以了解更多信息。](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002F) ChainForge 自带许多示例评估流程，帮助您了解其功能，其中包括从 OpenAI 评估基准中生成的 188 个示例流程。\n\nChainForge 基于 [ReactFlow](https:\u002F\u002Freactflow.dev) 和 [Flask](https:\u002F\u002Fflask.palletsprojects.com\u002Fen\u002F2.3.x\u002F) 构建。\n\n**_如需用户精选的资源和学习材料，请查看 [🌟Awesome ChainForge](https:\u002F\u002Fgithub.com\u002FloloMD\u002Fawesome_chainforge) 仓库！_**\n\n# 目录\n\n- 👉 [文档](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002F) 📖\n- [安装](#installation)\n- [示例实验](#example-experiments)\n- [与他人共享](#share-with-others)\n- [功能](#features)（更多详细信息请参阅 [文档](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fnodes\u002F)）\n- [开发与引用方式](#development)\n\n# 安装\n\n您可以在本地安装 ChainForge，也可以在网页版上试用：**https:\u002F\u002Fchainforge.ai\u002Fplay\u002F**。网页版的功能较为有限。而在本地安装的版本中，您可以从环境变量中自动加载 API 密钥，编写 Python 代码来评估大语言模型的响应，或查询通过 Ollama 托管的本地运行模型。\n\n要在您的机器上安装 ChainForge，请确保已安装 Python 3.8 或更高版本，然后运行：\n\n```bash\npip install chainforge\n```\n\n安装完成后，执行以下命令：\n\n```bash\nchainforge serve\n```\n\n在 Google Chrome、Firefox、Microsoft Edge 或 Brave 浏览器中打开 [localhost:8000](http:\u002F\u002Flocalhost:8000\u002F)。\n\n您可以通过点击右上角的设置图标来配置 API 密钥。如果您不想每次打开 ChainForge 时都手动设置这些密钥，我们**强烈建议**将您的 OpenAI、Anthropic、Google 等 API 密钥和\u002F或 Amazon AWS 凭证保存到本地环境变量中。更多详情请参阅 [安装指南](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fgetting_started\u002F)。\n\n## 使用 Docker 运行\n\n您可以使用我们的 [Dockerfile](\u002FDockerfile) 在本地通过 `Docker Desktop` 运行 `ChainForge`：\n\n- 构建 Dockerfile：\n  ```shell\n  docker build -t chainforge .\n  ```\n\n- 运行镜像：\n  ```shell\n  docker run -p 8000:8000 chainforge\n  ```\n\n现在您可以打开任意浏览器，并访问 `http:\u002F\u002F127.0.0.1:8000`。\n\n# 支持的提供商\n\n- OpenAI\n- Anthropic\n- Google Gemini\n- DeepSeek\n- HuggingFace（推理与端点）\n- Together.ai\n- [Ollama API](https:\u002F\u002Fgithub.com\u002Fjmorganca\u002Follama)（本地托管模型）\n- Microsoft Azure OpenAI 端点\n- [Aleph Alpha](https:\u002F\u002Fdocs.aleph-alpha.com\u002Fdocs\u002Fintroduction)\n- Amazon Bedrock 托管的按需推理服务，包括 Anthropic Claude 3\n- …以及其他任何可通过 [自定义提供商脚本](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fcustom_providers\u002F) 集成的提供商！\n\n# 示例实验\n\n我们准备了许多示例流程，帮助您了解 ChainForge 的强大功能。请点击右上角的“示例流程”按钮并选择一个。以下是一个基本的比较示例，展示了不同模型和提示参数 `{game}` 下的响应长度：\n\n\u003Cimg width=\"1593\" alt=\"basic-compare\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fianarawjo_ChainForge_readme_438e0f279c8b.png\">\n\n您还可以使用表格数据节点进行**真值评估**。例如，我们可以通过将每个大语言模型的回答与预期答案进行比较，来评估它们解答数学问题的能力：\n\n\u003Cimg width=\"1775\" alt=\"Screen Shot 2023-07-04 at 9 21 50 AM\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fianarawjo_ChainForge_readme_8551975bb39c.png\">\n\n只需导入一个数据集，将其连接到提示节点中的模板变量，然后点击运行即可。\n\n# 跨模型和提示比较响应\n\n通过交互式响应检查器，您可以跨模型和提示变量进行比较，该检查器包含格式化的表格和可导出的数据：\n\n\u003Cimg width=\"1460\" alt=\"Screen Shot 2023-07-19 at 5 03 55 PM\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fianarawjo_ChainForge_readme_9c19107e0ae2.png\">\n\nChainForge 的核心优势在于其**组合能力**：它会对提示模板的输入进行**笛卡尔积**运算，这意味着您可以生成所有可能的输入值组合。这种方式非常高效，能够一次性发送数百个查询，从而比单次提示更稳健地验证模型的行为。\n\n以下是[开始比较不同提示模板的教程](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fcompare_prompts\u002F)。\n\n# 与他人共享\n\nChainForge 的网页版（https:\u002F\u002Fchainforge.ai\u002Fplay\u002F）包含一个“分享”按钮。\n\n只需点击“分享”即可生成您流程的唯一链接，并将其复制到剪贴板：\n\n![ezgif-2-a4d8048bba](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fianarawjo_ChainForge_readme_20a52824cee4.png)\n\n例如，这是我制作的一个尝试让大语言模型泄露秘密密钥的实验：https:\u002F\u002Fchainforge.ai\u002Fplay\u002F?f=28puvwc788bog\n\n> **注意**\n> 为防止滥用，您一次最多只能分享 10 个流程，且每个流程压缩后不得超过 5MB。\n> 如果您分享超过 10 个流程，最早的链接将会失效，因此请务必始终将重要流程导出为 `cforge` 文件，\n> 并仅使用“分享”功能临时传递数据。\n\n有关特定节点功能的更多细节，请参阅 [节点列表](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fnodes\u002F)。\n\n# 功能\n\nChainForge 的一个关键目标是促进提示和模型的**比较**与**评估**。总体而言，您可以：\n\n- **跨提示及提示参数进行比较**：找到能够最大化您的评估指标（例如最低代码错误率）的最佳提示组合。或者，查看更改提示模板中的参数如何影响响应质量。\n- **跨模型进行比较**：比较不同模型及其不同设置下对每个提示的响应，从而为您的用例选择最佳模型。\n\n实现这些功能的主要特性包括：\n\n- **提示排列组合**：设置一个提示模板，并为其输入变量提供多种变化。ChainForge 将使用所有可能的输入提示排列组合来调用所有选定的 LLM，以便您更好地了解提示的质量。您还可以以任意深度串联提示模板（例如用于比较不同的模板）。\n- **模型设置**：更改支持的模型设置，并在不同设置之间进行比较。例如，您可以通过添加多个 ChatGPT 模型、调整各自的不同设置并为每个模型命名，来衡量系统消息对 ChatGPT 的影响。ChainForge 会向每个版本的模型发送查询。\n- **评估节点**：在链中探测 LLM 的响应，并针对某些期望的行为对其进行测试（传统方式）。从基础层面来看，这基于 Python 脚本。我们计划在不久的将来为常见用例添加预设的评估节点（例如命名实体识别）。请注意，您也可以将 LLM 的响应串联到提示模板中，在采用更复杂的评估方法之前，以较低的成本对输出进行初步评估。\n- **可视化节点**：在分组箱线图（适用于数值指标）和直方图（适用于布尔指标）等图表上可视化评估结果。目前我们仅支持数值和布尔指标。未来我们将为用户提供更多绘图控制和选项。\n- **对话轮次**：不仅限于提示和模板，还可以像提示一样跟踪后续的聊天消息。您可以测试用户查询措辞的变化如何影响 LLM 的输出，或比较多个聊天模型（或同一聊天模型的不同设置）在后续回复中的质量。\n\n除了内置的 [生成式 AI 功能 🪄💫](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fgen_ai\u002F)，如合成数据生成外，提示工程也得到了加速：您有时甚至可以在**无需编写任何代码的情况下**比较提示和模型性能，从而使迭代和发现过程提速十倍。\n\n我们还发现，有些用户只是想利用 ChainForge 向 LLM 发出大量带参数的查询（例如将提示模板串联到提示模板中），可能对其打分，然后将结果导出到电子表格（Excel `xlsx` 文件）。为此，只需将“检查”节点连接到“提示”节点的输出端，并点击“导出数据”。\n\n有关更详细的信息，请参阅我们的[文档](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fnodes\u002F)。\n\n---\n\n# 开发\n\nChainForge 由哈佛大学人机交互实验室 [Glassman 实验室](http:\u002F\u002Fglassmanlab.seas.harvard.edu\u002F) 的博士后学者 [Ian Arawjo](http:\u002F\u002Fianarawjo.com\u002Findex.html) 在哈佛 HCI 社区的支持下创建。合作者包括博士生 [Priyan Vaithilingam](https:\u002F\u002Fpriyan.info) 和 [Chelse Swoopes](https:\u002F\u002Fseas.harvard.edu\u002Fperson\u002Fchelse-swoopes)、哈佛大学本科生 [Sean Yang](https:\u002F\u002Fshawsean.com)，以及教职员工 [Elena Glassman](http:\u002F\u002Fglassmanlab.seas.harvard.edu\u002Fglassman.html) 和 [Martin Wattenberg](https:\u002F\u002Fwww.bewitched.com\u002Fabout.html)。其他合作者还包括加州大学伯克利分校博士生 Shreya Shankar 和蒙特利尔大学本科生 Cassandre Hamel。\n\n本项目部分由 NSF 资助，资助编号分别为 IIS-2107391、IIS-2040880 和 IIS-1955699。本文所表达的所有观点、发现、结论或建议均属作者个人观点，不一定反映美国国家科学基金会的观点。\n\n我们持续发布该工具的新版本，希望它能对其他人的项目有所帮助。\n\n## 灵感与链接\n\nChainForge 定位为通用工具，并非专为某个特定的 API 或 LLM 后端而开发。我们的最终目标是将其集成到其他工具中，用于 LLM 的系统性评估和审计。我们希望帮助那些正在开发 LLM 提示分析流程或进行 LLM 输出审计的人。该项目的灵感来源于我们自身的使用场景，同时也与两个相关的（闭源）研究项目有着一定的共鸣，这两个项目均由 [Sherry Wu](https:\u002F\u002Fwww.cs.cmu.edu\u002F~sherryw\u002F) 领导：\n\n- “PromptChainer：通过可视化编程串联大型语言模型提示”（Wu 等人，CHI ’22 LBW）[视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=p6MA8q19uo0)\n- “AI 链：通过串联大型语言模型提示实现透明且可控的人工智能交互”（Wu 等人，CHI ’22）\n\n与这些项目不同的是，我们专注于支持跨提示、提示参数和模型的评估。\n\n## 如何参与合作？\n\n我们欢迎开源社区的贡献者。如果您想报告 bug 或请求新功能，请在 [GitHub](https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fissues) 上提交一个 issue。我们也鼓励用户自行实现请求的功能或修复 bug，并提交 pull request。\n\n---\n\n# 引用我们\n\n如果您出于研究目的使用 ChainForge，无论是基于源代码进行二次开发，还是利用该工具研究 LLM 行为，我们都恳请您在相关出版物中引用我们的 [CHI 研究论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Ffull\u002F10.1145\u002F3613904.3642016)。您可以使用的 BibTeX 格式如下：\n\n```bibtex\n@inproceedings{arawjo2024chainforge,\n  title={ChainForge：用于提示工程和 LLM 假设检验的可视化工具},\n  author={Arawjo, Ian and Swoopes, Chelse and Vaithilingam, Priyan and Wattenberg, Martin and Glassman, Elena L},\n  booktitle={人机交互计算系统 CHI 大会论文集},\n  pages={1--18},\n  year={2024}\n}\n```\n\n# 许可证\n\nChainForge 采用 MIT 许可证发布。","# ChainForge 快速上手指南\n\nChainForge 是一个开源的可视化提示词工程环境，专为大规模测试、分析和评估大语言模型（LLM）响应而设计。它支持通过数据流图的方式，快速对比不同提示词、模型及参数设置的效果。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Windows, macOS 或 Linux\n*   **Python 版本**：Python 3.8 或更高版本\n*   **浏览器**：推荐使用 Google Chrome, Firefox, Microsoft Edge 或 Brave\n*   **API 密钥**：准备好所需 LLM 提供商的 API Key（如 OpenAI, Anthropic, Google Gemini, DeepSeek 等），或配置好本地运行的 Ollama 服务。\n\n> **提示**：为了免去每次启动都手动输入密钥的麻烦，建议将 API Key 配置到本地环境变量中。\n\n## 安装步骤\n\n您可以选择直接通过 pip 安装，或使用 Docker 运行。\n\n### 方式一：使用 pip 安装（推荐）\n\n1.  打开终端或命令行工具，运行以下命令安装 ChainForge：\n    ```bash\n    pip install chainforge\n    ```\n\n2.  安装完成后，启动服务：\n    ```bash\n    chainforge serve\n    ```\n\n3.  在浏览器中访问：\n    `http:\u002F\u002Flocalhost:8000\u002F`\n\n### 方式二：使用 Docker 运行\n\n如果您偏好容器化部署，可以使用 Docker Desktop：\n\n1.  构建镜像：\n    ```shell\n    docker build -t chainforge .\n    ```\n\n2.  运行容器：\n    ```shell\n    docker run -p 8000:8000 chainforge\n    ```\n\n3.  在浏览器中访问：\n    `http:\u002F\u002F127.0.0.1:8000`\n\n### 配置 API 密钥\n\n进入界面后，点击右上角的 **Settings (设置)** 图标输入 API Key。若已通过环境变量配置，系统将自动加载。\n\n## 基本使用\n\nChainForge 的核心优势在于**组合测试**（Combinatorial Testing），即一次性生成所有输入变量的排列组合并发送给多个模型。\n\n### 1. 创建基础实验流程\n\n1.  **添加节点**：从左侧面板拖入一个 **Prompt Node**（提示词节点）和一个 **LLM Node**（模型节点）。\n2.  **连接节点**：将 Prompt Node 的输出连接到 LLM Node 的输入。\n3.  **编写提示词模板**：在 Prompt Node 中使用花括号定义变量，例如：\n    `请解释以下概念：{concept}，并用{tone}的语气回答。`\n4.  **设置变量值**：在节点设置中为 `{concept}` 和 `{tone}` 提供多个值（例如 concept: [\"量子纠缠\", \"区块链\"], tone: [\"专业\", \"幽默\"]）。\n5.  **选择模型**：在 LLM Node 中选择您要测试的模型（如 GPT-4, Claude 3, DeepSeek 等）。\n\n### 2. 运行与对比\n\n点击 **Run** 按钮。ChainForge 会自动计算输入值的笛卡尔积（本例中将生成 2x2=4 个请求），并向选定的模型发送所有组合的查询。\n\n### 3. 查看与评估结果\n\n*   **交互式检查**：使用 **Inspect Node** 查看生成的响应表格，支持排序和过滤。\n*   **可视化分析**：连接 **Visualization Node**，将响应质量（如长度、特定关键词出现率或通过 Python 脚本计算的得分）以箱线图或直方图形式展示。\n*   **导出数据**：点击 Inspect 节点中的 `Export Data` 按钮，可将结果导出为 Excel (`.xlsx`) 文件进行进一步分析。\n\n### 4. 尝试示例流程\n\n新手可以直接点击界面右上角的 **\"Example Flows\"** 按钮，加载官方预设的实验模板（如“多模型响应长度对比”或“基于真值的数学题评估”），快速理解工作流构建逻辑。","某电商公司的算法团队正在为智能客服系统优化“处理退货请求”的提示词，试图在礼貌性与问题解决率之间找到最佳平衡点。\n\n### 没有 ChainForge 时\n- 工程师只能手动在多个聊天窗口中分别测试不同模型（如 GPT-4、Claude 3），反复复制粘贴提示词变体，效率极低且容易出错。\n- 缺乏统一的对比视图，难以直观判断哪种提示词结构或模型参数组合能产生最稳定的回复质量。\n- 评估过程依赖人工主观打分，无法快速建立量化指标（如“是否包含退款链接”）来批量验证数百条生成结果。\n- 当需要构造多样化的用户退货场景（如“商品损坏”、“发错货”）作为测试集时，编写合成数据耗时费力。\n\n### 使用 ChainForge 后\n- 通过可视化节点流同时向 OpenAI、Anthropic 等多个模型发送请求，一键并行测试数十种提示词变体，将迭代周期从几天缩短至几分钟。\n- 利用内置的数据流图表直接并排对比不同模型和参数下的回复差异，迅速锁定表现最优的“模型 - 提示词”组合。\n- 自定义 Python 评分函数自动检测回复中是否包含关键信息（如退款政策、物流指引），即时生成可视化报表量化效果。\n- 调用内置生成式 AI 功能自动创建涵盖各种边缘案例的合成测试数据集，大幅提升了评估的全面性与鲁棒性。\n\nChainForge 将原本碎片化、靠直觉的提示词调试过程，转变为可量化、可视化的科学实验流程，显著提升了大模型应用落地的可靠性。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fianarawjo_ChainForge_20a52824.gif","ianarawjo",null,"https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fianarawjo_7b95d4e7.png","Human-Computer Interaction @ Université de Montréal | Quebec | Boston | Ithaca | Bethlehem PA","Rotten Cartridge","Montréal, QC","https:\u002F\u002Ftwitter.com\u002FIanArawjo","https:\u002F\u002Fgithub.com\u002Fianarawjo",[84,88,92,96,100,104],{"name":85,"color":86,"percentage":87},"TypeScript","#3178c6",91.5,{"name":89,"color":90,"percentage":91},"Python","#3572A5",5.3,{"name":93,"color":94,"percentage":95},"CSS","#663399",2.6,{"name":97,"color":98,"percentage":99},"JavaScript","#f1e05a",0.4,{"name":101,"color":102,"percentage":103},"HTML","#e34c26",0.2,{"name":105,"color":106,"percentage":107},"Dockerfile","#384d54",0,2967,254,"2026-04-02T09:50:55","MIT","未说明","非必需（支持通过 Ollama 调用本地模型，也可完全使用云端 API）",{"notes":115,"python":116,"dependencies":117},"该工具主要作为可视化提示工程环境，核心逻辑基于 Python Flask 后端和 ReactFlow 前端。安装后可通过浏览器访问本地服务 (localhost:8000)。支持多种大模型提供商的 API（如 OpenAI, Anthropic, Google 等），若需运行本地模型需额外安装并配置 Ollama。可通过 Docker 部署。","3.8+",[118,119],"Flask>=2.3","ReactFlow (前端依赖)",[26,14,54,15,13],[122,123,124,125,126,127],"ai","evaluation","large-language-models","llmops","llms","prompt-engineering","2026-03-27T02:49:30.150509","2026-04-06T05:17:39.131201",[131,136,141,146,151,156],{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},15541,"在 v0.2 版本中使用 Azure OpenAI 模型时出现 \"Cannot read properties of undefined\" 错误，如何解决？","该问题通常是由于 API 密钥配置不正确导致的。请查阅项目根目录下的 INSTALL_GUIDE.md 文档，确认已正确设置 Azure OpenAI 所需的特定 API 密钥和 Endpoint。维护者已更新文档以反映不同模型提供商的具体配置要求。","https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fissues\u002F82",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},15542,"将 ChainForge 部署到服务器（如 Docker\u002FKubernetes）时，后端调用因硬编码的 host 和端口失败怎么办？","这是一个已知问题，已在后续版本中通过改用相对路径解决。如果您遇到此问题，请升级到最新版本。修复后，后端 API 调用将不再依赖硬编码的 `localhost:8000`，而是使用相对路径（如 `\u002Fapp\u002F${route}`），从而兼容代理和容器化部署环境。","https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fissues\u002F181",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},15543,"如何添加自定义模型或使用非内置的 LLM 提供商？","ChainForge 支持用户添加自定义模型。您可以创建一个自定义提供商节点，编写自己的 JSON API 请求来调用模型端点，并命名您的模型。系统会根据 API 参数自动生成表单供您配置。该功能已在实验性分支中实现并合并，允许用户集成如 Cohere 等外部模型。","https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fissues\u002F20",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},15544,"遇到大模型响应超时（默认 18 秒）导致测试失败，如何延长超时时间？","维护者已在新版本中移除了超时限制的上限。请通过 `pip` 升级到最新版本的 ChainForge 即可解决。此外，如果您修改了源代码但浏览器未生效，请注意浏览器缓存问题：尝试在开发者工具中手动清除缓存，或使用无痕模式打开页面。若需从源码修改 TypeScript 代码，请务必参考官方文档重新构建代码。","https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fissues\u002F308",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},15545,"如何在界面中复制一个节点（Node）？","在画布（\u002Fplay 页面）上右键点击节点即可看到复制选项。如果本地运行时无效，请检查是否正在运行旧版本，建议升级到最新版本以获取完整功能。","https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fissues\u002F159",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},15546,"内置提供商（如 Gemini）显示的模型列表过时，如何更新默认模型列表？","对于内置提供商，模型列表通常随软件版本更新。如果界面上显示的模型较旧（如旧版 Gemini 模型），请首先尝试升级 ChainForge 到最新版本。如果仍需特定新模型且内置列表未更新，目前的解决方案是创建一个“自定义提供商”来手动指定最新的模型端点和名称。","https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fissues\u002F317",[162,167,172,177,182,187,192,197,202,207,212,217,222,227,232,237,242,247,252,257],{"id":163,"version":164,"summary_zh":165,"released_at":166},90180,"v0.3.6","## 向 OpenAI、Anthropic、Google 和 Ollama 提供商发送图像输入  \n现在您可以将图像上传到新的 **Media Node**，并使用图像输入进行查询：\n\n\u003Cimg width=\"1661\" alt=\"Screen Shot 2025-05-11 at 2 04 53 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F941c1f0f-1cbd-4b8a-a92f-523ae6e0ce42\" \u002F>\n\n目前此功能仅限于 OpenAI、Anthropic、Google 和 Ollama。如果您希望支持更多提供商，请考虑提交一个 PR。与 LLM 相关的调用位于 `utils.ts` 文件中。\n\n### 图像输出作为图像输入  \n\n此外，还支持使用一个模型生成图像，再通过提示链将其传递给另一个模型，例如：\n\u003Cimg width=\"1083\" alt=\"Screen Shot 2025-05-10 at 3 27 01 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F3b653f66-b11d-4e65-babb-2b298a7405d6\" \u002F>\n\n### …还有许多其他内部改动  \n\n此次更新需要对 ChainForge 存储媒体的方式进行大量调整。\n\n图像在浏览器中会占用大量空间。为此，ChainForge 不再将图像直接存储在浏览器中。取而代之的是，我们为每张图像保留一个基于其 SHA-256 哈希值生成的唯一 ID，仅在必要时（例如在表格视图中查看图像时）才通过该 ID 查询后端并拉取图像。\n\n在本地运行时，ChainForge 现在会将您上传的媒体文件复制到与保存流程文件相同位置的 `media` 目录中。\n\n### 因此，ChainForge 现在可以导出和导入 `cfzip` **捆绑包**，即压缩后的 `.cforge`（JSON）文件以及流程中使用的媒体文件。\n\n（在 chainforge.ai\u002Fplay 上以浏览器模式运行时，ChainForge 仍将继续在前端缓存图像。）\n\n这些更改将使我们在不久的将来更容易添加数据密集型分析管道，例如将大量文档和分块加载到 RAG 流程中。\n\n如果您在此版本中遇到任何问题，请随时告知我们。\n\n# 现在，ChainForge 仅支持 Python 3.10 及以上版本。  \n我们已弃用对 3.8 和 3.9 的支持，因为 `markitdown` 包要求使用 3.10。这是一项面向未来的改进，因为许多其他与 RAG 相关的包现在也要求使用 3.10 左右的版本。\n\n## 开发者  \n\n### 此次更新由 @loloMD 和 @RoyHEyono 的工作推动，他们为 ChainForge 的查询基础设施添加了 Media Node 和图像输入功能。感谢 Roy 和 Loic！🎉🙌📺","2025-05-11T19:44:26",{"id":168,"version":169,"summary_zh":170,"released_at":171},90181,"v0.3.5","本次更新为本地运行 ChainForge 增加了四项功能：\n\n- 现在可以将节点和模型（按其精确设置）设为“收藏”♥️，以便稍后重新创建。\n  - 要收藏某个节点，只需右键点击并选择“收藏”。\n  - 要收藏某个模型，请打开该模型的设置界面，然后点击右上角的“收藏”按钮。\n  - 收藏内容会在不同会话之间保持不变。\n  \n- 全局设置⚙️ 现在也会跨会话保存，通过将 `settings.json` 文件存储到与已保存流程相同的目录中来实现。\n  \n- 如果您正在本地运行 Ollama 🦙，菜单中的模型列表现在会自动填充，方便访问（如果未显示，请刷新页面）：\n\n![屏幕快照 2025-04-14 下午8:30:25](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F71e85327-e461-4ffd-a166-7a9cd1053c1f)\n\n- 如果需要，现在可以对本地保存的流程及设置配置文件进行密码加密🔐。只需在运行 `chainforge serve` 时添加 `--secure` 标志即可。有关选项，请参阅 `chainforge serve --help`。（“导出”和“导入”UI 按钮将继续仅处理未加密的流程。）\n  - 这对于希望使用新的全局设置存储功能在前端界面上存放 API 密钥，但又担心将此类信息保存为文本文件的情况非常有用。有一个“settings”选项专门用于加密此设置文件。\n\n- “添加节点”中的节点列表更加美观，并新增了嵌套菜单。\n\n选择您的收藏 ♥️：\n\n[图片链接](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F8510c082-c8a3-4842-a9da-90700cfd30d1)\n\n并对您的流程进行加密 🔐：\n\n![屏幕快照 2025-04-14 下午3:46:35](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F256c260b-11ff-489b-926d-0a74e812416a)\n\n![屏幕快照 2025-04-14 下午3:44:39](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F5c50b5b1-362a-4c4f-aed2-f8f9eb6d391a)\n\n这是使 ChainForge 更加可定制且更安全的一项举措的一部分。\n\n### 作为附加功能，本次更新还增加了对 OpenAI GPT-4.1 和 Google Gemini 2.5 Pro 模型的支持。\n\n此次更新还进行了一些代码结构调整，以提升代码质量。`App.tsx` 文件现在使用自定义的 `NestedMenu`，模型下拉菜单也是如此，因为 Mantine 的 ContextMenu 功能过于有限。`App.tsx` 中的节点菜单也更接近 JSON 规范，而非直接以 React 元素形式编写。\n\n> [!警告]\n> 本次更新对代码库进行了大量改动，包括多项重构。如果您遇到任何问题，请提交 Issue。","2025-04-15T03:10:54",{"id":173,"version":174,"summary_zh":175,"released_at":176},90182,"v0.3.4","本次更新带来了三项重要的体验优化：\n- 应用程序中的字符串现在会频繁地被驻留到全局的 `StringLookup` 表中，以提升性能并减少重复内存占用。当 LLM 响应的数量超过 1000 条时，这一改进将显著提升性能，同时也能减小导出的 `.cforge` 文件的内存占用。\n- 新增“已保存流程”侧边栏，用于在本地运行 ChainForge 时跟踪您的流程，并提供“保存”按钮。该功能与 Python 后端交互，数据会存储在由 `platformdirs` 推荐的目录中。具体路径会显示在侧边栏的页脚中（以便您自行管理）。\n- 响应检查器中的表格视图现使用 [Mantine React Table](https:\u002F\u002Fwww.mantine-react-table.com)。这带来了以下功能：\n  - 按值对列进行排序\n  - 可选择性地显示或隐藏列\n  - 列内筛选\n  - 在表格向下滚动时固定表头\n  - 分页功能，以提升大型表格的渲染性能\n\n\u003Cimg width=\"1202\" alt=\"Screen Shot 2025-03-02 at 4 44 46 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F8736c61f-5a3d-4cef-a82f-20267020a654\" \u002F>\n\n\u003Cimg width=\"1352\" alt=\"Screen Shot 2025-02-28 at 2 58 04 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ff5f3adee-cf0c-4d04-9f07-f5fe2b459ed0\" \u002F>\n\n此外，响应检查器在首次计算所选视图时，会尝试采用懒加载方式，并显示加载动画。\n\n此次更改是将 `StorageCache` 迁移到 Python 后端的第一步，当内存占用超过一定阈值时便会触发迁移。这将使前端在运行大规模实验时更加轻量和精简。\n\n其他小幅改动：\n* 移除了对 `anthropic` 和 `google` 包的不必要的依赖。其中后者在我机器上安装 `grpcio` 时曾导致程序卡死。\n\n> [!WARNING]\n> 此次更改涉及 ChainForge 的大量源代码文件，并改变了 `cforge` 文件的导入和导出方式。尽管这些改动应保持向后兼容且无 bug，但我们无法完全保证这一点。如果出现性能问题，请回滚到之前的版本。\n\n","2025-02-28T19:59:33",{"id":178,"version":179,"summary_zh":180,"released_at":181},90183,"v0.3.2.5","ChainForge 新版本 ✨现已发布：只需一个提示即可生成表格，轻松扩展行数、像魔法一样添加列！\n\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F8c696a78-8a40-4896-854a-db6060762ad4\n\n","2024-12-19T20:51:44",{"id":183,"version":184,"summary_zh":185,"released_at":186},90184,"v0.3.1.5","这是首次在 ChainForge 主体中加入 `MultiEval` 节点的版本，同时包含以下改进：\n- 对响应检查器表格视图进行了优化，支持以列式布局展示多维度评分；\n- 当检测到多个评估器时，表格视图现为默认显示方式。\n\n效果如下：\n\n\u003Cimg width=\"1321\" alt=\"Screen Shot 2024-03-17 at 12 21 37 AM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F28dcd7e5-8214-4afc-8691-e7182f8ae2f0\">\n\n如你所见，Multi-Eval 允许在同一节点内定义多个针对单个响应的评估器。借助此功能，你可以基于多种标准对响应进行评估。评估器可以是代码评估器或 LLM 评估器的任意组合，具体配置由你决定；并且你还可以为每个评估器单独选择不同的 LLM 打分模型。\n\n### 这是 `MultiEval` 节点的“测试版”，原因有二：\n1. MultiEval 的输出句柄目前处于禁用状态，因为它尚无法与 VisNodes 配合使用，以跨多个标准绘制数据图表。这属于一个独立的问题，我不想因此拖慢本次发布进度。该功能正在开发中。\n2. MultiEval 尚未集成类似代码评估器节点中的生成式 AI 功能。我希望以更完善的方式实现这一功能（当然，这与 EvalGen 是另一回事）。设想用户只需在提示中描述评估标准，AI 就能根据每个标准自动推荐最合适的评估器并将其添加到列表中。作为临时解决方案，你仍可利用生成式 AI 功能在单个代码评估器中生成代码，并将这些代码移植到 MultiEval 中。\n\n此外，我们还计划推出 [`EvalGen` 向导](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.12272)，旨在帮助用户在人工监督下自动生成评估指标。目前，我们在 `multi-eval` 分支上已有相关实现（但由于前端采用了 TypeScript 重写，暂时无法直接合并到 `main` 分支），不过该实现尚未整合 Shreya 提供的修复内容。","2024-04-25T18:01:52",{"id":188,"version":189,"summary_zh":190,"released_at":191},90185,"v0.3.1","这一改动已筹备逾一个月。其中最重大的变化是将 ChainForge 的整个前端代码——总计数万行——重写为 TypeScript。更多详情如下。\n\n# 图像模型支持（首个支持 Dall-E OpenAI 模型）\n\n\u003Cimg width=\"714\" alt=\"Screen Shot 2024-03-30 at 7 03 45 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F25d3d977-d3b4-4172-ac81-3094dd842e69\">\n\n默认情况下，图像会使用 `compressorjs` 进行压缩，压缩后画质无明显下降，平均压缩至原文件大小的约 60%。用户可在设置窗口的“高级”选项卡中关闭压缩功能，但建议保持开启状态。\n\n#### 图像模型的自定义提供者\n您的自定义提供者现在可以返回图像数据，而不仅仅是文本。从提供者返回的数据应为 JSON 字典格式：`{ t: \"img\", d: \u003Cbase64_str_png_encoded_image> }`。（只需提供 Base64 编码的数据部分，无需包含 `\"data...image:png,\"` 等元数据前缀。）\n\n请注意，目前我们尚不支持：\n * 将图像导出到 Excel 表格的单元格中（如果检测到图像，导出到 Excel 功能将被禁用）。\n\n> [!WARNING] \n> 请务必注意，图像会迅速占用浏览器存储空间，并可能导致自动保存功能失效。\n\n# 前端重写为 TypeScript\n\n整个前端现已转换为 `tsx` 文件，并添加了相应的类型定义。在此过程中，还进行了其他优化重构。\n\n在排查潜在 Bug 和未执行代码的同时，我们致力于简化并统一 LLMResponse 在前后端的存储方式。尽管仍不完美，但如今通过 `LLMResponse` 和 `LLMResponseData`，开发者可以更清晰地了解存储响应的格式。\n\n此次改动使开发者能够更有信心地扩展 ChainForge，主要得益于 TypeScript 能够及时捕获核心代码变更带来的副作用。例如，借助 TypeScript，我们仅用 2 小时便实现了对图像模型的支持——只需将 `LLMResponseData` 的类型由字符串改为字符串 | 包含图像数据的对象即可。若没有 TypeScript 提供的对核心数据类型变更下游影响的清晰可见性，这样的改动将难以安全高效地完成。未来，这也将有助于我们进一步支持多模态视觉模型。\n\n# 节点自定义右键操作\n\n现在，右键单击节点可提供更多选项。\n- 文本框节点可转换为条目节点\n- 条目节点可转换为文本框节点\n- 可清除提示和对话轮次节点的 LLM 响应缓存：\n\n\u003Cimg width=\"420\" alt=\"Screen Shot 2024-03-30 at 7 41 37 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F5dae6ef0-c043-4d7f-9d00-66b5a67ebd9c\">\n\n# Amazon Bedrock 支持\n\n感谢 @massi-ang，ChainForge 已新增对 Amazon Bedrock 托管模型的支持。我们刚刚添加了这些 API 端点，由于我本人无法直接测试（暂无访问权限），因此若您遇到任何问题，请…","2024-03-31T02:59:20",{"id":193,"version":194,"summary_zh":195,"released_at":196},90186,"v0.3","## 新增 Anthropic Claude 3 模型。\n\n* 后端现使用 `messages` API 来调用 Claude 2.1 及更高版本的模型。\n* 在 Claude 设置中新增了 `system` 消息参数。\n\n## 增加基于 [pyodide](https:\u002F\u002Fpyodide.org\u002Fen\u002Fstable\u002F) 的浏览器沙盒 Python 环境\n\n现在您可以在浏览器中完全安全地运行 Python 代码，前提是不需要导入第三方库。位于 [chainforge.ai\u002Fplay](https:\u002F\u002Fchainforge.ai\u002Fplay\u002F) 的**网页托管版本**已解锁 Python 评估器：\n\n\u003Cimg width=\"1661\" alt=\"Screen Shot 2024-03-05 at 11 08 46 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002Fa05ec44e-c99c-426e-a9e7-23b42017b359\">\n\nChainForge 的**本地版本**则提供了一个开关，用于启用或禁用沙盒模式：\n\n\u003Cimg width=\"402\" alt=\"Screen Shot 2024-03-03 at 9 23 18 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F1e2f6be3-2b63-4f57-9c0f-690a7fd62a4b\">\n\n如果您关闭沙盒模式，系统将回退到之前的 Python 评估器，该评估器通过 Flask 后端在您的本地机器上执行。在非沙盒模式的评估节点中，您可以导入 Python 环境中可用的任何库。\n\n## 为什么要使用沙盒？\n\n沙盒模式的优势在于，ChainForge 现在可以直接执行由大语言模型生成的 Python 代码，例如在您的评估函数中使用 `eval()` 或 `exec()`。虽然以前也可以做到这一点，但存在较大的安全隐患。对于那些不依赖第三方库的基准测试，比如 HumanEvals 的 pass@1 准确率，现在完全可以在浏览器中直接运行（如果有人想搭建这样的环境，请随时联系我！）。","2024-03-06T04:30:09",{"id":198,"version":199,"summary_zh":200,"released_at":201},90187,"add-prettier","大家好，\n\n感谢 @massi-ang 提交的 PR https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fpull\u002F223 和 https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fpull\u002F222，我们已将 Prettier 和 ESLint 添加到 ChainForge 的 `main` 分支中。\n\n现在，`npm run build` 会自动运行 `prettier` 和 `eslint`，我们也鼓励大家在向 ChainForge 的 `main` 分支提交 PR 之前先运行它们。\n\n我们知道，这对基于 ChainForge 进行开发的开发者来说可能会有些麻烦，因为这会让基于最新 `main` 更改进行变基变得更为繁琐。我自己也有同样的感受——我一直在开发的 `multi-eval` 分支中的更改，现在合并起来就更加困难了。不过，引入一致的代码格式化和 lint 检查，能够为开发者贡献提供更规范的标准，不再像以前那样采用随意的编码方式。\n\n最近，我用于维护项目代码整洁性的时间确实减少了。但我认为，下一步应该是 **将整个前端代码迁移到 TypeScript**。这样不仅可以为开发者的贡献提供更多保障，还有望捕获现有的一些潜在 bug，并且能够在 ChainForge 中建立一个标准化、可扩展的 `ResponseObject` 格式。具体来说：\n- 对于那些希望添加自定义组件的开发者而言，这将明确响应数据的格式，从而提供可靠的保证；\n- 同时，这种格式也易于扩展，以支持更多类型的数据，比如作为 GPT-4 Vision 模型输入的图像，或者作为 DALL-E 等模型输出的图像。\n\n此外，我还设想：\n- 在 Inspector 中更好地封装响应的展示方式，例如创建一个专门的 React 组件（如 `ResponseBox`），以便在存在图像输出时也能轻松处理；\n- 改进内部响应（即包含“query”的那一部分）的存储机制，尽量减少 LLM 设置信息的重复，从而有效降低文件体积。目前，LLM 设置中的重复信息会导致文件迅速膨胀。因此，对于特定设置下的 LLM，应当使用唯一的 ID 来引用一个查找表；\n- 更新或新增更多示例流程，比如对比不同提示词、测试 JSON 格式、多轮评估等；\n- 编写关于如何创建新节点的开发文档。\n\n显然，LLM 并不会很快退出舞台，而对其输出质量的评估仍然面临着诸多挑战。如果我们齐心协力，就能把 ChainForge 打造成一个用于“快速尝试与验证”的图形化界面——帮助用户快速搭建提示词和链路原型，高效测试 LLM 的行为，而无需再依赖临时性的对话、命令行工具，或是手动编写代码。\n\nChainForge 的核心理念是透明性和完全可控性。我们始终致力于向开发者公开提示词内容（参见：https:\u002F\u002Fhamel.dev\u002Fblog\u002Fposts\u002Fprompt\u002F）。同时，开发者也应该能够访问模型所使用的精确设置。如果 ChainForge 未来加入提示词优化等功能，那么始终显示原始提示词也将至关重要。\n\n欢迎大家就这些改动发表意见，或者告诉我们你对未来还有什么期待。如果你是一名开发者，请务必考虑参与贡献哦！ :)","2024-02-24T16:50:10",{"id":203,"version":204,"summary_zh":205,"released_at":206},90188,"v0.2.9.5","此版本包含多项更改，其中一项是大大简化了跨系统消息的比较操作。[文档已更新](https:\u002F\u002Fchainforge.ai\u002Fdocs)，以反映这些变化（例如，请参阅[常见问题解答](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Ffaq\u002F)）。\n\n## 在表格数据节点中新增随机采样切换开关\n\n\u003Cimg width=\"836\" alt=\"random-sampling\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F00cd7aa0-9bee-4ee4-98d1-9b06ffe18017\">\n\n## 新增形式为 `{=setting_name}` 的[设置模板变量](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fprompt_templates\u002F#settings-variables)，允许用户像参数化提示词一样对设置进行参数化。\n\n例如，以下展示了如何跨系统消息进行比较：\n\n\u003Cimg width=\"1539\" alt=\"compare-sys-msgs\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002Fdd170cee-bd4b-4da7-9068-ab508f1a39ac\">\n\n再举一个例子，比较不同温度下的输出：\n\n\u003Cimg width=\"698\" alt=\"settings-vars\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F8bc185d9-de77-49d7-855b-ab49a5b8a000\">\n\n此外，[文档](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fprompt_templates\u002F#settings-variables)也已相应修订，以说明这些新功能。\n\n## 其他较小的改动、错误修复及用户体验优化\n\n- 移除了可能令人烦扰的红色通知点\n- 在加载新流程之前，完全清空 ReactFlow 的状态\n- 对提示节点中的模板钩子更新进行了防抖处理，以避免在用户编辑提示时频繁触发\n- 通过添加 `uid` 参数来追踪响应的来源信息。此举旨在特别记录当每个提示的生成数量 `n` 大于 1 时，某条响应属于哪个批次。这修复了评估器检查器中 `n` 大于 1 的输出被拆分的问题。","2024-01-20T01:49:21",{"id":208,"version":209,"summary_zh":210,"released_at":211},90189,"v0.2.8.9","对 Dalai 的支持已由 [","2024-01-08T23:53:59",{"id":213,"version":214,"summary_zh":215,"released_at":216},90190,"v0.2.8","Adds purple sparkly generative AI button to TextFields and Items Nodes, courtesy of @shawseanyang ! \r\n\r\nThis button gives you easy access to generating input values using LLM calls to OpenAI GPT-3.5 and GPT-4 models. You can access to two features:\r\n- **Replace**: Replace the existing fields with new fields, given the prompt entered.\r\n- **Extend**: Given the existing fields, extrapolate the pattern and extend the list the best you can. \r\n\r\n![image](https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002Ffb92921f-2ef8-4e89-910b-575059f9ed13)\r\n\r\n## Try it out on [chainforge.ai\u002Fplay](https:\u002F\u002Fchainforge.ai\u002Fplay\u002F), or install locally via `pip install chainforge --upgrade` (BYO OpenAI API key!)\r\n\r\n_Note: You must have input an OpenAI API key (either directly in Settings, or via environment variables) to use generative AI features. In the future, we might support letting users change the model used for this feature, if there's interest (or if you submit a PR! :)_\r\n\r\n## Yes, prompts generate too... sort of! \r\n\r\nReplace will also consider if you ask for prompts and can generate prompt templates. Extend will also consider prompt templates and try to keep to your existing template variables. If you use prompt generation, you are probably best off Extending after manually entering 2-3 prompts\u002Fprompt templates as examples. \r\n\r\n### Note that prompt template generation is very experimental atm, and we're working on improving this aspect.\r\n\r\n\u003Cimg width=\"581\" alt=\"Screen Shot 2023-12-12 at 10 34 45 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F2874ae94-81c4-43b2-9c57-6f21b6810647\">\r\n\r\n## This feature is in BETA. There may be rough edges, mistakes in generation, etc. \r\nHowever, we've been enjoying it greatly and found it very helpful for speeding up input data generation. Play around and let us know what you think!\r\n\r\n_If you don't want to see the buttons, you can toggle off AI support in the Settings Window. Also, you can toggle on the Autocomplete feature on TextFields nodes, if you're feeling experimental! :)_ \r\n\r\n# This feature represents a semester's worth of work from @shawseanyang! Thank you Sean! 🎉🥳 \r\n\r\n---------------------\r\nOther small changes in this release:\r\n - the max `Num of responses per prompt` counter on Prompt Nodes has been increased to 999\r\n - In-browser autosaving now disables if you start working with lots of LLM responses (talking files upwards of 20MB or more). `localStorage` can only handle so much.\r\n - Relatedly, when you tab away from the ChainForge browser tab, autosaving will not occur in the background. This is to save you performance and help with the check above. \r\n - API keys are now loaded from environment variables only upon load of the application, rather than every call, for consistency. ","2023-12-13T19:03:09",{"id":218,"version":219,"summary_zh":220,"released_at":221},90191,"v0.2.6.5","We added a  🔗**Join Node**, our first Processor node, which lets you concatenate responses and\u002For input data, within or across LLMs. \r\n\r\n\u003Cimg width=\"673\" alt=\"Screen Shot 2023-10-23 at 3 10 49 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002Fee91cb2f-2b86-4a6e-8506-c9fe17ae8d81\">\r\n\r\nFor instance, consider:\r\n\u003Cimg width=\"1731\" alt=\"Screen Shot 2023-10-23 at 3 29 26 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002Fbbaa40c8-b0f0-4e93-b2c1-a3efcce38a6e\">\r\n\r\nWe translate words one-by-one in the first Prompt Node:\r\n\r\n\u003Cimg width=\"329\" alt=\"Screen Shot 2023-10-23 at 3 29 41 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002Fa135a00b-2ad8-4055-ab97-78227109936e\">\r\n\r\nThen we can join the responses by category, fruit or dessert. Here I've opted for \"double newline\" formatting:\r\n\r\n\u003Cimg width=\"665\" alt=\"Screen Shot 2023-10-23 at 3 29 46 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F72ee5a48-d287-4b4b-bb12-65f75aea2033\">\r\n\r\nFinally we chain these lists of items into another Prompt Node, to have an LLM tell us which one is the sweetest item of the list: \r\n\r\n\u003Cimg width=\"659\" alt=\"Screen Shot 2023-10-23 at 3 30 06 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F503f885e-51ad-4ad4-a63d-80864cf53416\">\r\n\r\n### Questions? Comments?\r\n\r\nThe Join Node is a bit of an experimental node. It does a few things, but, please let us know if it doesn't fit your use case or is too limited. And, as always, you can just implement the changes you want, and submit a Pull Request --this will be faster if the change is minor (e.g., adding another formatting option to the Join Node).","2023-10-23T20:55:41",{"id":223,"version":224,"summary_zh":225,"released_at":226},90192,"v0.2.6","For weeks, many of you have asked for the ability to query custom models or providers in ChainForge. Given how fast this space is evolving --and also how idiosyncratic some of these APIs are --we decided it best to make ChainForge extensible. \r\n\r\nYou can now [add custom providers](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fcustom_providers\u002F) by writing simple completion functions in Python. Custom providers will be added to the list of providers in Prompt, Chat Turn, and LLM Scorer nodes. Added provider scripts are automatically cached, and persist across runs of ChainForge. \r\n\r\nHere's [an example script to add the Cohere API](https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fblob\u002Fmain\u002Fchainforge\u002Fexamples\u002Fcustom_provider_cohere.py), complete with a JSON schema defining custom settings options. You add this script by simply dropping it into the new \"Custom Providers\" tab in the ChainForge Settings window:\r\n\r\n![custom-providers](https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F70f363d0-1a59-47aa-bea9-650738c4e3e0)\r\n\r\nYou can then query the custom provider like normal:\r\n\r\n![custom-provider-query](https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F0fc6e042-75e5-43c8-b7ac-6fd33b538217)\r\n\r\nNote that only the local version of ChainForge (via `pip install`) supports custom providers. \r\n\r\n# [For extensive information, see the new \"Adding a custom provider\" page in the docs.](https:\u002F\u002Fchainforge.ai\u002Fdocs\u002Fcustom_providers\u002F)\r\nAs always, let us know if you encounter any problems! :)\r\n","2023-08-27T19:43:31",{"id":228,"version":229,"summary_zh":230,"released_at":231},90193,"docs","ChainForge now has documentation! Go here:\r\n\r\n# https:\u002F\u002Fchainforge.ai\u002Fdocs\u002F\r\n\r\nLet us know what you think!\r\n\r\n\u003Cimg width=\"1592\" alt=\"Screen Shot 2023-08-05 at 11 30 59 AM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002Fbcf82540-e2c9-4c76-b22b-4f2157653f7a\">\r\n","2023-08-05T15:45:55",{"id":233,"version":234,"summary_zh":235,"released_at":236},90194,"v0.2.5","We're excited to release two new nodes: **Chat Turns** and **LLM Scorers**. These nodes came from feedback during user sessions:\r\n - Some users wanted to first tell chat models 'how to act', and then wanted to put their real prompt in the second turn.\r\n - Some users wanted a quicker, cheaper way to 'evaluate' responses and visualize results. \r\n\r\nWe describe these new nodes below, as well as a few quality-of-life improvements. \r\n\r\n# 🗣️ Chat Turn nodes\r\nChat models are all the rage (in fact, they are so important that [OpenAI announced it would no longer support plain-old text generation models going forward](https:\u002F\u002Fopenai.com\u002Fblog\u002Fgpt-4-api-general-availability).) Yet strikingly, very few prompt engineering tools let you evaluate LLM outputs beyond a prompt.\r\n\r\nNow with Chat Turn nodes, you can continue conversations beyond a single prompt. In fact, you can: \r\n\r\n## Continue multiple conversations simultaneously across multiple LLMs\r\n\r\nJust connect the Chat Turn to your initial Prompt Node, and voilà:\r\n\r\n\u003Cimg width=\"1421\" alt=\"Screen Shot 2023-07-25 at 6 39 45 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F9039ce6b-a16d-4694-89fa-47a22636cd8a\">\r\n\r\nHere, I've first prompted four chat models: GPT3.5, GPT-4, Claude-2, and PaLM with the question: \"What was the first {game} game?\". Then I ask a follow-up question, \"What was the second?\" By default, Chat Turns continue the conversation with all LLMs that were used before, allowing you to follow-up on LLM responses in parallel. (You can also toggle that off, if you want to query different models --more details below).\r\n\r\n## Template chat messages, just like prompts \r\n\r\nYou can do everything you can with Chat Turns that you could with Prompt Nodes, including prompt templating and adding input variables. For instance, here's a prompt template as a follow-up message:\r\n\r\n\u003Cimg width=\"1184\" alt=\"Screen Shot 2023-07-25 at 1 22 15 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F497b5c6d-a830-4af6-b7fe-f9c5b5b6a132\">\r\n\r\n> **Note**\r\n> In fact, Chat Turns are merely modified Prompt Nodes, and use the underlying `PromptNode` class.\r\n\r\n## Start a conversation with one LLM, and continue it with a different LLM\r\n\r\nChat Turns include a toggle of whether you'd like to continue chatting with the same LLMs, or query different ones, passing chat context to the new models. With this, you can start a conversation with one LLM and continue it with another (or several):\r\n\r\n\u003Cimg width=\"1146\" alt=\"Screen Shot 2023-07-25 at 12 46 52 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F17e96f80-3344-49ff-b236-5a2cea017efd\">\r\n\r\n## Supported chat models\r\n\r\nSimple in concept, chat turns were the result of 2 weeks' work, revising many parts of the ChainForge backend to store and carry chat context. Chat history is automatically translated to the appropriate format for a number of providers:\r\n - OpenAI chat models\r\n - Anthropic models (Claude)\r\n - Google PaLM2 chat\r\n - HuggingFace (you need to set 'Model Type' in Settings to 'chat', and choose a Conversation model or custom endpoint. Currently there's only one chat model listed in ChainForge dropdown: `microsoft\u002FDialoGPT`. Go to the HuggingFace site to find more!)\r\n\r\n> **Warning**\r\n> If you use a non-chat, text completions model like GPT-2, chat turns will still function, but the chat context won't be passed into the text completions model.\r\n\r\nLet us know what you think!\r\n\r\n# 🤖 LLM Scorer nodes\r\n\r\nMore commonly called \"LLM evaluators\", LLM scorer nodes allow you to use an LLM to 'grade'\u002Fscore outputs of other LLMs:\r\n\r\n\u003Cimg width=\"342\" alt=\"Screen Shot 2023-07-25 at 6 44 01 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002Fa48d458c-9383-4040-888d-24a7c37a8f47\">\r\n\r\nAlthough ChainForge supported this functionality before via prompt chaining, it was not straightforward and required an additional chain to a code evaluator node for postprocessing. You can now connect the output of the scorer directly to a Vis Node to plot outputs. For instance, here's GPT-4 scoring whether different LLM responses apologized for a mistake: \r\n\r\n\u003Cimg width=\"1640\" alt=\"Screen Shot 2023-07-25 at 12 31 52 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002Fda7acda9-1d26-4fbf-ad73-a3b422455876\">\r\n\r\nNote that LLM scores are finicky --if one score isn't in the right format (true\u002Ffalse), visualization nodes won't work properly, because they'll think the outputs are notof  boolean type but categorical. We'll work on improving this, but, for now, enjoy LLM scorers!\r\n\r\n## ❗ Why we're not calling LLM scorers 'LLM evaluators'\r\nWe thought long and hard about what to call LLMs that score outputs of other LLMs. Ultimately, using LLMs to score outputs is helpful, and can save time when it's hard to write code to achieve the same effect. However, LLMs are imperfect. Although the AI community currently uses the term 'LLM evaluator,' we ultimately decided not to use that term, for a few reasons:\r\n 1. LLM scores should not ","2023-07-26T16:21:13",{"id":238,"version":239,"summary_zh":240,"released_at":241},90195,"v0.2.1.2","There's two minor, but important quality-of-life improvements in this release.\r\n\r\n# Table view\r\n\r\nNow in response inspectors, you can elect to see a table, rather than a hierarchical grouping of prompt variables:\r\n\r\n\u003Cimg width=\"1460\" alt=\"Screen Shot 2023-07-19 at 5 03 55 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F6aca2bd7-7820-4256-9e8b-3a87795f3e50\">\r\n\r\nColumns are prompt variables, followed by LLMs. We might add the ability to change columns in the future, if there's interest. \r\n\r\n# Persistent state in response inspectors\r\n\r\nResponse inspectors' state will, to an extent, persist across runs. For instance, say you were inspecting a specific response grouping:\r\n\r\n\u003Cimg width=\"901\" alt=\"Screen Shot 2023-07-19 at 5 04 21 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F3e9d8bb6-a0ea-4f21-bc91-c498879208f4\">\r\n\r\nImagine you now close the inspector window, delete one of the models and then increase num generations per prompt to 2. You will now see:\r\n\r\n\u003Cimg width=\"903\" alt=\"Screen Shot 2023-07-19 at 5 04 41 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F776f7e30-7437-4c5d-9378-f7803ec8caec\">\r\n\r\nRight where you left off, with the updated responses. It also keeps track if you've selected Table view, and retains the view you last selected.\r\n\r\n# Specify hostname and port (v0.2.1.3)\r\n\r\nI've added `--host` and `--port` flags when you're running ChainForge locally. You can specify what hostname and port to run it on like so:\r\n\r\n```\r\nchainforge serve --host 0.0.0.0 --port 3400\r\n```\r\n\r\nThe front-end app also knows you're running it from Flask (locally) regardless of what the hostname and port is. \r\n","2023-07-19T21:35:37",{"id":243,"version":244,"summary_zh":245,"released_at":246},90196,"v0.2.1","We've made several quality-of-life improvements from 0.2 to this release. \r\n\r\n# Prompt previews\r\n\r\nYou can now inspect what generated prompts will be sent off to LLMs. For a quick glance, simply hover over the 'list' icon on Prompt Nodes:\r\n\r\n![hover-over-prompt-preview](https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F32e47b32-38f0-4354-9c20-2f6f31c99806)\r\n\r\nFor full inspection, just click the button to bring up a popup inspector. \r\n\r\nThanks to Issue https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fissues\u002F90 raised by @profplum700 ! \r\n\r\n# Ability To Enable\u002FDisable Prompt Variables in Text Fields Without Deleting Them\r\n\r\nYou can now enable\u002Fdisable prompt variables selectively:\r\n\r\nhttps:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F92f9c869-8201-43d0-a4a5-8aee7524319e\r\n\r\nThanks to Issue https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fissues\u002F93 raised by @profplum700 !\r\n\r\n# Anthropic model Claude-2\r\n\r\nWe've also added the newest Claude model, Claude-2. All prior models remain supported; however, strangely, Claude-1 and 100k context models have disappeared from the Anthropic API documentation. So, if you are using earlier Claude models, just know that they may stop working at some future point. \r\n\r\n## Bug fixes\r\n\r\nThere have also been numerous bug fixes, including:\r\n- braces { and } inside Tabular Data tables are now escaped by default when data is pulled from the nodes, so that they are never treated as prompt templates\r\n- escaping template braces \\{ and \\} now removes the escape slash when generating prompts for models\r\n- outputs of Prompt Nodes, when chained into other Prompt Nodes, now escape the braces in LLM responses by default. Note that whenever prompts are generated, the escaped braces are cleaned up to just { and }. In response inspectors, input variables will appear with escaped braces, as input variables in ChainForge may themselves be templates. \r\n\r\n# Future Goals\r\n\r\nWe've been running pilot studies internally at Harvard HCI and getting some informal feedback. \r\n - One point that keeps coming up echoes Issue https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fissues\u002F56 , raised by @jjordanbaird : the ability to keep chat context and evaluate multiple chatbot turns. We are thinking to implement this as a `Chat Turn Node`, where optionally, one can provide \"past conversation\" context as input. The overall structure will be similar to Prompt Nodes, except that only Chat Models will be available. See https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fissues\u002F56 for more details. \r\n - Another issue we're aware of is the need for better documentation on what you can do with ChainForge, particularly on the rather unique feature of chaining prompt templates together. \r\n\r\nAs always, if you have any feedback or comments, open an Issue or start a Discussion. ","2023-07-12T21:33:20",{"id":248,"version":249,"summary_zh":250,"released_at":251},90197,"v0.2"," > **Note**\r\n > This release includes a breaking change regarding cache'ing responses. If you are working on a current flow, export your ChainForge flow to a `cforge` file before installing the new version.\r\n\r\nWe're closer than ever to hosting ChainForge on [chainforge.ai](http:\u002F\u002Fchainforge.ai), so that no installation is required to try it out. Latest changes below.\r\n\r\n# The entire backend has been rewritten in TypeScript 🥷🧑‍💻️\r\n\r\nThousands of lines of Python code, comprising nearly the entire backend, has been rewritten in TypeScript. The mechanism for generating prompt permutations, querying LLMs and cache'ing responses is performed now in the front-end (entirely in the browser).  Tests were added in `jest` to ensure the outputs of the TypeScript functions performed the same as their original Python versions. There are additional performance and maintainability benefits to adding static type checking. We've also added ample docstrings, which should help devs looking to get involved.\r\n\r\nFunctionally, you should not experience any difference (expect maybe a slight speed boost).\r\n\r\n# Javascript Evaluator Nodes 🧩\r\n\r\nBecause the application logic has moved to the browser, we added JavaScript evaluator nodes. These let you write evaluation functions in JavaScript, and function the same as Python evaluators.\r\n\r\nHere is a side-by-side comparison of JavaScript and Python evaluator nodes, showing semantically equivalent code and the in-node support for displaying console.log and print output: \r\n\r\n\u003Cimg width=\"678\" alt=\"Screen Shot 2023-06-30 at 12 08 27 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F09da964e-fd07-4cf2-a4c7-04fc0080b722\">\r\n\r\nWhen you are running ChainForge on `localhost`, you can still use Python evaluator nodes, which will execute on your local Flask server (the Python backend) as before. JavaScript evaluators run entirely in the browser (specifically, `eval` sandboxed inside an `iframe`). \r\n\r\n# HuggingFace Models 🤗\r\n\r\nWe added support for querying text generation models hosted on the [HuggingFace Inference API](https:\u002F\u002Fhuggingface.co\u002Finference-api). For instance, here is [falcon.7b.instruct](https:\u002F\u002Fhuggingface.co\u002Ftiiuae\u002Ffalcon-7b-instruct), an open-source model:\r\n\r\n\u003Cimg width=\"1107\" alt=\"Screen Shot 2023-06-30 at 2 15 46 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F344fbc65-f4a4-4b9f-9496-3ddb427db34c\">\r\n\r\nFor HF models, there is a 250 token limit. This can sometimes be rather limiting, so we've added a \"number of continuations\" setting to help with that. You can set it to > 0 to feed the response back into the API for text completions models, which will generate longer completions, for up to 1500 tokens. \r\n\r\nWe also support [HF Inference Endpoints](https:\u002F\u002Fhuggingface.co\u002Finference-endpoints) for text generation models. Simply put the API call URL in the `custom_model` field of the settings window. \r\n\r\n# Comment Nodes ✏️\r\n\r\nYou can write comments about your evaluation using a comment node:\r\n\r\n\u003Cimg width=\"306\" alt=\"Screen Shot 2023-06-30 at 2 18 03 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002Fe96df294-4b47-4575-9559-61883973d238\">\r\n\r\n# 'Browser unsupported' error 💢\r\n\r\nIf you load ChainForge on a mobile device or unsupported browser, it will now display an error message: \r\n\r\n\u003Cimg width=\"500\" alt=\"Screen Shot 2023-06-30 at 2 28 32 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002Fecfc0b79-9859-4612-8ad2-f8f9bc459469\">\r\n\r\nThis helps for our public release. If you'd like ChainForge to support more browsers, open an Issue or (better yet) make a Pull Request.\r\n\r\n# Fun example\r\n\r\nFinally, I wanted to share a fun practical example: an evaluation to **check if the LLM reveals a secret key**. This evaluation, including all API calls and JavaScript evaluation code, was run entirely in the browser:\r\n\r\n\u003Cimg width=\"1788\" alt=\"Screen Shot 2023-06-30 at 2 47 39 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F36cab316-419b-4257-980b-f6f6a3c82571\">\r\n\r\n#  Questions, concerns?\r\n\r\nOpen an Issue or start a Discussion!\r\n\r\nThis was a major, serious change to ChainForge. Although we've written tests, it's possible we have missed something, and there's a bug somewhere. **Note that unfortunately, Azure OpenAI 🔷 support is again untested following the rewrite, as we don’t have access to it. Someone in the community, let me know if it works for you! (Also, if you work at Microsoft and can give us access, let us know!)**\r\n\r\n# A browser-based, hosted version of ChainForge will be publicly available July 5th (next Wednesday) on chainforge.ai 🌍🎉","2023-06-30T19:15:52",{"id":253,"version":254,"summary_zh":255,"released_at":256},90198,"v0.1.7.2","This minor release includes two features:\r\n\r\n# Autosaving\r\n\r\n### Now, ChainForge autosaves your work to `localStorage` every 60 seconds.\r\nThis helps tremendously in case you accidentally close the window without exporting the flow, your systme crashes, or you encounter a bug. \r\n\r\nTo create a new flow now, just click the New Flow button to get a new canvas.\r\n\r\n# Plots now have clear with y-axis, x-axis, groupBy selectors on Vis Node\r\n\r\nWe've added a header bar to the Vis Node, clarifying what is plotted on each axis \u002F dimension:\r\n\r\n\u003Cimg width=\"588\" alt=\"Screen Shot 2023-06-23 at 9 32 58 AM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F74fc0f47-9390-4937-836d-77d24daad380\">\r\n\r\nIn addition, as you see above, the y-labels can be up to two lines (~40 chars long), making it easier to read. \r\n\r\nFinally, when num of generations per prompt is 1, we now output bar charts by default:\r\n\r\n\u003Cimg width=\"729\" alt=\"Screen Shot 2023-06-23 at 9 35 06 AM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F7a1266b2-622a-480a-938d-889950c6e90e\">\r\n\r\nBox-and-whiskers plots are still used whenever num generations n > 1.\r\n\r\nNote that improving the Vis Nodes is a work-in-progress, and functionally, everything is the same as before. \r\n","2023-06-23T13:38:30",{"id":258,"version":259,"summary_zh":260,"released_at":261},90199,"v0.1.7","\r\nWe've made a number of improvements to the inspector UI and beyond.  \r\n\r\n# Side-by-side comparison across LLM responses \r\nResponses now appear side-by-side for up to five LLMs queried:\r\n\r\n\u003Cimg width=\"1387\" alt=\"Screen Shot 2023-06-21 at 9 27 45 AM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002Fe739845c-cee5-422a-8567-505a195331dc\">\r\n\r\n# Collapseable response groups\r\nYou can also collapse LLM responses grouped by their prompt template variable, for easier selective inspection. Just click on a response group header to show\u002Fhide:\r\n\r\nhttps:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F452ab3ae-7a74-4b6c-a568-a6f14351b93d\r\n\r\n# Accuracy plots by default\r\n\r\nBoolean (true\u002Ffalse) evaluation metrics now use accuracy plots by default. For instance, for ChainForge's prompt injection example:\r\n\r\n\u003Cimg width=\"602\" alt=\"Screen Shot 2023-06-21 at 9 27 58 AM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F2509ca98-3b88-4b36-9e8c-8078b854871a\">\r\n\r\nThis makes it extremely easy to see differences across models for the specified evaluation. Stacked bar charts are still used when a prompt variable is selected. For instance, here is plotting a meta-variable, 'Domain', across two LLMs, testing whether or not the code outputs had an `import` statement (another new feature): \r\n\r\n\u003Cimg width=\"487\" alt=\"Screen Shot 2023-06-21 at 10 22 51 AM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002F41158437-ad54-4ba2-a989-a5fe071d6408\">\r\n\r\n# Added 'Inspect results' footer to both Prompt and Eval nodes\r\n\r\nThe tiny response previews footer in the Prompt Node has been changed to 'Inspect Responses' button that brings up a fullscreen response inspector. In addition, evaluation results can be easily inspected by clicking 'Inspect results':\r\n\r\n\u003Cimg width=\"1560\" alt=\"Screen Shot 2023-06-21 at 10 12 34 AM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002Fa3b642a7-ca34-475b-b8e7-a42b3f51d03c\">\r\n\r\nEvaluation scores appear in bold at the top of each response block:\r\n\r\n\u003Cimg width=\"1392\" alt=\"Screen Shot 2023-06-21 at 10 13 54 AM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002Faf4c9e00-576c-4dfd-9308-42f985d46471\">\r\n\r\nIn addition, both Prompt and Eval nodes now load cache'd results upon initialization. Simply load an example flow and click the respective Inspect button.\r\n\r\n## Added `asMarkdownAST` to `response` object in Evaluator node\r\n\r\nGiven how often developers wish to parse markdown, we've added a function `asMarkdownAST()` to the `ResponseInfo` class that uses the [`mistune` library](https:\u002F\u002Fmistune.lepture.com\u002Fen\u002Flatest\u002F) to parse markdown as an abstract syntax tree (AST). \r\n\r\nFor instance, here's code which detects if an 'import' statement appeared anywhere in the codeblocks of a chat response:\r\n\r\n\u003Cimg width=\"510\" alt=\"Screen Shot 2023-06-21 at 10 19 51 AM\" src=\"https:\u002F\u002Fgithub.com\u002Fianarawjo\u002FChainForge\u002Fassets\u002F5251713\u002Fc12c46e3-3371-415b-8ae5-c5819a24fd6a\">\r\n\r\n","2023-06-21T14:25:53"]