[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-zhongyu09--openchatbi":3,"tool-zhongyu09--openchatbi":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":82,"owner_website":82,"owner_url":83,"languages":84,"stars":97,"forks":98,"last_commit_at":99,"license":100,"difficulty_score":23,"env_os":101,"env_gpu":102,"env_ram":102,"env_deps":103,"category_tags":117,"github_topics":118,"view_count":10,"oss_zip_url":82,"oss_zip_packed_at":82,"status":16,"created_at":129,"updated_at":130,"faqs":131,"releases":159},714,"zhongyu09\u002Fopenchatbi","openchatbi","OpenChatBI is an intelligent chat-based BI tool powered by large language models, designed to help users query, analyze, and visualize data through natural language conversations. It uses LangGraph and LangChain to build chat agent and workflows that support natural language to SQL conversion and data analysis.","OpenChatBI 是一款基于大语言模型的开源智能对话式 BI 工具，旨在通过自然语言对话帮助用户轻松查询、分析和可视化数据。它消除了传统数据分析中编写 SQL 代码的门槛，让业务人员也能直接获取数据洞察，同时为开发者提供了灵活可扩展的分析框架。\n\nOpenChatBI 特别适合数据分析师、后端开发人员以及对数据应用感兴趣的研究者。它基于 LangGraph 和 LangChain 生态构建，核心亮点包括将自然语言自动转换为 SQL 语句、生成直观的图表、执行 Python 代码进行深度分析，甚至支持时间序列预测。此外，OpenChatBI 还具备强大的知识库管理和持久化记忆能力，能结合外部知识回答复杂问题，并支持 MCP 工具集成。无论是搭建内部数据平台还是探索 AI 数据分析，OpenChatBI 都提供了一个开箱即用的强大起点。","# OpenChatBI\n\nOpenChatBI is an open source, chat-based intelligent BI tool powered by large language models, designed to help users \nquery, analyze, and visualize data through natural language conversations. Built on LangGraph and LangChain ecosystem, \nit provides chat agents and workflows that support natural language to SQL conversion and streamlined data analysis.\n\nJoin the Slack channel to discuss: https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fopenchatbicommunity\u002Fshared_invite\u002Fzt-3jpzpx9mv-Sk88RxpO4Up0L~YTZYf4GQ\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fzhongyu09_openchatbi_readme_3609c67829db.gif\" alt=\"Demo\" width=\"800\">\n\n## Core Features\n\n1. **Natural Language Interaction**: Get data analysis results by asking questions in natural language\n2. **Automatic SQL Generation**: Convert natural language queries into SQL statements using advanced text2sql workflows\n   with schema linking and well organized prompt engineering\n3. **Data Visualization**: Generate intuitive data visualizations (via plotly)\n4. **Data Catalog Management**: Automatically discovers and indexes database table structures, supports flexible catalog\n   storage backends with vector-based or BM25-based retrieval, and easily maintains business explanations for tables\n   and columns as well as optimizes Prompts.\n5. **Time Series Forecasting**: Forecasting models deployed in-house that can be called as tools\n6. **Code Execution**: Execute Python code for data analysis and visualization\n7. **Interactive Problem-Solving**: Proactively ask users for more context when information is incomplete\n8. **Persistent Memory**: Conversation management and user characteristic memory based on LangGraph checkpointing\n9. **MCP Support**: Integration with MCP tools by configuration\n10. **Knowledge Base Integration**: Answer complex questions by combining catalog based knowledge retrival and external\n   knowledge base retrival (via MCP tools)\n11. **Web UI Interface**: Provide 2 sample UI: simple and streaming web interfaces using Gradio and Streamlit, easy to\n   integrate with other web applications\n\n## Roadmap\n\n1. **Anomaly Detection Algorithm**: Time series anomaly detection\n2. **Root Cause Analysis Algorithm**: Multi-dimensional drill-down capabilities for anomaly investigation\n\n# Getting started\n\n## Installation & Setup\n\n### Prerequisites\n\n- Python 3.11 or higher\n- Access to a supported LLM provider (OpenAI, Anthropic, etc.)\n- Data Warehouse (Database) credentials (like Presto, PostgreSQL, MySQL, etc.)\n- (Optional) Embedding model for vector-based retrieval - if not available, BM25-based retrieval will be used\n- (Optional) Docker - required only for `docker` executor mode\n\n**Note on Chinese Text Segmentation**: For better Chinese text retrieval, `jieba` is used for word segmentation. However, `jieba` is not compatible with Python 3.12+. On Python 3.12 and higher, the system automatically falls back to simple punctuation-based segmentation for Chinese text.\n\n### Installation\n\n1. **Using uv (recommended):**\n\n```bash\ngit clone git@github.com:zhongyu09\u002Fopenchatbi\nuv sync\n```\n\n2. **Using pip:**\n\n```bash\npip install openchatbi\n```\n\n3. **For development:**\n\n```bash\ngit clone git@github.com:zhongyu09\u002Fopenchatbi\nuv sync --group dev\n```\n\nOptional: If you want to use `pysqlite3` (newer SQLite builds), you can install it manually. If build fails, install SQLite first:\n\nOn macOS, try to install sqlite using Homebrew:\n```bash\nbrew install sqlite\nbrew info sqlite\nexport LDFLAGS=\"-L\u002Fopt\u002Fhomebrew\u002Fopt\u002Fsqlite\u002Flib\"\nexport CPPFLAGS=\"-I\u002Fopt\u002Fhomebrew\u002Fopt\u002Fsqlite\u002Finclude\"\n```\nOn Amazon Linux \u002F RHEL \u002F CentOS:\n```bash\nsudo yum install sqlite-devel\n```\nOn Ubuntu \u002F Debian:\n```bash\nsudo apt-get update\nsudo apt-get install libsqlite3-dev\n```\n\n### Run Demo\n\nRun demo using **example dataset** from spider dataset. You need to provide \"YOUR OPENAI API KEY\" or change config to use other LLM providers.\n\n**Note**: The demo example includes embedding model configuration. If you want to run without an embedding model, you can remove the `embedding_model` section in the config - BM25 retrieval will be used automatically.\n\n```bash\ncp example\u002Fconfig.yaml openchatbi\u002Fconfig.yaml\nsed -i 's\u002FYOUR_API_KEY_HERE\u002F[YOUR OPENAI API KEY]\u002Fg' openchatbi\u002Fconfig.yaml\npython run_streamlit_ui.py\n```\n\n### Configuration\n\n1. **Create configuration file**\n\nCopy the configuration template:\n```bash\ncp openchatbi\u002Fconfig.yaml.template openchatbi\u002Fconfig.yaml\n```\nOr create an empty YAML file.\n\n2. **Configure your LLMs:**\n\n```yaml\n# Select which provider to use\ndefault_llm: openai\n\n# Define one or more providers\nllm_providers:\n  openai:\n    default_llm:\n      class: langchain_openai.ChatOpenAI\n      params:\n        api_key: YOUR_API_KEY_HERE\n        model: gpt-4.1\n        temperature: 0.02\n        max_tokens: 8192\n\n    # Optional: Embedding model for vector-based retrieval and memory tools\n    # If not configured, BM25-based retrieval will be used, and the memory tools will not work\n    embedding_model:\n      class: langchain_openai.OpenAIEmbeddings\n      params:\n        api_key: YOUR_API_KEY_HERE\n        model: text-embedding-3-large\n        chunk_size: 1024\n```\n\n3. **Configure your data warehouse:**\n\n```yaml\norganization: Your Company\ndialect: presto\ndata_warehouse_config:\n  uri: \"presto:\u002F\u002Fuser@host:8080\u002Fcatalog\u002Fschema\"\n  include_tables:\n    - your_table_name\n  database_name: \"catalog.schema\"\n```\n\n### Running the Application\n\n1. **Invoking LangGraph:**\n\n```bash\nexport CONFIG_FILE=YOUR_CONFIG_FILE_PATH\n```\n\n```python\nfrom openchatbi import get_default_graph\n\ngraph = get_default_graph()\ngraph.invoke({\"messages\": [{\"role\": \"user\", \"content\": \"Show me ctr trends for the past 7 days\"}]},\n    config={\"configurable\": {\"thread_id\": \"1\"}})\n```\n\n```\n# System-generated SQL\nSELECT date, SUM(clicks)\u002FSUM(impression) AS ctr\nFROM ad_performance\nWHERE date >= CURRENT_DATE - 7 DAYS\nGROUP BY date\nORDER BY date;\n```\n\n2. **Sample Web UI:**\n\nStreamlit based UI:\n```bash\nstreamlit run sample_ui streamlit_ui.py\n```\n\nRun Gradio based UI:\n```bash\npython sample_ui\u002Fstreaming_ui.py\n```\n\n## Configuration Instructions\n\nThe configuration template is provided at `config.yaml.template`. Key configuration sections include:\n\n### Basic Settings\n\n- `organization`: Organization name (e.g., \"Your Company\")\n- `dialect`: Database dialect (e.g., \"presto\")\n- `bi_config_file`: Path to BI configuration file (e.g., \"example\u002Fbi.yaml\")\n\n### Catalog Store Configuration\n\n- `catalog_store`: Configuration for data catalog storage\n    - `store_type`: Storage type (e.g., \"file_system\")\n    - `data_path`: Path to catalog data stored by file system (e.g., \".\u002Fexample\")\n\n### Data Warehouse Configuration\n\n- `data_warehouse_config`: Database connection settings\n    - `uri`: Connection string for your database\n    - `include_tables`: List of tables to include in catalog, leave empty to include all tables\n    - `database_name`: Database name for catalog\n    - `token_service`: Token service URL (for data warehouse that need token authentication like Presto)\n    - `user_name` \u002F `password`: Token service credentials\n\n### LLM Configuration\n\nVarious LLMs are supported based on LangChain, see LangChain API\nDocument(https:\u002F\u002Fpython.langchain.com\u002Fapi_reference\u002Freference.html#integrations) for full list that support\n`chat_models`. You can configure different LLMs for different tasks:\n\n- `default_llm`: Primary language model for general tasks\n- `embedding_model`: (Optional) Model for embedding generation. If not configured, BM25-based text retrieval will be used as fallback, and the memory tools will not work\n- `text2sql_llm`: (Optional) Specialized model for SQL generation. If not configured, uses `default_llm`\n\nMultiple providers (optional):\n\n- Configure multiple providers under `llm_providers` and select with `default_llm: \u003Cprovider_name>`.\n- In `sample_ui\u002Fstreamlit_ui.py`, a provider dropdown appears when `llm_providers` is configured.\n- In `sample_api\u002Fasync_api.py`, pass `provider` in the `\u002Fchat\u002Fstream` request body.\n\nCommonly used LLM providers and their corresponding classes and installation commands:\n\n- **Anthropic**: `langchain_anthropic.ChatAnthropic`, `pip install langchain-anthropic`\n- **OpenAI**: `langchain_openai.ChatOpenAI`, `pip install langchain-openai`\n- **Azure OpenAI**: `langchain_openai.AzureChatOpenAI`, `pip install langchain-openai`\n- **Google Vertex AI**: `langchain_google_vertexai.ChatVertexAI`, `pip install langchain-google-vertexai`\n- **Bedrock**: `langchain_aws.ChatBedrock`, `pip install langchain-aws`\n- **Huggingface**: `langchain_huggingface.ChatHuggingFace`, `pip install langchain-huggingface`\n- **Deepseek**: `langchain_deepseek.ChatDeepSeek`, `pip install langchain-deepseek`\n- **Ollama**: `langchain_ollama.ChatOllama`, `pip install langchain-ollama`\n\n### Advanced Configuration\n\nOpenChatBI supports sophisticated customization through prompt engineering and catalog management features:\n\n- **Prompt Engineering Configuration**: Customize system prompts, business glossaries, and data warehouse introductions\n- **Data Catalog Management**: Configure table metadata, column descriptions, and SQL generation rules\n- **Business Rules**: Define table selection criteria and domain-specific SQL constraints\n- **Forecasting Service**: Configure the forecasting service url and prompt based on your own deployment \n\nFor detailed configuration options and examples, see the [Advanced Features](#advanced-features) section.\n\n## Architecture Overview\n\nOpenChatBI is built using a modular architecture with clear separation of concerns:\n\n1. **LangGraph Workflows**: Core orchestration using state machines for complex multi-step processes\n2. **Catalog Management**: Flexible data catalog system with intelligent retrieval (vector-based or BM25 fallback)\n3. **Text2SQL Pipeline**: Advanced natural language to SQL conversion with schema linking\n4. **Code Execution**: Sandboxed Python execution environment for data analysis\n5. **Tool Integration**: Extensible tool system for human interaction and knowledge search\n6. **Persistent Memory**: SQLite-based conversation state management\n\n## Technology Stack\n\n- **Frameworks**: LangGraph, LangChain, FastAPI, Gradio\u002FStreamlit\n- **Large Language Models**: Azure OpenAI (GPT-4), Anthropic Claude, OpenAI GPT models\n- **Text Retrieval**: Vector-based (with embedding models) or BM25-based (fallback without embeddings)\n- **Databases**: Presto, Trino, MySQL with SQLAlchemy support\n- **Code Execution**: Local Python, RestrictedPython, Docker containerization\n- **Development**: Python 3.11+, with modern tooling (Black, Ruff, MyPy, Pytest)\n- **Storage**: SQLite for conversation checkpointing, file system catalog storage\n\n### Agent Graph\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fzhongyu09_openchatbi_readme_1e1b25442bc2.png\" alt=\"Agent Graph\" width=\"800\">\n\n### Text2SQL Graph\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fzhongyu09_openchatbi_readme_0a140bfee623.png\" alt=\"Text2SQL Graph\" width=\"800\">\n\n## Project Structure\n\n```\nopenchatbi\u002F\n├── README.md                    # Project documentation\n├── pyproject.toml               # Modern Python project configuration\n├── Dockerfile.python-executor  # Docker image for isolated code execution\n├── run_tests.py                # Test runner script\n├── run_streamlit_ui.py         # Streamlit UI launcher\n├── openchatbi\u002F                 # Core application code\n│   ├── __init__.py             # Package initialization\n│   ├── config.yaml.template    # Configuration template\n│   ├── config_loader.py        # Configuration management\n│   ├── constants.py            # Application constants\n│   ├── agent_graph.py          # Main LangGraph workflow\n│   ├── graph_state.py          # State definition for workflows\n│   ├── context_config.py       # Context management configuration\n│   ├── context_manager.py      # Context window and token management\n│   ├── text_segmenter.py       # Text segmentation with jieba support\n│   ├── utils.py                # Utility functions and SimpleStore (BM25-based retrieval)\n│   ├── catalog\u002F                # Data catalog management\n│   │   ├── __init__.py         # Package initialization\n│   │   ├── catalog_loader.py   # Catalog loading logic\n│   │   ├── catalog_store.py    # Catalog storage interface\n│   │   ├── factory.py          # Catalog factory patterns\n│   │   ├── helper.py           # Catalog helper functions\n│   │   ├── retrival_helper.py  # Retrieval helper utilities\n│   │   ├── schema_retrival.py  # Schema retrieval logic\n│   │   ├── token_service.py    # Token service integration\n│   │   └── store\u002F              # Catalog storage implementations\n│   │       └── file_system.py  # File system-based catalog storage\n│   ├── code\u002F                   # Code execution framework\n│   │   ├── __init__.py         # Package initialization\n│   │   ├── executor_base.py    # Base executor interface\n│   │   ├── local_executor.py   # Local Python execution\n│   │   ├── restricted_local_executor.py # RestrictedPython execution\n│   │   └── docker_executor.py  # Docker-based isolated execution\n│   ├── llm\u002F                    # LLM integration layer\n│   │   ├── __init__.py         # Package initialization\n│   │   └── llm.py              # LLM management and retry logic\n│   ├── prompts\u002F                # Prompt templates and engineering\n│   │   ├── __init__.py         # Package initialization\n│   │   ├── agent_prompt.md     # Main agent prompts\n│   │   ├── extraction_prompt.md # Information extraction prompts\n│   │   ├── system_prompt.py    # System prompt management\n│   │   ├── summary_prompt.md   # Summary conversation prompts\n│   │   ├── table_selection_prompt.md # Table selection prompts\n│   │   ├── text2sql_prompt.md  # Text-to-SQL prompts\n│   │   └── sql_dialect\u002F        # SQL dialect-specific prompts\n│   ├── text2sql\u002F               # Text-to-SQL conversion pipeline\n│   │   ├── __init__.py         # Package initialization\n│   │   ├── data.py             # Data and retriever for Text-to-SQL\n│   │   ├── extraction.py       # Information extraction\n│   │   ├── generate_sql.py     # SQL generation and execution logic\n│   │   ├── schema_linking.py   # Schema linking process\n│   │   ├── sql_graph.py        # SQL generation LangGraph workflow\n│   │   ├── text2sql_utils.py   # Text2SQL utilities\n│   │   └── visualization.py    # Data visualization functions\n│   └── tool\u002F                   # LangGraph tools and functions\n│       ├── ask_human.py        # Human-in-the-loop interactions\n│       ├── memory.py           # Memory management tool\n│       ├── mcp_tools.py        # MCP (Model Context Protocol) integration\n│       ├── run_python_code.py  # Configurable Python code execution\n│       ├── save_report.py      # Report saving functionality\n│       ├── search_knowledge.py # Knowledge base search\n│       └── timeseries_forecast.py # Time series forecasting tool\n├── sample_api\u002F                 # API implementations\n│   └── async_api.py            # Asynchronous FastAPI example\n├── sample_ui\u002F                  # Web interface implementations\n│   ├── memory_ui.py            # Memory-enhanced UI interface\n│   ├── plotly_utils.py         # Plotly utilities and helpers\n│   ├── simple_ui.py            # Simple non-streaming Gradio UI\n│   ├── streaming_ui.py         # Streaming Gradio UI with real-time updates\n│   ├── streamlit_ui.py         # Streaming Streamlit UI with enhanced features\n│   └── style.py                # UI styling and CSS\n├── example\u002F                    # Example configurations and data\n│   ├── bi.yaml                 # BI configuration example\n│   ├── config.yaml             # Application config example\n│   ├── table_info.yaml         # Table information\n│   ├── table_columns.csv       # Table column registry\n│   ├── common_columns.csv      # Common column definitions\n│   ├── sql_example.yaml        # SQL examples for retrieval\n│   ├── table_selection_example.csv # Table selection examples\n│   └── tracking_orders.sqlite  # Sample SQLite database\n├── timeseries_forecasting\u002F     # Time series forecasting service\n│   ├── README.md               # Forecasting service documentation\n│   └── ...                     # Forecasting service implementation\n├── tests\u002F                      # Test suite\n│   ├── __init__.py             # Package initialization\n│   ├── conftest.py             # Test configuration\n│   ├── test_*.py               # Test modules for various components\n│   └── README.md               # Testing documentation\n├── docs\u002F                       # Documentation\n│   ├── source\u002F                 # Sphinx documentation source\n│   ├── build\u002F                  # Built documentation\n│   ├── Makefile                # Documentation build scripts\n│   └── make.bat                # Windows build script\n└── .github\u002F                    # GitHub workflows and templates\n    └── workflows\u002F              # CI\u002FCD workflows\n```\n\n## Advanced Features\n\n### Visualization configuration\nYou can choose rule-based or llm-based visualization or disable visualization.\n```yaml\n# Options: \"rule\" (rule-based), \"llm\" (LLM-based), or null (skip visualization)\nvisualization_mode: llm\n```\n\n### Prompt Engineering\n#### Basic Knowledge & Glossary\n\nYou can define basic knowledge and glossary in `example\u002Fbi.yaml`, for example:\n\n```yaml\nbasic_knowledge_glossary: |\n  # Basic Knowledge Introduction\n    The basic knowledge about your company and its business, including key concepts, metrics, and processes.\n  # Glossary\n    Common terms and their definitions used in your business context.\n```\n\n#### Data Warehouse Introduction\n\nYou can provide a brief introduction of your data warehouse in `example\u002Fbi.yaml`, for example:\n\n```yaml\ndata_warehouse_introduction: |\n  # Data Warehouse Introduction\n    This data warehouse is built on Presto and contains various tables related to XXXXX.\n    The main fact tables include XXXX metrics, while dimension tables include XXXXX.\n    The data is updated hourly and is used for reporting and analysis purposes.\n```\n\n#### Table Selection Rules\n\nYou can configure table selection rules in `example\u002Fbi.yaml`, for example:\n\n```yaml\ntable_selection_extra_rule: |\n  - All tables with is_valid can support both valid and invalid traffics\n```\n\n#### Custom SQL Rules\n\nYou can define your additional SQL Generation rules for tables in `example\u002Ftable_info.yaml`, for example:\n\n```yaml\nsql_rule: |\n  ### SQL Rules\n  - All event_date in the table are stored in **UTC**. If the user specifies a timezone (e.g., CET, PST), convert between timezones accordingly.\n\n```\n\n\n### Catalog Management\n\n#### Introduction\n\nHigh-quality catalog data is essential for accurate Text2SQL generation and data analysis. OpenChatBI automatically \ndiscovers and indexes data warehouse table structures while providing flexible management for business metadata, column \ndescriptions, and query optimization rules.\n\n#### Catalog Structure\n\nThe catalog system organizes metadata in a hierarchical structure:\n\n**Database Level**\n- Top-level container for all tables and schemas\n\n**Table Level**\n- `description`: Business functionality and purpose of the table\n- `selection_rule`: Guidelines for when and how to use this table in queries\n- `sql_rule`: Specific SQL generation rules and constraints for this table\n\n**Column Level**\n- **Required Fields**: Essential metadata for each column to enable effective Text2SQL generation\n  - `column_name`: Technical database column name\n  - `display_name`: Human-readable name for business users\n  - `alias`: Alternative names or abbreviations\n  - `type`: Data type (string, integer, date, etc.)\n  - `category`: Business category, dimension or metric\n  - `tag`: Additional labels for filtering and organization\n  - `description`: Detailed explanation of column purpose and usage\n- **Two Types** of Columns\n  - **Common Columns**: Columns with standardized business meanings shared across tables\n  - **Table-Specific Columns**: Columns with context-dependent meanings that vary between tables\n- **Derived Metrics**: Virtual metrics calculated from existing columns using SQL formulas\n  - Computed dynamically during query execution rather than stored as physical columns\n  - Examples: CTR (clicks\u002Fimpressions), conversion rates, profit margins\n  - Enable complex business calculations without pre-computing values\n  \n#### Loading Catalog from Database\n\nOpenChatBI can automatically discover and load table structures from your data warehouse:\n\n1. **Automatic Discovery**: Connects to your configured data warehouse and scans table schemas\n2. **Metadata Extraction**: Extracts column names, data types, and basic structural information\n3. **Incremental Updates**: Supports updating catalog data as your database schema evolves\n\nConfigure automatic catalog loading in your `config.yaml`:\n\n```yaml\ncatalog_store:\n  store_type: file_system\n  data_path: .\u002Fcatalog_data\ndata_warehouse_config:\n  include_tables:\n    - your_table_pattern\n  # Leave empty to include all accessible tables\n```\n\n#### File System Catalog Store\n\nThe file system catalog store organizes metadata across multiple files for maintainability and version control:\n\n**Core Table Information**\n- `table_info.yaml`: Comprehensive table metadata organized hierarchically (database → table → information)\n  - `type`: Table classification (e.g., \"fact\" for Fact Tables, \"dimension\" for Dimension Tables)\n  - `description`: Business functionality and purpose\n  - `selection_rule`: Usage guidelines in markdown list format (each line starts with `-`)\n  - `sql_rule`: SQL generation rules in markdown header format (each rule starts with `####`)\n  - `derived_metric`: Virtual metrics with calculation formulas, organized by groups:\n    ```md\n    #### Derived Ratio Metrics\n    Click-through Rate (alias CTR): SUM(clicks) \u002F SUM(impression)\n    Conversion Rate (alias CVR): SUM(conversions) \u002F SUM(clicks)\n    ```\n\n**Column Management**\n- `table_columns.csv`: Basic column registry with schema `db_name,table_name,column_name`\n- `table_spec_columns.csv`: Table-specific column metadata with full schema:\n  `db_name,table_name,column_name,display_name,alias,type,category,tag,description`\n- `common_columns.csv`: Shared column definitions across tables with schema:\n  `column_name,display_name,alias,type,category,tag,description`\n\n**Query Examples and Training Data**\n- `table_selection_example.csv`: Table selection training examples with schema `question,selected_tables`\n- `sql_example.yaml`: Query examples organized by database and table structure:\n  ```yaml\n  your_database:\n    ad_performance: |\n      Q: Show me CTR trends for the past 7 days\n      A: SELECT date, SUM(clicks)\u002FSUM(impressions) AS ctr\n         FROM ad_performance\n         WHERE date >= CURRENT_DATE - INTERVAL 7 DAY\n         GROUP BY date\n         ORDER BY date;\n  ```\n\n### Time Series Forecasting Service Setup\n\nOpenChatBI can integrate with a time series forecasting service for advanced predictive analytics. Follow these steps to set up the service:\n\n#### 1. Build and Run the Forecasting Service\n\nSee detailed instructions in [timeseries_forecasting\u002FREADME.md](timeseries_forecasting\u002FREADME.md)\n\nQuick start:\n```bash\ncd timeseries_forecasting\n.\u002Fbuild_and_run.sh\n```\n\n#### 2. Configure Tool Usage Rules\n\nIn your `bi.yaml`, add constraints for the timeseries_forecast tool, e.g. if you are using `timer-base-84m` model:\n```yaml\nextra_tool_use_rule: |\n  - timeseries_forecast tool requires at least 96 time points in input data. If no enough input data, set input_len to 96 to pad with zeros.\n```\n\n#### 3. Configure Service URL\n\nIn your `config.yaml`:\n```yaml\n# Time Series Forecasting Service Configuration\ntimeseries_forecasting_service_url: \"http:\u002F\u002Flocalhost:8765\"\n```\n\n**Important**: Adjust the URL based on your deployment scenario:\n- **Local development** (OpenChatBI on host, Forecasting service in Docker): `http:\u002F\u002Flocalhost:8765`\n- **Remote service**: `http:\u002F\u002Fyour-service-host:8765`\n\n\n#### 4. Verify Service Health\n\nTest the service is accessible:\n```bash\ncurl http:\u002F\u002Flocalhost:8765\u002Fhealth\n```\n\nExpected response:\n```json\n{\n  \"status\": \"healthy\",\n  \"model_initialized\": true,\n  \"uptime_seconds\": 123.45\n}\n``` \n\n### Python Code Execution Configuration\n\nOpenChatBI supports multiple execution environments for running Python code with different security and performance characteristics:\n\n```yaml\n# Python Code Execution Configuration\npython_executor: local  # Options: \"local\", \"restricted_local\", \"docker\"\n```\n\n#### Executor Types\n\n- **`local`** (Default)\n  - **Performance**: Fastest execution\n  - **Security**: Least secure (code runs in current Python process)\n  - **Capabilities**: Full Python capabilities and library access\n  - **Use Case**: Development environments, trusted code execution\n\n- **`restricted_local`**\n  - **Performance**: Moderate execution speed\n  - **Security**: Moderate security with RestrictedPython sandboxing\n  - **Capabilities**: Limited Python features (no imports, file access, etc.)\n  - **Use Case**: Semi-trusted environments with controlled execution\n\n- **`docker`**\n  - **Performance**: Slower due to container overhead\n  - **Security**: Highest security with complete process isolation\n  - **Capabilities**: Full Python capabilities within isolated container\n  - **Use Case**: Production environments, untrusted code execution\n  - **Requirements**: Docker must be installed and running\n\n#### Docker Executor Setup\n\nFor production deployments or when running untrusted code, the Docker executor provides complete isolation:\n\n1. **Install Docker**: Download and install Docker Desktop or Docker Engine\n2. **Configure executor**: Set `python_executor: docker` in your config\n3. **Automatic setup**: OpenChatBI will automatically build the required Docker image\n4. **Fallback behavior**: If Docker is unavailable, automatically falls back to local executor\n\n**Docker Executor Features**:\n- Pre-installed data science libraries (pandas, numpy, matplotlib, seaborn)\n- Network isolation for security\n- Automatic container cleanup\n- Resource isolation from host system\n\n## Development & Testing\n\n### Code Quality Tools\n\nThe project uses modern Python tooling for code quality:\n\n```bash\n# Format code\nuv run black .\n\n# Lint code  \nuv run ruff check .\n\n# Type checking\nuv run mypy openchatbi\u002F\n\n# Security scanning\nuv run bandit -r openchatbi\u002F\n```\n\n### Testing\n\nRun the test suite:\n\n```bash\n# Run all tests\nuv run pytest\n\n# Run with coverage\nuv run pytest --cov=openchatbi --cov-report=html\n\n# Run specific test files\nuv run pytest test\u002Ftest_generate_sql.py\nuv run pytest test\u002Ftest_agent_graph.py\n```\n\n### Pre-commit Hooks\n\nInstall pre-commit hooks for automatic code quality checks:\n\n```bash\nuv run pre-commit install\n```\n\n## Contribution Guidelines\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature\u002FfooBar`)\n3. Commit your changes (`git commit -am 'Add some fooBar'`)\n4. Push to the branch (`git push origin feature\u002FfooBar`)\n5. Create a new Pull Request\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details\n\n## Contact & Support\n\n- **Author**: Yu Zhong ([zhongyu8@gmail.com](mailto:zhongyu8@gmail.com))\n- **Repository**: [github.com\u002Fzhongyu09\u002Fopenchatbi](https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi)\n- **Issues**: [Report bugs and feature requests](https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi\u002Fissues)\n","# OpenChatBI\n\nOpenChatBI 是一个开源的、基于聊天的智能商业智能（BI）工具，由大语言模型（LLM）驱动，旨在帮助用户通过自然语言对话来查询、分析和可视化数据。它构建在 LangGraph 和 LangChain 生态系统之上，提供聊天代理和工作流，支持自然语言到 SQL 的转换以及简化的数据分析。\n\n加入 Slack 频道进行讨论：https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fopenchatbicommunity\u002Fshared_invite\u002Fzt-3jpzpx9mv-Sk88RxpO4Up0L~YTZYf4GQ\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fzhongyu09_openchatbi_readme_3609c67829db.gif\" alt=\"Demo\" width=\"800\">\n\n## 核心功能\n\n1. **自然语言交互**：通过自然语言提问获取数据分析结果\n2. **自动 SQL 生成**：利用高级 text2sql 工作流，结合 Schema 链接和精心组织的提示工程（Prompt Engineering），将自然语言查询转换为 SQL 语句\n3. **数据可视化**：生成直观的数据可视化图表（通过 Plotly）\n4. **数据目录管理**：自动发现并索引数据库表结构，支持灵活的目录存储后端，具备基于向量或 BM25 的检索能力，并可轻松维护表和列的业务解释以及优化提示词（Prompts）。\n5. **时间序列预测**：内部部署的预测模型可作为工具被调用\n6. **代码执行**：执行 Python 代码以进行数据分析和可视化\n7. **交互式问题解决**：当信息不完整时，主动询问用户获取更多上下文\n8. **持久化记忆**：基于 LangGraph 检查点机制（Checkpointing）的对话管理和用户特征记忆\n9. **MCP 支持**：通过配置集成 MCP 工具\n10. **知识库集成**：结合基于目录的知识检索和外部知识库检索（通过 MCP 工具）来回答复杂问题\n11. **Web UI 界面**：提供 2 个示例 UI：使用 Gradio 和 Streamlit 构建的简单和流式 Web 界面，易于与其他 Web 应用集成\n\n## 路线图\n\n1. **异常检测算法**：时间序列异常检测\n2. **根本原因分析算法**：用于异常调查的多维下钻能力\n\n# 开始使用\n\n## 安装与设置\n\n### 前置条件\n\n- Python 3.11 或更高版本\n- 访问支持的 LLM 提供商（OpenAI, Anthropic 等）\n- 数据仓库（数据库）凭证（如 Presto, PostgreSQL, MySQL 等）\n- （可选）用于基于向量检索的嵌入模型（Embedding model）- 如果不可用，将使用基于 BM25 的检索\n- （可选）Docker - 仅在 `docker` 执行器模式下需要\n\n**中文文本分词说明**：为了更好的中文文本检索，系统使用 `jieba` 进行分词。然而，`jieba` 与 Python 3.12+ 不兼容。在 Python 3.12 及更高版本上，系统会自动回退到基于简单标点的中文文本分词。\n\n### 安装\n\n1. **使用 uv（推荐）：**\n\n```bash\ngit clone git@github.com:zhongyu09\u002Fopenchatbi\nuv sync\n```\n\n2. **使用 pip：**\n\n```bash\npip install openchatbi\n```\n\n3. **用于开发：**\n\n```bash\ngit clone git@github.com:zhongyu09\u002Fopenchatbi\nuv sync --group dev\n```\n\n可选：如果你想使用 `pysqlite3`（较新的 SQLite 构建版本），可以手动安装它。如果构建失败，请先安装 SQLite：\n\n在 macOS 上，尝试使用 Homebrew 安装 sqlite：\n```bash\nbrew install sqlite\nbrew info sqlite\nexport LDFLAGS=\"-L\u002Fopt\u002Fhomebrew\u002Fopt\u002Fsqlite\u002Flib\"\nexport CPPFLAGS=\"-I\u002Fopt\u002Fhomebrew\u002Fopt\u002Fsqlite\u002Finclude\"\n```\n在 Amazon Linux \u002F RHEL \u002F CentOS 上：\n```bash\nsudo yum install sqlite-devel\n```\n在 Ubuntu \u002F Debian 上：\n```bash\nsudo apt-get update\nsudo apt-get install libsqlite3-dev\n```\n\n### 运行演示\n\n使用 spider 数据集的**示例数据集**运行演示。你需要提供 \"YOUR OPENAI API KEY\" 或更改配置以使用其他 LLM 提供商。\n\n**注意**：演示示例包含嵌入模型配置。如果你想在不使用嵌入模型的情况下运行，可以从配置中删除 `embedding_model` 部分 - 系统将自动使用 BM25 检索。\n\n```bash\ncp example\u002Fconfig.yaml openchatbi\u002Fconfig.yaml\nsed -i 's\u002FYOUR_API_KEY_HERE\u002F[YOUR OPENAI API KEY]\u002Fg' openchatbi\u002Fconfig.yaml\npython run_streamlit_ui.py\n```\n\n### 配置\n\n1. **创建配置文件**\n\n复制配置模板：\n```bash\ncp openchatbi\u002Fconfig.yaml.template openchatbi\u002Fconfig.yaml\n```\n或者创建一个空的 YAML 文件。\n\n2. **配置你的 LLMs：**\n\n```yaml\n# Select which provider to use\ndefault_llm: openai\n\n# Define one or more providers\nllm_providers:\n  openai:\n    default_llm:\n      class: langchain_openai.ChatOpenAI\n      params:\n        api_key: YOUR_API_KEY_HERE\n        model: gpt-4.1\n        temperature: 0.02\n        max_tokens: 8192\n\n    # Optional: Embedding model for vector-based retrieval and memory tools\n    # If not configured, BM25-based retrieval will be used, and the memory tools will not work\n    embedding_model:\n      class: langchain_openai.OpenAIEmbeddings\n      params:\n        api_key: YOUR_API_KEY_HERE\n        model: text-embedding-3-large\n        chunk_size: 1024\n```\n\n3. **配置你的数据仓库：**\n\n```yaml\norganization: Your Company\ndialect: presto\ndata_warehouse_config:\n  uri: \"presto:\u002F\u002Fuser@host:8080\u002Fcatalog\u002Fschema\"\n  include_tables:\n    - your_table_name\n  database_name: \"catalog.schema\"\n```\n\n### 运行应用程序\n\n1. **调用 LangGraph：**\n\n```bash\nexport CONFIG_FILE=YOUR_CONFIG_FILE_PATH\n```\n\n```python\nfrom openchatbi import get_default_graph\n\ngraph = get_default_graph()\ngraph.invoke({\"messages\": [{\"role\": \"user\", \"content\": \"Show me ctr trends for the past 7 days\"}]},\n    config={\"configurable\": {\"thread_id\": \"1\"}})\n```\n\n```\n# System-generated SQL\nSELECT date, SUM(clicks)\u002FSUM(impression) AS ctr\nFROM ad_performance\nWHERE date >= CURRENT_DATE - 7 DAYS\nGROUP BY date\nORDER BY date;\n```\n\n2. **示例 Web UI：**\n\n基于 Streamlit 的 UI：\n```bash\nstreamlit run sample_ui streamlit_ui.py\n```\n\n运行基于 Gradio 的 UI：\n```bash\npython sample_ui\u002Fstreaming_ui.py\n```\n\n## 配置说明\n\n配置模板位于 `config.yaml.template`。主要配置部分包括：\n\n### 基本设置\n\n- `organization`：组织名称（例如：\"Your Company\"）\n- `dialect`：数据库方言（例如：\"presto\"）\n- `bi_config_file`：BI 配置文件路径（例如：\"example\u002Fbi.yaml\"）\n\n### 目录存储配置\n\n- `catalog_store`：数据目录存储配置\n    - `store_type`：存储类型（例如：\"file_system\"）\n    - `data_path`：文件系统存储的目录数据路径（例如：\".\u002Fexample\"）\n\n### 数据仓库配置\n\n- `data_warehouse_config`: 数据库连接设置\n    - `uri`: 数据库的连接字符串\n    - `include_tables`: 要包含在目录中的表列表，留空则包含所有表\n    - `database_name`: 目录的数据库名称\n    - `token_service`: Token 服务 URL（适用于需要 Token 认证的数据仓库，如 Presto）\n    - `user_name` \u002F `password`: Token 服务凭证\n\n### LLM (大型语言模型) 配置\n\n基于 LangChain 支持多种 LLM，详见 LangChain API 文档 (https:\u002F\u002Fpython.langchain.com\u002Fapi_reference\u002Freference.html#integrations) 以获取完整的支持 `chat_models` 列表。您可以为不同任务配置不同的 LLM：\n\n- `default_llm`: 用于一般任务的主要语言模型\n- `embedding_model`: （可选）用于生成嵌入 (embedding) 的模型。如果未配置，将使用基于 BM25 的文本检索作为回退，且记忆工具将无法工作\n- `text2sql_llm`: （可选）用于 SQL 生成的专用模型。如果未配置，则使用 `default_llm`\n\n多提供商配置（可选）：\n\n- 在 `llm_providers` 下配置多个提供商，并通过 `default_llm: \u003Cprovider_name>` 进行选择。\n- 在 `sample_ui\u002Fstreamlit_ui.py` 中，当配置了 `llm_providers` 时会出现提供商下拉菜单。\n- 在 `sample_api\u002Fasync_api.py` 中，在 `\u002Fchat\u002Fstream` 请求体中传递 `provider`。\n\n常用 LLM 提供商及其对应的类和安装命令：\n\n- **Anthropic**: `langchain_anthropic.ChatAnthropic`, `pip install langchain-anthropic`\n- **OpenAI**: `langchain_openai.ChatOpenAI`, `pip install langchain-openai`\n- **Azure OpenAI**: `langchain_openai.AzureChatOpenAI`, `pip install langchain-openai`\n- **Google Vertex AI**: `langchain_google_vertexai.ChatVertexAI`, `pip install langchain-google-vertexai`\n- **Bedrock**: `langchain_aws.ChatBedrock`, `pip install langchain-aws`\n- **Huggingface**: `langchain_huggingface.ChatHuggingFace`, `pip install langchain-huggingface`\n- **Deepseek**: `langchain_deepseek.ChatDeepSeek`, `pip install langchain-deepseek`\n- **Ollama**: `langchain_ollama.ChatOllama`, `pip install langchain-ollama`\n\n### 高级配置\n\nOpenChatBI 通过提示工程 (Prompt Engineering) 和目录管理功能支持复杂的自定义：\n\n- **提示工程配置**：自定义系统提示词、业务术语表和数据库介绍\n- **数据目录管理**：配置表元数据、列描述和 SQL 生成规则\n- **业务规则**：定义表选择标准和特定领域的 SQL 约束\n- **预测服务**：根据您的部署配置预测服务 URL 和提示词\n\n有关详细的配置选项和示例，请参阅 [高级功能](#advanced-features) 部分。\n\n## 架构概览\n\nOpenChatBI 采用模块化架构构建，职责分离清晰：\n\n1. **LangGraph 工作流**：使用状态机 (State Machines) 进行核心编排，处理复杂的多步骤流程\n2. **目录管理**：灵活的数据目录系统，支持智能检索（基于向量或 BM25 回退）\n3. **Text2SQL 管道**：带有模式链接 (Schema Linking) 的高级自然语言到 SQL 转换\n4. **代码执行**：用于数据分析的沙箱 (Sandboxed) Python 执行环境\n5. **工具集成**：可扩展的工具系统，用于人机交互和知识搜索\n6. **持久化记忆**：基于 SQLite 的对话状态检查点 (Checkpointing) 管理\n\n## 技术栈\n\n- **框架**：LangGraph, LangChain, FastAPI, Gradio\u002FStreamlit\n- **大语言模型**：Azure OpenAI (GPT-4), Anthropic Claude, OpenAI GPT 模型\n- **文本检索**：基于向量 (Vector-based)（带嵌入模型）或基于 BM25（无嵌入时的回退）\n- **数据库**：Presto, Trino, MySQL，支持 SQLAlchemy\n- **代码执行**：本地 Python, RestrictedPython, Docker 容器化\n- **开发**：Python 3.11+，使用现代工具链（Black, Ruff, MyPy, Pytest）\n- **存储**：SQLite 用于对话检查点，文件系统目录存储\n\n### 代理图\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fzhongyu09_openchatbi_readme_1e1b25442bc2.png\" alt=\"代理图\" width=\"800\">\n\n### Text2SQL 图\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fzhongyu09_openchatbi_readme_0a140bfee623.png\" alt=\"Text2SQL 图\" width=\"800\">\n\n## 项目结构\n\n```\nopenchatbi\u002F\n├── README.md                    # Project documentation\n├── pyproject.toml               # Modern Python project configuration\n├── Dockerfile.python-executor  # Docker image for isolated code execution\n├── run_tests.py                # Test runner script\n├── run_streamlit_ui.py         # Streamlit UI launcher\n├── openchatbi\u002F                 # Core application code\n│   ├── __init__.py             # Package initialization\n│   ├── config.yaml.template    # Configuration template\n│   ├── config_loader.py        # Configuration management\n│   ├── constants.py            # Application constants\n│   ├── agent_graph.py          # Main LangGraph workflow\n│   ├── graph_state.py          # State definition for workflows\n│   ├── context_config.py       # Context management configuration\n│   ├── context_manager.py      # Context window and token management\n│   ├── text_segmenter.py       # Text segmentation with jieba support\n│   ├── utils.py                # Utility functions and SimpleStore (BM25-based retrieval)\n│   ├── catalog\u002F                # Data catalog management\n│   │   ├── __init__.py         # Package initialization\n│   │   ├── catalog_loader.py   # Catalog loading logic\n│   │   ├── catalog_store.py    # Catalog storage interface\n│   │   ├── factory.py          # Catalog factory patterns\n│   │   ├── helper.py           # Catalog helper functions\n│   │   ├── retrival_helper.py  # Retrieval helper utilities\n│   │   ├── schema_retrival.py  # Schema retrieval logic\n│   │   ├── token_service.py    # Token service integration\n│   │   └── store\u002F              # Catalog storage implementations\n│   │       └── file_system.py  # File system-based catalog storage\n│   ├── code\u002F                   # Code execution framework\n│   │   ├── __init__.py         # Package initialization\n│   │   ├── executor_base.py    # Base executor interface\n│   │   ├── local_executor.py   # Local Python execution\n│   │   ├── restricted_local_executor.py # RestrictedPython execution\n│   │   └── docker_executor.py  # Docker-based isolated execution\n│   ├── llm\u002F                    # LLM integration layer\n│   │   ├── __init__.py         # Package initialization\n│   │   └── llm.py              # LLM management and retry logic\n│   ├── prompts\u002F                # Prompt templates and engineering\n│   │   ├── __init__.py         # Package initialization\n│   │   ├── agent_prompt.md     # Main agent prompts\n│   │   ├── extraction_prompt.md # Information extraction prompts\n│   │   ├── system_prompt.py    # System prompt management\n│   │   ├── summary_prompt.md   # Summary conversation prompts\n│   │   ├── table_selection_prompt.md # Table selection prompts\n│   │   ├── text2sql_prompt.md  # Text-to-SQL prompts\n│   │   └── sql_dialect\u002F        # SQL dialect-specific prompts\n│   ├── text2sql\u002F               # Text-to-SQL conversion pipeline\n│   │   ├── __init__.py         # Package initialization\n│   │   ├── data.py             # Data and retriever for Text-to-SQL\n│   │   ├── extraction.py       # Information extraction\n│   │   ├── generate_sql.py     # SQL generation and execution logic\n│   │   ├── schema_linking.py   # Schema linking process\n│   │   ├── sql_graph.py        # SQL generation LangGraph workflow\n│   │   ├── text2sql_utils.py   # Text2SQL utilities\n│   │   └── visualization.py    # Data visualization functions\n│   └── tool\u002F                   # LangGraph tools and functions\n│       ├── ask_human.py        # Human-in-the-loop interactions\n│       ├── memory.py           # Memory management tool\n│       ├── mcp_tools.py        # MCP (Model Context Protocol) integration\n│       ├── run_python_code.py  # Configurable Python code execution\n│       ├── save_report.py      # Report saving functionality\n│       ├── search_knowledge.py # Knowledge base search\n│       └── timeseries_forecast.py # Time series forecasting tool\n├── sample_api\u002F                 # API implementations\n│   └── async_api.py            # Asynchronous FastAPI example\n├── sample_ui\u002F                  # Web interface implementations\n│   ├── memory_ui.py            # Memory-enhanced UI interface\n│   ├── plotly_utils.py         # Plotly utilities and helpers\n│   ├── simple_ui.py            # Simple non-streaming Gradio UI\n│   ├── streaming_ui.py         # Streaming Gradio UI with real-time updates\n│   ├── streamlit_ui.py         # Streaming Streamlit UI with enhanced features\n│   └── style.py                # UI styling and CSS\n├── example\u002F                    # Example configurations and data\n│   ├── bi.yaml                 # BI configuration example\n│   ├── config.yaml             # Application config example\n│   ├── table_info.yaml         # Table information\n│   ├── table_columns.csv       # Table column registry\n│   ├── common_columns.csv      # Common column definitions\n│   ├── sql_example.yaml        # SQL examples for retrieval\n│   ├── table_selection_example.csv # Table selection examples\n│   └── tracking_orders.sqlite  # Sample SQLite database\n├── timeseries_forecasting\u002F     # Time series forecasting service\n│   ├── README.md               # Forecasting service documentation\n│   └── ...                     # Forecasting service implementation\n├── tests\u002F                      # Test suite\n│   ├── __init__.py             # Package initialization\n│   ├── conftest.py             # Test configuration\n│   ├── test_*.py               # Test modules for various components\n│   └── README.md               # Testing documentation\n├── docs\u002F                       # Documentation\n│   ├── source\u002F                 # Sphinx documentation source\n│   ├── build\u002F                  # Built documentation\n│   ├── Makefile                # Documentation build scripts\n│   └── make.bat                # Windows build script\n└── .github\u002F                    # GitHub workflows and templates\n    └── workflows\u002F              # CI\u002FCD workflows\n```\n\n## 高级功能\n\n### 可视化配置\n您可以选择基于规则的、基于 LLM (大语言模型) 的可视化，或者禁用可视化。\n```yaml\n# Options: \"rule\" (rule-based), \"llm\" (LLM-based), or null (skip visualization)\nvisualization_mode: llm\n```\n\n### 提示工程\n#### 基础知识与术语表\n\n您可以在 `example\u002Fbi.yaml` 中定义基础知识和术语表，例如：\n\n```yaml\nbasic_knowledge_glossary: |\n  # Basic Knowledge Introduction\n    The basic knowledge about your company and its business, including key concepts, metrics, and processes.\n  # Glossary\n    Common terms and their definitions used in your business context.\n```\n\n#### 数据仓库介绍\n\n您可以在 `example\u002Fbi.yaml` 中提供数据仓库的简要介绍，例如：\n\n```yaml\ndata_warehouse_introduction: |\n  # Data Warehouse Introduction\n    This data warehouse is built on Presto and contains various tables related to XXXXX.\n    The main fact tables include XXXX metrics, while dimension tables include XXXXX.\n    The data is updated hourly and is used for reporting and analysis purposes.\n```\n\n#### 表选择规则\n\n您可以在 `example\u002Fbi.yaml` 中配置表选择规则，例如：\n\n```yaml\ntable_selection_extra_rule: |\n  - All tables with is_valid can support both valid and invalid traffics\n```\n\n#### 自定义 SQL 规则\n\n您可以在 `example\u002Ftable_info.yaml` 中定义针对表的额外 SQL 生成规则，例如：\n\n```yaml\nsql_rule: |\n  ### SQL Rules\n  - All event_date in the table are stored in **UTC**. If the user specifies a timezone (e.g., CET, PST), convert between timezones accordingly.\n\n```\n\n\n### 目录管理\n\n#### 简介\n\n高质量的目录数据对于准确的 Text2SQL（文本转 SQL）生成和数据分析至关重要。OpenChatBI 自动发现并索引数据仓库表结构，同时为业务元数据、列描述和查询优化规则提供灵活的管理。\n\n#### 目录结构\n\n目录系统以分层结构组织元数据：\n\n**数据库级别**\n- 所有表和模式的顶层容器\n\n**表级别**\n- `description`: 表的功能和业务目的\n- `selection_rule`: 在查询中使用此表的时机和方法指南\n- `sql_rule`: 此表特定的 SQL 生成规则和约束\n\n**列级别**\n- **必填字段**：启用有效 Text2SQL 生成所需的每列关键元数据\n  - `column_name`: 技术数据库列名\n  - `display_name`: 供业务用户阅读的易读名称\n  - `alias`: 替代名称或缩写\n  - `type`: 数据类型（字符串、整数、日期等）\n  - `category`: 业务类别、维度或指标\n  - `tag`: 用于过滤和组织的附加标签\n  - `description`: 列用途和用法的详细解释\n- **两种类型**的列\n  - **通用列**：跨表共享标准化业务含义的列\n  - **特定于表的列**：具有上下文相关含义且在不同表之间变化的列\n- **衍生指标**：使用 SQL 公式从现有列计算的虚拟指标\n  - 在查询执行期间动态计算，而不是作为物理列存储\n  - 示例：CTR（点击量\u002F展示量）、转化率、利润率\n  - 无需预计算值即可实现复杂的业务计算\n  \n#### 从数据库加载目录\n\nOpenChatBI 可以自动发现并从您的数据仓库加载表结构：\n\n1. **自动发现**：连接到您配置的数据仓库并扫描表模式 (Schema)\n2. **元数据提取**：提取列名、数据类型和基本结构信息\n3. **增量更新**：支持随着数据库模式演变而更新目录数据\n\n在您的 `config.yaml` 中配置自动目录加载：\n\n```yaml\ncatalog_store:\n  store_type: file_system\n  data_path: .\u002Fcatalog_data\ndata_warehouse_config:\n  include_tables:\n    - your_table_pattern\n  # Leave empty to include all accessible tables\n```\n\n#### 文件系统目录存储\n\n文件系统目录存储将元数据组织在多个文件中，以便于维护和版本控制：\n\n**核心表信息**\n- `table_info.yaml`: 分层组织的综合表元数据（数据库 → 表 → 信息）\n  - `type`: 表分类（例如：\"fact\" 表示事实表，\"dimension\" 表示维度表）\n  - `description`: 业务功能和目的\n  - `selection_rule`: 使用指南，采用 markdown 列表格式（每行以 `-` 开头）\n  - `sql_rule`: SQL 生成规则，采用 markdown 标题格式（每条规则以 `####` 开头）\n  - `derived_metric`: 带有计算公式的虚拟指标，按组组织：\n    ```md\n    #### Derived Ratio Metrics\n    Click-through Rate (alias CTR): SUM(clicks) \u002F SUM(impression)\n    Conversion Rate (alias CVR): SUM(conversions) \u002F SUM(clicks)\n    ```\n\n**列管理**\n- `table_columns.csv`: 基本列注册表，模式为 `db_name,table_name,column_name`\n- `table_spec_columns.csv`: 特定于表的列元数据，完整模式为：\n  `db_name,table_name,column_name,display_name,alias,type,category,tag,description`\n- `common_columns.csv`: 跨表共享的列定义，模式为：\n  `column_name,display_name,alias,type,category,tag,description`\n\n**查询示例和训练数据**\n- `table_selection_example.csv`: 表选择训练示例，模式为 `question,selected_tables`\n- `sql_example.yaml`: 按数据库和表结构组织的查询示例：\n  ```yaml\n  your_database:\n    ad_performance: |\n      Q: Show me CTR trends for the past 7 days\n      A: SELECT date, SUM(clicks)\u002FSUM(impressions) AS ctr\n         FROM ad_performance\n         WHERE date >= CURRENT_DATE - INTERVAL 7 DAY\n         GROUP BY date\n         ORDER BY date;\n  ```\n\n### 时间序列预测服务设置\n\nOpenChatBI 可以与时间序列预测服务集成，以实现高级预测分析。请按照以下步骤设置服务：\n\n#### 1. 构建并运行预测服务\n\n详见 [timeseries_forecasting\u002FREADME.md](timeseries_forecasting\u002FREADME.md) 中的详细说明\n\n快速开始：\n```bash\ncd timeseries_forecasting\n.\u002Fbuild_and_run.sh\n```\n\n#### 2. 配置工具使用规则\n\n在您的 `bi.yaml` 中，为 timeseries_forecast 工具添加约束，例如如果您使用的是 `timer-base-84m` 模型：\n```yaml\nextra_tool_use_rule: |\n  - timeseries_forecast tool requires at least 96 time points in input data. If no enough input data, set input_len to 96 to pad with zeros.\n```\n\n#### 3. 配置服务 URL\n\n在您的 `config.yaml` 中：\n```yaml\n# Time Series Forecasting Service Configuration\ntimeseries_forecasting_service_url: \"http:\u002F\u002Flocalhost:8765\"\n```\n\n**重要**：根据您的部署场景调整 URL：\n- **本地开发**（OpenChatBI 在主机上，预测服务在 Docker 中）：`http:\u002F\u002Flocalhost:8765`\n- **远程服务**：`http:\u002F\u002Fyour-service-host:8765`\n\n\n#### 4. 验证服务健康状态\n\n测试服务是否可访问：\n```bash\ncurl http:\u002F\u002Flocalhost:8765\u002Fhealth\n```\n\n预期响应：\n```json\n{\n  \"status\": \"healthy\",\n  \"model_initialized\": true,\n  \"uptime_seconds\": 123.45\n}\n```\n\n### Python 代码执行配置\n\nOpenChatBI 支持多种执行环境 (Execution Environments)，用于运行具有不同安全性和性能特征的 Python 代码：\n\n```yaml\n# Python Code Execution Configuration\npython_executor: local  # Options: \"local\", \"restricted_local\", \"docker\"\n```\n\n#### 执行器类型 (Executor Types)\n\n- **`local`** (默认)\n  - **性能**：执行速度最快\n  - **安全性**：安全性最低（代码在当前 Python 进程中运行）\n  - **能力**：完整的 Python 功能及库访问权限\n  - **适用场景**：开发环境、可信代码执行\n\n- **`restricted_local`**\n  - **性能**：中等执行速度\n  - **安全性**：中等安全性，使用 RestrictedPython 沙箱隔离 (sandboxing)\n  - **能力**：有限的 Python 功能（无导入、文件访问等）\n  - **适用场景**：半可信环境下的受控执行\n\n- **`docker`**\n  - **性能**：由于容器开销导致较慢\n  - **安全性**：最高安全性，具备完整的进程隔离\n  - **能力**：在隔离容器内拥有完整的 Python 功能\n  - **适用场景**：生产环境、不可信代码执行\n  - **要求**：必须安装并运行 Docker (容器引擎)\n\n#### Docker 执行器设置\n\n对于生产部署或运行不可信代码时，Docker 执行器提供完全隔离：\n\n1. **安装 Docker**：下载并安装 Docker Desktop 或 Docker Engine\n2. **配置执行器**：在配置中设置 `python_executor: docker`\n3. **自动设置**：OpenChatBI 将自动构建所需的 Docker 镜像\n4. **回退行为**：如果 Docker 不可用，则自动回退到本地执行器\n\n**Docker 执行器特性**：\n- 预装数据科学库（pandas, numpy, matplotlib, seaborn）\n- 网络隔离以保障安全\n- 自动清理容器\n- 与主机系统的资源隔离\n\n## 开发与测试\n\n### 代码质量工具\n\n本项目使用现代 Python 工具链进行代码质量控制（例如 uv Python 运行时、black 代码格式化工具、ruff 代码检查工具等）：\n\n```bash\n# Format code\nuv run black .\n\n# Lint code  \nuv run ruff check .\n\n# Type checking\nuv run mypy openchatbi\u002F\n\n# Security scanning\nuv run bandit -r openchatbi\u002F\n```\n\n### 测试\n\n运行测试套件（pytest 测试框架）：\n\n```bash\n# Run all tests\nuv run pytest\n\n# Run with coverage\nuv run pytest --cov=openchatbi --cov-report=html\n\n# Run specific test files\nuv run pytest test\u002Ftest_generate_sql.py\nuv run pytest test\u002Ftest_agent_graph.py\n```\n\n### 前置提交钩子 (pre-commit hooks)\n\n安装前置提交钩子以进行自动代码质量检查：\n\n```bash\nuv run pre-commit install\n```\n\n## 贡献指南\n\n1. Fork (仓库分叉) 该仓库\n2. 创建功能分支 (`git checkout -b feature\u002FfooBar`)\n3. 提交你的更改 (`git commit -am 'Add some fooBar'`)\n4. 推送到分支 (`git push origin feature\u002FfooBar`)\n5. 创建新的拉取请求 (Pull Request)\n\n## 许可证\n\n本项目采用 MIT 许可证授权 - 详见 [LICENSE](LICENSE) 文件\n\n## 联系与支持\n\n- **作者**：Yu Zhong ([zhongyu8@gmail.com](mailto:zhongyu8@gmail.com))\n- **仓库**：[github.com\u002Fzhongyu09\u002Fopenchatbi](https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi)\n- **问题反馈**：[报告错误和功能请求](https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi\u002Fissues)","# OpenChatBI 快速上手指南\n\nOpenChatBI 是一款基于大语言模型的开源智能 BI 工具，支持通过自然语言对话查询、分析和可视化数据。本指南将帮助您快速搭建并运行该工具。\n\n## 1. 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n- **Python 版本**: 3.11 或更高版本\n- **大模型服务**: 拥有支持的 LLM 提供商访问权限（如 OpenAI, Anthropic 等）及 API Key\n- **数据仓库**: 准备好数据库凭证（如 Presto, PostgreSQL, MySQL 等）\n- **可选组件**:\n  - 向量检索模型（Embedding Model），若未配置则自动回退至 BM25 检索\n  - Docker（仅在使用 Docker 执行模式时需要）\n\n> **注意**: 关于中文分词，系统默认使用 `jieba`。但在 Python 3.12+ 环境下，`jieba` 不兼容，系统将自动回退到基于标点的简单分词。\n\n## 2. 安装步骤\n\n推荐使用 `uv` 进行依赖管理，也可使用 `pip`。\n\n### 方式一：源码运行（推荐用于演示和开发）\n\n```bash\ngit clone git@github.com:zhongyu09\u002Fopenchatbi\ncd openchatbi\nuv sync\n```\n\n### 方式二：直接安装\n\n```bash\npip install openchatbi\n```\n\n### 配置初始化\n\n复制配置文件模板并修改：\n\n```bash\ncp openchatbi\u002Fconfig.yaml.template openchatbi\u002Fconfig.yaml\n```\n\n编辑 `config.yaml`，主要配置以下两项：\n\n1. **LLM 配置**: 填入您的 API Key 和模型信息。\n   ```yaml\n   llm_providers:\n     openai:\n       default_llm:\n         class: langchain_openai.ChatOpenAI\n         params:\n           api_key: YOUR_API_KEY_HERE\n           model: gpt-4.1\n   ```\n2. **数据仓库配置**: 填入数据库连接 URI 和表名。\n   ```yaml\n   dialect: presto\n   data_warehouse_config:\n     uri: \"presto:\u002F\u002Fuser@host:8080\u002Fcatalog\u002Fschema\"\n     include_tables:\n       - your_table_name\n   ```\n\n## 3. 基本使用\n\n### 启动 Web 界面\n\n运行 Streamlit 界面即可开始对话分析：\n\n```bash\npython run_streamlit_ui.py\n```\n\n或者使用 Gradio 界面：\n\n```bash\npython sample_ui\u002Fstreaming_ui.py\n```\n\n### 代码调用示例\n\n如果您需要在代码中集成 OpenChatBI，可通过 LangGraph 直接调用：\n\n```python\nfrom openchatbi import get_default_graph\n\ngraph = get_default_graph()\nresult = graph.invoke(\n    {\"messages\": [{\"role\": \"user\", \"content\": \"Show me ctr trends for the past 7 days\"}]},\n    config={\"configurable\": {\"thread_id\": \"1\"}}\n)\n```\n\n系统将自动生成对应的 SQL 语句并返回分析结果。","某电商公司运营经理需要在早会前快速复盘昨日各区域的销售表现及用户活跃度，以往高度依赖数据分析师协助取数。\n\n### 没有 openchatbi 时\n- 每次提取销售明细都需要向数据团队提交工单，响应周期通常超过 4 小时，延误决策时机。\n- 业务人员不懂 SQL 语法，面对复杂的表结构关联往往无从下手，只能等待技术人员处理。\n- 获取原始数据后需手动导入 Excel 制作图表，耗时且容易在复制粘贴过程中出现数据错误。\n- 缺乏上下文记忆，无法基于上一轮分析结果进行连续的深度追问，分析过程碎片化。\n\n### 使用 openchatbi 后\n- 直接在界面输入“展示昨日华东区 Top10 商品销量”，openchatbi 自动完成 Text2SQL 转换并执行查询。\n- openchatbi 根据数据特征自动调用 Plotly 生成可视化报表，直观呈现趋势变化，无需手动绘图。\n- 利用 openchatbi 内置数据目录管理功能，自动识别业务术语与数据库字段的映射关系，降低理解门槛。\n- 基于 openchatbi 的持久化记忆支持多轮对话，可随时追问“剔除促销因素后的真实增长率”进行归因分析。\n\nopenchatbi 通过自然语言交互打通了业务需求与底层数据之间的壁垒，让非技术人员也能独立高效地完成复杂的数据分析任务。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fzhongyu09_openchatbi_3609c678.gif","zhongyu09","Yu Zhong","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fzhongyu09_1e16ceae.png","Machine Learning Engineer building LLM-based apps (ChatBI, Agents) & data intelligent systems like anomaly detection, RCA, and attribution.","Fintopia","Beijing","zhongyu8@gmail.com",null,"https:\u002F\u002Fgithub.com\u002Fzhongyu09",[85,89,93],{"name":86,"color":87,"percentage":88},"Python","#3572A5",99.5,{"name":90,"color":91,"percentage":92},"Shell","#89e051",0.4,{"name":94,"color":95,"percentage":96},"Dockerfile","#384d54",0.1,543,70,"2026-04-05T00:13:40","MIT","Linux, macOS","未说明",{"notes":104,"python":105,"dependencies":106},"1. Python 3.12+ 环境下 jieba 分词不兼容，系统将自动降级为标点分割；2. 需准备外部大模型 API Key（如 OpenAI、Anthropic 等）；3. 可选使用 Docker 模式运行代码执行器；4. 部分 Linux 发行版安装 pysqlite3 前需先安装 sqlite 开发库（如 libsqlite3-dev）。","3.11+",[107,108,109,110,111,112,113,114,115,116],"langchain","langgraph","fastapi","streamlit","gradio","sqlalchemy","plotly","jieba","pyyaml","langchain-openai",[14,13,15,51,26],[119,120,121,122,123,124,125,107,108,126,127,128],"agent","ai","analytics","bi","database","datawarehouse","gpt","llm","nlp","text2sql","2026-03-27T02:49:30.150509","2026-04-06T07:13:35.113339",[132,137,141,145,150,154],{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},3011,"部署后首页总是卡在初始化 text2sql 客户端循环，日志显示无限等待，如何解决？","在最新版本中，如果您没有使用自定义嵌入模型，可以在 config.yaml 配置文件中移除 embedding_llm 相关配置。这通常能解决因配置残留导致的初始化循环问题。同时确保本地模型（如 Ollama）已正确安装并处于运行状态。","https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi\u002Fissues\u002F5",{"id":138,"question_zh":139,"answer_zh":140,"source_url":136},3012,"部署时因 Embedding 模型配置默认值导致启动失败或卡死怎么办？","检查 config.yaml 中的 langchain_ollama.OllamaEmbeddings 类配置。如果使用的是本地 Ollama 模型，不要保留默认的 OpenAI 配置。请将 Embedding 部分修改为指向本地 Ollama 服务，替换掉默认的 openai 类，保存后重新进入页面测试。",{"id":142,"question_zh":143,"answer_zh":144,"source_url":136},3013,"启动过程中出现 openai.APITimeoutError: Request timed out 错误是什么原因？","这通常是因为示例配置文件仍默认连接 OpenAI 且网络不通或 API Key 未配置。请检查 config.yaml，确认是否需要连接 OpenAI。如果需要，请替换 YOUR_API_KEY_HERE 为您的有效密钥并确保网络可达；如果不需要，请移除相关配置以避免尝试连接外部 API。",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},3014,"如果发现项目存在安全漏洞（如路径遍历），应该如何处理和修复？","维护者建议通过 GitHub Security Advisories 功能提交报告。例如针对 save_report 工具的路径遍历漏洞，可通过创建安全公告链接（如 GHSA-vmwq-8g8c-jm79）来追踪和修复。用户应避免使用旧版本（\u003C=0.2.1），并及时更新以应用修复补丁。","https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi\u002Fissues\u002F10",{"id":151,"question_zh":152,"answer_zh":153,"source_url":149},3015,"如何协助维护者为发现的安全漏洞申请 CVE 编号？","可以通过 GitHub 的 Security 功能协助申请 CVE。在提交安全报告后，可以联系维护者（如 @zhongyu09）询问是否可以使用 GitHub 的安全特性为其申请 CVE 编号。维护者通常会创建安全公告（Security Advisory）来处理此类问题。",{"id":155,"question_zh":156,"answer_zh":157,"source_url":158},3016,"如何在 CI 流程中集成确定性的回归检查以防止工具调用回归？","可以使用 RunLedger 等工具进行记录\u002F回放测试。建议在 evals\u002Frunledger\u002F 目录下添加测试套件和 Schema，并将工作流连接到 openchatbi\u002Fagent_graph.py 作为入口点。GitHub Actions 工作流应配置为仅在 PR 修改 openchatbi\u002F 目录下的文件时运行，以确保稳定性且不依赖实时 API 调用。","https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi\u002Fissues\u002F7",[160,165,170,175,180,185],{"id":161,"version":162,"summary_zh":163,"released_at":164},102530,"v0.2.2","## What's Changed\r\n* feat: implement multi-provider support for LLM configuration and usage by @OrMullerHahitti in https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi\u002Fpull\u002F6\r\n* Add optional RunLedger replay suite by @ZackMitchell910 in https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi\u002Fpull\u002F8\r\n* Add cache validation in create_vector_db by @zhongyu09 in https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi\u002Fpull\u002F9\r\n* fix: restrict file formats in save_report tool to enhance security by @zhongyu09 in https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi\u002Fpull\u002F12\r\n\r\n## New Contributors\r\n* @OrMullerHahitti made their first contribution in https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi\u002Fpull\u002F6\r\n* @ZackMitchell910 made their first contribution in https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi\u002Fpull\u002F8\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi\u002Fcompare\u002Fv0.2.1...v0.2.2","2026-03-02T03:39:42",{"id":166,"version":167,"summary_zh":168,"released_at":169},102531,"v0.2.1","Support launch without embedding_model\r\n  - Use BM25-based retrieval as fallback\r\n  - Do not provide memory tools\r\n  - Add Ollama config example\r\n \r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi\u002Fcompare\u002Fv0.2.0...v0.2.1","2025-12-03T09:02:24",{"id":171,"version":172,"summary_zh":173,"released_at":174},102532,"v0.2.0","**Time Series Forecasting**: Integrated Transformer-based forecasting service with Docker deployment\r\n  - New `timeseries_forecast` tool for OpenChatBI agent\r\n  - A new FastAPI based forecasting service supporting time series predictions\r\n  - Automatic GPU detection and health monitoring\r\n  - See [timeseries_forecasting\u002FREADME.md](timeseries_forecasting\u002FREADME.md) for setup guide\r\n \r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi\u002Fcompare\u002Fv0.1.1...v0.2.0","2025-11-21T16:00:38",{"id":176,"version":177,"summary_zh":178,"released_at":179},102533,"v0.1.1","The installation for `pkuseg` is complicated, this release remove it to simplify the installation.","2025-10-09T10:22:37",{"id":181,"version":182,"summary_zh":183,"released_at":184},102534,"v0.1.0","## 🚀 New Features\r\nThis release introduces enhanced context management for LLM. It truncates old tool messages and summarizes history to maintain relevant context, improving multi-turn responses while saving tokens and reducing redundant information.\r\n\r\n## What's Changed\r\n\r\n- fix: add tool message (if missing) right after AI message with tool call to prevent LLM error\r\n- feat: support context management(truncate old tool message, summarize history)\r\n- test: add and fix some unit test cases\t\r\n- fix: use pkuseg instead of jieba to fix the python3.12 compatible issue\r\n","2025-10-09T05:49:27",{"id":186,"version":187,"summary_zh":188,"released_at":189},102535,"v0.0.1","**This is the first release of OpenChatBI beta version 0.0.1!**\r\n\r\n## Introduction\r\n\r\nOpenChatBI is an open source, chat-based intelligent BI tool powered by large language models, designed to help users \r\nquery, analyze, and visualize data through natural language conversations. Built on LangGraph and LangChain ecosystem, \r\nit provides chat agents and workflows that support natural language to SQL conversion and streamlined data analysis.\r\n\r\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi\u002Fraw\u002Fmain\u002Fexample\u002Fdemo.gif\" alt=\"Demo\" width=\"800\">\r\n\r\n## Core Features\r\n\r\n1. **Natural Language Interaction**: Get data analysis results by asking questions in natural language\r\n2. **Automatic SQL Generation**: Convert natural language queries into SQL statements using advanced text2sql workflows\r\n   with schema linking and well organized prompt engineering\r\n3. **Data Visualization**: Generate intuitive data visualizations (via plotly)\r\n4. **Data Catalog Management**: Automatically discovers and indexes database table structures, supports flexible catalog \r\n   storage backends, and easily maintains business explanations for tables and columns as well as optimizes Prompts.\r\n5. **Knowledge Base Integration**: Answer complex questions by combining catalog based knowledge retrival and external\r\n   knowledge base retrival (via MCP tools)\r\n6. **Code Execution**: Execute Python code for data analysis and visualization\r\n7. **Interactive Problem-Solving**: Proactively ask users for more context when information is incomplete\r\n8. **Persistent Memory**: Conversation management and user characteristic memory based on LangGraph checkpointing\r\n9. **MCP Support**: Integration with MCP tools by configuration\r\n10. **Web UI Interface**: Provide 2 sample UI: simple and streaming web interfaces using Gradio and Streamlit, easy to\r\n   integrate with other web applications\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fzhongyu09\u002Fopenchatbi\u002Fcommits\u002Fv0.0.1","2025-09-25T04:09:51"]