[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-neo4j-labs--llm-graph-builder":3,"tool-neo4j-labs--llm-graph-builder":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",142651,2,"2026-04-06T23:34:12",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":72,"owner_avatar_url":73,"owner_bio":74,"owner_company":75,"owner_location":75,"owner_email":75,"owner_twitter":75,"owner_website":76,"owner_url":77,"languages":78,"stars":116,"forks":117,"last_commit_at":118,"license":119,"difficulty_score":120,"env_os":121,"env_gpu":122,"env_ram":122,"env_deps":123,"category_tags":132,"github_topics":133,"view_count":32,"oss_zip_url":75,"oss_zip_packed_at":75,"status":17,"created_at":147,"updated_at":148,"faqs":149,"releases":183},4795,"neo4j-labs\u002Fllm-graph-builder","llm-graph-builder","Neo4j graph construction from unstructured data using LLMs","llm-graph-builder 是一款利用大语言模型（LLM）将非结构化数据转化为结构化知识图谱的开源工具。它能轻松处理 PDF、文档、文本、YouTube 视频及网页等多种来源的信息，自动提取其中的实体节点、关系及属性，并存储至 Neo4j 数据库中。\n\n该工具主要解决了从杂乱无章的原始资料中高效构建知识体系的难题，让机器能够“理解”并关联分散的信息，从而支持更智能的数据检索与分析。用户不仅可以直观地可视化生成的图谱，还能通过自然对话的方式直接“与数据聊天”，快速获取答案并追溯信息来源。\n\n它非常适合开发者、数据科学家及研究人员使用，尤其是那些希望构建企业知识库、进行复杂关系挖掘或探索 RAG（检索增强生成）应用的团队。技术亮点方面，llm-graph-builder 基于 LangChain 框架开发，兼容 OpenAI、Gemini、Anthropic 等十余种主流大模型，并提供灵活的嵌入模型选择。此外，它还内置了详细的 Token 用量追踪功能，帮助用户有效管理成本。无论是本地部署还是云端集成，它都能为用户提供流畅的知识图谱构建体验。","# Knowledge Graph Builder\n![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-yellow)\n![FastAPI](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FFastAPI-green)\n![React](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FReact-blue)\n\n\nTransform unstructured data (PDFs, DOCs, TXTs, YouTube videos, web pages, etc.) into a structured Knowledge Graph stored in Neo4j using the power of Large Language Models (LLMs) and the LangChain framework.\n\nThis application allows you to upload files from various sources (local machine, GCS, S3 bucket, or web sources), choose your preferred LLM model, and generate a Knowledge Graph.\n\n## Getting Started\n\n### **Prerequisites**\n- **Python 3.12 or higher** (for local\u002Fseparate backend deployment)\n- Neo4j Database **5.23 or later** with APOC installed.\n  - **Neo4j Aura** databases (including the free tier) are supported.\n  - If using **Neo4j Desktop**, you will need to deploy the backend and frontend separately (docker-compose is not supported).\n\n#### **Backend Setup**\n1. Create a `.env` file in the `backend` folder by copying `backend\u002Fexample.env`.\n2. Pre-configure user credentials in the `.env` file to bypass the login dialog:\n   ```bash\n   NEO4J_URI=\u003Cyour-neo4j-uri>\n   NEO4J_USERNAME=\u003Cyour-username>\n   NEO4J_PASSWORD=\u003Cyour-password>\n   NEO4J_DATABASE=\u003Cyour-database-name>\n   ```\n3. Run:\n   ```bash\n   cd backend\n   python3.12 -m venv venv\n   source venv\u002Fbin\u002Factivate  # On Windows: venv\\Scripts\\activate\n   pip install -r requirements.txt -c constraints.txt\n   uvicorn score:app --reload\n   ```\n\n\n## Key Features\n\n### **Knowledge Graph Creation**\n- Seamlessly transform unstructured data into structured Knowledge Graphs using advanced LLMs.\n- Extract nodes, relationships, and their properties to create structured graphs.\n\n### **Schema Support**\n- Use a custom schema or existing schemas configured in the settings to generate graphs.\n\n### **Graph Visualization**\n- View graphs for specific or multiple data sources simultaneously in **Neo4j Bloom**.\n\n### **Chat with Data**\n- Interact with your data in the Neo4j database through conversational queries.\n- Retrieve metadata about the source of responses to your queries.\n- For a dedicated chat interface, use the standalone chat application with the **[\u002Fchat-only](\u002Fchat-only) route.**\n\n### **LLMs Supported**\n1. OpenAI\n2. Gemini\n3. Diffbot\n4. Azure OpenAI (dev deployed version)\n5. Anthropic (dev deployed version)\n6. Fireworks (dev deployed version)\n7. Groq (dev deployed version)\n8. Amazon Bedrock (dev deployed version)\n9. Ollama (dev deployed version)\n10. Deepseek (dev deployed version)\n11. Other OpenAI-compatible base URL models (dev deployed version)\n\n\n### **Token Usage Tracking**\n- Easily monitor and track your LLM token usage for each user and database connection.\n- Enable this feature by setting the `TRACK_USER_USAGE` environment variable to `true` in your backend configuration.\n- View your daily and monthly token consumption and limits, helping you manage usage and avoid overages.\n- You can check your remaining token limits at any time using the provided API endpoint.\n\n### **Embedding Model Selection**\n- Choose from a variety of embedding models to generate vector embeddings for your data. This can be configured from the frontend in **Graph Settings > Processing Configuration > Select Embedding Model**.\n- Supported model providers include OpenAI, Gemini, Amazon Titan, and Sentence Transformers.\n- Your selected embedding model is saved to your user profile when `TRACK_USER_USAGE` is enabled.\n\n#### **Local Configuration**\nYou have two ways to configure the embedding model locally:\n\n1.  **With User Tracking (`TRACK_USER_USAGE=true`):**\n    - Set `TRACK_USER_USAGE` to `true` in your backend `.env` file.\n    - Provide your token tracking database credentials (`TOKEN_TRACKER_DB_URI`, `TOKEN_TRACKER_DB_USERNAME`, etc.).\n    - Select your desired embedding model from the frontend. Your selection will be saved and automatically used in subsequent sessions.\n\n2.  **Without User Tracking (`TRACK_USER_USAGE=false`):**\n    - Set `TRACK_USER_USAGE` to `false`.\n    - Specify the embedding model and provider directly in your backend `.env` file using `EMBEDDING_MODEL` and `EMBEDDING_PROVIDER`.\n    - If these variables are not set, the application defaults to a Sentence Transformer model.\n    - In this mode, the embedding model cannot be changed from the frontend.\n\n\n---\n\n## Getting Started\n\n### **Prerequisites**\n- Neo4j Database **5.23 or later** with APOC installed.\n  - **Neo4j Aura** databases (including the free tier) are supported.\n  - If using **Neo4j Desktop**, you will need to deploy the backend and frontend separately (docker-compose is not supported).\n\n---\n\n## Deployment Options\n\n### **Local Deployment**\n\n#### Using Docker-Compose\nRun the application using the default `docker-compose` configuration.\n\n1. **Supported LLM Models:**  \n   By default, only OpenAI and Diffbot are enabled. Gemini requires additional GCP configurations.  \n   Use the `VITE_LLM_MODELS_PROD` variable to configure the models you need. Example:\n   ```bash\n   VITE_LLM_MODELS_PROD=\"gemini_2.5_flash,openai_gpt_5_mini,diffbot,anthropic_claude_4.5_haiku\"\n   ```\n\n2. **Input Sources:**  \n   By default, the following sources are enabled: `local`, `YouTube`, `Wikipedia`, `AWS S3`, and `web`.  \n   To add Google Cloud Storage (GCS) integration, include `gcs` and your Google client ID:\n   ```bash\n   VITE_REACT_APP_SOURCES=\"local,youtube,wiki,s3,gcs,web\"\n   VITE_GOOGLE_CLIENT_ID=\"your-google-client-id\"\n   ```\n\n#### Chat Modes\nConfigure chat modes using the `VITE_CHAT_MODES` variable:\n- By default, all modes are enabled: `vector`, `graph_vector`, `graph`, `fulltext`, `graph_vector_fulltext`, `entity_vector`, and `global_vector`. \n- To specify specific modes, update the variable. For example:\n  ```bash\n  VITE_CHAT_MODES=\"vector,graph\"\n  ```\n\n---\n\n### **Running Backend and Frontend Separately**\n\nFor development, you can run the backend and frontend independently.\n\n#### **Frontend Setup**\n1. Create a `.env` file in the `frontend` folder by copying `frontend\u002Fexample.env`.\n2. Update environment variables as needed.\n3. Run:\n   ```bash\n   cd frontend\n  yarn\n  yarn run dev\n   ```\n\n#### **Backend Setup**\n1. Create a `.env` file in the `backend` folder by copying `backend\u002Fexample.env`.\n2. Pre-configure user credentials in the `.env` file to bypass the login dialog:\n   ```bash\n   NEO4J_URI=\u003Cyour-neo4j-uri>\n   NEO4J_USERNAME=\u003Cyour-username>\n   NEO4J_PASSWORD=\u003Cyour-password>\n   NEO4J_DATABASE=\u003Cyour-database-name>\n   ```\n3. Run:\n   ```bash\n   cd backend\n  python -m venv envName\n  source envName\u002Fbin\u002Factivate\n  pip install -r requirements.txt\n  uvicorn score:app --reload\n   ```\n\n---\n\n### **Cloud Deployment**\n\nDeploy the application on **Google Cloud Platform** using the following commands:\n\n#### **Frontend Deployment**\n```bash\ngcloud run deploy dev-frontend \\\n  --source . \\\n  --region us-central1 \\\n  --allow-unauthenticated\n```\n\n#### **Backend Deployment**\n```bash\ngcloud run deploy dev-backend \\\n  --set-env-vars \"OPENAI_API_KEY=\u003Cyour-openai-api-key>\" \\\n  --set-env-vars \"DIFFBOT_API_KEY=\u003Cyour-diffbot-api-key>\" \\\n  --set-env-vars \"NEO4J_URI=\u003Cyour-neo4j-uri>\" \\\n  --set-env-vars \"NEO4J_USERNAME=\u003Cyour-username>\" \\\n  --set-env-vars \"NEO4J_PASSWORD=\u003Cyour-password>\" \\\n  --source . \\\n  --region us-central1 \\\n  --allow-unauthenticated\n```\n\n---\n\n## For local llms (Ollama)\n1. Pull the docker image of ollama\n   ```bash\n   docker pull ollama\u002Follama\n   ```\n2. Run the ollama docker image\n   ```bash\n   docker run -d -v ollama:\u002Froot\u002F.ollama -p 11434:11434 --name ollama ollama\u002Follama\n   ```\n3. Execute any llm model, e.g., llama3\n   ```bash\n   docker exec -it ollama ollama run llama3\n   ```\n4. Configure env variable in docker compose.\n   ```env\n   LLM_MODEL_CONFIG_ollama_\u003Cmodel_name>\n   # example\n   LLM_MODEL_CONFIG_ollama_llama3=${LLM_MODEL_CONFIG_ollama_llama3-llama3,http:\u002F\u002Fhost.docker.internal:11434}\n   ```\n5. Configure the backend API url\n   ```env\n   VITE_BACKEND_API_URL=${VITE_BACKEND_API_URL-backendurl}\n   ```\n6. Open the application in browser and select the ollama model for the extraction.\n7. Enjoy Graph Building.\n---\n\n## Usage\n1. Connect to a Neo4j Aura Instance, which can be either AURA DS or AURA DB, by passing the URI and password through the backend environment, filling in the login dialog, or dragging and dropping the Neo4j credentials file.\n2. To differentiate, we have added different icons. For AURA DB, there is a database icon, and for AURA DS, there is a scientific molecule icon right under the Neo4j Connection details label.\n3. Choose your source from a list of unstructured sources to create a graph.\n4. Change the LLM (if required) from the dropdown, which will be used to generate the graph.\n5. Optionally, define the schema (nodes and relationship labels) in the entity graph extraction settings.\n6. Either select multiple files to 'Generate Graph', or all the files in 'New' status will be processed for graph creation.\n7. View the graph for individual files using 'View' in the grid, or select one or more files and 'Preview Graph'.\n8. Ask questions related to the processed\u002Fcompleted sources to the chatbot. Also, get detailed information about your answers generated by the LLM.\n\n---\n\n## [ENV][env-sheet]\n| Env Variable Name       | Mandatory\u002FOptional | Default Value | Description                                                                                      |\n|------------------------ |-------------------|---------------|--------------------------------------------------------------------------------------------------|\n|                        |                   |               |                                                                                                  |\n| **BACKEND ENV**         |                   |               |                                                                                                  |\n| OPENAI_API_KEY          | Optional          |               | An OpenAI Key is required to use OpenAI LLM model to authenticate and track requests             |\n| DIFFBOT_API_KEY         | Mandatory         |               | API key is required to use Diffbot's NLP service to extract entities and relationships from unstructured data |\n| BUCKET_UPLOAD_FILE      | Optional          |               | Bucket name to store uploaded file on GCS                                                        |\n| BUCKET_FAILED_FILE      | Optional          |               | Bucket name to store failed file on GCS while extraction                                         |\n| NEO4J_USER_AGENT        | Optional          | llm-graph-builder | Name of the user agent to track Neo4j database activity                                      |\n| ENABLE_USER_AGENT       | Optional          | true          | Boolean value to enable\u002Fdisable Neo4j user agent                                                 |\n| DUPLICATE_TEXT_DISTANCE | Optional          | 5             | This value is used to find distance for all node pairs in the graph and is calculated based on node properties |\n| DUPLICATE_SCORE_VALUE   | Optional          | 0.97          | Node score value to match duplicate nodes                                                        |\n| EFFECTIVE_SEARCH_RATIO  | Optional          | 1             | Ratio used for effective search calculations                                                     |\n| GRAPH_CLEANUP_MODEL     | Optional          | openai_gpt_5_mini | Model name to clean up graph in post processing                                            |\n| MAX_TOKEN_CHUNK_SIZE    | Optional          | 10000         | Maximum token size to process file content                                                       |\n| YOUTUBE_TRANSCRIPT_PROXY| Mandatory         |               | Proxy key to process YouTube videos for getting transcripts                                      |\n| IS_EMBEDDING           | Optional           | true          | Flag to enable text embedding                                                                    |\n| KNN_MIN_SCORE          | Optional           | 0.8           | Minimum score for KNN algorithm                                                                  |\n| GCP_LOG_METRICS_ENABLED| Optional           | False         | Flag to enable Google Cloud logs                                                                 |\n| NEO4J_URI              | Optional           | neo4j:\u002F\u002Fdatabase:7687 | URI for Neo4j database                                                                  |\n| NEO4J_USERNAME         | Optional           | neo4j         | Username for Neo4j database                                                                      |\n| NEO4J_PASSWORD         | Optional           | password      | Password for Neo4j database                                                                      |                                               |\n| GCS_FILE_CACHE         | Optional           | False         | If set to True, will save files to process into GCS. If False, will save files locally           |                   |\n| ENTITY_EMBEDDING       | Optional           | False         | If set to True, it will add embeddings for each entity in the database                           |\n| LLM_MODEL_CONFIG_ollama_\u003Cmodel_name> | Optional |           | Set ollama config as model_name,model_local_url for local deployments                            |\n|                        |                   |               |                                                                                                  |\n| **FRONTEND ENV**        |                   |               |                                                                                                  |\n| VITE_BLOOM_URL         | Mandatory          | [Bloom URL][bloom-url] | URL for Bloom visualization                                |\n| VITE_REACT_APP_SOURCES | Mandatory          | local,youtube,wiki,s3 | List of input sources that will be available                                 |\n| VITE_CHAT_MODES        | Mandatory          | vector,graph+vector,graph,hybrid | Chat modes available for Q&A                                |\n| VITE_ENV               | Mandatory          | DEV or PROD   | Environment variable for the app                                                                 |\n| VITE_LLM_MODELS        | Optional           | openai_gpt_5_mini,gemini_2.5_flash,anthropic_claude_4.5_haiku | Supported models for the application |\n| VITE_BACKEND_API_URL   | Optional           | [localhost][backend-url] | URL for backend API                                        |\n| VITE_TIME_PER_PAGE     | Optional           | 50            | Time per page for processing                                                                     |\n| VITE_CHUNK_SIZE        | Optional           | 5242880       | Size of each chunk of file for upload                                                            |\n| VITE_GOOGLE_CLIENT_ID  | Optional           |               | Client ID for Google authentication                                                              |\n| VITE_LLM_MODELS_PROD   | Optional           | openai_gpt_5_mini,gemini_2.5_flash,anthropic_claude_4.5_haiku | To distinguish models based on environment (PROD or DEV) |\n| VITE_AUTH0_CLIENT_ID   | Mandatory if you are enabling Authentication otherwise it is optional |  | Okta OAuth Client ID for authentication |\n| VITE_AUTH0_DOMAIN      | Mandatory if you are enabling Authentication otherwise it is optional |  | Okta OAuth Client Domain                                  |\n| VITE_SKIP_AUTH         | Optional           | true          | Flag to skip authentication                                                                      |\n| VITE_CHUNK_OVERLAP     | Optional           | 20            | Variable to configure chunk overlap                                                              |\n| VITE_TOKENS_PER_CHUNK  | Optional           | 100           | Variable to configure tokens count per chunk. This gives flexibility for users who may require different chunk sizes for various tokenization tasks |\n| VITE_CHUNK_TO_COMBINE  | Optional           | 1             | Variable to configure number of chunks to combine for parallel processing                        |\n\n### Example Environment Files\n\nRefer to the example environment files for additional variables and configuration:\n\n- [Backend example.env](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fblob\u002Fmain\u002Fbackend\u002Fexample.env)\n- [Frontend example.env](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fblob\u002Fmain\u002Ffrontend\u002Fexample.env)\n\n---\n\n## Cloud Build Deployment\n\nYou can deploy the backend and the frontend to Google Cloud Run using Cloud Build, either manually or via automated triggers.\n\n### **Automated Deployment (Recommended)**\n1. **Connect your repository to Google Cloud Build:**\n   - In the Google Cloud Console, go to Cloud Build > Triggers.\n   - Create a new trigger and select your repository.\n   - Set the trigger to run on push to your desired branch (`main`, `staging`, or `dev`).\n   - Cloud Build will automatically use the `cloudbuild.yaml` file in the root of your repository.\n\n2. **Configure Substitutions and Secrets:**\n   - In the trigger settings, add required substitutions (e.g., `_OPENAI_API_KEY`, `_DIFFBOT_API_KEY`, etc.) as environment variables or use Secret Manager for sensitive data.\n\n3. **Push your code:**\n   - When you push to the configured branch, Cloud Build will build and deploy your backend (and optionally frontend) to Cloud Run using the steps defined in `cloudbuild.yaml`.\n\n### **Manual Deployment**\n1. **Set up Google Cloud SDK and authenticate:**\n   ```bash\n   gcloud auth login\n   gcloud config set project \u003CYOUR_PROJECT_ID>\n   ```\n\n2. **Run Cloud Build manually:**\n   ```bash\n   gcloud builds submit --config cloudbuild.yaml \\\n     --substitutions=_REGION=us-central1,_REPO=cloud-run-repo,_OPENAI_API_KEY=\u003Cyour-openai-key>,_DIFFBOT_API_KEY=\u003Cyour-diffbot-key>,_BUCKET_UPLOAD_FILE=\u003Cyour-bucket>,_BUCKET_FAILED_FILE=\u003Cyour-bucket>,_PROJECT_ID=\u003Cyour-project-id>,_GCS_FILE_CACHE=False,_TRACK_USER_USAGE=False,_TOKEN_TRACKER_DB_URI=...,_TOKEN_TRACKER_DB_USERNAME=...,_TOKEN_TRACKER_DB_PASSWORD=...,_TOKEN_TRACKER_DB_DATABASE=...,_DEFAULT_DIFFBOT_CHAT_MODEL=...,_YOUTUBE_TRANSCRIPT_PROXY=...,_EMBEDDING_MODEL=...,\n     _EMBEDDING_PROVIDER=...,_BEDROCK_EMBEDDING_MODEL_KEY=...,_LLM_MODEL_CONFIG_OPENAI_GPT_5_2=...,_LLM_MODEL_CONFIG_OPENAI_GPT_5_MINI=...,_LLM_MODEL_CONFIG_GEMINI_2_5_FLASH=...,_LLM_MODEL_CONFIG_GEMINI_2_5_PRO=...,_LLM_MODEL_CONFIG_DIFFBOT=...,_LLM_MODEL_CONFIG_GROQ_LLAMA3_1_8B=...,_LLM_MODEL_CONFIG_ANTHROPIC_CLAUDE_4_5_SONNET=...,_LLM_MODEL_CONFIG_ANTHROPIC_CLAUDE_4_5_HAIKU=...,_LLM_MODEL_CONFIG_LLAMA4_MAVERICK=...,_LLM_MODEL_CONFIG_FIREWORKS_QWEN3_30B=...,_LLM_MODEL_CONFIG_FIREWORKS_GPT_OSS=...,_LLM_MODEL_CONFIG_FIREWORKS_DEEPSEEK_V3=...,_LLM_MODEL_CONFIG_BEDROCK_NOVA_MICRO_V1=...,_LLM_MODEL_CONFIG_BEDROCK_NOVA_LITE_V1=...,_LLM_MODEL_CONFIG_BEDROCK_NOVA_PRO_V1=...,_LLM_MODEL_CONFIG_OLLAMA_LLAMA3=...\n   ```\n   - Replace the values in angle brackets with your actual configuration and secrets.\n   - You can omit or add substitutions as needed for your deployment.\n\n3. **Monitor the build:**\n   - The build and deployment process will be visible in the Cloud Build console.\n\n4. **Access your deployed service:**\n   - After deployment, your backend will be available at the Cloud Run service URL shown in the Cloud Console.\n\n---\n\n**Note:**  \n- The `cloudbuild.yaml` file supports multiple environments (`main`, `staging`, `dev`) based on the branch name.\n- The frontend build and deployment steps are commented out by default. Uncomment them in `cloudbuild.yaml` if you wish to deploy the frontend as well.\n\nFor more details, see the comments in [`cloudbuild.yaml`](cloudbuild.yaml).\n\n---\n\n## Links\n\n[LLM Knowledge Graph Builder Application][app-link]\n\n[Neo4j Workspace][neo4j-workspace]\n\n## Reference\n\n[Demo of application][demo-video]\n\n## Contact\nFor any inquiries or support, feel free to raise [GitHub Issues][github-issues]\n\n[backend-url]: http:\u002F\u002Flocalhost:8000\n[env-sheet]: https:\u002F\u002Fdocs.google.com\u002Fspreadsheets\u002Fd\u002F1DBg3m3hz0PCZNqIjyYJsYALzdWwMlLah706Xvxt62Tk\u002Fedit?gid=184339012#gid=184339012\n[env-vars]: https:\u002F\u002Fdocs.google.com\u002Fspreadsheets\u002Fd\u002F1DBg3m3hz0PCZNqIjyYJsYALzdWwMlLah706Xvxt62Tk\u002Fedit?gid=0#gid=0\n[app-link]: https:\u002F\u002Fllm-graph-builder.neo4jlabs.com\u002F\n[neo4j-workspace]: https:\u002F\u002Fworkspace-preview.neo4j.io\u002Fworkspace\u002Fquery\n[demo-video]: https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=LlNy5VmV290\n[github-issues]: https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fissues\n[bloom-url]: https:\u002F\u002Fworkspace-preview.neo4j.io\u002Fworkspace\u002Fexplore?connectURL={CONNECT_URL}&search=Show+me+a+graph&featureGenAISuggestions=true&featureGenAISuggestionsInternal=true\n[langchain-endpoint]: https:\u002F\u002Fapi.smith.langchain.com\n\n## Happy Graph Building!\n","# 知识图谱构建器\n![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-yellow)\n![FastAPI](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FFastAPI-green)\n![React](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FReact-blue)\n\n\n利用大型语言模型（LLMs）和LangChain框架的强大功能，将非结构化数据（PDF、DOC、TXT、YouTube视频、网页等）转换为存储在Neo4j中的结构化知识图谱。\n\n此应用程序允许您从各种来源（本地机器、GCS、S3存储桶或网络资源）上传文件，选择您偏好的LLM模型，并生成知识图谱。\n\n## 快速入门\n\n### **先决条件**\n- **Python 3.12 或更高版本**（用于本地\u002F独立后端部署）\n- Neo4j 数据库 **5.23 或更高版本**，并已安装 APOC。\n  - 支持 **Neo4j Aura** 数据库（包括免费层）。\n  - 如果使用 **Neo4j Desktop**，则需要分别部署后端和前端（不支持 docker-compose）。\n\n#### **后端设置**\n1. 在 `backend` 文件夹中复制 `backend\u002Fexample.env` 创建一个 `.env` 文件。\n2. 在 `.env` 文件中预先配置用户凭据以跳过登录对话框：\n   ```bash\n   NEO4J_URI=\u003Cyour-neo4j-uri>\n   NEO4J_USERNAME=\u003Cyour-username>\n   NEO4J_PASSWORD=\u003Cyour-password>\n   NEO4J_DATABASE=\u003Cyour-database-name>\n   ```\n3. 运行：\n   ```bash\n   cd backend\n   python3.12 -m venv venv\n   source venv\u002Fbin\u002Factivate  # 在 Windows 上：venv\\Scripts\\activate\n   pip install -r requirements.txt -c constraints.txt\n   uvicorn score:app --reload\n   ```\n\n\n## 主要特性\n\n### **知识图谱创建**\n- 利用先进的 LLM 将非结构化数据无缝转换为结构化的知识图谱。\n- 提取节点、关系及其属性以创建结构化图谱。\n\n### **模式支持**\n- 使用自定义模式或在设置中配置的现有模式来生成图谱。\n\n### **图谱可视化**\n- 在 **Neo4j Bloom** 中同时查看特定或多个数据源的图谱。\n\n### **与数据聊天**\n- 通过会话式查询与 Neo4j 数据库中的数据互动。\n- 检索关于查询响应来源的元数据。\n- 对于专用的聊天界面，请使用带有 **[\u002Fchat-only](\u002Fchat-only)** 路由的独立聊天应用程序。\n\n### **支持的 LLMs**\n1. OpenAI\n2. Gemini\n3. Diffbot\n4. Azure OpenAI（开发部署版本）\n5. Anthropic（开发部署版本）\n6. Fireworks（开发部署版本）\n7. Groq（开发部署版本）\n8. Amazon Bedrock（开发部署版本）\n9. Ollama（开发部署版本）\n10. Deepseek（开发部署版本）\n11. 其他兼容 OpenAI 基础 URL 的模型（开发部署版本）\n\n\n### **Token 使用跟踪**\n- 轻松监控和跟踪每个用户及数据库连接的 LLM token 使用情况。\n- 通过在后端配置中将 `TRACK_USER_USAGE` 环境变量设置为 `true` 来启用此功能。\n- 查看每日和每月的 token 消耗量及限额，帮助您管理使用情况并避免超额。\n- 您可以随时使用提供的 API 端点检查剩余的 token 限额。\n\n### **嵌入模型选择**\n- 从多种嵌入模型中选择，为您的数据生成向量嵌入。这可以在前端的 **图谱设置 > 处理配置 > 选择嵌入模型** 中进行配置。\n- 支持的模型提供商包括 OpenAI、Gemini、Amazon Titan 和 Sentence Transformers。\n- 当启用 `TRACK_USER_USAGE` 时，您选择的嵌入模型将保存到您的用户资料中。\n\n#### **本地配置**\n您可以通过两种方式在本地配置嵌入模型：\n\n1.  **启用用户跟踪（`TRACK_USER_USAGE=true`）：**\n    - 在后端 `.env` 文件中将 `TRACK_USER_USAGE` 设置为 `true`。\n    - 提供您的 token 跟踪数据库凭据（`TOKEN_TRACKER_DB_URI`、`TOKEN_TRACKER_DB_USERNAME` 等）。\n    - 从前端选择您想要的嵌入模型。您的选择将被保存并在后续会话中自动使用。\n\n2.  **禁用用户跟踪（`TRACK_USER_USAGE=false`）：**\n    - 将 `TRACK_USER_USAGE` 设置为 `false`。\n    - 直接在后端 `.env` 文件中使用 `EMBEDDING_MODEL` 和 `EMBEDDING_PROVIDER` 指定嵌入模型和提供商。\n    - 如果未设置这些变量，则应用程序默认使用 Sentence Transformer 模型。\n    - 在此模式下，无法从前端更改嵌入模型。\n\n\n---\n\n## 快速入门\n\n### **先决条件**\n- Neo4j 数据库 **5.23 或更高版本**，并已安装 APOC。\n  - 支持 **Neo4j Aura** 数据库（包括免费层）。\n  - 如果使用 **Neo4j Desktop**，则需要分别部署后端和前端（不支持 docker-compose）。\n\n---\n\n## 部署选项\n\n### **本地部署**\n\n#### 使用 Docker-Compose\n使用默认的 `docker-compose` 配置运行应用程序。\n\n1. **支持的 LLM 模型：**  \n   默认情况下，仅启用 OpenAI 和 Diffbot。Gemini 需要额外的 GCP 配置。  \n   使用 `VITE_LLM_MODELS_PROD` 变量配置所需的模型。例如：\n   ```bash\n   VITE_LLM_MODELS_PROD=\"gemini_2.5_flash,openai_gpt_5_mini,diffbot,anthropic_claude_4.5_haiku\"\n   ```\n\n2. **输入源：**  \n   默认启用以下源：`local`、`YouTube`、`Wikipedia`、`AWS S3` 和 `web`。  \n   若要添加 Google Cloud Storage (GCS) 集成，请包含 `gcs` 和您的 Google 客户 ID：\n   ```bash\n   VITE_REACT_APP_SOURCES=\"local,youtube,wiki,s3,gcs,web\"\n   VITE_GOOGLE_CLIENT_ID=\"your-google-client-id\"\n   ```\n\n#### 聊天模式\n使用 `VITE_CHAT_MODES` 变量配置聊天模式：\n- 默认情况下，所有模式均启用：`vector`、`graph_vector`、`graph`、`fulltext`、`graph_vector_fulltext`、`entity_vector` 和 `global_vector`。 \n- 若要指定特定模式，请更新该变量。例如：\n  ```bash\n  VITE_CHAT_MODES=\"vector,graph\"\n  ``` \n\n---\n\n### **分别运行后端和前端**\n\n在开发过程中，您可以独立运行后端和前端。\n\n#### **前端设置**\n1. 在 `frontend` 文件夹中复制 `frontend\u002Fexample.env` 创建一个 `.env` 文件。\n2. 根据需要更新环境变量。\n3. 运行：\n   ```bash\n   cd frontend\n  yarn\n  yarn run dev\n   ```\n\n#### **后端设置**\n1. 在 `backend` 文件夹中复制 `backend\u002Fexample.env` 创建一个 `.env` 文件。\n2. 在 `.env` 文件中预先配置用户凭据以跳过登录对话框：\n   ```bash\n   NEO4J_URI=\u003Cyour-neo4j-uri>\n   NEO4J_USERNAME=\u003Cyour-username>\n   NEO4J_PASSWORD=\u003Cyour-password>\n   NEO4J_DATABASE=\u003Cyour-database-name>\n   ```\n3. 运行：\n   ```bash\n   cd backend\n  python -m venv envName\n  source envName\u002Fbin\u002Factivate\n  pip install -r requirements.txt\n  uvicorn score:app --reload\n   ``` \n\n---\n\n### **云部署**\n\n使用以下命令在 **Google Cloud Platform** 上部署应用程序：\n\n#### **前端部署**\n```bash\ngcloud run deploy dev-frontend \\\n  --source . \\\n  --region us-central1 \\\n  --allow-unauthenticated\n```\n\n#### **后端部署**\n```bash\ngcloud run deploy dev-backend \\\n  --set-env-vars \"OPENAI_API_KEY=\u003Cyour-openai-api-key>\" \\\n  --set-env-vars \"DIFFBOT_API_KEY=\u003Cyour-diffbot-api-key>\" \\\n  --set-env-vars \"NEO4J_URI=\u003Cyour-neo4j-uri>\" \\\n  --set-env-vars \"NEO4J_USERNAME=\u003Cyour-username>\" \\\n  --set-env-vars \"NEO4J_PASSWORD=\u003Cyour-password>\" \\\n  --source . \\\n  --region us-central1 \\\n  --allow-unauthenticated\n```\n\n---\n\n## 对于本地大模型（Ollama）\n1. 拉取 Ollama 的 Docker 镜像：\n   ```bash\n   docker pull ollama\u002Follama\n   ```\n2. 运行 Ollama 的 Docker 容器：\n   ```bash\n   docker run -d -v ollama:\u002Froot\u002F.ollama -p 11434:11434 --name ollama ollama\u002Follama\n   ```\n3. 执行任意大模型，例如 llama3：\n   ```bash\n   docker exec -it ollama ollama run llama3\n   ```\n4. 在 `docker-compose` 中配置环境变量：\n   ```env\n   LLM_MODEL_CONFIG_ollama_\u003Cmodel_name>\n   # 示例\n   LLM_MODEL_CONFIG_ollama_llama3=${LLM_MODEL_CONFIG_ollama_llama3-llama3,http:\u002F\u002Fhost.docker.internal:11434}\n   ```\n5. 配置后端 API 地址：\n   ```env\n   VITE_BACKEND_API_URL=${VITE_BACKEND_API_URL-backendurl}\n   ```\n6. 在浏览器中打开应用，并选择 Ollama 模型进行信息抽取。\n7. 享受图谱构建的乐趣。\n\n---\n\n## 使用说明\n1. 通过后端环境传递 URI 和密码、填写登录对话框，或直接拖放 Neo4j 凭证文件，连接到 Neo4j Aura 实例，包括 AURA DS 或 AURA DB。\n2. 为便于区分，我们添加了不同图标：AURA DB 显示数据库图标，而 AURA DS 则在“Neo4j 连接详情”标签下方显示科学分子图标。\n3. 从非结构化数据源列表中选择您的数据源，以创建图谱。\n4. 如需更换大模型，可在下拉菜单中选择，该模型将用于生成图谱。\n5. 您还可以在实体图谱提取设置中自定义模式（节点和关系标签）。\n6. 您可以选择多个文件来“生成图谱”，或者对所有状态为“新建”的文件进行图谱创建处理。\n7. 在网格中使用“查看”功能查看单个文件的图谱，或选择一个或多个文件并点击“预览图谱”。\n8. 向聊天机器人提问与已处理\u002F已完成数据源相关的问题，同时获取由大模型生成的答案的详细信息。\n\n---\n\n## [环境变量][env-sheet]\n| 环境变量名称       | 必填\u002F可选 | 默认值 | 描述                                                                                      |\n|------------------------ |-------------------|---------------|--------------------------------------------------------------------------------------------------|\n|                        |                   |               |                                                                                                  |\n| **后端环境**         |                   |               |                                                                                                  |\n| OPENAI_API_KEY          | 可选          |               | 使用 OpenAI LLM 模型时，需要提供 OpenAI API 密钥以进行身份验证和请求跟踪             |\n| DIFFBOT_API_KEY         | 必填          |               | 使用 Diffbot 的 NLP 服务从非结构化数据中提取实体和关系时，需要提供 API 密钥           |\n| BUCKET_UPLOAD_FILE      | 可选          |               | 用于在 GCS 上存储上传文件的存储桶名称                                                        |\n| BUCKET_FAILED_FILE      | 可选          |               | 用于在 GCS 上存储提取失败文件的存储桶名称                                                    |\n| NEO4J_USER_AGENT        | 可选          | llm-graph-builder | 用于跟踪 Neo4j 数据库活动的用户代理名称                                      |\n| ENABLE_USER_AGENT       | 可选          | true          | 用于启用或禁用 Neo4j 用户代理的布尔值                                                 |\n| DUPLICATE_TEXT_DISTANCE | 可选          | 5             | 此值用于计算图中所有节点对之间的距离，并基于节点属性进行计算                             |\n| DUPLICATE_SCORE_VALUE   | 可选          | 0.97          | 用于匹配重复节点的节点得分值                                                               |\n| EFFECTIVE_SEARCH_RATIO  | 可选          | 1             | 用于有效搜索计算的比率                                                                       |\n| GRAPH_CLEANUP_MODEL     | 可选          | openai_gpt_5_mini | 用于后处理阶段清理图的模型名称                                                            |\n| MAX_TOKEN_CHUNK_SIZE    | 可选          | 10000         | 处理文件内容时的最大令牌大小                                                                 |\n| YOUTUBE_TRANSCRIPT_PROXY| 必填          |               | 用于处理 YouTube 视频以获取字幕的代理密钥                                                    |\n| IS_EMBEDDING           | 可选           | true          | 用于启用文本嵌入的标志                                                                     |\n| KNN_MIN_SCORE          | 可选           | 0.8           | KNN 算法的最小得分阈值                                                                       |\n| GCP_LOG_METRICS_ENABLED| 可选           | False         | 用于启用 Google Cloud 日志记录的标志                                                         |\n| NEO4J_URI              | 可选           | neo4j:\u002F\u002Fdatabase:7687 | Neo4j 数据库的 URI                                                                  |\n| NEO4J_USERNAME         | 可选           | neo4j         | Neo4j 数据库的用户名                                                                         |\n| NEO4J_PASSWORD         | 可选           | password      | Neo4j 数据库的密码                                                                           |                                               |\n| GCS_FILE_CACHE         | 可选           | False         | 如果设置为 True，将要处理的文件保存到 GCS；如果为 False，则将文件保存在本地           |                   |\n| ENTITY_EMBEDDING       | 可选           | False         | 如果设置为 True，将在数据库中为每个实体添加嵌入向量                                       |\n| LLM_MODEL_CONFIG_ollama_\u003Cmodel_name> | 可选 |           | 用于本地部署时设置 Ollama 配置：model_name, model_local_url                            |\n|                        |                   |               |                                                                                                  |\n| **前端环境**        |                   |               |                                                                                                  |\n| VITE_BLOOM_URL         | 必填          | [Bloom URL][bloom-url] | Bloom 可视化界面的 URL                                |\n| VITE_REACT_APP_SOURCES | 必填          | local,youtube,wiki,s3 | 可用的输入源列表                                                             |\n| VITE_CHAT_MODES        | 必填          | vector,graph+vector,graph,hybrid | 可供问答使用的聊天模式                                |\n| VITE_ENV               | 必填          | DEV 或 PROD   | 应用程序的环境变量                                                           |\n| VITE_LLM_MODELS        | 可选           | openai_gpt_5_mini,gemini_2.5_flash,anthropic_claude_4.5_haiku | 应用程序支持的模型 |\n| VITE_BACKEND_API_URL   | 可选           | [localhost][backend-url] | 后端 API 的 URL                                        |\n| VITE_TIME_PER_PAGE     | 可选           | 50            | 每页处理所需的时间                                                                           |\n| VITE_CHUNK_SIZE        | 可选           | 5242880       | 上传文件时每个分块的大小                                                                    |\n| VITE_GOOGLE_CLIENT_ID  | 可选           |               | 用于 Google 身份验证的客户端 ID                                                              |\n| VITE_LLM_MODELS_PROD   | 可选           | openai_gpt_5_mini,gemini_2.5_flash,anthropic_claude_4.5_haiku | 根据环境（PROD 或 DEV）区分模型 |\n| VITE_AUTH0_CLIENT_ID   | 必填，若启用身份验证；否则可选 |  | Okta OAuth 客户端 ID，用于身份验证 |\n| VITE_AUTH0_DOMAIN      | 必填，若启用身份验证；否则可选 |  | Okta OAuth 客户端域名                                  |\n| VITE_SKIP_AUTH         | 可选           | true          | 用于跳过身份验证的标志                                                                       |\n| VITE_CHUNK_OVERLAP     | 可选           | 20            | 用于配置分块重叠的变量                                                                     |\n| VITE_TOKENS_PER_CHUNK  | 可选           | 100           | 用于配置每个分块中的令牌数量。这为用户提供了灵活性，可根据不同的分词任务需求调整分块大小 |\n| VITE_CHUNK_TO_COMBINE  | 可选           | 1             | 用于配置并行处理时要合并的分块数量                        |\n\n### 示例环境文件\n\n请参阅示例环境文件，以获取更多变量和配置：\n\n- [后端示例 .env](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fblob\u002Fmain\u002Fbackend\u002Fexample.env)\n- [前端示例 .env](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fblob\u002Fmain\u002Ffrontend\u002Fexample.env)\n\n---\n\n## 云构建部署\n\n您可以使用 Cloud Build 将后端和前端部署到 Google Cloud Run，既可以手动操作，也可以通过自动化触发器来实现。\n\n### **自动化部署（推荐）**\n1. **将您的仓库连接到 Google Cloud Build：**\n   - 在 Google Cloud 控制台中，转到 Cloud Build > 触发器。\n   - 创建一个新的触发器并选择您的仓库。\n   - 设置触发器在推送到您希望的分支（`main`、`staging` 或 `dev`）时运行。\n   - Cloud Build 将自动使用您仓库根目录下的 `cloudbuild.yaml` 文件。\n\n2. **配置替换参数和密钥：**\n   - 在触发器设置中，添加所需的替换参数（例如 `_OPENAI_API_KEY`、`_DIFFBOT_API_KEY` 等），作为环境变量，或使用 Secret Manager 来管理敏感数据。\n\n3. **推送代码：**\n   - 当您推送到配置的分支时，Cloud Build 将根据 `cloudbuild.yaml` 中定义的步骤，构建并将您的后端（以及可选的前端）部署到 Cloud Run。\n\n### **手动部署**\n1. **设置 Google Cloud SDK 并进行身份验证：**\n   ```bash\n   gcloud auth login\n   gcloud config set project \u003CYOUR_PROJECT_ID>\n   ```\n\n2. **手动运行 Cloud Build：**\n   ```bash\n   gcloud builds submit --config cloudbuild.yaml \\\n     --substitutions=_REGION=us-central1,_REPO=cloud-run-repo,_OPENAI_API_KEY=\u003Cyour-openai-key>,_DIFFBOT_API_KEY=\u003Cyour-diffbot-key>,_BUCKET_UPLOAD_FILE=\u003Cyour-bucket>,_BUCKET_FAILED_FILE=\u003Cyour-bucket>,_PROJECT_ID=\u003Cyour-project-id>,_GCS_FILE_CACHE=False,_TRACK_USER_USAGE=False,_TOKEN_TRACKER_DB_URI=...,_TOKEN_TRACKER_DB_USERNAME=...,_TOKEN_TRACKER_DB_PASSWORD=...,_TOKEN_TRACKER_DB_DATABASE=...,_DEFAULT_DIFFBOT_CHAT_MODEL=...,_YOUTUBE_TRANSCRIPT_PROXY=...,_EMBEDDING_MODEL=...,\n     _EMBEDDING_PROVIDER=...,_BEDROCK_EMBEDDING_MODEL_KEY=...,_LLM_MODEL_CONFIG_OPENAI_GPT_5_2=...,_LLM_MODEL_CONFIG_OPENAI_GPT_5_MINI=...,_LLM_MODEL_CONFIG_GEMINI_2_5_FLASH=...,_LLM_MODEL_CONFIG_GEMINI_2_5_PRO=...,_LLM_MODEL_CONFIG_DIFFBOT=...,_LLM_MODEL_CONFIG_GROQ_LLAMA3_1_8B=...,_LLM_MODEL_CONFIG_ANTHROPIC_CLAUDE_4_5_SONNET=...,_LLM_MODEL_CONFIG_ANTHROPIC_CLAUDE_4_5_HAIKU=...,_LLM_MODEL_CONFIG_LLAMA4_MAVERICK=...,_LLM_MODEL_CONFIG_FIREWORKS_QWEN3_30B=...,_LLM_MODEL_CONFIG_FIREWORKS_GPT_OSS=...,_LLM_MODEL_CONFIG_FIREWORKS_DEEPSEEK_V3=...,_LLM_MODEL_CONFIG_BEDROCK_NOVA_MICRO_V1=...,_LLM_MODEL_CONFIG_BEDROCK_NOVA_LITE_V1=...,_LLM_MODEL_CONFIG_BEDROCK_NOVA_PRO_V1=...,_LLM_MODEL_CONFIG_OLLAMA_LLAMA3=...\n   ```\n   - 将尖括号中的值替换为您实际的配置和密钥。\n   - 您可以根据需要省略或添加替换参数。\n\n3. **监控构建过程：**\n   - 构建和部署过程将在 Cloud Build 控制台中显示。\n\n4. **访问已部署的服务：**\n   - 部署完成后，您的后端服务将可通过 Cloud Run 服务 URL 访问，该 URL 将显示在 Cloud 控制台中。\n\n---\n\n**注意：**  \n- `cloudbuild.yaml` 文件支持基于分支名称的多个环境（`main`、`staging`、`dev`）。\n- 前端的构建和部署步骤默认被注释掉。如果您也想部署前端，请在 `cloudbuild.yaml` 中取消注释相关部分。\n\n更多详细信息，请参阅 `cloudbuild.yaml` 中的注释。\n\n---\n\n## 链接\n\n[LLM 知识图谱构建应用][app-link]\n\n[Neo4j Workspace][neo4j-workspace]\n\n## 参考\n\n[应用演示视频][demo-video]\n\n## 联系方式\n如有任何疑问或需要支持，请随时提交 [GitHub Issues][github-issues]。\n\n[backend-url]: http:\u002F\u002Flocalhost:8000\n[env-sheet]: https:\u002F\u002Fdocs.google.com\u002Fspreadsheets\u002Fd\u002F1DBg3m3hz0PCZNqIjyYJsYALzdWwMlLah706Xvxt62Tk\u002Fedit?gid=184339012#gid=184339012\n[env-vars]: https:\u002F\u002Fdocs.google.com\u002Fspreadsheets\u002Fd\u002F1DBg3m3hz0PCZNqIjyYJsYALzdWwMlLah706Xvxt62Tk\u002Fedit?gid=0#gid=0\n[app-link]: https:\u002F\u002Fllm-graph-builder.neo4jlabs.com\u002F\n[neo4j-workspace]: https:\u002F\u002Fworkspace-preview.neo4j.io\u002Fworkspace\u002Fquery\n[demo-video]: https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=LlNy5VmV290\n[github-issues]: https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fissues\n[bloom-url]: https:\u002F\u002Fworkspace-preview.neo4j.io\u002Fworkspace\u002Fexplore?connectURL={CONNECT_URL}&search=Show+me+a+graph&featureGenAISuggestions=true&featureGenAISuggestionsInternal=true\n[langchain-endpoint]: https:\u002F\u002Fapi.smith.langchain.com\n\n## 祝您构建愉快！","# llm-graph-builder 快速上手指南\n\nllm-graph-builder 是一个基于 LLM 和 LangChain 的工具，可将非结构化数据（PDF、文档、YouTube 视频、网页等）转换为存储在 Neo4j 中的结构化知识图谱。\n\n## 环境准备\n\n### 系统要求\n- **Python**: 3.12 或更高版本\n- **Node.js & Yarn**: 用于前端构建（若单独部署前端）\n- **Docker & Docker Compose**: 推荐用于快速部署\n\n### 前置依赖\n- **Neo4j 数据库**: 版本 5.23 或更高，且必须安装 **APOC** 插件。\n  - 支持 **Neo4j Aura** (含免费版)。\n  - 若使用 **Neo4j Desktop**，需分别部署前后端（不支持 docker-compose）。\n- **API Keys**: 根据选择的模型准备相应的密钥（如 OpenAI API Key, Diffbot API Key 等）。\n\n## 安装步骤\n\n推荐使用 **Docker Compose** 进行一键部署，这是最简便的方式。\n\n### 方式一：使用 Docker Compose (推荐)\n\n1. **配置环境变量**\n   在项目根目录创建 `.env` 文件（可参考 `example.env`），配置必要的参数。\n   \n   *默认仅启用 OpenAI 和 Diffbot，如需启用其他模型或数据源，请参考以下配置示例：*\n   ```bash\n   # 启用多种 LLM 模型 (例如：Gemini, OpenAI, Diffbot, Anthropic)\n   VITE_LLM_MODELS_PROD=\"gemini_2.5_flash,openai_gpt_5_mini,diffbot,anthropic_claude_4.5_haiku\"\n\n   # 启用数据源 (本地、YouTube、Wikipedia、S3、GCS、Web)\n   # 若需 GCS 支持，请填入 Google Client ID\n   VITE_REACT_APP_SOURCES=\"local,youtube,wiki,s3,gcs,web\"\n   VITE_GOOGLE_CLIENT_ID=\"your-google-client-id\"\n\n   # 配置聊天模式 (默认全开，可按需精简)\n   VITE_CHAT_MODES=\"vector,graph\"\n   ```\n\n2. **启动服务**\n   在终端执行以下命令启动后端和前端：\n   ```bash\n   docker-compose up -d\n   ```\n\n### 方式二：本地分离部署 (开发模式)\n\n若需修改代码或调试，可分别启动前后端。\n\n#### 1. 后端设置\n```bash\ncd backend\n# 复制环境变量模板\ncp example.env .env\n# 编辑 .env 文件，填入 Neo4j 连接信息及 API Keys\n\n# 创建虚拟环境\npython3.12 -m venv venv\nsource venv\u002Fbin\u002Factivate  # Windows: venv\\Scripts\\activate\n\n# 安装依赖\npip install -r requirements.txt -c constraints.txt\n\n# 启动后端服务\nuvicorn score:app --reload\n```\n\n#### 2. 前端设置\n```bash\ncd frontend\n# 复制环境变量模板\ncp example.env .env\n# 根据需要修改 VITE_BACKEND_API_URL 等配置\n\n# 安装依赖并启动\nyarn\nyarn run dev\n```\n\n#### 3. 本地运行 Ollama (可选)\n若需使用本地大模型（如 Llama3）：\n```bash\n# 拉取并运行 Ollama 容器\ndocker pull ollama\u002Follama\ndocker run -d -v ollama:\u002Froot\u002F.ollama -p 11434:11434 --name ollama ollama\u002Follama\n\n# 下载模型\ndocker exec -it ollama ollama run llama3\n\n# 在 docker-compose 或 .env 中配置模型地址\n# 示例：LLM_MODEL_CONFIG_ollama_llama3=llama3,http:\u002F\u002Fhost.docker.internal:11434\n```\n\n## 基本使用\n\n1. **连接数据库**\n   - 打开浏览器访问应用地址（Docker 默认为 `http:\u002F\u002Flocalhost:8080` 或前端配置的端口）。\n   - 在登录界面输入 **Neo4j URI**、**用户名**、**密码** 和 **数据库名称**。\n   - 或者直接拖拽 Neo4j 凭证文件进行连接。\n   - *注：界面会通过图标区分 Aura DB（数据库图标）和 Aura DS（分子图标）。*\n\n2. **上传数据源**\n   - 选择数据来源：支持本地文件上传、YouTube 链接、Wikipedia、AWS S3、GCS 或网页 URL。\n   - 上传后文件状态显示为 \"New\"。\n\n3. **配置生成参数**\n   - **选择 LLM**: 从下拉菜单中选择用于提取知识的模型（如 OpenAI GPT-4, Gemini 等）。\n   - **定义 Schema (可选)**: 在设置中预定义节点和关系标签，以规范图谱结构。\n   - **嵌入模型**: 在 `Graph Settings > Processing Configuration` 中选择向量嵌入模型（如 OpenAI, Sentence Transformers）。\n\n4. **生成知识图谱**\n   - 选中一个或多个文件，点击 **\"Generate Graph\"**。\n   - 系统将自动提取实体、关系及属性，并写入 Neo4j。\n\n5. **可视化与问答**\n   - **查看图谱**: 点击文件列表中的 \"View\" 或选中多个文件点击 \"Preview Graph\"，将在 Neo4j Bloom 中展示图谱。\n   - **对话查询**: 在聊天框中输入问题（例如：“总结这些文档的核心观点”），系统将基于图谱内容回答，并提供来源溯源。\n   - 若需独立聊天界面，可访问 `\u002Fchat-only` 路由。\n\n6. **监控用量 (可选)**\n   - 若在 `.env` 中设置 `TRACK_USER_USAGE=true` 并配置 Token 追踪数据库，可在界面查看每日\u002F每月的 Token 消耗及剩余限额。","某金融合规团队需要从数百份非结构化的监管政策 PDF、新闻报告和内部会议纪要中，快速梳理出实体间的复杂关联以应对突发审计。\n\n### 没有 llm-graph-builder 时\n- 分析师需人工阅读海量文档并手动摘录实体关系，耗时数周且极易遗漏关键隐性连接。\n- 提取的数据分散在 Excel 或笔记中，缺乏统一的结构化存储，难以进行跨文档的关联查询。\n- 面对“某高管与特定风险事件的所有间接关联”这类复杂问题，传统关键词搜索完全无法胜任。\n- 每次政策更新都意味着要重新投入大量人力进行重复整理，知识沉淀成本极高。\n\n### 使用 llm-graph-builder 后\n- 利用 LLM 自动解析上传的 PDF 和网页内容，几分钟内即可将非结构化文本转化为标准的 Neo4j 知识图谱。\n- 系统自动抽取节点（如公司、人物、事件）及属性关系，构建出可视化的全局关联网络，数据结构清晰统一。\n- 通过内置的“对话式查询”功能，直接提问即可获得包含来源依据的复杂路径分析，瞬间定位风险传导链条。\n- 新增文档只需重新上传处理，llm-graph-builder 会自动增量更新图谱，确保持续的知识迭代零摩擦。\n\nllm-graph-builder 将原本需要数周的人工情报整理工作压缩至分钟级，让沉睡的非结构化数据瞬间变为可交互、可推理的战略资产。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fneo4j-labs_llm-graph-builder_910a1efc.png","neo4j-labs","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fneo4j-labs_7ca4cf52.png","",null,"https:\u002F\u002Fneo4j.com\u002Flabs","https:\u002F\u002Fgithub.com\u002Fneo4j-labs",[79,83,87,91,95,99,103,106,110,113],{"name":80,"color":81,"percentage":82},"Jupyter Notebook","#DA5B0B",59.9,{"name":84,"color":85,"percentage":86},"TypeScript","#3178c6",24.8,{"name":88,"color":89,"percentage":90},"Python","#3572A5",14.3,{"name":92,"color":93,"percentage":94},"CSS","#663399",0.4,{"name":96,"color":97,"percentage":98},"PowerShell","#012456",0.3,{"name":100,"color":101,"percentage":102},"Dockerfile","#384d54",0.1,{"name":104,"color":105,"percentage":102},"Shell","#89e051",{"name":107,"color":108,"percentage":109},"HTML","#e34c26",0,{"name":111,"color":112,"percentage":109},"JavaScript","#f1e05a",{"name":114,"color":115,"percentage":109},"Procfile","#3B2F63",4582,787,"2026-04-06T15:06:15","Apache-2.0",4,"Linux, macOS, Windows","未说明",{"notes":124,"python":125,"dependencies":126},"需要 Neo4j 数据库版本 5.23 或更高并安装 APOC 插件。支持多种部署方式：Docker Compose、前后端分离运行或云端部署。若使用本地大模型（如 Ollama），需额外配置 Docker 环境。默认仅启用 OpenAI 和 Diffbot 模型，其他模型需通过环境变量配置。","3.12+",[127,128,129,130,131],"FastAPI","React","LangChain","Neo4j Driver","Uvicorn",[16,14],[134,135,136,137,138,139,140,141,142,143,144,145,146],"data-import","genai","graph","graph-rag","graph-search","graphdb","graphrag","knowledge-graph","langchain","neo4j","rag","unstructured-data","vectordb","2026-03-27T02:49:30.150509","2026-04-07T10:52:01.944117",[150,155,160,165,170,174,178],{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},21781,"如何配置 Ollama 本地模型（如 llama3）以在系统中运行？","需要在后端和前端分别配置 .env 文件。\n\n后端 .env 配置：\nLLM_MODEL_CONFIG_ollama_llama3=\"llama3.2:latest,http:\u002F\u002F127.0.0.1:11434\"\n注意：只需在此处填写版本，不需要其他环境变量。目前尚不支持 Ollama 嵌入模型，因此 EMBEDDING_MODEL 应保持空白（系统将使用 HuggingFace 的 sentence transformer 模型）。\n\n前端 .env 配置：\nVITE_LLM_MODELS=\"ollama_llama3\"\n\n启动后，可以在运行 uvicorn score:app --reload 的后端终端查看日志。如果仍有问题，请分享日志以便进一步排查。","https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fissues\u002F730",{"id":156,"question_zh":157,"answer_zh":158,"source_url":159},21782,"上传 TXT 文件生成图谱时显示\"Failed\"状态且报错\"Chunks are not created\"，如何解决？","该问题通常与 Python 版本兼容性有关。维护者确认 Python 3.12 版本与其他相关库存在兼容性问题，导致文件处理失败。建议检查您的 Python 版本，尝试使用推荐的 Python 版本（通常是 3.10 或 3.11），或者等待项目在 README 中更新具体的版本支持说明。如果使用的是受支持的版本仍出现此问题，请尝试重新上传文件。","https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fissues\u002F977",{"id":161,"question_zh":162,"answer_zh":163,"source_url":164},21783,"前端启动时报错\"Google OAuth components must be used within GoogleOAuthProvider\"，如何解决？","这是因为默认配置中包含了已移除或不支持的 Google Cloud Storage (GCS) 源。解决方法是修改前端 .env 文件中的 VITE_REACT_APP_SOURCES 变量，从中移除 'gcs' 项，但保留其他数据源项。例如，如果原配置包含 'gcs'，请将其删除。维护者已在开发分支修复此问题并更新了 README，从默认源中移除了 GCS。","https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fissues\u002F1107",{"id":166,"question_zh":167,"answer_zh":168,"source_url":169},21784,"使用 Ollama 和 deepseek-r1 等模型生成图谱时，报错\"Input should be a valid string [type=string_type, input_value=...dict]\"，原因是什么？","该错误表明 LLM 返回的关系类型（type）格式不正确，返回的是字典（dict）而不是字符串（string），导致 Pydantic 验证失败。这通常是因为使用的模型（如 deepseek-r1:1.5b 或其他小参数模型）未能严格按照要求的格式输出结构化数据。建议尝试使用更强大的模型（如 llama3 系列），或者在提示词中增加对输出格式的严格约束。如果必须使用该模型，可能需要调整代码中的解析逻辑以容错处理非标准输出。","https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fissues\u002F1123",{"id":171,"question_zh":172,"answer_zh":173,"source_url":154},21785,"在不使用 Docker 的情况下（直接在 Windows 本地）运行后端和前端，Ollama 模型配置有何不同？","在 Windows 本地直接运行（后端使用 uvicorn，前端使用 yarn run dev）时，配置要点如下：\n1. 确保 Ollama 服务正在运行（ollama list 能看到模型）。\n2. 后端 .env 中，LLM_MODEL_CONFIG_ollama_llama3 应指向本地地址，如 \"llama3,http:\u002F\u002Flocalhost:11434\" 或 \"llama3.2:latest,http:\u002F\u002F127.0.0.1:11434\"。\n3. 前端 .env 中设置 VITE_LLM_MODELS=\"ollama_llama3\"。\n4. 如果生成图谱失败，请检查后端终端日志。如果是内存（RAM）限制导致的问题，尝试上传较小的文件进行测试。",{"id":175,"question_zh":176,"answer_zh":177,"source_url":169},21786,"如何在配置文件中正确指定 Ollama 模型的名称和版本（例如 llama3 还是 llama3:latest）？","在 .env 文件中配置 LLM_MODEL_CONFIG_ollama_llama3 时，格式应为 \"模型名称，URL\"。您可以使用 ollama list 命令查看本地安装的模型名称。\n推荐格式：LLM_MODEL_CONFIG_ollama_llama3=\"llama3:latest,http:\u002F\u002Flocalhost:11434\" 或 LLAMA_MODEL_CONFIG_ollama_llama3=\"llama3,http:\u002F\u002Flocalhost:11434\"。\n关键在于模型名称必须与 ollama list 显示的 NAME 列一致（包括标签，如 :latest）。如果不确定，可以使用完整名称（如 llama3:latest）。所有可用模型通常会自动渲染到下拉菜单中供选择。",{"id":179,"question_zh":180,"answer_zh":181,"source_url":182},21787,"连接 S3 存储桶失败，无法访问上传的文件，可能是什么原因？","虽然具体案例中用户未提供详细错误日志，但连接 S3 失败通常涉及以下原因：\n1. AWS Access Key 和 Secret Key 配置错误或权限不足（确保 IAM 用户有读取 S3 桶的权限）。\n2. S3 桶的区域（Region）配置不正确。\n3. 网络防火墙或安全组阻止了访问。\n4. 桶名称或路径拼写错误。\n建议检查 AWS 凭证是否正确填入环境变量，并确认 S3 桶策略允许该凭证访问。如果问题依旧，请尝试上传 PDF 文件测试流程，并查看后端日志获取具体错误信息。","https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fissues\u002F294",[184,189,194,199,204,209,214,219,224,229,234,239,244,249,254,259],{"id":185,"version":186,"summary_zh":187,"released_at":188},127801,"v0.8.5","发布说明\n本次发布重点在于嵌入模型的支持以及改进的令牌使用跟踪功能。\n\n**嵌入模型**\n\n- 用户可以选择多种嵌入模型来为您的数据生成向量嵌入。您可以在前端的 **图设置 > 处理配置 > 选择嵌入模型** 中进行配置。\n- 支持的模型提供商包括 OpenAI、Gemini、Amazon Titan 和 Sentence Transformers。\n- 当 `TRACK_USER_USAGE` 被启用时，您所选择的嵌入模型将保存到您的用户个人资料中。\n\n**修复的令牌使用跟踪问题**\n\n- 部分用户的电子邮件为空，导致出现令牌限额用尽的错误 #1463\n- 采用模块化方式，可通过环境变量 `LIMIT_TOKEN_USAGE_PER_USER` 禁用令牌跟踪功能\n- 在令牌限额用尽时，新增团队联系人选项\n","2026-02-11T05:52:38",{"id":190,"version":191,"summary_zh":192,"released_at":193},127802,"v0.8.4","### 发布说明\n\n本次发布主要聚焦于依赖项升级、模型更新、更清晰的架构设计以及改进的使用量跟踪功能。\n\n**依赖项**\n\n- 前端和后端的所有依赖项均已更新至最新稳定版本。\n\n**认证**\n\n- 生产环境现已启用登录功能。\n\n**LLM 模型更新**\n\n- 新增并升级至最新支持的 LLM，包括 Gemini 2.5、OpenAI GPT-5.x、Anthropic Claude 4.5、Bedrock Nova 系列、Groq、Fireworks 模型等。\n- 默认模型已切换为 Gemini 2.5 Flash。\n\n**代码库改进**\n\n- 重构代码库，移除不必要的逻辑。\n- 引入依赖注入，并减少 API 函数的参数数量。\n- 简化 API 流程，统一输入参数，以提高一致性和可维护性。\n\n**文档**\n\n- 更新了 README 文件，并刷新了前后端文档。\n- 更新了示例环境变量配置文件，以便更好地理解哪些是可选、哪些是必填的环境变量。\n\n**Token 使用量跟踪**\n\n- 现在会按用户分别跟踪提取、聊天机器人和后处理 API 的 Token 使用量。\n- 非 Neo4j 用户每日限额为 25 万 Token，每月限额为 100 万 Token。\n- 限额可通过 Cron 任务重置。\n- 仓库用户可以通过设置 TRACK_USER_USAGE=true 来启用跟踪功能。\n- 当前使用量可在 UI 的登录用户专区查看。","2026-01-15T13:04:24",{"id":195,"version":196,"summary_zh":197,"released_at":198},127803,"v0.8.3","🚀 功能\n• 新增数据导入器支持——可将图模型从 [Neo4j Console Preview](https:\u002F\u002Fconsole-preview.neo4j.io\u002Ftools\u002Fimport\u002Fmodels) 导入 LLM 图构建器。(#1301)\n• Claude 4 Sonnet (Anthropic) 现已在生产环境中可用。(#1299)\n🛠️ 更新\n• 后端依赖项已升级至最新版本。(#1321)\n🐞 修复\n• 解决了 Protobuf 文件重复错误 (#1323)\n• 修复了 YouTube 字幕缺失的问题。(#1321)","2025-06-24T09:00:34",{"id":200,"version":201,"summary_zh":202,"released_at":203},127804,"v0.8.2","## 变更内容\n错误修复\n\n- 由于竞态条件导致连接到错误的数据库 #1283\n\n- chatollama 无法正常工作 #1286\n\n- 添加了元组模式验证 #1289 #1290\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fcompare\u002Fv0.8.1...v0.8.2","2025-05-19T13:23:21",{"id":205,"version":206,"summary_zh":207,"released_at":208},127805,"v0.8.1","## 变更内容\n#1246 使用 PyTorch CPU 版本以减小 Docker 镜像大小\n#1275 下拉菜单根据用户选择的预定义模式、加载现有模式或文本模式，更新所有必需的源、类型和目标值。\n#1248 对元组模式进行验证，以避免在源、类型和目标中输入多个值。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fcompare\u002Fv0.8...v0.8.1","2025-05-12T10:19:40",{"id":210,"version":211,"summary_zh":212,"released_at":213},127806,"v0.8","🚀 新功能\n\n- 模式可视化工具：推出功能强大的模式可视化界面，提供多种选项：\n  - 从文本生成：用户现在可以输入纯文本，以三元组格式（源 → 关系 → 目标）提取模式。\n  - 从数据库加载：直接从数据库中以三元组格式可视化现有模式。\n  - 预定义模式：从领域特定的模式库中选择（例如零售、医疗），快速启动项目。\n  - 用户自定义模式：用户可以通过下拉菜单选择或直接以三元组格式（源 → 关系 → 目标）输入来定义自己的模式。\n#1235 #1230。\n\n- 旧格式迁移：轻松将先前存储的模式格式转换为新的基于三元组的格式，以实现一致性并改善可视化效果。\n\n⚙️ 功能增强\n\n- LLM 升级（生产环境）：将 Gemini 1.5 Flash 替换为性能更强的 Gemini 2.0 Flash，以提升生产工作流中的性能。[#[1233](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F1233)]\n- LLM 增加（开发环境）：在开发环境中集成了对 LLaMA4 Maverick、LLaMA4 Scout 和 OpenAI GPT-4.1 的支持。[#[1233](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F1233)]\n\n🛠 错误修复与改进\n\n- 文件上传状态：修复了文件上传失败时，数据库状态未能正确反映的问题。[#[1222](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F1222)]\n\n- 准确的节点与关系计数：更新逻辑，在抽取过程中正确反映节点和关系的数量。[#[1191](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F1191)]\n\n- 死锁处理：通过为受影响的 Cypher 查询实施最多 3 次自动重试，解决了瞬时死锁错误（Neo.TransientError.Transaction.DeadlockDetected）。[#[1187](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F1187)]\n\n📦 依赖项更新\n\n- 将所有后端包升级到最新版本，以提高兼容性和性能。[#[1189](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F1189)]","2025-04-21T11:03:28",{"id":215,"version":216,"summary_zh":217,"released_at":218},127807,"v0.7.2","**🔒 安全性增强**\n• 安全相关 API 方法变更：将 sourcelist API 由 GET 改为 POST，以防止密码泄露。（#1102）\n🛠 错误修复与改进\n• 后端修复（#1097）：\n  - 修复了列表索引超出范围的错误。\n  - 解决了 GCS 文件未找到的问题。\n  - 增加了从维基百科相关链接中获取数据的功能 #1151。\n  - 针对某些模式下无法找到顶级实体的问题进行了修复 #1150、#1154。\n\n**提取流程修复：**\n• DOCX 文件提取问题：通过添加 pypandoc-binary 解决了缺少 pandoc 依赖的问题。（#1124）\n• UnicodeDecodeError 修复：解决了部分文本文件的 gb2312 编码问题。（#1126）\n• GraphDB 访问错误：修复了提取过程中出现的 UnboundLocalError。（#1129）\n• 连接问题修复：\n  - 修复了访问数据资源时出现的“连接不存在”错误。（#1131）\n\n**📝 代码与配置清理**\n• 从根目录移除了 example.env 文件，以避免混淆。（#1099）\n• 清理了附加说明：防止因大括号格式及潜在的提示注入问题引发的故障。（#1130）\n\n**🎨 UI 修复与增强**\n• 当数据库未连接时，禁用模式按钮。\n• 在小屏幕上，若数据库未连接，则隐藏数据源。\n• 当可视化包含超过 50 个块的图时，新增一条信息提示。（#1097）","2025-03-11T09:59:34",{"id":220,"version":221,"summary_zh":222,"released_at":223},127808,"v0.7.1","## 变更内容\n\n### 新功能\n\n- 模式可视化 - #1035\n\n- 包更新：langchain neo4j 及其他 langchain 相关包 - #1048\n\n- 在卸载阶段取消 API 调用 - #1068 \n\n \n### Bug 修复\n\n- UI 修复 - #1091 \n\n- 后端连接配置已修复 - #1060 \n\n- 从获取现有模式中移除了与聊天机器人相关的实体 - #1061 \n\n- 解决了由于文档名称前后存在空格而导致部分 URL 的网页来源提取失败的问题。- #1064 \n\n- 增加了对新模型的支持 - #1069 \n1. gemini 2.0 flash\n2. GPT o3 mini（开发中）\n3. deepseek r1 & v3（开发中）\n\n\n \n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fcompare\u002Fv0.7...v0.7.1","2025-02-18T05:01:20",{"id":225,"version":226,"summary_zh":227,"released_at":228},127809,"v0.7","### 体验与洞察优化\n• **文件过期提醒：**\n用户现在会在本地文件过期时收到通知，确保及时采取行动。#953  \n• **维基百科标题缺失时的回退机制：**\n当元数据标题不可用时，系统会根据 URL 自动为页面分配标题，从而提升数据清晰度。#982  \n• **新增说明字段：**\n引入了一个新选项卡，允许用户为实体抽取提供具体指示，例如聚焦关键主题。#1013、#1014  \n\n### 知识图谱构建与检索能力提升\n• **限制处理的文本块数量：**\n对生成和处理的文本块数量进行了限制，以提升性能和可扩展性。#1000  \n• **图谱合并逻辑优化：**\n通过新的逻辑将大型图模式归并为更少但更相关的节点标签和关系类型，从而提高图谱质量。#1013、#1014  \n• **集成新模型：**\n集成了 Amazon Nova 系列模型（Micro、Lite、Pro v1）用于图谱生成和聊天机器人问答，并支持 Titan 嵌入模型。#1006  \n• **有效搜索比率参数：**\n引入了 effective_search_ratio 参数，通过扩大 Lucene 索引候选集来提高查询准确性，该参数可通过后端环境变量进行配置。#981  \n• **API 自定义错误处理：**\n新增 LLMGraphBuilderException 异常类，用于处理 extract 和 url_scan API 中面向用户的错误，以提供更清晰的反馈信息。#989  \n\n### 代码重构\n• **代码清理：**\n移除了未使用的库及注释掉的代码，以提升代码可维护性。#973  \n• **后处理流程更新：**\n在后处理任务中移除了对 graphType 的 isSchema 检查，简化了逻辑流程。  \n• **文档更新：**\n更新了 README.md 和前端文档，以提高清晰度。#974  \n• **驱动程序优化：**\n确保在获取文本块详情后正确关闭驱动程序，避免资源泄漏。#938  \n\n### Bug 修复\n• **指标表格修复：**\n解决了指标表格中的各类 UI 问题，确保界面运行更加流畅。#921  \n• **数据库连接问题修复：**\n修复了切换数据库实例时出现的问题，包括：  \n- 切换时刷新聊天机器人界面；  \n- 解决 atob 控制台报错及重复键警告；  \n- 移除文档名称中的 strip() 函数。#966  \n• **前端改进：**\n修复了 .env 模型格式错误，将模糊不清的提示信息替换为更具描述性的错误信息。#946  \n• **模式校验修复：**\n针对 EquivalentSchemaRuleAlreadyExist 错误，优化了模式校验逻辑。#949  \n• **日志记录器错误修复：**\n解决了因 JSON 解析导致的日志记录器错误。#994","2025-01-27T09:19:56",{"id":230,"version":231,"summary_zh":232,"released_at":233},127810,"v0.6","### 体验与洞察力提升：\n* UI 上的分块详情——用户现在可以直接在 UI 中查看从每个文档源提取的文本分块。这些分块为知识图谱提供支持，并可借助多种策略实现精准的信息检索。https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F850\n* 节点与关系数量统计——UI 现在针对每个文档源，提供了分块、实体和社区节点数量的详细拆分，从而更深入地洞察知识图谱的结构。https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F881, https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F890\n* 私有实例安全性增强——后端服务不再向前端暴露 Neo4j 凭证，从而提升了私有部署环境的安全性。https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F896\n* 专用聊天界面——现已提供精简的 URL，可直接访问聊天功能。https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F870\n\n### 知识图谱构建与检索优化\n* 提取连通性——优化了分块提取逻辑，提升了知识图谱的连通性和整体质量。https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F852\n* 单一分块提取——改用异步方式执行单一分块提取，简化了知识图谱的构建流程。https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F901\n* 新版 Neo4j Langchain 包——推出了全新的顶级 Neo4j Langchain 包，并将其他相关包升级至最新版本，从而简化集成与使用流程。https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F891\n\n### 高级评估与指标\n* 扩展聊天机器人评估指标——引入了用于评估聊天机器人回答的新指标，包括 ROUGE 分数、语义相似度分数以及上下文实体召回率分数。https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F855\u002F\n\n### 错误修复\n* 检查 GCS 存储桶中的文件是否存在，更新了从上次处理位置重新处理的条件。https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F917\n* 在后处理阶段更新了节点查询的重复问题，并根据嵌入模型动态调整向量索引维度。https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F929\n* 从 UI 中增加了社区创建检查功能。https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F916\n* 添加数据库图标以区分不同的 Neo4j 图数据库实例。https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F924\n\n","2024-12-11T11:01:39",{"id":235,"version":236,"summary_zh":237,"released_at":238},127811,"v0.5.1","This release includes several improvements to security, performance, and user experience.\r\n\r\n### Security Enhancements\r\n* Strengthened Frontend Security: Enhanced the security of the frontend by adding several key security headers to our Nginx configuration. This strengthens the application's defenses against common web vulnerabilities.\r\n\r\n### Performance Improvements\r\n* Removed a parameter that unnecessarily included chunk nodes as an entity source, eliminating the creation of an extra Document node. This change improves performance.\r\n\r\n* Updated several frontend libraries to their latest versions. This brings performance enhancements, bug fixes, and potential new features to the frontend application.\r\n\r\n### Usability Enhancements\r\n* The \"Reprocess\" label has been changed to \"Ready to Reprocess\" for greater clarity and accuracy.\r\n\r\n* The ENABLE_COMMUNITIES flag has been removed from the backend. Users now have the option to create communities directly within the Aura DS frontend, streamlining the process and providing more control.\r\n\r\n### Bug fixes\r\nUpdated the queries used for document and entity deletion.","2024-11-29T10:33:27",{"id":240,"version":241,"summary_zh":242,"released_at":243},127812,"v0.5","## What's Changed\r\n### New Features\r\n* Graph communities- - In a Neo4j Aura DS instance, graph communities (also called clusters, modules, or subgraphs) can be created. These are groups of interconnected nodes with denser internal connections than external connections. These communities are created to improve chatbot performance by organizing users into relevant groups. Users have the option to enable community creation across entities as part of the post processing job in the graph enhancement tab on UI.[ #721](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F721) ,[ #728](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F728)\r\n* Global search full text - This new chat mode includes communities, full-text indexing, and global search. Full-text indexing speeds up searches by creating a searchable index of words and their locations within documents, eliminating the need to scan entire documents. This improves search speed, storage efficiency, and allows searching across multiple database columns. Global search leverages broader contextual understanding across the entire system for less specific queries, benefiting from connections between different data points. -[ #699](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F699)\r\n* Ragas Evaluation Metrics - RAGAs (Retrieval-Augmented Generation Assessment) is a framework that automatically evaluates RAG pipelines. Instead of relying on human-labeled data, it uses LLMs to assess component performance, providing scores for faithfulness (accuracy to source information) and answer relevancy (how well the answer addresses the question). RAGAs evaluation metrics are now provided as chatbot retrieval information. These scores help improve chatbot accuracy and helpfulness.[ #787](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F787),[ #806](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F806)\r\n* Read only user support - The application provides access to the read-only Neo4j database users. Users have the option to view graph data, access chat bot and chat history.[#766](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F766)\r\n* Get neighborhood nodes -To dive deeper into the structure of a graph, get Neighbors retrieves nodes directly connected to a specified element. This can be particularly useful for exploring connections around a given node, allowing for focused data inspection within the larger graph. By providing the element ID, users can pinpoint a node and extract its neighboring nodes and relationships. Implemented text link (View Graph)  for Retrieval information modal that holds information for Chunk and entities and in Graph enhancement model for Disconnected nodes and De duplication of nodes we are adding links to ID column. #796 \r\n* Download chat conversation - Export the current chat history in JSON format. This includes all messages, timestamps, and ideally, identifying information for each user (e.g., neo4j usernames or IDs).[ #800](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F800)\r\n* Updated Lang chain versions in https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F565\r\n### Enhancements\r\n* Retry Processing - Users can reprocess canceled, failed, or processed files. Options include starting from the beginning, deleting existing entities and starting over, or resuming from the last successful point (not available for completed files). Clicking \"Generate graph\" after selecting \"Reprocess\" begins processing.[ #698](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F698)\r\n* Post processing call after file completion -[ https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F716\u002Ffiles#diff-b31aa7db705259a32218d462af23ecdc6369dbc7b67ff46e67204f696925c884](https:\u002F\u002Fgithub.com\u002Fneo4j-labs\u002Fllm-graph-builder\u002Fpull\u002F716\u002Ffiles#diff-b31aa7db705259a32218d462af23ecdc6369dbc7b67ff46e67204f696925c884)\r\nChanges to post processing call that will take place only if there are no files in the table with status processing or waiting or if all files are completed\r\n* Multiple chat mode selection - The chat interface offers users a choice of several powerful search modes, allowing for highly customizable retrieval of information. Users can select one or more of the following modes:\r\n            - Vector Search: A search based on semantic similarity, finding results conceptually similar to the query.\r\n            - Full Text Search: A traditional keyword-based search, returning results containing the exact words or phrases in the query.\r\n             - Graph + Vector + Full Text Search: A combined approach leveraging graph connections, semantic similarity (vector), and keyword matching (full text) for comprehensive results.\r\n             - Entity Search + Vector Search: Focuses on identifying and retrieving specific entities (people, places, things) combined with semantic similarity.\r\n              - Global Search + Vector + Full Text Search: A broad search encompassing all available data, utilizing communities, semantic similarity and keyword matching. T","2024-10-30T06:24:30",{"id":245,"version":246,"summary_zh":247,"released_at":248},127813,"v0.4","**New Features**\r\n\r\n- Graph Enhancements -\r\n  - De-duplication of nodes: User is provided with the option to select and merge nodes to eliminate redundancy and improve accuracy. #566 \r\n  - Post processing jobs: After extracting the knowledge graph, users can select from a range of post-processing jobs to fine-tune it. These options include:KNN Semantic Similarity Graph of chunks, Full-Text Index on Database Labels and Entity Embeddings Generation. #627\r\n\r\n- Interactive graph visualization: User can highlight nodes\u002Frelations and also search node properties. #696  \r\n\r\n- Data table filtering - Users can refine the data displayed in the UI table by Filtering by file status\u002Fsource\u002Ftype\u002Fmodel or Sorting by Column #589, #664\r\n\t\t\r\n- Vector dimensions mismatch - The application checks for compatible vector indexes in the user's graph database. If an incompatible index exists, an error message will appear and users can choose to create a new, compatible vector index. #594\r\n\r\n- Hybrid search for chunks - The Graph+Vector+FullText retriever indexes nodes and relationships by their string properties using a FullText Index. This allows for efficient retrieval of graph data, specifically from chunk nodes. The 'keyword' index is automatically created if it doesn't exist when this mode is selected. #611, #670\r\n\r\n**Enhancements**\r\n\r\n- Graph Visualization for each document based on lexical graph or full knowledge graph with entities. #575\r\n- OpenAI's LLMs as the default option. For users seeking to run OpenAI-compatible LLMs locally, simply provide the API key, base URL, and model name. The code will seamlessly transition to using ChatOpenAI from langchain. #588\r\n- Concurrent File Processing Enhancements - \r\n\r\n  - Batch Processing: Files are now processed in batches for efficient performance, regardless of whether the user selects a specific number of files or uses the \"Select All\" checkbox.\r\n  - State Persistence: File processing status is maintained even after page refreshes. Waiting files will continue processing once ongoing tasks are completed.\r\n  - Dynamic Batching: If the number of files to be processed is less than the batch size, the remaining capacity will be filled with additional files, ensuring optimal processing.  #665\r\n\r\n**Bug Fixes** \r\n\r\n- User Interface Improvements - \r\n  - Chat Detail Accuracy: Corrected an issue where incorrect retrieval information was displayed when viewing details of previous chat answers. \r\n  - Enhanced UI Clarity: Improved UI elements for better clarity, including tooltips for the model dropdown, enhancements to the graph visualization, and a \"Clear Schema\" button. #572\r\n\r\n- Data Integrity and Functionality -\r\n  - Label and Relationship Sanitization: Improved data integrity by removing backticks from labels and relationship types. #547 \r\n  - Visualization Query Accuracy: Fixed an issue where the visualization query was pulling data from unintended documents, ensuring accurate results. #590 \r\n\r\n- General Bug Fixes -\r\n  - Addressed several other existing bugs to enhance overall application stability and functionality. #584, #684\r\n\t\t\r\n","2024-08-27T13:53:09",{"id":250,"version":251,"summary_zh":252,"released_at":253},127814,"v0.3","### New Features\r\n\r\n- Local Filtering for Graph Viz , Lexical Graph, Entity Graph, Refresh - Can have a view of graph for a particular file based on dropdown selection and refresh the graph while file is processing. #497 \r\n- Graph Enhancements UI - Set schema for KG creation  \r\n    - Entity Extraction Settings - Option to provide predefined graph schema or schema based on input text for building knowledge graph for multiple sources. #501\r\n    - Delete Disconnected nodes - Identifying and removing entities that are not connected to any other information. Remove them to create a cleaner and more efficient knowledge graph that leads to more relevant and informative responses. #493 \r\n- Support for local Ollama models - Try your own Ollama models when application deployed locally by setting config. #508 \r\n- Chatbot enhancement - \r\n   - Have option to chat with graph using Vector only or graph +Vector #485  \r\n   - Option to select specific files for QA (all processed sources are considered for QA by default) #514 \r\n- Entity Embeddings - Create embeddings for entities extracted from LLMs (configurable) #505 \r\n- Web Pages as Source - Provide web URLs as source for creating knowledge graph. #475 \r\n\r\n### UI\u002FUX Bug fixes\r\n#527 , #490 \r\n\r\n","2024-07-12T08:58:59",{"id":255,"version":256,"summary_zh":257,"released_at":258},127815,"v.02","Deletion of Orphan Nodes\r\nFull text Index\r\nChat Modes\r\nSetting Schema Modal\r\nWeb pages as Source.","2024-07-02T15:59:31",{"id":260,"version":261,"summary_zh":75,"released_at":262},127816,"v0.1","2024-05-16T12:14:54"]