[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-freedmand--semantra":3,"tool-freedmand--semantra":64},[4,17,27,35,44,52],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":10,"last_commit_at":41,"category_tags":42,"status":16},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[13,14,15,43],"视频",{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":23,"last_commit_at":50,"category_tags":51,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":53,"name":54,"github_repo":55,"description_zh":56,"stars":57,"difficulty_score":23,"last_commit_at":58,"category_tags":59,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,60,43,61,15,62,26,13,63],"数据工具","插件","其他","音频",{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":82,"owner_website":83,"owner_url":84,"languages":85,"stars":110,"forks":111,"last_commit_at":112,"license":113,"difficulty_score":23,"env_os":114,"env_gpu":115,"env_ram":116,"env_deps":117,"category_tags":125,"github_topics":126,"view_count":23,"oss_zip_url":130,"oss_zip_packed_at":130,"status":16,"created_at":131,"updated_at":132,"faqs":133,"releases":162},4129,"freedmand\u002Fsemantra","semantra","Multi-tool for semantic search","Semantra 是一款专为本地文档设计的语义搜索工具，旨在帮助用户突破传统关键词匹配的局限，实现“按含义”而非“按字面”查找信息。无论是面对海量泄露文件、学术论文还是历史档案，用户只需输入自然语言描述，Semantra 就能精准定位到内容相关但措辞不同的段落，轻松解决在大量文本中“大海捞针”的难题。\n\n这款工具特别适合记者、研究人员、学生及历史学者等需要深度处理文档的专业人士，同时也欢迎注重数据隐私的普通用户尝试。Semantra 的最大亮点在于其私密性与便捷性：它完全在本地命令行运行，分析指定的文本或 PDF 文件后，会自动启动一个本地网页服务供交互式查询，确保敏感数据无需上传云端即可被高效检索。首次运行时，它会自动下载轻量级的机器学习模型进行本地化处理，既保证了搜索的智能程度，又兼顾了运行速度与存储效率。通过简单的命令即可构建专属的个人知识库，让文档探索变得直观且高效。","# Semantra\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F306095\u002F233867821-601db8b0-19c6-4bae-8e93-720b324dc199.mov\n\nSemantra is a multipurpose tool for semantically searching documents. Query by meaning rather than just by matching text.\n\nThe tool, made to run on the command line, analyzes specified text and PDF files on your computer and launches a local web search application for interactively querying them. The purpose of Semantra is to make running a specialized semantic search engine easy, friendly, configurable, and private\u002Fsecure.\n\nSemantra is built for individuals seeking needles in haystacks — journalists sifting through leaked documents on deadline, researchers seeking insights within papers, students engaging with literature by querying themes, historians connecting events across books, and so forth.\n\n## Resources\n\n- [Tutorial](docs\u002Ftutorial.md): a gentle introduction to getting started with Semantra — everything from installing the tool to hands-on examples of analyzing documents with it\n- [Guides](docs\u002Fguides.md): practical guides on how to do more with Semantra\n- [Concepts](docs\u002Fconcepts.md): Explainers on some concepts to better understand how Semantra works\n- [Using the web interface](docs\u002Fhelp.md): A reference on how to use the Semantra web app\n\nThis page gives a high-level overview of Semantra and a reference of its features. It's also available in other languages: [Semantra en español](docs\u002FREADME_es.md), [Semantra 中文说明](docs\u002FREADME_zh-CN.md)\n\n## Installation\n\nEnsure you have [Python >= 3.9](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F).\n\nThe easiest way to install Semantra is via `pipx`. If you do not have `pipx` installed, run:\n\n```sh\npython3 -m pip install --user pipx\n```\n\nOr, if you have [Homebrew](https:\u002F\u002Fbrew.sh\u002F) installed, you can run `brew install pipx`.\n\nOnce `pipx` is installed, run:\n\n```sh\npython3 -m pipx ensurepath\n```\n\nOpen a new terminal window for the new path settings `pipx` sets to go into effect. Then run:\n\n```sh\npython3 -m pipx install semantra\n```\n\nThis will install Semantra on your path. You should be able to run `semantra` in the terminal and see output.\n\nNote: if the above steps don't work or you'd like a more granular installation, you can install Semantra in a virtual environment (though note it will only be accessible while the virtual environment is activated):\n\n```sh\npython3 -m venv venv\nsource venv\u002Fbin\u002Factivate\npip install semantra\n```\n\n## Usage\n\nSemantra operates on collections of documents — text or PDF files — stored on your local computer.\n\nAt its simplest, you can run Semantra over a single document by running:\n\n```sh\nsemantra doc.pdf\n```\n\nYou can run Semantra over multiple documents, too:\n\n```sh\nsemantra report.pdf book.txt\n```\n\nSemantra will take some time to process the input documents. This is a one-time operation per document (subsequent runs over the same document collection will be near instantaneous).\n\nOnce processing is complete, Semantra will launch a local webserver, by default at [localhost:8080](http:\u002F\u002Flocalhost:8080). On this web page, you can interactively query the passed in documents semantically.\n\n**Quick notes:**\n\nWhen you first run Semantra, it may take several minutes and several hundred megabytes of hard disk space to download a local machine learning model that can process the document you passed in. [The model used can be customized](docs\u002Fguide_models.md), but the default one is a great mix of being fast, lean, and effective.\n\nIf you want to process documents quickly without using your own computational resources and don't mind paying or sharing data with external services, you can use [OpenAI's embedding model](docs\u002Fguide_openai.md).\n\n## Quick tour of the web app\n\nWhen you first navigate to the Semantra web interface, you will see a screen like this:\n\n![Semantra web interface](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffreedmand_semantra_readme_114b2a2d1695.png)\n\nType in something in the search box to start querying semantically. Hit \u003Ckbd>Enter\u003C\u002Fkbd> or click the search icon to execute the query.\n\nSearch results will appear in the left pane ordered by most relevant documents:\n\n![Semantra search results](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffreedmand_semantra_readme_0d0e2c5f5d20.png)\n\nThe yellow scores show relevance from 0-1.00. Anything in the 0.50 range indicates a strong match. Lighter brown highlights will stream in over the search results explaining the most relevant portions to your query.\n\nClicking on a search result's text will navigate to the relevant section of the associated document.\n\n![Highlighted search result in document](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffreedmand_semantra_readme_fc7f9918ca53.png)\n\nClicking on the plus\u002Fminus buttons associated with a search result will positively\u002Fnegatively tag those results. Re-running the query will cause these additional query parameters to go into effect.\n\n![Positively\u002Fnegatively tagging search results](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffreedmand_semantra_readme_0e2c38a616eb.png)\n\nFinally, text queries can be added and subtracted with plus\u002Fminus signs in the query text to sculpt a precise semantic meaning.\n\n![Adding and subtracting text queries](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffreedmand_semantra_readme_e7f22950c9e9.png)\n\nFor a more in-depth walkthrough of the web app, check out the [tutorial](docs\u002Ftutorial.md) or [the web app reference](docs\u002Fhelp.md).\n\n## Quick concepts\n\nUsing a semantic search engine is fundamentally different than an exact text matching algorithm.\n\nFor starters, there will _always_ be search results for a given query, no matter how irrelevant it is. The scores may be really low, but the results will never disappear entirely. This is because semantic searching with query arithmetic often reveals useful results amid very minor score differences. The results will always be sorted by relevance and only the top 10 results per document are shown so the lower scoring results are cut off automatically.\n\nAnother difference is that Semantra will not necessarily find exact text matches if you query something that directly appears in the document. At a high level, this is because words can mean different things in different contexts, e.g. the word \"leaves\" can refer to the leaves on trees or to someone _leaving_. The embedding models that Semantra uses convert all the text and queries you enter into long sequences of numbers that can be mathematically compared, and an exact substring match is not always significant in this sense. See [the embeddings concept doc](docs\u002Fconcept_embeddings.md) for more information on embeddings.\n\n## Command-line reference\n\n```sh\nsemantra [OPTIONS] [FILENAME(S)]...\n```\n\n## Options\n\n- `--model [openai|minilm|mpnet|sgpt|sgpt-1.3B]`: Preset model to use for embedding. See [the models guide](docs\u002Fguide_models.md) for more info (default: mpnet)\n- `--transformer-model TEXT`: Custom Huggingface transformers model name to use for embedding (only one of `--model` and `--transformer-model` should be specified). See [the models guide](docs\u002Fguide_models.md) for more info\n- `--windows TEXT`: Embedding windows to extract. A comma-separated list of the format \"size[\\_offset=0][_rewind=0]. A window with size 128, offset 0, and rewind of 16 (128_0_16) will embed the document in chunks of 128 tokens which partially overlap by 16. Only the first window is used for search. See the [windows concept doc](docs\u002Fconcept_windows.md) for more information (default: 128_0_16)\n- `--encoding`: Encoding to use for reading text files [default: utf-8]\n- `--no-server`: Do not start the UI server (only process)\n- `--port INTEGER`: Port to use for embedding server (default: 8080)\n- `--host TEXT`: Host to use for embedding server (default: 127.0.0.1)\n- `--pool-size INTEGER`: Max number of embedding tokens to pool together in requests\n- `--pool-count INTEGER`: Max number of embeddings to pool together in requests\n- `--doc-token-pre TEXT`: Token to prepend to each document in transformer models (default: None)\n- `--doc-token-post TEXT`: Token to append to each document in transformer models (default: None)\n- `--query-token-pre TEXT`: Token to prepend to each query in transformer models (default: None)\n- `--query-token-post TEXT`: Token to append to each query in transformer models (default: None)\n- `--num-results INTEGER`: Number of results (neighbors) to retrieve per file for queries (default: 10)\n- `--annoy`: Use approximate kNN via Annoy for queries (faster querying at a slight cost of accuracy); if false, use exact exhaustive kNN (default: True)\n- `--num-annoy-trees INTEGER`: Number of trees to use for approximate kNN via Annoy (default: 100)\n- `--svm`: Use SVM instead of any kind of kNN for queries (slower and only works on symmetric models)\n- `--svm-c FLOAT`: SVM regularization parameter; higher values penalize mispredictions more (default: 1.0)\n- `--explain-split-count INTEGER`: Number of splits on a given window to use for explaining a query (default: 9)\n- `--explain-split-divide INTEGER`: Factor to divide the window size by to get each split length for explaining a query (default: 6)\n- `--num-explain-highlights INTEGER`: Number of split results to highlight for explaining a query (default: 2)\n- `--force`: Force process even if cached\n- `--silent`: Do not print progress information\n- `--no-confirm`: Do not show cost and ask for confirmation before processing with OpenAI\n- `--version`: Print version and exit\n- `--list-models`: List preset models and exit\n- `--show-semantra-dir`: Print the directory semantra will use to store processed files and exit\n- `--semantra-dir PATH`: Directory to store semantra files in\n- `--help`: Show this message and exit\n\n## Frequently asked questions\n\n### Can it use ChatGPT?\n\nNo, and this is by design.\n\nSemantra does not use any generative models like ChatGPT. It is built only to query text semantically without any layers on top to attempt explaining, summarizing, or synthesizing results. Generative language models occasionally produce outwardly plausible but ultimately incorrect information, placing the burden of verification on the user. Semantra treats primary source material as the only source of truth and endeavors to show that a human-in-the-loop search experience on top of simpler embedding models is more serviceable to users.\n\n## Development\n\nThe Python app is in `src\u002Fsemantra\u002Fsemantra.py` and is managed as a standard Python command-line project with `pyproject.toml`.\n\nThe local web app is written in [Svelte](https:\u002F\u002Fsvelte.dev\u002F) and managed as a standard npm application.\n\nTo develop for the web app `cd` into `client` and then run `npm install`.\n\nTo build the web app, run `npm run build`. To build the web app in watch mode and rebuild when there's changes, run `npm run build:watch`.\n\n## Contributions\n\nThe app is still in early stages, but contributions are welcome. Please feel free to submit an issue for any bugs or feature requests.\n","# Semantra\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F306095\u002F233867821-601db8b0-19c6-4bae-8e93-720b324dc199.mov\n\nSemantra 是一款用于语义化文档搜索的多功能工具。它通过理解查询的含义来进行搜索，而不仅仅是基于文本匹配。\n\n该工具运行在命令行界面，能够分析您计算机上指定的文本和 PDF 文件，并启动一个本地的 Web 搜索应用，供您交互式地查询这些文件。Semantra 的目标是让运行一个专门的语义搜索引擎变得简单、友好、可配置，并且私密安全。\n\nSemantra 专为那些需要在海量信息中寻找关键线索的人设计——例如：记者在截止日期前梳理泄露文件、研究人员在论文中寻找洞见、学生通过主题查询来深入阅读文献、历史学家跨书籍关联事件等。\n\n## 资源\n\n- [教程](docs\u002Ftutorial.md)：轻松入门 Semantra 的指南——从安装工具到实际操作示例，帮助您快速上手。\n- [指南](docs\u002Fguides.md)：关于如何更高效使用 Semantra 的实用指导。\n- [概念](docs\u002Fconcepts.md)：解释 Semantra 工作原理的一些核心概念。\n- [使用 Web 界面](docs\u002Fhelp.md)：Semantra Web 应用的使用参考。\n\n本页面提供了 Semantra 的概览及功能参考。此外，还提供其他语言版本：[Semantra en español](docs\u002FREADME_es.md)、[Semantra 中文说明](docs\u002FREADME_zh-CN.md)。\n\n## 安装\n\n请确保已安装 [Python >= 3.9](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F)。\n\n安装 Semantra 最简便的方式是使用 `pipx`。如果您尚未安装 `pipx`，请运行以下命令：\n\n```sh\npython3 -m pip install --user pipx\n```\n\n或者，如果您已安装 [Homebrew](https:\u002F\u002Fbrew.sh\u002F)，可以直接运行：\n\n```sh\nbrew install pipx\n```\n\n安装完 `pipx` 后，请执行：\n\n```sh\npython3 -m pipx ensurepath\n```\n\n打开一个新的终端窗口，以使 `pipx` 设置的新路径生效。然后运行：\n\n```sh\npython3 -m pipx install semantra\n```\n\n这将把 Semantra 安装到您的系统路径中。您应该能够在终端中运行 `semantra` 并看到输出。\n\n注意：如果上述步骤无法正常工作，或者您希望进行更精细的安装，也可以在虚拟环境中安装 Semantra（但请注意，只有在激活虚拟环境时才能使用）：\n\n```sh\npython3 -m venv venv\nsource venv\u002Fbin\u002Factivate\npip install semantra\n```\n\n## 使用\n\nSemantra 处理的是存储在您本地计算机上的文档集合——可以是文本文件或 PDF 文件。\n\n最简单的用法是针对单个文档运行 Semantra：\n\n```sh\nsemantra doc.pdf\n```\n\n您也可以同时处理多个文档：\n\n```sh\nsemantra report.pdf book.txt\n```\n\nSemantra 需要一些时间来处理输入的文档。这一过程仅需对每个文档执行一次，后续再次运行时几乎无需额外时间。\n\n处理完成后，Semantra 将启动一个本地 Web 服务器，默认地址为 [localhost:8080](http:\u002F\u002Flocalhost:8080)。在该网页上，您可以交互式地对传入的文档进行语义查询。\n\n**快速提示：**\n\n首次运行 Semantra 时，可能需要下载一个本地机器学习模型，这通常会耗费几分钟时间以及数百 MB 的硬盘空间。该模型可用于处理您提供的文档。您可以[自定义使用的模型](docs\u002Fguide_models.md)，但默认模型在速度、轻量级和效果之间取得了很好的平衡。\n\n如果您希望快速处理文档，而不占用本地计算资源，并且不介意付费或将数据共享给外部服务，可以使用 [OpenAI 的嵌入模型](docs\u002Fguide_openai.md)。\n\n## Web 应用快速导览\n\n首次进入 Semantra 的 Web 界面时，您将看到如下界面：\n\n![Semantra Web 界面](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffreedmand_semantra_readme_114b2a2d1695.png)\n\n在搜索框中输入内容即可开始语义查询。按下 \u003Ckbd>Enter\u003C\u002Fkbd> 键或点击搜索图标以执行查询。\n\n搜索结果将按相关性顺序显示在左侧窗格中：\n\n![Semantra 搜索结果](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffreedmand_semantra_readme_0d0e2c5f5d20.png)\n\n黄色评分表示相关性，范围为 0 到 1.00。分数在 0.50 左右表示高度匹配。较浅的棕色高亮区域会覆盖在搜索结果上，突出显示与您的查询最相关的部分。\n\n点击某个搜索结果的文本，即可跳转到对应文档中的相关内容：\n\n![文档中的高亮搜索结果](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffreedmand_semantra_readme_fc7f9918ca53.png)\n\n点击搜索结果旁边的加号\u002F减号按钮，可以对该结果进行正向或负向标记。重新运行查询时，这些附加的查询参数将会生效：\n\n![对搜索结果进行正向\u002F负向标记](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffreedmand_semantra_readme_0e2c38a616eb.png)\n\n最后，您还可以在查询文本中使用加号\u002F减号来添加或移除关键词，从而精确塑造语义含义：\n\n![添加和删除文本查询](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffreedmand_semantra_readme_e7f22950c9e9.png)\n\n如需深入了解 Web 应用的操作流程，请参阅 [教程](docs\u002Ftutorial.md) 或 [Web 应用参考](docs\u002Fhelp.md)。\n\n## 核心概念\n\n使用语义搜索引擎与传统的精确文本匹配算法有着本质的不同。\n\n首先，无论查询多么无关紧要，总会返回搜索结果。尽管得分可能很低，但结果永远不会完全消失。这是因为语义搜索结合查询算术，往往能在细微的分数差异中发现有用的信息。搜索结果始终按相关性排序，每份文档仅显示前 10 条结果，因此低分结果会被自动过滤掉。\n\n另一个区别在于，即使您查询的内容直接出现在文档中，Semantra 也不一定会找到精确的文本匹配。从根本上讲，这是因为同一个词语在不同上下文中可能具有不同的含义，例如“leaves”既可以指树叶，也可以表示“离开”。Semantra 所使用的嵌入模型会将您输入的所有文本和查询转换为一串数字序列，以便进行数学比较，而在这种情况下，精确的子字符串匹配并不总是重要的。更多关于嵌入技术的信息，请参阅 [嵌入概念文档](docs\u002Fconcept_embeddings.md)。\n\n## 命令行参考\n\n```sh\nsemantra [OPTIONS] [FILENAME(S)]...\n```\n\n## 选项\n\n- `--model [openai|minilm|mpnet|sgpt|sgpt-1.3B]`: 预设用于嵌入的模型。更多信息请参阅[模型指南](docs\u002Fguide_models.md)（默认：mpnet）\n- `--transformer-model TEXT`: 自定义 Hugging Face 转换器模型名称，用于嵌入（`--model` 和 `--transformer-model` 只能指定其中一个）。更多信息请参阅[模型指南](docs\u002Fguide_models.md)\n- `--windows TEXT`: 要提取的嵌入窗口。以逗号分隔的列表格式为“大小[\\_偏移量=0][_回溯=0]”。例如，窗口大小为 128、偏移量为 0、回溯为 16（128_0_16）会将文档以 128 个标记的块进行嵌入，这些块之间有 16 个标记的重叠部分。搜索时仅使用第一个窗口。更多信息请参阅[窗口概念文档](docs\u002Fconcept_windows.md)（默认：128_0_16）\n- `--encoding`: 用于读取文本文件的编码方式 [默认：utf-8]\n- `--no-server`: 不启动 UI 服务器（仅执行处理）\n- `--port INTEGER`: 嵌入服务器使用的端口（默认：8080）\n- `--host TEXT`: 嵌入服务器使用的主机地址（默认：127.0.0.1）\n- `--pool-size INTEGER`: 请求中最多可聚合的嵌入标记数\n- `--pool-count INTEGER`: 请求中最多可聚合的嵌入数量\n- `--doc-token-pre TEXT`: 在转换器模型中每个文档前添加的标记（默认：无）\n- `--doc-token-post TEXT`: 在转换器模型中每个文档后添加的标记（默认：无）\n- `--query-token-pre TEXT`: 在转换器模型中每个查询前添加的标记（默认：无）\n- `--query-token-post TEXT`: 在转换器模型中每个查询后添加的标记（默认：无）\n- `--num-results INTEGER`: 每个文件的查询结果（邻居）数量（默认：10）\n- `--annoy`: 使用 Annoy 进行近似 kNN 查询（查询速度更快，但准确性略有降低）；若设置为 false，则使用精确的穷举 kNN（默认：True）\n- `--num-annoy-trees INTEGER`: 使用 Annoy 进行近似 kNN 时的树的数量（默认：100）\n- `--svm`: 使用 SVM 而不是任何类型的 kNN 进行查询（速度较慢，且仅适用于对称模型）\n- `--svm-c FLOAT`: SVM 正则化参数；值越高，对错误预测的惩罚越大（默认：1.0）\n- `--explain-split-count INTEGER`: 用于解释查询时，给定窗口上要进行的分割次数（默认：9）\n- `--explain-split-divide INTEGER`: 将窗口大小除以该因子，以得到每次分割的长度，用于解释查询（默认：6）\n- `--num-explain-highlights INTEGER`: 解释查询时要高亮显示的分割结果数量（默认：2）\n- `--force`: 即使已缓存也强制执行\n- `--silent`: 不打印进度信息\n- `--no-confirm`: 处理 OpenAI 数据时，不显示费用并请求确认\n- `--version`: 打印版本并退出\n- `--list-models`: 列出预设模型并退出\n- `--show-semantra-dir`: 打印 Semantra 将用于存储处理文件的目录并退出\n- `--semantra-dir PATH`: 存储 Semantra 文件的目录\n- `--help`: 显示此消息并退出\n\n## 常见问题\n\n### 它可以使用 ChatGPT 吗？\n\n不可以，这也是设计上的考量。\n\nSemantra 不使用任何生成式模型，如 ChatGPT。它仅用于语义查询文本，不会在基础上添加任何层来尝试解释、总结或合成结果。生成式语言模型有时会产生表面上看似合理但实际上不准确的信息，这会将验证的责任转嫁给用户。Semantra 将原始资料视为唯一的真理来源，并致力于证明，在更简单的嵌入模型之上结合人工参与的搜索体验，对用户来说更为实用。\n\n## 开发\n\nPython 应用程序位于 `src\u002Fsemantra\u002Fsemantra.py`，并作为标准的 Python 命令行项目，通过 `pyproject.toml` 进行管理。\n\n本地 Web 应用程序使用 [Svelte](https:\u002F\u002Fsvelte.dev\u002F) 编写，并作为标准的 npm 应用程序进行管理。\n\n要开发 Web 应用程序，请进入 `client` 目录并运行 `npm install`。\n\n要构建 Web 应用程序，运行 `npm run build`。要在监听模式下构建 Web 应用程序并在发生更改时自动重新构建，运行 `npm run build:watch`。\n\n## 贡献\n\n该应用程序目前仍处于早期阶段，但我们欢迎贡献。如有任何 bug 或功能请求，请随时提交 issue。","# Semantra 快速上手指南\n\nSemantra 是一款命令行工具，用于对本地文档（文本或 PDF）进行**语义搜索**。它不依赖简单的关键词匹配，而是通过理解含义来检索内容，非常适合需要从大量资料中快速定位信息的开发者、研究人员和记者。\n\n## 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**：Linux, macOS 或 Windows (需配置 Python 环境)\n*   **Python 版本**：>= 3.9\n*   **包管理工具**：推荐安装 `pipx` 以隔离应用环境（可选，但推荐）\n\n> **国内加速建议**：\n> 如果直接安装较慢，建议配置国内 PyPI 镜像源（如清华源或阿里源）。\n> ```bash\n> pip config set global.index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 安装步骤\n\n### 方法一：使用 pipx 安装（推荐）\n这种方式会将 Semantra 安装在独立的虚拟环境中，避免污染全局 Python 环境。\n\n1.  **安装 pipx** (如果尚未安装)：\n    ```sh\n    python3 -m pip install --user pipx\n    python3 -m pipx ensurepath\n    ```\n    *注：执行完 `ensurepath` 后，可能需要重启终端或新开一个终端窗口使路径生效。*\n\n2.  **安装 Semantra**：\n    ```sh\n    python3 -m pipx install semantra\n    ```\n\n### 方法二：使用虚拟环境安装\n如果您更喜欢手动管理虚拟环境：\n\n```sh\npython3 -m venv venv\nsource venv\u002Fbin\u002Factivate  # Windows 用户请使用: venv\\Scripts\\activate\npip install semantra\n```\n\n安装完成后，在终端输入 `semantra --help` 验证是否安装成功。\n\n## 基本使用\n\nSemantra 的工作流程是：分析本地文件 -> 启动本地 Web 服务 -> 在浏览器中进行交互式语义搜索。\n\n### 1. 搜索单个文档\n对单个 PDF 或文本文件进行语义分析并启动搜索界面：\n\n```sh\nsemantra doc.pdf\n```\n\n### 2. 搜索多个文档\n同时分析多个文件（支持混合格式）：\n\n```sh\nsemantra report.pdf book.txt notes.md\n```\n\n### 3. 开始搜索\n*   **等待处理**：首次运行时，Semantra 会自动下载默认的机器学习模型（约几百 MB），并对文档进行向量化处理。此过程仅需执行一次，后续搜索将是瞬时的。\n*   **访问界面**：处理完成后，工具会自动在浏览器打开 `http:\u002F\u002Flocalhost:8080`（或手动访问该地址）。\n*   **执行查询**：\n    *   在搜索框输入自然语言问题（例如：“关于气候变化的主要观点”）。\n    *   系统会按相关性排序显示结果，并用高亮标出文档中最相关的片段。\n    *   点击结果可直接跳转到原文对应位置。\n\n### 高级技巧提示\n*   **正负反馈**：在搜索结果旁点击 `+` 或 `-` 按钮，可以告诉系统哪些结果是相关或不相关的，从而优化后续搜索。\n*   **语义运算**：在查询词前加 `+` 或 `-` 可以微调搜索意图（例如：`+经济 -战争` 表示查找与经济相关但与战争无关的内容）。\n\n---\n*更多详细配置（如更换 OpenAI 模型、调整分词窗口等）请参考官方文档中的 Guides 部分。*","一位调查记者需要在截稿前从数百页泄露的 PDF 报告和文本档案中，快速找出所有关于“资金非法转移”的线索，而不仅仅是包含特定关键词的段落。\n\n### 没有 semantra 时\n- **关键词匹配局限**：只能搜索\"transfer\"或\"money\"等确切词汇，若文档使用“拨款”、“汇款”或隐喻表达，极易漏掉关键信息。\n- **人工阅读效率低**：面对海量文档，必须逐页打开 PDF 肉眼筛选，耗时数小时甚至数天，难以应对紧急截稿压力。\n- **上下文断裂**：即使找到关键词，也往往需要手动翻阅前后文才能理解语境，难以快速判断该段落是否真正相关。\n- **隐私风险高**：若使用在线搜索工具处理敏感泄露文件，存在数据上传至第三方服务器的安全隐患。\n\n### 使用 semantra 后\n- **语义理解精准**：semantra 能理解“资金非法转移”的含义，即使文档中写的是“隐蔽的资本流动”或未直接提及金钱，也能准确召回相关段落。\n- **秒级响应检索**：只需在命令行运行一次索引，随后通过本地网页界面输入自然语言问题，即可在毫秒间遍历所有文档并返回结果。\n- **智能片段定位**：搜索结果直接高亮显示最相关的语义片段及其出处，记者可立即确认上下文，无需反复翻页验证。\n- **本地私有部署**：所有数据处理和模型推理均在本地完成，确保敏感的调查资料绝不离开记者的电脑，保障信息安全。\n\nsemantra 将原本需要数天的地毯式阅读工作压缩为几分钟的语义对话，让创作者真正从海量文本的“大海捞针”中解放出来。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffreedmand_semantra_114b2a2d.png","freedmand","Dylan Freedman","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ffreedmand_0deda0e8.jpg","A.I. Initiatives at The New York Times","The New York Times","Washington, D.C.","freedmand@gmail.com","dylfreed","https:\u002F\u002Fdylanfreedman.com","https:\u002F\u002Fgithub.com\u002Ffreedmand",[86,90,94,98,102,106],{"name":87,"color":88,"percentage":89},"Python","#3572A5",46.5,{"name":91,"color":92,"percentage":93},"Svelte","#ff3e00",38.5,{"name":95,"color":96,"percentage":97},"TypeScript","#3178c6",11.2,{"name":99,"color":100,"percentage":101},"JavaScript","#f1e05a",2.5,{"name":103,"color":104,"percentage":105},"CSS","#663399",0.8,{"name":107,"color":108,"percentage":109},"HTML","#e34c26",0.4,2706,158,"2026-04-02T05:27:43","MIT","Linux, macOS, Windows","未说明（默认使用本地 CPU 运行机器学习模型，也可配置使用 OpenAI 外部服务）","未说明（首次运行需下载数百 MB 的模型文件）",{"notes":118,"python":119,"dependencies":120},"推荐使用 pipx 进行安装以实现全局可用；首次运行时会自动下载默认的本地机器学习模型（如 mpnet），耗时数分钟并占用数百 MB 磁盘空间；支持自定义 Huggingface 模型或使用 OpenAI 嵌入模型（需联网及付费）；工具启动后会在本地 localhost:8080 运行 Web 服务；不支持生成式模型（如 ChatGPT），仅用于语义搜索。","3.9+",[121,122,123,124],"pipx","Huggingface transformers (隐含)","Annoy (隐含)","Svelte (前端)",[61,13],[127,128,129],"cli","machine-learning","semantic-search",null,"2026-03-27T02:49:30.150509","2026-04-06T12:04:30.971012",[134,139,144,149,153,158],{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},18803,"在 Windows 上安装 semantra 时遇到 'Microsoft Visual C++ 14.0 or greater is required' 错误怎么办？","该错误通常是因为缺少编译依赖。解决方法如下：\n1. 安装 Microsoft C++ Build Tools（下载地址：https:\u002F\u002Fvisualstudio.microsoft.com\u002Fvisual-cpp-build-tools\u002F）。\n2. 如果已安装但仍报错，或者在 WSL (Linux 子系统) 中遇到 'gcc' 或 'g++' 缺失错误，请运行以下命令安装编译器：\n   sudo apt-get install gcc g++\n3. 重新运行安装命令：pipx install semantra","https:\u002F\u002Fgithub.com\u002Ffreedmand\u002Fsemantra\u002Fissues\u002F1",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},18804,"如何使用 Hugging Face Hub 上的自定义模型？命令中的 [files] 参数应该填什么？","使用自定义模型的命令格式为：semantra --transformer-model \u003C模型名称> \u003C文件路径>。\n其中 [files] 需要替换为您本地要搜索的文件路径，支持单个文件、多个文件或通配符。\n示例：\n- 搜索单个 PDF：semantra --transformer-model intfloat\u002Fmultilingual-e5-base \u002Fusers\u002Fname\u002Fdesktop\u002Fdoc.pdf\n- 搜索目录下所有 TXT 文件：semantra --transformer-model intfloat\u002Fmultilingual-e5-base \u002Fusers\u002Fname\u002Fdocuments\u002F*.txt\n注意：--transformer-model 仅用于指定模型，必须在其后提供实际的文件路径才能运行工具。","https:\u002F\u002Fgithub.com\u002Ffreedmand\u002Fsemantra\u002Fissues\u002F44",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},18805,"使用某些模型（如 all-mpnet-base-v2）时出现 404 Repository Not Found 错误如何解决？","这通常是由于 Hugging Face 服务暂时不可用或模型仓库名称变更导致的，并非 Semantra 本身的故障。\n解决方案：\n1. 等待 Hugging Face 或 GitHub 服务恢复后重试。\n2. 使用替代模型镜像。例如，将默认模型替换为镜像版本，可以通过命令行参数直接指定，无需修改代码：\n   semantra --transformer-model obrizum\u002Fall-MiniLM-L6-v2 \u003C文件路径>\n3. 如果必须修改代码，可在 src\u002Fmodels.py 中将模型名称 \"sentence-transformers\u002Fall-MiniLM-L6-v2\" 替换为 \"obrizum\u002Fall-MiniLM-L6-v2\"。","https:\u002F\u002Fgithub.com\u002Ffreedmand\u002Fsemantra\u002Fissues\u002F31",{"id":150,"question_zh":151,"answer_zh":152,"source_url":143},18806,"Semantra 支持哪些文件格式？如何指定多个文件进行搜索？","Semantra 支持文本文件 (.txt) 和 PDF 文件 (.pdf)。\n您可以通过以下方式指定文件：\n1. 单个文件：semantra doc.pdf\n2. 多个不同路径的文件：semantra \u002Fpath1\u002Ffile.pdf \u002Fpath2\u002Ffile.txt\n3. 使用通配符搜索整个文件夹：semantra \u002Fpath\u002Fto\u002Ffolder\u002F*.pdf\n文件路径可以是 macOS\u002FLinux 风格（如 \u002Fusers\u002Fname\u002Fdoc.pdf）或 Windows 风格。",{"id":154,"question_zh":155,"answer_zh":156,"source_url":157},18807,"如何增强对中文语义检索的支持？","可以通过指定支持多语言的 Transformer 模型来实现更好的中文支持。\n推荐使用以下命令运行：\nsemantra --transformer-model intfloat\u002Fmultilingual-e5-base \u003C文件路径>\n该模型（multilingual-e5-base）专门针对多语言语义检索进行了优化，能显著提升中文内容的检索效果。","https:\u002F\u002Fgithub.com\u002Ffreedmand\u002Fsemantra\u002Fissues\u002F30",{"id":159,"question_zh":160,"answer_zh":161,"source_url":143},18808,"在哪里可以找到关于模型配置和高级用法的详细指南？","关于如何使用自定义模型及详细配置指南，请参考项目文档：\nhttps:\u002F\u002Fgithub.com\u002Ffreedmand\u002Fsemantra\u002Fblob\u002Fmain\u002Fdocs\u002Fguide_models.md#using-custom-models\n该文档详细介绍了 --transformer-model 参数的用法以及推荐的模型设置。",[]]