[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-elfvingralf--macOSpilot-ai-assistant":3,"tool-elfvingralf--macOSpilot-ai-assistant":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":81,"owner_email":80,"owner_twitter":82,"owner_website":80,"owner_url":83,"languages":84,"stars":93,"forks":94,"last_commit_at":95,"license":80,"difficulty_score":96,"env_os":97,"env_gpu":98,"env_ram":98,"env_deps":99,"category_tags":105,"github_topics":80,"view_count":23,"oss_zip_url":80,"oss_zip_packed_at":80,"status":16,"created_at":106,"updated_at":107,"faqs":108,"releases":109},2117,"elfvingralf\u002FmacOSpilot-ai-assistant","macOSpilot-ai-assistant","Voice + Vision powered AI assistant that answers questions about any application, in context and in audio.","macOSpilot 是一款专为 macOS 打造的智能语音视觉助手，旨在让你在不切换窗口的情况下，即时获取当前应用中任何内容的解答。无论是面对复杂的代码编辑器、设计软件还是文档工具，只需按下快捷键，对着麦克风提问或直接输入文字，它就能结合屏幕截图上下文，迅速以文字和语音形式反馈答案。\n\n这款工具主要解决了用户在多任务处理时频繁切换窗口、打断工作流的痛点。通过“所见即所问”的交互模式，它将人工智能的视觉理解能力无缝融入操作系统，让用户能专注于手头任务，无需分心查找资料或复制粘贴内容。\n\nmacOSpilot 非常适合希望提升工作效率的开发者、设计师、研究人员以及各类 macOS 重度用户。对于需要频繁查阅文档、调试代码或分析界面信息的专业人士而言，它就像一位随时待命的贴身专家。\n\n其核心技术亮点在于融合了 OpenAI 的多项前沿能力：利用 GPT-4 Vision 模型“看懂”当前屏幕内容，通过 Whisper API 将语音精准转为文字，再借助 TTS 技术将回答转化为自然流畅的语音播报。整个流程基于 Electron 构建，配置灵活，让本地操作与云端智能完美结合，为用户带来高效、","macOSpilot 是一款专为 macOS 打造的智能语音视觉助手，旨在让你在不切换窗口的情况下，即时获取当前应用中任何内容的解答。无论是面对复杂的代码编辑器、设计软件还是文档工具，只需按下快捷键，对着麦克风提问或直接输入文字，它就能结合屏幕截图上下文，迅速以文字和语音形式反馈答案。\n\n这款工具主要解决了用户在多任务处理时频繁切换窗口、打断工作流的痛点。通过“所见即所问”的交互模式，它将人工智能的视觉理解能力无缝融入操作系统，让用户能专注于手头任务，无需分心查找资料或复制粘贴内容。\n\nmacOSpilot 非常适合希望提升工作效率的开发者、设计师、研究人员以及各类 macOS 重度用户。对于需要频繁查阅文档、调试代码或分析界面信息的专业人士而言，它就像一位随时待命的贴身专家。\n\n其核心技术亮点在于融合了 OpenAI 的多项前沿能力：利用 GPT-4 Vision 模型“看懂”当前屏幕内容，通过 Whisper API 将语音精准转为文字，再借助 TTS 技术将回答转化为自然流畅的语音播报。整个流程基于 Electron 构建，配置灵活，让本地操作与云端智能完美结合，为用户带来高效、自然的沉浸式辅助体验。","# macOSpilot: your personal macOS AI assistant\n\nmacOSpilot answers your questions about anything, in any application. No need to reach for another window. Simply use a keyboard shortcut to trigger the assistant, speak or type your question, and it will give the answer in context and in audio within seconds. Behind the scenes macOSpilot takes a screenshot of your active window when triggered, and sends it to OpenAI GPT Vision along with a transcript of your question. It's answer will be displayed in text, and converted into audio using OpenAI TTS (text to speech).\n\nhttps:\u002F\u002Fgithub.com\u002Felfvingralf\u002FmacOSpilot-ai-assistant\u002Fassets\u002F94417497\u002F5a9e9288-0479-4def-9a87-451dddd783af\n\n- **Works with any application in macOS:** macOSpilot is application agnostic, and simply takes a screenshot of the currently active window when you trigger the assistant.\n- **Trigger with keyboard shortcut, speak your question:** No need to juggle windows, just press the keyboard shortcut and speak your question. If you prefer to type it, that's possible too.\n- **Answers in-context and in audio:** The answer to your question is provided in an small window overlayed on top of your active window, and in audio (using text-to-speech).\n\n## How it works\n\n1. macOSpilot runs NodeJS\u002FElectron. Simply install the NodeJS project and dependencies (see below) and make the necessary configurations in `index.js`. Then chose to run `yarn start` from the terminal, or package it with Electron with the instructions below, add your OpenAI API key and let the application run in the background.\n2. When you need to use macOSpilot, press the keyboard shortcut you've configured (default is Command+Shift+'). macOSpilot will take a screenshot of your currently active macOS application window and activate the microphone.\n3. Speak your question into your microphone and then press the same keyboard shortcut to end the microphone recording. If you've enabled text input, you'll get to type your question and press enter instead of speaking.\n4. macOSpilot will send your question to OpenAI's Whisper API, and the transcription will be sent to OpenAI's Vision API along with the screenshot.\n5. The Vision API response will be displayed in a small notification window on top of your active macOS application window, and read outloud once it's been processed by OpenAI's TTS (text to speech) API.\n6. A simple history of answers to your questions in the current session is available in another window that you can hide\u002Fminimize.\n\nThe most recent screenshot, audio recording, and TTS response will be stored on your machine in part for debugging purposes. The same filename is used every time so they will be overwritten, but are not automatically deleted when you close or delete the application.\n\n## Getting Started\n\n### Video walk-through\n\nPrefer a video? Head on over to YouTube to watch the walk through of how to get started, how the application works, and a brief explanation of how it works under the hood.\n\n[![YouTube walk-through and tutorial](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Felfvingralf_macOSpilot-ai-assistant_readme_ff334908213e.png)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=1IdCWqTZLyA)\n\n### Install\n\nMake sure you have NodeJS installed on your machine. Then clone the repo and follow the steps below.\n\n```bash\n\ngit  clone  https:\u002F\u002Fgithub.com\u002Felfvingralf\u002FmacOSpilot-ai-assistant.git\n\n```\n\nNavigate to the folder and run `yarn install` or `npm install` in your folder. This should install all dependencies.\n\nRun `yarn start` or `npm start`. Because the application needs access to read your screen, microphone, read\u002Fwrite files etc, you will need to go through the steps of granting it access and possibly restarting your terminal.\n\n### Configurations\n\nMake sure to add your OpenAI API key by clicking the settings icon in the top right-hand corner of the main window. (it's not stored encrypted!)\n\nIf you want to change the default values here's a few things that might be worth changing, all in `index.js`:\n\n- **Keyboard shortcut:** The default keyboard shortcut `keyboardShortcut` is set to \"CommandOrControl+Shift+'\" (because it seemed like it was rarely used by other applications)\n\n- **OpenAI Vision prompt:** The OpenAI Vision API system prompt in `conversationHistory`, currently just set to \"You are helping users with questions about their macOS applications based on screenshots, always answer in at most one sentence.\"\n\n- **VisionAPI image size:** Image resize params to save some money, I left an example of how in `callVisionAPI()` (I found that I had much poorer results when using it)\n\n- **Application window sizes and settings:** The size of the main window: `mainWindowWidth` and `mainWindowHeight`. The size of the notification window, which always remains on top: `notificationWidth` and `notificationHeight`.\n\n- **More notification window settings:** The level of opacity of the notification window: `notificationOpacity`. Where the notification window moves to on activation, relative to the active window: inside `positionNotificationAtTopRight()` (terrible naming, I know)\n\n### Turn it into an .app with Electron\n\nWant to create an .app executable instead of running this from your terminal?\n\nFirst go to `index.js` and change `const useElectronPackager` from `false` to `true`.\n\nRun one of these in your terminal, depending on which platform you're on.\n\n```bash\nnpm  run  package-mac\nnpm  run  package-win\nnpm  run  package-linux\n```\n\nNote I have only tested this on Mac (Apple silicon and Intel).\n\nGo to `\u002Frelease-builds\u002F` in your project folder, and chose the folder of your platform. In there is an executable, `.app` if you're on Mac. Double-click it to open the app, note that it may take a few seconds the first time so be patient.\n\nOnce the app is opened, trigger your keyboard shortcut. You'll be asked to grant Privacy & Security permissions. You may need to repeat this another one or two times for all permissions to work properly, and to restart the app.\n\n## Improvements:\n\nSome improvements I'd like to make, in no particular order:\n\n- Enable optional conversation state inbetween sessions (open\u002Fclose application)\n- Use buffers instead of writing\u002Freading screenshot and audio files to disk\n- Make assistant audio configurable in UI (e.g. speed, make playback optional)\n- Make always-on-top window configurable in UI (e.g. toggle sticky position, enable\u002Fdisable)\n- Make screenshot settings configurable in UI (e.g. select area, entire screen)\n- ~Fix microphone issue not working as .app~ Fixed thanks to [@claar](https:\u002F\u002Fwww.github.com\u002Fclaar).\n- ~Enable text-based input instead of voice~\n\n## About \u002F contact\n\nI'm a self-taught and really like scrapping together fun projects. I write functional code that probably isn't beautiful nor efficient, and share it with the hope that someone else might find it useful.\n\nYou can find me as [@ralfelfving](https:\u002F\u002Ftwitter.com\u002Fralfelfving) on Twitter\u002FX. If you liked this project, consider checking my tutorials on my YouTube channel [@ralfelfving](https:\u002F\u002Fwww.youtube.com\u002F@ralfelfving).\n","# macOSpilot：您的个人 macOS AI 助手\n\nmacOSpilot 可以在任何应用程序中回答您提出的任何问题。无需切换到其他窗口。只需使用键盘快捷键触发助手，说出或输入您的问题，它就会在几秒钟内以情境化的方式并以语音形式给出答案。在后台，macOSpilot 会在被触发时截取当前活动窗口的屏幕截图，并将其连同您问题的转录文本一起发送给 OpenAI GPT Vision。随后，答案将以文本形式显示，并通过 OpenAI TTS（文本转语音）技术转换为语音。\n\nhttps:\u002F\u002Fgithub.com\u002Felfvingralf\u002FmacOSpilot-ai-assistant\u002Fassets\u002F94417497\u002F5a9e9288-0479-4def-9a87-451dddd783af\n\n- **适用于 macOS 中的任何应用程序：** macOSpilot 不依赖于特定的应用程序，当您触发助手时，它只会截取当前活动窗口的屏幕截图。\n- **通过键盘快捷键触发，直接提问：** 无需在多个窗口之间切换，只需按下快捷键并说出您的问题即可。如果您更喜欢打字，也可以直接输入。\n- **情境化且语音化的回答：** 您的问题答案会以一个小窗口的形式叠加在您当前活动的窗口之上显示，并通过文本转语音功能以语音形式播报。\n\n## 工作原理\n\n1. macOSpilot 基于 NodeJS 和 Electron 构建。只需安装 NodeJS 项目及其依赖项（见下文），并在 `index.js` 中进行必要的配置。然后您可以选择在终端中运行 `yarn start`，或者按照以下说明使用 Electron 打包应用，添加您的 OpenAI API 密钥，并让应用程序在后台运行。\n2. 当您需要使用 macOSpilot 时，按下您已配置的键盘快捷键（默认为 Command+Shift+'）。macOSpilot 将截取您当前 macOS 应用程序窗口的屏幕截图，并激活麦克风。\n3. 向麦克风说出您的问题，然后再次按下相同的键盘快捷键以结束录音。如果您启用了文本输入功能，则可以直接输入问题并按 Enter 键，而无需说话。\n4. macOSpilot 会将您的问题发送至 OpenAI 的 Whisper API 进行转录，并将转录结果与屏幕截图一同发送至 OpenAI 的 Vision API。\n5. Vision API 返回的答案将在您当前 macOS 应用程序窗口上方显示在一个小通知窗口中，并由 OpenAI 的 TTS（文本转语音）API 处理后朗读出来。\n6. 当前会话中您提出的所有问题及其答案的历史记录都会保存在另一个窗口中，您可以将其隐藏或最小化。\n\n最近一次的屏幕截图、音频录音和 TTS 回答会被存储在您的设备上，部分用于调试目的。每次都会使用相同的文件名覆盖这些文件，但关闭或删除应用程序并不会自动删除它们。\n\n## 快速入门\n\n### 视频教程\n\n如果您更喜欢视频，请前往 YouTube 观看关于如何开始使用、应用程序的工作原理以及其底层机制的简要说明。\n\n[![YouTube 教程](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Felfvingralf_macOSpilot-ai-assistant_readme_ff334908213e.png)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=1IdCWqTZLyA)\n\n### 安装\n\n请确保您的设备上已安装 NodeJS。然后克隆仓库并按照以下步骤操作。\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Felfvingralf\u002FmacOSpilot-ai-assistant.git\n```\n\n进入该文件夹，在终端中运行 `yarn install` 或 `npm install`。这将安装所有依赖项。\n接着运行 `yarn start` 或 `npm start`。由于应用程序需要访问您的屏幕、麦克风、读写文件等权限，您可能需要授予这些权限，并重新启动终端。\n\n### 配置\n\n请务必在主窗口右上角的设置图标中添加您的 OpenAI API 密钥。（请注意，该密钥并未加密存储！）\n\n如果您想更改默认值，可以考虑修改以下内容，所有更改均需在 `index.js` 中进行：\n\n- **键盘快捷键：** 默认的键盘快捷键 `keyboardShortcut` 设置为 “CommandOrControl+Shift+'”（因为这个组合很少被其他应用程序使用）。\n- **OpenAI Vision 提示词：** `conversationHistory` 中的 OpenAI Vision API 系统提示词目前仅设置为 “您正在根据屏幕截图帮助用户解答关于 macOS 应用程序的问题，回答应尽量控制在一句话以内。”\n- **VisionAPI 图像尺寸：** 为了节省成本，我在 `callVisionAPI()` 中提供了一个调整图像大小的示例（我发现使用该功能后效果反而变差）。\n- **应用程序窗口尺寸及设置：** 主窗口的宽度和高度：`mainWindowWidth` 和 `mainWindowHeight`。始终置顶的通知窗口的宽度和高度：`notificationWidth` 和 `notificationHeight`。\n- **更多通知窗口设置：** 通知窗口的透明度：`notificationOpacity`。通知窗口在激活时相对于当前窗口的位置：位于 `positionNotificationAtTopRight()` 函数中（命名确实不太理想，我知道）。\n\n### 使用 Electron 打包成 .app 文件\n\n想要创建一个可执行的 .app 文件，而不是从终端运行吗？\n\n首先打开 `index.js`，将 `const useElectronPackager` 由 `false` 改为 `true`。\n\n根据您使用的平台，在终端中运行以下命令之一：\n\n```bash\nnpm run package-mac\nnpm run package-win\nnpm run package-linux\n```\n\n请注意，我目前只在 Mac（Apple Silicon 和 Intel 芯片）上进行了测试。\n进入项目文件夹中的 `\u002Frelease-builds\u002F`，选择对应您平台的文件夹。其中包含一个可执行文件，如果是 Mac 用户，则是一个 `.app` 文件。双击即可打开应用，首次启动可能需要几秒钟，请耐心等待。\n应用打开后，按下您设置的键盘快捷键。系统会提示您授予隐私和安全权限。您可能需要重复一两次此操作，以确保所有权限正常生效，并重启应用。\n\n## 改进建议：\n\n以下是一些我希望改进的功能，不分先后顺序：\n\n- 实现会话之间的对话状态保存功能（即打开\u002F关闭应用程序时仍能保持上下文）。\n- 使用缓冲区代替将屏幕截图和音频文件写入\u002F读取磁盘的操作。\n- 在 UI 中增加对助手语音的自定义选项（例如语速调节、是否启用播放等功能）。\n- 在 UI 中增加对始终置顶窗口的自定义选项（例如切换固定位置、启用\u002F禁用功能）。\n- 在 UI 中增加对截图区域的自定义选项（例如选择特定区域或截取整个屏幕）。\n- ~修复作为 .app 文件时麦克风无法正常工作的问题~ 已经由 [@claar](https:\u002F\u002Fwww.github.com\u002Fclaar) 解决。\n- ~支持文本输入而非仅限语音输入~\n\n## 关于\u002F联系\n\n我是一名自学成才的开发者，非常喜欢动手拼凑各种有趣的项目。我编写的代码功能实用，但可能既不美观也不高效，我还是愿意分享出来，希望能对其他人有所帮助。\n\n你可以在 Twitter\u002FX 上找到我：[@ralfelfving](https:\u002F\u002Ftwitter.com\u002Fralfelfving)。如果你喜欢这个项目，不妨去看看我在 YouTube 频道 [@ralfelfving](https:\u002F\u002Fwww.youtube.com\u002F@ralfelfving) 上的教程吧。","# macOSpilot AI 助手快速上手指南\n\nmacOSpilot 是一款运行在 macOS 上的个人 AI 助手。它能在任何应用程序中通过快捷键唤醒，截取当前屏幕画面并结合你的语音或文字提问，利用 OpenAI GPT-4 Vision 和 TTS 技术，在几秒钟内以图文和音频形式提供上下文相关的解答。\n\n## 环境准备\n\n*   **操作系统**：macOS（已在 Apple Silicon 和 Intel 芯片上测试）。\n*   **前置依赖**：\n    *   已安装 [Node.js](https:\u002F\u002Fnodejs.org\u002F)。\n    *   已安装 `yarn` 或 `npm` 包管理器。\n    *   **OpenAI API Key**：你需要拥有可用的 OpenAI 账户及 API 密钥（需支持 Whisper、GPT-4 Vision 和 TTS 接口）。\n    *   **网络环境**：由于需要连接 OpenAI 服务，请确保你的网络环境能够稳定访问相关 API。\n\n## 安装步骤\n\n1.  **克隆项目仓库**\n    在终端中执行以下命令下载源码：\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Felfvingralf\u002FmacOSpilot-ai-assistant.git\n    ```\n\n2.  **进入目录并安装依赖**\n    进入项目文件夹并安装所需依赖包：\n    ```bash\n    cd macOSpilot-ai-assistant\n    yarn install\n    # 或者使用 npm\n    # npm install\n    ```\n\n3.  **配置 API 密钥**\n    *   运行应用后，在主窗口右上角点击设置图标。\n    *   输入你的 OpenAI API Key。\n    *   *注意：根据开发者说明，该密钥未加密存储，请妥善保管。*\n\n4.  **授予系统权限**\n    首次运行时，macOS 会提示你授予以下权限，请务必在“系统设置”->“隐私与安全性”中允许：\n    *   **屏幕录制**（用于截取当前窗口）\n    *   **麦克风**（用于语音输入）\n    *   **辅助功能**（用于模拟键盘操作等）\n    *   *提示：可能需要重启终端或应用多次以生效所有权限。*\n\n## 基本使用\n\n### 1. 启动应用\n在终端中运行以下命令启动开发版：\n```bash\nyarn start\n# 或者\n# npm start\n```\n*(可选：若想打包成 `.app` 独立应用，请参考原文 \"Turn it into an .app with Electron\" 章节修改 `index.js` 并运行 `npm run package-mac`)*\n\n### 2. 唤起助手\n确保当前处于任意你想要询问的应用窗口中，按下默认快捷键：\n> **Command + Shift + '** (单引号)\n\n### 3. 提问与获取答案\n*   **语音模式（默认）**：\n    1.  按下快捷键后，对着麦克风说出你的问题（例如：“这个报错是什么意思？”或“如何在这个界面导出数据？”）。\n    2.  说完后，再次按下相同的快捷键 (**Command + Shift + '**) 结束录音。\n*   **文本模式**：\n    如果在配置中启用了文本输入，按下快捷键后可直接打字提问并按回车。\n\n### 4. 查看结果\n*   **视觉反馈**：一个小型的通知窗口将悬浮在当前应用上方，显示 AI 的文字回答。\n*   **听觉反馈**：系统将自动朗读回答内容（基于 OpenAI TTS）。\n*   **历史记录**：你可以最小化主窗口，其中保留了当前会话的问答历史。\n\n---\n*提示：默认的截图和提示词配置位于 `index.js` 文件中，高级用户可根据需要修改快捷键、Prompt 内容或窗口尺寸。*","资深数据分析师正在使用复杂的 Excel 宏处理财务报表，同时需要参考 Safari 浏览器中的最新会计准则文档，却因不熟悉某个特定函数的参数而卡壳。\n\n### 没有 macOSpilot-ai-assistant 时\n- **频繁切换窗口打断心流**：必须手动最小化 Excel，切换到浏览器搜索函数用法，再切回表格，反复操作严重破坏专注度。\n- **视觉对照繁琐低效**：需要一边盯着屏幕上的报错单元格，一边在另一个窗口核对文档说明，肉眼来回比对极易出错。\n- **双手占用无法记录**：双手正忙于键盘输入和数据调整，难以腾出手来打字查询或复制粘贴帮助信息。\n- **阅读解释增加认知负荷**：在高度紧张的报表截止日前，还要费力阅读枯燥的文字教程，增加了额外的脑力负担。\n\n### 使用 macOSpilot-ai-assistant 后\n- **原地唤醒无需切换**：只需按下快捷键，macOSpilot-ai-assistant 直接截取当前 Excel 界面，无需离开当前工作窗口即可发起提问。\n- **视觉上下文智能理解**：工具自动将屏幕截图与语音问题发送给 AI，它能“看见”具体的报错单元格和公式，提供针对性的修正建议。\n- **纯语音交互解放双手**：直接口述“这个 VLOOKUP 为什么返回错误”，说完再次按键即可，全程无需敲击键盘或移动鼠标。\n- **音频播报即时反馈**：解决方案不仅显示在浮窗中，还会通过 TTS 语音直接读出来，让用户边听边改，大幅降低阅读压力。\n\nmacOSpilot-ai-assistant 通过“所见即所问”的语音视觉融合能力，将跨应用的知识查询转化为零中断的即时辅助，极大提升了复杂任务下的工作流效率。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Felfvingralf_macOSpilot-ai-assistant_ff334908.jpg","elfvingralf","Ralf Elfving","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Felfvingralf_b7b643ca.png","Learning to code, one project at a time. Formerly data\u002Fproduct at @klarna, @Shopify and @gadget-inc. \r\n\r\nLost access to my previous GH account :(",null,"Canada","ralfelfving","https:\u002F\u002Fgithub.com\u002Felfvingralf",[85,89],{"name":86,"color":87,"percentage":88},"JavaScript","#f1e05a",73.3,{"name":90,"color":91,"percentage":92},"HTML","#e34c26",26.7,1157,48,"2026-03-26T15:58:51",4,"macOS, Windows, Linux","未说明",{"notes":100,"python":98,"dependencies":101},"该工具基于 NodeJS\u002FElectron 开发，非本地运行大模型，无需 GPU。主要依赖 OpenAI API（Vision, Whisper, TTS），需在配置文件中填入 API Key。在 macOS 上运行时需授予屏幕录制、麦克风及文件读写权限，首次运行可能需要重启终端或应用。虽然提供了 Windows 和 Linux 的打包命令，但作者明确表示仅在 Mac (Apple Silicon 和 Intel) 上进行了测试。",[102,103,104],"NodeJS","Electron","yarn 或 npm",[26,14,55,15,54],"2026-03-27T02:49:30.150509","2026-04-06T08:39:58.207643",[],[]]