[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-HKUDS--OpenPhone":3,"tool-HKUDS--OpenPhone":64},[4,17,27,35,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,43,44,45,15,46,26,13,47],"数据工具","视频","插件","其他","音频",{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,46],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},2181,"OpenHands","OpenHands\u002FOpenHands","OpenHands 是一个专注于 AI 驱动开发的开源平台，旨在让智能体（Agent）像人类开发者一样理解、编写和调试代码。它解决了传统编程中重复性劳动多、环境配置复杂以及人机协作效率低等痛点，通过自动化流程显著提升开发速度。\n\n无论是希望提升编码效率的软件工程师、探索智能体技术的研究人员，还是需要快速原型验证的技术团队，都能从中受益。OpenHands 提供了灵活多样的使用方式：既可以通过命令行（CLI）或本地图形界面在个人电脑上轻松上手，体验类似 Devin 的流畅交互；也能利用其强大的 Python SDK 自定义智能体逻辑，甚至在云端大规模部署上千个智能体并行工作。\n\n其核心技术亮点在于模块化的软件智能体 SDK，这不仅构成了平台的引擎，还支持高度可组合的开发模式。此外，OpenHands 在 SWE-bench 基准测试中取得了 77.6% 的优异成绩，证明了其解决真实世界软件工程问题的能力。平台还具备完善的企业级功能，支持与 Slack、Jira 等工具集成，并提供细粒度的权限管理，适合从个人开发者到大型企业的各类用户场景。",70612,"2026-04-05T11:12:22",[26,15,13,45],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":98,"forks":99,"last_commit_at":100,"license":101,"difficulty_score":102,"env_os":103,"env_gpu":104,"env_ram":105,"env_deps":106,"category_tags":113,"github_topics":114,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":118,"updated_at":119,"faqs":120,"releases":126},919,"HKUDS\u002FOpenPhone","OpenPhone","\"OpenPhone: Mobile Agentic Foundation Models for AI Phone\"","OpenPhone 是一个面向智能手机的智能体基础模型开源项目，旨在让 AI 能够像真人一样理解和操作手机界面，完成复杂的多步骤任务。它通过模拟人类与手机的交互过程，将用户的高层指令（如“帮我订一张明天去北京的机票”）自动分解为一系列具体的屏幕操作（如点击、滑动、输入），并自主执行。\n\n它主要解决了当前大语言模型在移动设备上“能说不能做”的问题，即模型虽然能理解任务，却无法直接操控手机应用来完成任务。OpenPhone 通过其核心的“Ralph 循环”（执行→评估→修复→重复）机制，让 AI 能够自主尝试、检查结果并在失败时调整策略，从而实现真正端到端的自动化操作。\n\n这个工具非常适合AI 研究人员、移动应用开发者以及对智能体技术和自动化感兴趣的技术爱好者。研究人员可以利用其开源模型和数据集探索移动智能体的能力边界；开发者可以基于它构建更智能的手机助手或自动化测试工具。\n\n其技术亮点在于构建了一个大规模、高质量的智能手机交互数据集，并训练了专门的视觉-语言-动作基础模型。最新发布的 **PhoneClaw** 功能尤为突出，它像一个不知疲倦的 AI 手机管家，专门针对 iOS 设备，具","OpenPhone 是一个面向智能手机的智能体基础模型开源项目，旨在让 AI 能够像真人一样理解和操作手机界面，完成复杂的多步骤任务。它通过模拟人类与手机的交互过程，将用户的高层指令（如“帮我订一张明天去北京的机票”）自动分解为一系列具体的屏幕操作（如点击、滑动、输入），并自主执行。\n\n它主要解决了当前大语言模型在移动设备上“能说不能做”的问题，即模型虽然能理解任务，却无法直接操控手机应用来完成任务。OpenPhone 通过其核心的“Ralph 循环”（执行→评估→修复→重复）机制，让 AI 能够自主尝试、检查结果并在失败时调整策略，从而实现真正端到端的自动化操作。\n\n这个工具非常适合AI 研究人员、移动应用开发者以及对智能体技术和自动化感兴趣的技术爱好者。研究人员可以利用其开源模型和数据集探索移动智能体的能力边界；开发者可以基于它构建更智能的手机助手或自动化测试工具。\n\n其技术亮点在于构建了一个大规模、高质量的智能手机交互数据集，并训练了专门的视觉-语言-动作基础模型。最新发布的 **PhoneClaw** 功能尤为突出，它像一个不知疲倦的 AI 手机管家，专门针对 iOS 设备，具备**用户记忆**能力，能够学习并记住用户的个人习惯与历史信息，从而提供更个性化、更连贯的服务体验。","\u003Cdiv align=\"center\">\n  \u003Cpicture>\n      \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_OpenPhone_readme_715cf2917644.png\" width=\"20%\" style=\"border: none; box-shadow: none;\">\n  \u003C\u002Fpicture>\n\u003C\u002Fdiv >\n\n\u003Cdiv align=\"center\">\n\n# ✨OpenPhone✨: Mobile Agentic Foundation Models for AI Phone\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_OpenPhone_readme_516452bfad77.png\" alt=\"Typing Animation\" \u002F>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_OpenPhone_readme_4bcfa1207a4c.gif\" width=\"800\" height=\"400\" alt=\"演示动画\">\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Cdiv style=\"background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius: 15px; padding: 25px; text-align: center;\">\n    \u003Cp>\n      \u003Ca href='https:\u002F\u002Fgithub.com\u002FHKUDS\u002FOpenPhone'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🔥Project-Page-00d9ff?style=for-the-badge&logo=github&logoColor=white&labelColor=1a1a2e'>\u003C\u002Fa>\n      \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhkuds\u002FOpenPhone_dataset\">\u003Cimg alt=\"Hugging Face\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Dataset-ffc107?style=for-the-badge&color=ffc107&logoColor=white&labelColor=1a1a2e\"\u002F>\u003C\u002Fa>\n      \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fhkuds\u002FOpenPhone_model\">\u003Cimg alt=\"Hugging Face\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Model-ffc107?style=for-the-badge&color=ffc107&logoColor=white&labelColor=1a1a2e\"\u002F>\u003C\u002Fa>\n      \u003Ca href='https:\u002F\u002Fgithub.com\u002FTHUDM\u002FAndroid-Lab'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F⚡Based%20on-AndroidLab-4ecdc4?style=for-the-badge&logo=lightning&logoColor=white&labelColor=1a1a2e'>\u003C\u002Fa>\n    \u003C\u002Fp>\n    \u003Cp>\n      \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FHKUDS\u002FOpenPhone\u002Fstargazers\">\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FHKUDS\u002FOpenPhone?color=00d9ff&style=for-the-badge&logo=star&logoColor=white&labelColor=1a1a2e' \u002F>\u003C\u002Fa>\n      \u003Ca href=\".\u002FCommunication.md\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F💬Feishu-Group-07c160?style=for-the-badge&logoColor=white&labelColor=1a1a2e\">\u003C\u002Fa>\n      \u003Ca href=\".\u002FCommunication.md\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeChat-Group-07c160?style=for-the-badge&logo=wechat&logoColor=white&labelColor=1a1a2e\">\u003C\u002Fa>\n      \u003Ca href=\"\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPlatform-Android%20%7C%20iOS-d3d3d3?style=for-the-badge&logo=android&logoColor=white&labelColor=1a1a2e\"\u002F>\u003C\u002Fa>\n      \u003Ca href='https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.22009'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F📄arXiv-2510.22009-ff6b6b?style=for-the-badge&logo=arxiv&logoColor=white&labelColor=1a1a2e'>\u003C\u002Fa>\n    \u003C\u002Fp>\n  \u003C\u002Fdiv>\n\u003C\u002Fdiv>\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Cdiv style=\"width: 100%; height: 2px; margin: 20px 0; background: linear-gradient(90deg, transparent, #00d9ff, transparent);\">\u003C\u002Fdiv>\n\u003C\u002Fdiv>\n\n\u003Cdiv>\n  \u003Cdiv style=\"background: linear-gradient(135deg, #f953c6 0%, #b91d73 100%); border-radius: 15px; padding: 28px; margin: 20px 0; border: 2px solid #f953c6; box-shadow: 0 4px 24px rgba(249,83,198,0.25); text-align: left;\">\n    \u003Ch2 style=\"color: white; margin: 0 0 14px 0; font-size: 22px; text-align: left;\">🦾 New Release: PhoneClaw — Your Autonomous AI Butler for iPhone\u003C\u002Fh2>\n    \u003Cp style=\"color: rgba(255,255,255,0.95); margin: 0 0 16px 0; font-size: 15px; line-height: 1.7;\">\n      \u003Cstrong>PhoneClaw\u003C\u002Fstrong> is a tireless AI phone butler that handles any iOS task for you — and \u003Cem>gets smarter with every session\u003C\u002Fem>. Powered by the \u003Cstrong>Ralph Loop\u003C\u002Fstrong> (\u003Ccode style=\"background: rgba(255,255,255,0.18); padding: 2px 7px; border-radius: 4px;\">EXECUTE → EVALUATE → FIX → REPEAT\u003C\u002Fcode>), it breaks your request into subtasks, acts on your phone, checks whether each step succeeded, and automatically retries with the failure context — until the job is done.\n    \u003C\u002Fp>\n    \u003Cul style=\"color: rgba(255,255,255,0.95); margin: 0 0 18px 0; font-size: 15px; line-height: 1.7; padding-left: 20px;\">\n      \u003Cli>🧠 \u003Cstrong>UserMemory\u003C\u002Fstrong> — builds a persistent profile of who you are (name, city, habits, history) and injects it into every plan, so the butler truly knows its owner\u003C\u002Fli>\n      \u003Cli>📚 \u003Cstrong>ExperienceLog\u003C\u002Fstrong> — records app-specific navigation know-how (tap coords, failure patterns, timing) across sessions, auto-compacted into a lean, high-confidence knowledge base\u003C\u002Fli>\n      \u003Cli>⚡ \u003Cstrong>Memory-first answers\u003C\u002Fstrong> — repeated questions are answered instantly from the profile with zero device interactions\u003C\u002Fli>\n      \u003Cli>🤖 \u003Cstrong>Interactive daemon mode\u003C\u002Fstrong> — connect once, accept unlimited tasks back-to-back; the screen stays on automatically\u003C\u002Fli>\n      \u003Cli>🎓 \u003Cstrong>Learning mode\u003C\u002Fstrong> — just operate your phone as usual while PhoneClaw watches; it captures screenshots at ~8 fps, detects your taps via computer vision, and distils your actions into reusable navigation lessons that are added to the ExperienceLog immediately\u003C\u002Fli>\n    \u003C\u002Ful>\n    \u003Cp style=\"margin: 0; text-align: center;\">\n      \u003Ca href=\".\u002FPhoneClaw\u002FREADME.md\" style=\"color: #1a1a2e; background: white; padding: 8px 20px; border-radius: 8px; text-decoration: none; font-weight: bold; font-size: 14px; display: inline-block;\">📖 PhoneClaw Full Documentation →\u003C\u002Fa>\n      &nbsp;&nbsp;\n      \u003Ca href=\".\u002Fios_agent\u002FREADME.md\" style=\"color: white; background: rgba(255,255,255,0.18); padding: 8px 20px; border-radius: 8px; text-decoration: none; font-weight: bold; font-size: 14px; border: 1px solid rgba(255,255,255,0.4); display: inline-block;\">iOS Agent README →\u003C\u002Fa>\n    \u003C\u002Fp>\n  \u003C\u002Fdiv>\n\u003C\u002Fdiv>\n\n## 🎯 What is OpenPhone?\n\n**The Problem**: Most AI agents rely on expensive cloud APIs and large models that are impractical for real-world on-device deployment. Users face **Privacy Concerns**, **Latency Issues**, and **High Costs** when their phone needs to call external services for every interaction.\n\n**Our Solution**: OpenPhone introduces the first **Open-Source, 3B-parameter Agentic Foundation Model** designed specifically for on-device smartphone interaction. This compact vision-language model runs entirely locally — meaning **No Privacy Concerns**, **No Cloud Dependence**, and **Zero API Costs**.\n\n## 🤔 Why 3B Parameters?\nWe believe the future of mobile AI lies not only in making models larger, but in making them smarter and more efficient for real-world constraints. Our 3B model is:\n- ⚡ **Edge-Optimized**: Efficient enough for commodity GPUs and next-generation mobile NPUs.\n- 🔒 **Privacy-First**: All computation stays on your device.\n- 💰 **Cost-Free**: No cloud inference and no ongoing API fees.\n- 🎯 **High-Performance**: Achieves performance comparable to 7B–9B models through advanced training.\n\n---\n\n## 💡 Research Highlights\n\n### 🔍 OpenPhone‑3B: Lightweight Agentic Model\nConsidering the compute limitations of today’s edge devices, models with **≤3B parameters** strike a practical balance between capability and deployability. Based on this insight, we introduce **OpenPhone‑3B**, a lightweight yet powerful on‑device agent model.\n\n- **Model Size & Architecture**: Vision-language model engineered for efficient on-device reasoning under tight mobile compute constraints.\n- **Edge-Native Design**: Primary local agent compatible with consumer GPUs and mobile NPUs, eliminating continuous cloud dependency.\n- **GUI‑Aware Action Capabilities**: Trained for visual interpretation, instruction following, and structured action generation across real mobile tasks.\n- **Open‑Source Release**: Full model weights, configurations, and inference stack enabling community deployment and development.\n- **Practical Sweet Spot**: 3B scale delivers optimal balance—significantly stronger than tiny models while remaining deployable where larger models fail.\n\n### Why 3B is the Sweet Spot for Phone Agents\n- **Hardware Fit**: 3B parameters align perfectly with consumer GPU memory (8-12GB) and emerging mobile NPU computational budgets.\n- **Speed Advantage**: 3B models deliver 3-5x faster inference than 7B alternatives while maintaining competitive accuracy for sub-second GUI responses.\n- **Power Efficiency**: Smaller footprint extends battery life - essential for mobile deployment where power consumption affects user experience.\n- **Privacy-First**: Enables phone tasks to run entirely on-device, preserving user privacy while eliminating network dependencies.\n- **Cost Savings**: Local processing eliminates expensive cloud APIs and per-request charges for sustainable operation.\n\n### 🦾 PhoneClaw: Your Autonomous AI Butler for iPhone\nAn autonomous iOS phone butler built on the **Ralph Loop** — a closed-loop execution methodology that runs until every subtask passes its success criteria. The key differentiator is a **two-layer self-learning memory** that makes the butler measurably smarter after each session:\n\n- **UserMemory** — Maintains a persistent user profile (inferred name, city, app habits, task history) injected into every planning prompt, so the agent makes contextually intelligent decisions from the very first step. Repeated questions are answered directly from memory with **zero device interactions**.\n- **ExperienceLog** — Records app-specific navigation knowledge per session: successful tap coordinates, failure patterns, UI timing quirks. Lessons are semantically deduplicated, reinforced on confirmation, and automatically compacted when an app accumulates ≥ 20 entries — keeping the knowledge base lean and high-quality.\n- **Intelligent Planning**: VLM decomposes each task into subtasks with explicit success criteria, enabling precise per-step evaluation and targeted retries rather than blind repetition.\n- **Interactive Daemon Mode**: Connect once, accept unlimited tasks indefinitely — the device screen stays on automatically throughout the session.\n- **Learning Mode**: Just use your phone normally while PhoneClaw watches. It captures screenshots at ~8 fps, detects tap positions via computer vision (`HoughCircles` + pixel-diff fallback), annotates each frame, and distils your actions into reusable navigation lessons added directly to the ExperienceLog — no manual annotation required.\n\n➜ [Full PhoneClaw documentation](.\u002FPhoneClaw\u002FREADME.md)\n\n---\n\n## 🚀 Model Release & Resources\n\n### 📦 Ready-to-Deploy Model\n\n- **Model Weights**: OpenPhone-3B is available on Hugging Face with full licensing for research and commercial use.\n- **Production-Ready Serving**: Pre-configured vLLM inference scripts enable efficient deployment with optimized throughput and memory usage.\n\n### 🛠️ Complete Training Pipeline\n- **Reproducible Recipe**: Full training implementation including our novel two-stage approach (SFT + GRPO-style RL with synthetic GUI data).\n- **Customization Support**: Detailed documentation in model_training\u002Fallows researchers to adapt the model for domain-specific phone tasks or extend to new mobile platforms.\n- **Data Generation Paradigm**: Scripts and methodologies for creating high-quality training data at scale.\n\n---\n\n## 📖 Table of Contents\n- [✨OpenPhone✨: Mobile Agentic Foundation Models for AI Phone](#openphone-mobile-agentic-foundation-models-for-ai-phone)\n  - [🎯 What is OpenPhone?](#-what-is-openphone)\n  - [🤔 Why 3B Parameters?](#-why-3b-parameters)\n  - [💡 Research Highlights](#-research-highlights)\n    - [🔍 OpenPhone‑3B: Lightweight Agentic Model](#-openphone3b-lightweight-agentic-model)\n    - [Why 3B is the Sweet Spot for Phone Agents](#why-3b-is-the-sweet-spot-for-phone-agents)\n    - [🦾 PhoneClaw: Your Autonomous AI Butler for iPhone](#-phoneclaw-your-autonomous-ai-butler-for-iphone)\n  - [🚀 Model Release \\& Resources](#-model-release--resources)\n    - [📦 Ready-to-Deploy Model](#-ready-to-deploy-model)\n    - [🛠️ Complete Training Pipeline](#️-complete-training-pipeline)\n  - [📖 Table of Contents](#-table-of-contents)\n  - [🚀 Quick Start](#-quick-start)\n    - [📱 AndroidLab Benchmark Setup](#-androidlab-benchmark-setup)\n    - [🚀 Model Deployment \\& Inference](#-model-deployment--inference)\n    - [⚙️ Pre-Testing Configuration](#️-pre-testing-configuration)\n  - [🌟 Key Features of OpenPhone](#-key-features-of-openphone)\n    - [🤖 Lightweight Agentic Foundation Models](#-lightweight-agentic-foundation-models)\n    - [☁️ Device-Cloud Collaboration Framework](#️-device-cloud-collaboration-framework)\n    - [🎯 Comprehensive Mobile Agent Evaluation Playground](#-comprehensive-mobile-agent-evaluation-playground)\n  - [🌟 Technical Innovation \\& Implementation](#-technical-innovation--implementation)\n    - [🧠 Model Training: SFT+RL](#-model-training-sftrl)\n    - [☁️ Device-Cloud Collaboration Framework](#️-device-cloud-collaboration-framework-1)\n    - [💾 Efficient Memory Mechanism for Mobile Agents](#-efficient-memory-mechanism-for-mobile-agents)\n  - [🧪 Testing \\& Evaluation](#-testing--evaluation)\n    - [Single Task Testing](#single-task-testing)\n    - [Batch Evaluation Scripts](#batch-evaluation-scripts)\n    - [Additional App Documentation](#additional-app-documentation)\n  - [📊 Result Generation](#-result-generation)\n    - [LLM Evaluator Setup](#llm-evaluator-setup)\n    - [Generate Evaluation Results](#generate-evaluation-results)\n    - [Batch Testing File Management](#batch-testing-file-management)\n  - [🎯 📊 Key Evaluation Findings for OpenPhone](#--key-evaluation-findings-for-openphone)\n    - [🏆 Small Model, Big Performance](#-small-model-big-performance)\n    - [🥊 Competitive Performance](#-competitive-performance)\n    - [🔄 Device-Cloud Framework Works](#-device-cloud-framework-works)\n    - [🧠 Longer Prompts Don't Always Help](#-longer-prompts-dont-always-help)\n  - [📈 Device-Cloud Distribution Analysis for Phone Agents](#-device-cloud-distribution-analysis-for-phone-agents)\n    - [📊 Workload Distribution](#-workload-distribution)\n    - [💰 Efficiency Gains](#-efficiency-gains)\n    - [🎯 Model Capability Impact](#-model-capability-impact)\n  - [⚡ Inference Speed Comparison](#-inference-speed-comparison)\n    - [🎯 Speed Advantage](#-speed-advantage)\n    - [📊 Quantified Comparison](#-quantified-comparison)\n    - [💡 Practical Implications](#-practical-implications)\n  - [🌟 Citation](#-citation)\n  - [🔗 Related Projects](#-related-projects)\n  - [📜 License](#-license)\n\n---\n\n## 🚀 Quick Start\nThis project comprises three core components designed for comprehensive mobile agent development and evaluation:\n\n- ⚡ For **model training**, please refer to the training guide [README](.\u002Fmodel_training\u002FREADME.md) for comprehensive setup and execution instructions.\n- 🔧 For the **data generation pipeline**, please refer to the data preparation guide [README](.\u002Fprepare_data\u002FREADME.md) for detailed implementation steps.\n\nBelow, we focus on evaluation using the AndroidLab benchmark framework.\n\n### 📱 AndroidLab Benchmark Setup\nInstallation: Follow the official AndroidLab documentation [AndroidLab](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FAndroid-Lab) for complete setup instructions.\u003Cbr>\n\n**Environment Configuration**:\n- Recommended Mode: AVD on Mac (arm64) - validated in our experiments.\u003Cbr>\n- App Setup: Manual installation and task-specific configuration required.\u003Cbr>\n- Compatibility Note: Original Docker images are not compatible with AVD environments.\u003Cbr>\n\n### 🚀 Model Deployment & Inference\n**vLLM Integration**:\n- Inference scripts available in .\u002Fvllm_script\u002F directory\u003Cbr>\n- Optimized for efficient small model serving\u003Cbr>\n\n**Model Access**:\n- OpenPhone Weights: 3B parameter model hosted on HuggingFace\u003Cbr>\n- Deployment Process: Download weights → Deploy via vLLM → Configure inference service\u003Cbr>\n- Service Ready: Seamless integration with evaluation pipeline\u003Cbr>\n\n### ⚙️ Pre-Testing Configuration\n- API Setup Required: Configure cloud model credentials in .\u002Fevaluation\u002Fevaluation.py: Line 63, Line 75, Line 81\u003Cbr>\n- Coming Soon: Streamlined configuration interface in development\u003Cbr>\n\n---\n\n## 🌟 Key Features of OpenPhone\n\n### 🤖 Lightweight Agentic Foundation Models\n• **Compact Architecture**: Specialized **3B-scale** Vision-Language Models optimized for mobile GUI tasks with minimal computational footprint.\u003Cbr>\n• **On-Device Deployment**: True smartphone-compatible models that maintain competitive performance while running locally without cloud dependency.\n\n### ☁️ Device-Cloud Collaboration Framework\n• **Dynamic Orchestration**: Real-time task complexity assessment that intelligently switches between device and cloud models based on execution requirements. \u003Cbr>\n• **Cost-Performance Optimization**: Strategic resource allocation that leverages cost-efficient on-device models while compensating limitations through selective cloud model usage.\n\n### 🎯 Comprehensive Mobile Agent Evaluation Playground\n• **Extended Benchmark Suite**: Beyond AndroidLab, incorporating 25+ additional tasks across popular mobile applications for real-world validation. \u003Cbr>\n• **Multi-Dimensional Assessment**: Comprehensive evaluation covering performance metrics, computational efficiency, and practical deployment scenarios.\n\n---\n\n## 🌟 Technical Innovation & Implementation\n\n### 🧠 Model Training: SFT+RL\n• **Synthetic Data Generation**: Leverages advanced MLLMs to create high-quality reasoning chain training data, addressing the scarcity of manual annotations. \u003Cbr>\n• **Two-Stage Training**: SFT injects GUI foundational knowledge, while GRPO reinforcement learning optimizes task completion accuracy. \u003Cbr>\n• **Small Model Enhancement**: Enables 3B models to achieve performance comparable to 7B-9B models on GUI tasks through structured training. \n\n### ☁️ Device-Cloud Collaboration Framework\n• **Dynamic Task Assessment**: Real-time complexity evaluation determines when and how frequently to monitor device model performance. \u003Cbr>\n• **Intelligent Orchestration**: Seamlessly switches between device and cloud models based on execution progress and failure patterns. \u003Cbr>\n• **Cost-Performance Optimization**: Reduces cloud invocations by ~10% while maintaining high task success rates through strategic resource allocation.\n\n### 💾 Efficient Memory Mechanism for Mobile Agents\n• **Long-Horizon Reasoning**: Multi-step chain-of-thought reasoning with reflective error correction to enhance decision-making capabilities. \u003Cbr>\n• **Text-Based Summarization**: Compresses high-resolution screenshots into compact textual representations for efficient memory management. \u003Cbr>\n• **Structured Context Retention**: Maintains 10-20 steps of historical context in resource-constrained environments through optimized token usage.\n\n---\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_OpenPhone_readme_7e8a0179144c.png\" style=\"zoom:100%;\" \u002F>\n\n---\n\n## 🧪 Testing & Evaluation\n\n### Single Task Testing\nTest individual tasks using the following command structure:\n\n```bash\npython eval.py -n test_name -c your path to config.yaml --task_id task_id\n```\n\nExample Usage:\n\n```bash\npython eval.py -n all_cloud_v1_hyper -c .\u002Fconfigs\u002Fexample_xml_cloud_hyper.yaml --task_id zoom_1\n```\n\n### Batch Evaluation Scripts\nConvenient batch testing scripts are available in `.\u002Ftest_script`:\n\n• `all_test_cloud_v1_hyper.sh`: Evaluates all 138 AndroidLab benchmark tasks\u003Cbr>\n• `all_test_cloud_v1_hyper_add.sh`: Evaluates tasks for four additional mobile apps\u003Cbr>\n\n### Additional App Documentation\nFor comprehensive details about the four additional app tasks, refer to the documentation: [Additional Apps Documentation](.\u002Fdocs\u002Fnew_apps.md)\n\n---\n\n## 📊 Result Generation\n\n### LLM Evaluator Setup\nRequired Configuration: Set up LLM service credentials in .\u002Fevaluation\u002Ftasks\u002Fllm_evaluator.py:\n\n• Line 10: API configuration\u003Cbr>\n• Line 12: Service URL\u003Cbr>\n\n💡 Enhancement: Our implementation replaces AndroidLab's rule-based evaluation with LLM-powered assessment, providing more nuanced and accurate task completion evaluation.\n\n### Generate Evaluation Results\nExecute result generation with the following command:\n\n```bash\npython generate_result.py --input_folder .\u002Flogs\u002Fevaluation\u002F --output_folder .\u002Flogs\u002Fevaluation\u002F --output_excel .\u002Flogs\u002Fevaluation\u002Ftest_name.xlsx\n```\n### Batch Testing File Management\n⚠️ Important: When using batch scripts from .\u002Ftest_script\u002F:\u003Cbr>\n• Manual Transfer Required: Move generated evaluation files from script directory to .\u002Flogs\u002F\u003Cbr>\n• Then Execute: Run the result generation command above\u003Cbr>\n• Error Prevention: This step prevents file path conflicts and ensures proper result compilation\u003Cbr>\n\n---\n\n## 🎯 📊 Key Evaluation Findings for OpenPhone\n\n### 🏆 Small Model, Big Performance\n- **Size vs Performance**: OpenPhone-3B achieves performance comparable to 9B models while maintaining the deployment advantages of a compact architecture.\n- **Efficiency Champion**: Establishes itself as a genuine \"small powerhouse\" that challenges the bigger-is-better assumption in mobile AI.\n\n### 🥊 Competitive Performance\n- **Against Proprietary Models**: OpenPhone-3B shows respectable performance compared to lightweight versions of proprietary models when evaluated on standard benchmarks.\n- **Potential of Small Models**: Demonstrates promising results that validate the viability of compact open-source approaches in mobile agent developmen.\n\n### 🔄 Device-Cloud Framework Works\n- **Performance with Efficiency**: OpenPhone's hybrid architecture delivers near-optimal performance while dramatically reducing cloud model usage.\n- **Intelligent Routing**: Proves that smart task routing creates practical efficiency gains without sacrificing capability.\n\n### 🧠 Longer Prompts Don't Always Help\n- **Context Matters**: Extended prompting strategies only improve performance when paired with sufficiently capable cloud models.\n- **Smart Matching**: Highlights the importance of matching reasoning complexity to model capability rather than assuming longer prompts always help.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_OpenPhone_readme_1c723ec5f1a4.png\" width=\"90%\"\u002F>\n\u003C\u002Fp>\n\n## 📈 Device-Cloud Distribution Analysis for Phone Agents\n\nTo evaluate the practical efficiency of our hybrid approach, we measured key metrics across different MLLMs: average total steps per task, the proportion of steps handled by on-device versus cloud models, and cloud call reduction compared to cloud-only baselines.\n\n### 📊 Workload Distribution\nCloud models still handle approximately 65% of execution steps, reflecting the computational limitations of smaller on-device models for complex reasoning tasks.\n\n### 💰 Efficiency Gains\nIntroducing on-device processing achieves roughly 10% reduction in cloud API calls, translating to direct cost savings and reduced latency.\n\n### 🎯 Model Capability Impact\nAdvanced cloud models like GLM-4.5V show smaller reductions in cloud dependency, as their superior capabilities enable more independent task completion without requiring on-device assistance.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_OpenPhone_readme_379a33d6e2b5.png\" width=\"49%\"\u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_OpenPhone_readme_b2097f49145f.png\" width=\"47%\"\u002F>\n\u003C\u002Fp>\n\n## ⚡ Inference Speed Comparison\nWe evaluated average inference time per step using vLLM across different GPU configurations to assess real-world deployment feasibility. Note that GLM-4.1V-9B-Thinking could not operate on a single 3090 GPU due to context length constraints.\n\n\u003Cdiv align=\"center\">\n\n| Model                  | GPUs        | Size | SR   | Time Cost \u002F Step |\n| ---------------------- | ----------- | ---- | ---- | ---------------- |\n| Qwen2.5-VL-7B-Instruct | Single 3090 | 7B   | 10.1 | 6289.15 ms       |\n| OpenPhone              | Single 3090 | 3B   | 15.2 | 4170.63 ms       |\n| GLM-4.1V-9B-Thinking   | Two 3090s   | 9B   | 24.6 | 14584.89 ms      |\n| Qwen2.5-VL-7B-Instruct | Two 3090s   | 7B   | 10.1 | 4587.79 ms       |\n| OpenPhone              | Two 3090s   | 3B   | 15.2 | 3524.25 ms       |\n\n\u003C\u002Fdiv>\n\u003C\u002Fp>\n\n### 🎯 Speed Advantage\n- **Clear Winner**: OpenPhone demonstrates significant inference speed advantages thanks to its lightweight 3B architecture\n- **Real-World Ready**: Speed benefits become increasingly pronounced under constrained computational resources, matching typical edge deployment scenarios\n\n### 📊 Quantified Comparison\n- **3.5x Faster**: OpenPhone on single 3090 vs GLM-4.1V-9B-Thinking on dual 3090s.\n- **4x Faster**: OpenPhone on dual 3090s vs GLM-4.1V-9B-Thinking on dual 3090s.\n- **OpenPhone's Lightweight**: GLM-4.1V-9B-Thinking's inability to run on single 3090 severely limits edge deployment options.\n\n### 💡 Practical Implications\nThe trade-off is clear: while larger models like GLM-4.1V-9B-Thinking achieve higher task performance, OpenPhone's speed advantages make it far more suitable for real-world on-device scenarios where response time and hardware constraints matter.\n\n---\n\n## 🌟 Citation\n\nIf you find this work helpful to your research, please kindly consider citing our paper.\n\n```\n@article{jiang2025lightagent,\n  title={LightAgent: Mobile Agentic Foundation Models},\n  author={Jiang, Yangqin and Huang, Chao},\n  journal={arXiv preprint arXiv:2510.22009},\n  year={2025}\n}\n```\n\n## 🔗 Related Projects\n\nOpenPhone builds upon excellent open-source projects. We sincerely thank their authors and contributors:\n\n- [AndroidLab](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FAndroid-Lab) - The benchmark framework.\n- [R1-V](https:\u002F\u002Fgithub.com\u002FStarsfieldAI\u002FR1-V) - Implementation details for the GRPO training methodology.\n- [LLaMA Factory](https:\u002F\u002Fgithub.com\u002Fhiyouga\u002FLLaMA-Factory) - The unified training framework enabling efficient model fine-tuning.\n\n## 📜 License\n\nThis project is released under the [MIT License](.\u002FLICENSE).\n\n\u003Cdiv align=\"center\">\n\n**If this project helps you, please give us a Star🌟**\n\n**🤖 Empower AI Phone with Agents!**\n\n\u003Cbr>\n\n\u003Cp align=\"center\">\n  \u003Cem> ❤️ Thanks for visiting ✨ OpenPhone!\u003C\u002Fem>\u003Cbr>\u003Cbr>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_OpenPhone_readme_b18673397fde.png\" alt=\"Views\">\n\u003C\u002Fp>\n\n\n","\u003Cdiv align=\"center\">\n  \u003Cpicture>\n      \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_OpenPhone_readme_715cf2917644.png\" width=\"20%\" style=\"border: none; box-shadow: none;\">\n  \u003C\u002Fpicture>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n# ✨OpenPhone✨：面向 AI 手机的移动智能体基础模型\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_OpenPhone_readme_516452bfad77.png\" alt=\"Typing Animation\" \u002F>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_OpenPhone_readme_4bcfa1207a4c.gif\" width=\"800\" height=\"400\" alt=\"演示动画\">\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Cdiv style=\"background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius: 15px; padding: 25px; text-align: center;\">\n    \u003Cp>\n      \u003Ca href='https:\u002F\u002Fgithub.com\u002FHKUDS\u002FOpenPhone'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🔥Project-Page-00d9ff?style=for-the-badge&logo=github&logoColor=white&labelColor=1a1a2e'>\u003C\u002Fa>\n      \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhkuds\u002FOpenPhone_dataset\">\u003Cimg alt=\"Hugging Face\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Dataset-ffc107?style=for-the-badge&color=ffc107&logoColor=white&labelColor=1a1a2e\"\u002F>\u003C\u002Fa>\n      \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fhkuds\u002FOpenPhone_model\">\u003Cimg alt=\"Hugging Face\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Model-ffc107?style=for-the-badge&color=ffc107&logoColor=white&labelColor=1a1a2e\"\u002F>\u003C\u002Fa>\n      \u003Ca href='https:\u002F\u002Fgithub.com\u002FTHUDM\u002FAndroid-Lab'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F⚡Based%20on-AndroidLab-4ecdc4?style=for-the-badge&logo=lightning&logoColor=white&labelColor=1a1a2e'>\u003C\u002Fa>\n    \u003C\u002Fp>\n    \u003Cp>\n      \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FHKUDS\u002FOpenPhone\u002Fstargazers\">\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fgithub-stars\u002FHKUDS\u002FOpenPhone?color=00d9ff&style=for-the-badge&logo=star&logoColor=white&labelColor=1a1a2e' \u002F>\u003C\u002Fa>\n      \u003Ca href=\".\u002FCommunication.md\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F💬Feishu-Group-07c160?style=for-the-badge&logoColor=white&labelColor=1a1a2e\">\u003C\u002Fa>\n      \u003Ca href=\".\u002FCommunication.md\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeChat-Group-07c160?style=for-the-badge&logo=wechat&logoColor=white&labelColor=1a1a2e\">\u003C\u002Fa>\n      \u003Ca href=\"\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPlatform-Android%20%7C%20iOS-d3d3d3?style=for-the-badge&logo=android&logoColor=white&labelColor=1a1a2e\"\u002F>\u003C\u002Fa>\n      \u003Ca href='https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.22009'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F📄arXiv-2510.22009-ff6b6b?style=for-the-badge&logo=arxiv&logoColor=white&labelColor=1a1a2e'>\u003C\u002Fa>\n    \u003C\u002Fp>\n  \u003C\u002Fdiv>\n\u003C\u002Fdiv>\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Cdiv style=\"width: 100%; height: 2px; margin: 20px 0; background: linear-gradient(90deg, transparent, #00d9ff, transparent);\">\u003C\u002Fdiv>\n\u003C\u002Fdiv>\n\n\u003Cdiv>\n  \u003Cdiv style=\"background: linear-gradient(135deg, #f953c6 0%, #b91d73 100%); border-radius: 15px; padding: 28px; margin: 20px 0; border: 2px solid #f953c6; box-shadow: 0 4px 24px rgba(249,83,198,0.25); text-align: left;\">\n    \u003Ch2 style=\"color: white; margin: 0 0 14px 0; font-size: 22px; text-align: left;\">🦾 新发布：PhoneClaw —— 你的 iPhone 自主 AI 管家\u003C\u002Fh2>\n    \u003Cp style=\"color: rgba(255,255,255,0.95); margin: 0 0 16px 0; font-size: 15px; line-height: 1.7;\">\n      \u003Cstrong>PhoneClaw\u003C\u002Fstrong> 是一个不知疲倦的 AI 手机管家，为你处理任何 iOS 任务 —— 并且 \u003Cem>每次会话都变得更聪明\u003C\u002Fem>。它由 \u003Cstrong>Ralph 循环\u003C\u002Fstrong> (\u003Ccode style=\"background: rgba(255,255,255,0.18); padding: 2px 7px; border-radius: 4px;\">执行 → 评估 → 修复 → 重复\u003C\u002Fcode>) 驱动，将你的请求分解为子任务，在手机上执行，检查每一步是否成功，并自动利用失败上下文重试 —— 直到任务完成。\n    \u003C\u002Fp>\n    \u003Cul style=\"color: rgba(255,255,255,0.95); margin: 0 0 18px 0; font-size: 15px; line-height: 1.7; padding-left: 20px;\">\n      \u003Cli>🧠 \u003Cstrong>用户记忆\u003C\u002Fstrong> —— 构建关于你是谁（姓名、城市、习惯、历史）的持久档案，并将其注入到每个计划中，让管家真正了解它的主人\u003C\u002Fli>\n      \u003Cli>📚 \u003Cstrong>经验日志\u003C\u002Fstrong> —— 记录跨会话的应用特定导航知识（点击坐标、失败模式、时机），自动压缩成一个精简、高置信度的知识库\u003C\u002Fli>\n      \u003Cli>⚡ \u003Cstrong>记忆优先的答案\u003C\u002Fstrong> —— 重复的问题可以立即从用户档案中得到回答，无需任何设备交互\u003C\u002Fli>\n      \u003Cli>🤖 \u003Cstrong>交互式守护进程模式\u003C\u002Fstrong> —— 连接一次，即可连续接受无限任务；屏幕自动保持开启\u003C\u002Fli>\n      \u003Cli>🎓 \u003Cstrong>学习模式\u003C\u002Fstrong> —— 你只需像往常一样操作手机，PhoneClaw 会在一旁观察；它以约 8 fps 的速度捕获屏幕截图，通过计算机视觉检测你的点击，并将你的操作提炼成可复用的导航经验，立即添加到经验日志中\u003C\u002Fli>\n    \u003C\u002Ful>\n    \u003Cp style=\"margin: 0; text-align: center;\">\n      \u003Ca href=\".\u002FPhoneClaw\u002FREADME.md\" style=\"color: #1a1a2e; background: white; padding: 8px 20px; border-radius: 8px; text-decoration: none; font-weight: bold; font-size: 14px; display: inline-block;\">📖 PhoneClaw 完整文档 →\u003C\u002Fa>\n      &nbsp;&nbsp;\n      \u003Ca href=\".\u002Fios_agent\u002FREADME.md\" style=\"color: white; background: rgba(255,255,255,0.18); padding: 8px 20px; border-radius: 8px; text-decoration: none; font-weight: bold; font-size: 14px; border: 1px solid rgba(255,255,255,0.4); display: inline-block;\">iOS 智能体 README →\u003C\u002Fa>\n    \u003C\u002Fp>\n  \u003C\u002Fdiv>\n\u003C\u002Fdiv>\n\n## 🎯 什么是 OpenPhone？\n\n**问题所在**：大多数 AI 智能体依赖于昂贵的云端 API 和大模型，这对于现实世界的设备端部署来说是不切实际的。当用户的手机需要为每次交互调用外部服务时，他们面临着**隐私问题**、**延迟问题**和**高昂成本**。\n\n**我们的解决方案**：OpenPhone 推出了首个专为设备端智能手机交互设计的**开源、30亿参数智能体基础模型**。这个紧凑的视觉语言模型完全在本地运行 —— 这意味着**没有隐私顾虑**、**不依赖云端**，以及**零 API 成本**。\n\n## 🤔 为什么是 30亿参数？\n我们相信移动 AI 的未来不仅在于让模型变得更大，更在于让它们在现实世界的限制下变得更智能、更高效。我们的 30亿参数模型是：\n- ⚡ **边缘优化**：效率足以在商用 GPU 和下一代移动 NPU 上运行。\n- 🔒 **隐私优先**：所有计算都保留在你的设备上。\n- 💰 **完全免费**：无需云端推理，没有持续的 API 费用。\n- 🎯 **高性能**：通过先进的训练，达到与 70亿–90亿参数模型相当的性能。\n\n---\n\n## 💡 研究亮点\n\n### 🔍 OpenPhone‑3B：轻量级智能体模型\n考虑到当前边缘设备的计算限制，**参数 ≤ 3B** 的模型在能力与可部署性之间取得了实用的平衡。基于此洞察，我们推出了 **OpenPhone‑3B**，一个轻量级但功能强大的端侧智能体模型。\n\n- **模型规模与架构**：专为在严格的移动计算限制下进行高效端侧推理而设计的视觉语言模型。\n- **原生边缘设计**：主要作为本地智能体，兼容消费级 GPU 和移动 NPU，消除了对云端的持续依赖。\n- **GUI 感知的操作能力**：经过训练，可在真实的移动任务中进行视觉理解、指令跟随和结构化操作生成。\n- **开源发布**：完整的模型权重、配置和推理栈，支持社区部署和开发。\n- **实用最佳点**：3B 规模提供了最佳平衡——性能显著强于微型模型，同时仍可在大型模型无法部署的场景下运行。\n\n### 为什么 3B 是手机智能体的最佳选择\n- **硬件适配**：3B 参数完美契合消费级 GPU 内存（8-12GB）和新兴移动 NPU 的计算预算。\n- **速度优势**：与 7B 模型相比，3B 模型的推理速度快 3-5 倍，同时为亚秒级 GUI 响应保持了有竞争力的准确性。\n- **能效优势**：更小的模型占用延长了电池续航——这对于功耗影响用户体验的移动部署至关重要。\n- **隐私优先**：使手机任务完全在设备上运行，保护用户隐私，同时消除网络依赖。\n- **成本节约**：本地处理消除了昂贵的云端 API 和按请求计费，实现可持续运营。\n\n### 🦾 PhoneClaw：您的 iPhone 自主 AI 管家\n一个基于 **Ralph Loop** 构建的自主 iOS 手机管家——这是一种闭环执行方法，会一直运行直到每个子任务都通过其成功标准。其关键区别在于 **双层自学习记忆**，使得管家在每次会话后都能明显变得更智能：\n\n- **用户记忆** —— 维护一个持久的用户档案（推断出的姓名、城市、应用习惯、任务历史），并注入到每个规划提示中，因此智能体从一开始就能做出符合上下文的智能决策。重复的问题会直接从记忆中回答，**无需任何设备交互**。\n- **经验日志** —— 记录每次会话中特定于应用的导航知识：成功的点击坐标、失败模式、UI 时序特性。经验教训会进行语义去重，在确认后得到强化，并且当一个应用积累了 ≥ 20 条记录时会自动压缩——保持知识库的精简和高质量。\n- **智能规划**：视觉语言模型将每个任务分解为具有明确成功标准的子任务，从而实现精确的逐步骤评估和有针对性的重试，而非盲目重复。\n- **交互式守护进程模式**：连接一次，即可无限期接受不限量的任务——在整个会话期间，设备屏幕会自动保持开启。\n- **学习模式**：只需正常使用您的手机，PhoneClaw 会在一旁观察。它以约 8 fps 的速度捕获屏幕截图，通过计算机视觉（`HoughCircles` + 像素差异后备方案）检测点击位置，注释每一帧，并将您的操作提炼成可重用的导航经验，直接添加到经验日志中——无需手动标注。\n\n➜ [完整的 PhoneClaw 文档](.\u002FPhoneClaw\u002FREADME.md)\n\n---\n\n## 🚀 模型发布与资源\n\n### 📦 开箱即用的模型\n\n- **模型权重**：OpenPhone-3B 已在 Hugging Face 上提供，附带完整的许可，可用于研究和商业用途。\n- **生产就绪的服务**：预配置的 vLLM 推理脚本可实现高效部署，并优化吞吐量和内存使用。\n\n### 🛠️ 完整的训练流程\n- **可复现的方案**：完整的训练实现，包括我们新颖的两阶段方法（SFT + 使用合成 GUI 数据的 GRPO 风格 RL）。\n- **定制化支持**：`model_training\u002F` 中的详细文档允许研究人员针对特定领域的手机任务调整模型，或扩展到新的移动平台。\n- **数据生成范式**：用于大规模创建高质量训练数据的脚本和方法论。\n\n---\n\n## 📖 目录\n- [✨OpenPhone✨：面向 AI 手机的移动智能体基础模型](#openphone-面向-ai-手机的移动智能体基础模型)\n  - [🎯 什么是 OpenPhone？](#-什么是-openphone)\n  - [🤔 为什么是 30 亿参数？](#-为什么是-30-亿参数)\n  - [💡 研究亮点](#-研究亮点)\n    - [🔍 OpenPhone‑3B：轻量级智能体模型](#-openphone3b轻量级智能体模型)\n    - [为什么 30 亿参数是手机智能体的“甜点”](#为什么-30-亿参数是手机智能体的甜点)\n    - [🦾 PhoneClaw：您的 iPhone 自主 AI 管家](#-phoneclaw您的-iphone-自主-ai-管家)\n  - [🚀 模型发布与资源](#-模型发布与资源)\n    - [📦 开箱即用模型](#-开箱即用模型)\n    - [🛠️ 完整训练流程](#️-完整训练流程)\n  - [📖 目录](#-目录-1)\n  - [🚀 快速开始](#-快速开始)\n    - [📱 AndroidLab 基准测试环境设置](#-androidlab-基准测试环境设置)\n    - [🚀 模型部署与推理](#-模型部署与推理)\n    - [⚙️ 测试前配置](#️-测试前配置)\n  - [🌟 OpenPhone 主要特性](#-openphone-主要特性)\n    - [🤖 轻量级智能体基础模型](#-轻量级智能体基础模型)\n    - [☁️ 设备-云端协作框架](#️-设备-云端协作框架)\n    - [🎯 全面的移动智能体评估平台](#-全面的移动智能体评估平台)\n  - [🌟 技术创新与实现](#-技术创新与实现)\n    - [🧠 模型训练：SFT+RL](#-模型训练sftrl)\n    - [☁️ 设备-云端协作框架](#️-设备-云端协作框架-1)\n    - [💾 面向移动智能体的高效记忆机制](#-面向移动智能体的高效记忆机制)\n  - [🧪 测试与评估](#-测试与评估)\n    - [单任务测试](#单任务测试)\n    - [批量评估脚本](#批量评估脚本)\n    - [附加应用文档](#附加应用文档)\n  - [📊 结果生成](#-结果生成)\n    - [LLM 评估器设置](#llm-评估器设置)\n    - [生成评估结果](#生成评估结果)\n    - [批量测试文件管理](#批量测试文件管理)\n  - [🎯 📊 OpenPhone 关键评估发现](#--openphone-关键评估发现)\n    - [🏆 小模型，大性能](#-小模型大性能)\n    - [🥊 有竞争力的性能](#-有竞争力的性能)\n    - [🔄 设备-云端框架有效](#-设备-云端框架有效)\n    - [🧠 更长的提示词并非总是有益](#-更长的提示词并非总是有益)\n  - [📈 手机智能体的设备-云端分布分析](#-手机智能体的设备-云端分布分析)\n    - [📊 工作负载分布](#-工作负载分布)\n    - [💰 效率提升](#-效率提升)\n    - [🎯 模型能力影响](#-模型能力影响)\n  - [⚡ 推理速度对比](#-推理速度对比)\n    - [🎯 速度优势](#-速度优势)\n    - [📊 量化对比](#-量化对比)\n    - [💡 实际意义](#-实际意义)\n  - [🌟 引用](#-引用)\n  - [🔗 相关项目](#-相关项目)\n  - [📜 许可证](#-许可证)\n\n---\n\n## 🚀 快速开始\n本项目包含三个核心组件，旨在为全面的移动智能体开发和评估提供支持：\n\n- ⚡ 关于**模型训练**，请参阅训练指南 [README](.\u002Fmodel_training\u002FREADME.md) 以获取完整的设置和执行说明。\n- 🔧 关于**数据生成流程**，请参阅数据准备指南 [README](.\u002Fprepare_data\u002FREADME.md) 以获取详细的实现步骤。\n\n下文将重点介绍使用 AndroidLab 基准测试框架进行评估。\n\n### 📱 AndroidLab 基准测试环境设置\n安装：请按照官方 AndroidLab 文档 [AndroidLab](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FAndroid-Lab) 完成完整的设置说明。\u003Cbr>\n\n**环境配置**：\n- 推荐模式：Mac (arm64) 上的 AVD（Android 虚拟设备）—— 已在我们的实验中验证。\u003Cbr>\n- 应用设置：需要手动安装应用并进行特定于任务的配置。\u003Cbr>\n- 兼容性说明：原始的 Docker 镜像与 AVD 环境不兼容。\u003Cbr>\n\n### 🚀 模型部署与推理\n**vLLM 集成**：\n- 推理脚本位于 .\u002Fvllm_script\u002F 目录中\u003Cbr>\n- 针对高效的小模型服务进行了优化\u003Cbr>\n\n**模型访问**：\n- OpenPhone 权重：托管在 HuggingFace 上的 30 亿参数模型\u003Cbr>\n- 部署流程：下载权重 → 通过 vLLM 部署 → 配置推理服务\u003Cbr>\n- 服务就绪：与评估流程无缝集成\u003Cbr>\n\n### ⚙️ 测试前配置\n- 需要设置 API：在 .\u002Fevaluation\u002Fevaluation.py 的第 63、75、81 行配置云端模型凭据\u003Cbr>\n- 即将推出：正在开发简化的配置界面\u003Cbr>\n\n---\n\n## 🌟 OpenPhone 主要特性\n\n### 🤖 轻量级智能体基础模型\n• **紧凑架构**：专为移动 GUI（图形用户界面）任务优化的**30亿参数规模**视觉语言模型，计算足迹最小。\u003Cbr>\n• **设备端部署**：真正兼容智能手机的模型，在本地运行且不依赖云端的情况下，仍能保持有竞争力的性能。\n\n### ☁️ 设备-云端协作框架\n• **动态编排**：实时评估任务复杂度，根据执行需求智能地在设备模型和云端模型之间切换。\u003Cbr>\n• **成本-性能优化**：战略性的资源分配，利用经济高效的设备端模型，同时通过选择性使用云端模型来弥补其局限性。\n\n### 🎯 全面的移动智能体评估平台\n• **扩展的基准测试套件**：超越 AndroidLab，整合了流行移动应用中的 25 个以上额外任务，用于现实世界验证。\u003Cbr>\n• **多维度评估**：全面的评估，涵盖性能指标、计算效率和实际部署场景。\n\n---\n\n## 🌟 技术创新与实现\n\n### 🧠 模型训练：SFT+RL\n• **合成数据生成**：利用先进的 MLLMs（多模态大语言模型）创建高质量推理链训练数据，解决人工标注稀缺的问题。\u003Cbr>\n• **两阶段训练**：SFT（监督微调）注入 GUI 基础知识，而 GRPO（分组相对策略优化）强化学习则优化任务完成准确率。\u003Cbr>\n• **小模型增强**：通过结构化训练，使 30 亿参数模型在 GUI 任务上达到与 70 亿-90 亿参数模型相当的性能。\n\n### ☁️ 设备-云端协作框架\n• **动态任务评估**：实时复杂度评估决定了何时以及以何种频率监控设备模型的性能。\u003Cbr>\n• **智能编排**：根据执行进度和失败模式，在设备和云端模型之间无缝切换。\u003Cbr>\n• **成本-性能优化**：通过战略性的资源分配，在保持高任务成功率的同时，将云端调用减少约 10%。\n\n### 💾 移动代理的高效记忆机制\n• **长程推理**：通过多步思维链推理与反思性错误纠正来增强决策能力。\u003Cbr>\n• **基于文本的摘要**：将高分辨率屏幕截图压缩为紧凑的文本表示，以实现高效的内存管理。\u003Cbr>\n• **结构化上下文保留**：通过优化的令牌使用，在资源受限的环境中保持 10-20 步的历史上下文。\n\n---\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_OpenPhone_readme_7e8a0179144c.png\" style=\"zoom:100%;\" \u002F>\n\n---\n\n## 🧪 测试与评估\n\n### 单任务测试\n使用以下命令结构测试单个任务：\n\n```bash\npython eval.py -n test_name -c your path to config.yaml --task_id task_id\n```\n\n使用示例：\n\n```bash\npython eval.py -n all_cloud_v1_hyper -c .\u002Fconfigs\u002Fexample_xml_cloud_hyper.yaml --task_id zoom_1\n```\n\n### 批量评估脚本\n`.\u002Ftest_script` 目录下提供了便捷的批量测试脚本：\n\n• `all_test_cloud_v1_hyper.sh`：评估 AndroidLab 基准测试中的所有 138 个任务\u003Cbr>\n• `all_test_cloud_v1_hyper_add.sh`：评估四个额外移动应用的任务\u003Cbr>\n\n### 额外应用文档\n关于四个额外应用任务的完整详细信息，请参阅文档：[额外应用文档](.\u002Fdocs\u002Fnew_apps.md)\n\n---\n\n## 📊 结果生成\n\n### LLM 评估器设置\n所需配置：在 .\u002Fevaluation\u002Ftasks\u002Fllm_evaluator.py 中设置 LLM 服务凭证：\n\n• 第 10 行：API 配置\u003Cbr>\n• 第 12 行：服务 URL\u003Cbr>\n\n💡 增强功能：我们的实现用基于 LLM 的评估取代了 AndroidLab 基于规则的评估，提供了更细致、更准确的任务完成度评估。\n\n### 生成评估结果\n使用以下命令执行结果生成：\n\n```bash\npython generate_result.py --input_folder .\u002Flogs\u002Fevaluation\u002F --output_folder .\u002Flogs\u002Fevaluation\u002F --output_excel .\u002Flogs\u002Fevaluation\u002Ftest_name.xlsx\n```\n\n### 批量测试文件管理\n⚠️ 重要提示：当使用 .\u002Ftest_script\u002F 中的批量脚本时：\u003Cbr>\n• 需要手动转移：将脚本目录中生成的评估文件移动到 .\u002Flogs\u002F 目录下\u003Cbr>\n• 然后执行：运行上述结果生成命令\u003Cbr>\n• 错误预防：此步骤可防止文件路径冲突并确保正确的结果汇总\u003Cbr>\n\n---\n\n## 🎯 📊 OpenPhone 的关键评估发现\n\n### 🏆 小模型，大性能\n- **尺寸与性能**：OpenPhone-3B 实现了与 9B 模型相当的性能，同时保持了紧凑架构的部署优势。\n- **效率冠军**：确立了自己作为真正的“小巨人”的地位，挑战了移动 AI 中“越大越好”的假设。\n\n### 🥊 有竞争力的性能\n- **与专有模型对比**：在标准基准测试中，OpenPhone-3B 与专有模型的轻量级版本相比，表现出可观的性能。\n- **小模型的潜力**：展示了有希望的结果，验证了紧凑开源方法在移动代理开发中的可行性。\n\n### 🔄 设备-云端框架有效\n- **性能与效率兼备**：OpenPhone 的混合架构提供了接近最优的性能，同时显著减少了云端模型的使用。\n- **智能路由**：证明了智能任务路由可以在不牺牲能力的情况下创造实际的效率增益。\n\n### 🧠 更长的提示词并不总是有帮助\n- **上下文很重要**：扩展的提示策略只有在与足够强大的云端模型配对时才能提高性能。\n- **智能匹配**：强调了将推理复杂度与模型能力相匹配的重要性，而不是假设更长的提示词总是有帮助。\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_OpenPhone_readme_1c723ec5f1a4.png\" width=\"90%\"\u002F>\n\u003C\u002Fp>\n\n## 📈 手机代理的设备-云端分布分析\n\n为了评估我们混合方法的实际效率，我们测量了不同 MLLM 的关键指标：每个任务的平均总步数、由设备端模型与云端模型处理的步骤比例，以及与纯云端基线相比云端调用的减少量。\n\n### 📊 工作负载分布\n云端模型仍然处理大约 65% 的执行步骤，这反映了较小的设备端模型在复杂推理任务上的计算限制。\n\n### 💰 效率增益\n引入设备端处理实现了大约 10% 的云端 API 调用减少，转化为直接的成本节约和延迟降低。\n\n### 🎯 模型能力的影响\n像 GLM-4.5V 这样的先进云端模型显示出云端依赖性的减少幅度较小，因为其卓越的能力使其能够更独立地完成任务，而无需设备端协助。\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_OpenPhone_readme_379a33d6e2b5.png\" width=\"49%\"\u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_OpenPhone_readme_b2097f49145f.png\" width=\"47%\"\u002F>\n\u003C\u002Fp>\n\n## ⚡ 推理速度比较\n我们使用 vLLM 评估了不同 GPU 配置下每步的平均推理时间，以评估实际部署的可行性。请注意，由于上下文长度限制，GLM-4.1V-9B-Thinking 无法在单个 3090 GPU 上运行。\n\n\u003Cdiv align=\"center\">\n\n| 模型                      | GPU 配置      | 尺寸 | SR   | 时间成本 \u002F 步 |\n| ------------------------- | ------------- | ---- | ---- | ------------- |\n| Qwen2.5-VL-7B-Instruct | 单 3090       | 7B   | 10.1 | 6289.15 ms    |\n| OpenPhone              | 单 3090       | 3B   | 15.2 | 4170.63 ms    |\n| GLM-4.1V-9B-Thinking   | 双 3090       | 9B   | 24.6 | 14584.89 ms   |\n| Qwen2.5-VL-7B-Instruct | 双 3090       | 7B   | 10.1 | 4587.79 ms    |\n| OpenPhone              | 双 3090       | 3B   | 15.2 | 3524.25 ms    |\n\n\u003C\u002Fdiv>\n\u003C\u002Fp>\n\n### 🎯 速度优势\n- **明显胜出**：得益于其轻量级的 3B 架构，OpenPhone 展现出显著的推理速度优势。\n- **适合实际应用**：在计算资源受限的情况下，速度优势变得更加明显，符合典型的边缘部署场景。\n\n### 📊 量化比较\n- **快 3.5 倍**：单 3090 上的 OpenPhone 对比双 3090 上的 GLM-4.1V-9B-Thinking。\n- **快 4 倍**：双 3090 上的 OpenPhone 对比双 3090 上的 GLM-4.1V-9B-Thinking。\n- **OpenPhone 的轻量级优势**：GLM-4.1V-9B-Thinking 无法在单 3090 上运行，严重限制了其边缘部署选项。\n\n### 💡 实际意义\n权衡是明确的：虽然像 GLM-4.1V-9B-Thinking 这样更大的模型实现了更高的任务性能，但 OpenPhone 的速度优势使其更适合对响应时间和硬件约束有要求的实际设备端场景。\n\n---\n\n## 🌟 引用\n\n如果您发现这项工作对您的研究有帮助，请考虑引用我们的论文。\n\n```\n@article{jiang2025lightagent,\n  title={LightAgent: Mobile Agentic Foundation Models},\n  author={Jiang, Yangqin and Huang, Chao},\n  journal={arXiv preprint arXiv:2510.22009},\n  year={2025}\n}\n```\n\n## 🔗 相关项目\n\nOpenPhone 建立在优秀的开源项目之上。我们衷心感谢这些项目的作者和贡献者：\n\n- [AndroidLab](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FAndroid-Lab) - 基准测试框架。\n- [R1-V](https:\u002F\u002Fgithub.com\u002FStarsfieldAI\u002FR1-V) - GRPO 训练方法的具体实现细节。\n- [LLaMA Factory](https:\u002F\u002Fgithub.com\u002Fhiyouga\u002FLLaMA-Factory) - 支持高效模型微调的统一训练框架。\n\n## 📜 许可证\n\n本项目基于 [MIT 许可证](.\u002FLICENSE) 发布。\n\n\u003Cdiv align=\"center\">\n\n**如果本项目对您有帮助，请为我们点个 Star🌟**\n\n**🤖 用智能体赋能 AI 手机！**\n\n\u003Cbr>\n\n\u003Cp align=\"center\">\n  \u003Cem> ❤️ 感谢访问 ✨ OpenPhone！\u003C\u002Fem>\u003Cbr>\u003Cbr>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_OpenPhone_readme_c577995f48ab.png\" alt=\"访问量\">\n\u003C\u002Fp>","# OpenPhone 快速上手指南\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: Linux (推荐 Ubuntu 20.04+ 或同等版本)\n- **Python**: 3.9 或 3.10\n- **CUDA**: 11.8 或更高版本 (如需 GPU 推理)\n- **内存**: 至少 8GB RAM\n- **存储**: 至少 10GB 可用空间\n\n### 前置依赖\n确保已安装以下基础工具：\n- Git\n- pip (Python 包管理器)\n\n## 安装步骤\n\n1. **克隆仓库**\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002FHKUDS\u002FOpenPhone.git\n   cd OpenPhone\n   ```\n\n2. **创建并激活 Python 虚拟环境 (推荐)**\n   ```bash\n   python -m venv openphone_env\n   source openphone_env\u002Fbin\u002Factivate  # Linux\u002FmacOS\n   # 或 openphone_env\\Scripts\\activate  # Windows\n   ```\n\n3. **安装 Python 依赖包**\n   ```bash\n   pip install -r requirements.txt\n   ```\n   *提示：如遇网络问题，可使用国内镜像源加速下载，例如：*\n   ```bash\n   pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n   ```\n\n4. **下载模型 (可选，如需本地运行)**\n   - 模型可从 Hugging Face 获取：\n     ```bash\n     # 使用 huggingface-cli (需先登录)\n     huggingface-cli download hkuds\u002FOpenPhone_model --local-dir .\u002Fmodel_weights\n     ```\n   - 或直接从 Hugging Face 页面手动下载。\n\n## 基本使用\n\n### 1. 运行基础推理示例\n以下是一个使用 OpenPhone-3B 模型进行简单推理的脚本示例 (`demo_inference.py`)：\n\n```python\nimport torch\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\n# 加载模型和分词器 (请将路径替换为实际模型路径)\nmodel_path = \".\u002Fmodel_weights\"  # 或 \"hkuds\u002FOpenPhone_model\"\ntokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_path,\n    torch_dtype=torch.float16,\n    device_map=\"auto\",\n    trust_remote_code=True\n)\n\n# 准备输入\nprompt = \"分析当前手机屏幕并描述你可以执行的操作。\"\ninputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n\n# 生成响应\nwith torch.no_grad():\n    outputs = model.generate(**inputs, max_new_tokens=100)\nresponse = tokenizer.decode(outputs[0], skip_special_tokens=True)\n\nprint(\"模型响应：\", response)\n```\n\n运行脚本：\n```bash\npython demo_inference.py\n```\n\n### 2. 连接 Android 设备进行测试 (基于 AndroidLab)\n如需在真实或虚拟 Android 设备上测试智能体功能，请参考 `AndroidLab` 集成指南：\n1. 确保已安装 Android SDK 并配置 `adb`。\n2. 按照项目内 `AndroidLab` 相关文档设置设备连接。\n3. 运行提供的测试脚本与设备交互。\n\n### 3. 体验 PhoneClaw (iOS 智能管家)\n如需在 iOS 设备上运行自主智能体，请参考 `PhoneClaw` 子模块文档：\n```bash\ncd PhoneClaw\n# 按照 .\u002FPhoneClaw\u002FREADME.md 中的说明进行配置和运行\n```\n\n## 下一步\n- 查看 `.\u002Fexamples\u002F` 目录获取更多使用示例。\n- 访问 [Hugging Face 模型页面](https:\u002F\u002Fhuggingface.co\u002Fhkuds\u002FOpenPhone_model) 获取最新的模型权重和详细配置。\n- 参考项目 Wiki 或 `Communication.md` 加入社区讨论。","**场景背景**：一位独立开发者小张，正在开发一款新的健康饮食 App。他需要频繁地在自己的安卓测试手机上手动操作，以验证各种用户交互流程是否顺畅，例如注册登录、浏览食谱、记录饮食等，这个过程耗时且重复。\n\n### 没有 OpenPhone 时\n- **手动执行耗时冗长**：小张需要亲自在测试手机上一步步点击，完成从启动 App、输入测试账号到查看特定食谱页面的完整流程，每次验证都要花费 5-10 分钟。\n- **测试覆盖不全面且易出错**：复杂的多步骤操作（如先搜索食材，再筛选食谱，最后收藏）很容易因手误而中断，难以保证每次测试路径的一致性，边缘用例更容易被忽略。\n- **问题复现与调试困难**：当收到用户反馈“在某个界面点击提交后卡顿”时，小张很难精确复现用户的操作序列和环境状态，导致定位问题效率低下。\n\n### 使用 OpenPhone 后\n- **自动化执行提升效率**：小张通过自然语言向 OpenPhone 的智能体描述测试任务（如“请用测试账号登录，并浏览‘低碳水晚餐’分类下的前三个食谱”），它便能自动规划并执行操作，将验证时间从数分钟缩短至秒级。\n- **可靠且全面的交互测试**：OpenPhone 的智能体能精准、稳定地执行复杂的多步操作流程，并能通过其“执行-评估-修复”循环自动处理意外弹窗或界面变化，确保测试路径的完整性和可重复性，轻松覆盖更多场景。\n- **精准复现与辅助诊断**：借助 OpenPhone 对手机状态的深度感知和控制能力，小张可以录制或精确指令化任何用户报错的操作序列，一键复现问题，并能利用其分析能力快速定位可能的原因（如特定控件未加载）。\n\nOpenPhone 将小张从重复、琐碎的手动测试操作中解放出来，使其能专注于更核心的创意与开发工作，同时大幅提升了测试的可靠性与开发迭代速度。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHKUDS_OpenPhone_1c723ec5.png","HKUDS","✨Data Intelligence Lab@HKU✨","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FHKUDS_fc32cc87.jpg",null,"https:\u002F\u002Fsites.google.com\u002Fview\u002Fchaoh","https:\u002F\u002Fgithub.com\u002FHKUDS",[83,87,91,94],{"name":84,"color":85,"percentage":86},"Python","#3572A5",95.6,{"name":88,"color":89,"percentage":90},"Shell","#89e051",2.3,{"name":92,"color":93,"percentage":23},"Jupyter Notebook","#DA5B0B",{"name":95,"color":96,"percentage":97},"Makefile","#427819",0,757,151,"2026-04-05T00:32:23","MIT",4,"Android, iOS","需要，用于模型推理。支持消费级 GPU（8-12GB 显存）及移动设备 NPU。CUDA 版本未明确说明。","未说明",{"notes":107,"python":105,"dependencies":108},"该项目为移动端AI代理模型，核心是部署在手机等边缘设备上。运行环境需求主要围绕模型部署和推理，而非传统的PC端开发环境。需要下载 OpenPhone-3B 模型文件。包含独立的 iOS 代理组件 PhoneClaw。",[109,110,111,112],"torch","transformers","accelerate","vLLM",[15],[115,116,117],"mobile-agents","ai-phone","phone-agent","2026-03-27T02:49:30.150509","2026-04-06T05:35:32.215363",[121],{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},4027,"README中提到的“No Cloud Dependence”与论文中强调的“Device-cloud collaborative agent system”是否矛盾？","不矛盾。README中的“No Cloud Dependence”指的是我们训练的3B模型理论上支持部署在移动端设备以完成部分简单任务。而论文中强调“Device-cloud collaborative”是因为现阶段3B模型的性能仍有较大局限性，需要辅以先进的云端模型来提升任务成功率。","https:\u002F\u002Fgithub.com\u002FHKUDS\u002FOpenPhone\u002Fissues\u002F2",[]]