[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-taco-group--4KAgent":3,"tool-taco-group--4KAgent":62},[4,18,26,36,46,54],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",159636,2,"2026-04-17T23:33:34",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":42,"last_commit_at":43,"category_tags":44,"status":17},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[13,45],"插件",{"id":47,"name":48,"github_repo":49,"description_zh":50,"stars":51,"difficulty_score":32,"last_commit_at":52,"category_tags":53,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":55,"name":56,"github_repo":57,"description_zh":58,"stars":59,"difficulty_score":32,"last_commit_at":60,"category_tags":61,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[45,13,15,14],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":77,"owner_twitter":78,"owner_website":79,"owner_url":80,"languages":81,"stars":111,"forks":112,"last_commit_at":113,"license":114,"difficulty_score":115,"env_os":116,"env_gpu":117,"env_ram":118,"env_deps":119,"category_tags":127,"github_topics":128,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":144,"updated_at":145,"faqs":146,"releases":175},8901,"taco-group\u002F4KAgent","4KAgent","[NeurIPS 2025] 4KAgent: Agentic Any Image to 4K Super-Resolution. An intelligent computer vision agent that can magically restore any image to perfect-4K!","4KAgent 是一款荣获 NeurIPS 2025 收录的智能图像超分辨率工具，旨在将任意质量的图片“魔法般”地修复并提升至完美的 4K 分辨率。无论是严重模糊的老照片、低清的网络截图，还是复杂的科学显微图像乃至 AI 生成的画面，它都能通用处理，有效解决了传统算法在面对极端退化或特定领域图像时效果不佳的难题。\n\n其核心创新在于采用了多智能体协作架构：由“感知智能体”利用大型视觉语言模型分析图像内容与损伤情况并制定修复策略，再由“恢复智能体”执行包含反思与回滚机制的递归修复流程。此外，4KAgent 引入了质量驱动的混合专家策略（Q-MoE）以在每一步骤中优选最佳结果，并配备了专门的人脸增强管道和无需额外训练即可适配不同任务的配置模块。\n\n这款工具非常适合需要高质量图像放大的设计师、处理科研影像的研究人员、开发计算机视觉应用的工程师，以及希望提升个人照片清晰度的普通用户。通过智能化的流程设计，4KAgent 让高分辨率图像重建变得更加通用、精准且易于定制。","\u003Cdiv align=\"center\">\n\n\n\u003Ch1>4KAgent: Agentic Any Image to 4K Super-Resolution\u003C\u002Fh1>\n\n\u003Cdiv>\n    \u003Ca href='https:\u002F\u002Fyushenzuo.github.io' target='_blank'>Yushen Zuo\u003Csup>1\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    Qi Zheng\u003Csup>1†\u003C\u002Fsup>&emsp;\n    Mingyang Wu\u003Csup>1†\u003C\u002Fsup>&emsp;\n    Xinrui Jiang\u003Csup>2†\u003C\u002Fsup>&emsp;\n    \u003Ca href='https:\u002F\u002Fshadowiterator.github.io' target='_blank'>Renjie Li\u003Csup>1\u003C\u002Fsup>\u003C\u002Fa>&emsp;\u003Cbr>\n    \u003Ca href='https:\u002F\u002Fjianwang-cmu.github.io' target='_blank'>Jian Wang\u003Csup>3\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Fyzhang34.github.io\u002Fauthor\u002Fyide-zhang' target='_blank'>Yide Zhang\u003Csup>4\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Fgengchenmai.github.io' target='_blank'>Gengchen Mai\u003Csup>5\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Fcoilab.caltech.edu\u002Fmembers\u002Fdirectors-biography' target='_blank'>Lihong V. Wang\u003Csup>6\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Fwww.james-zou.com' target='_blank'>James Zou\u003Csup>2\u003C\u002Fsup>\u003C\u002Fa>&emsp;\u003Cbr>\n    \u003Ca href='https:\u002F\u002Fwww.xiaoyumu.com' target='_blank'>Xiaoyu Wang\u003Csup>7\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Ffaculty.ucmerced.edu\u002Fmhyang' target='_blank'>Ming-Hsuan Yang\u003Csup>8\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Fvztu.github.io' target='_blank'>Zhengzhong Tu\u003Csup>1*\u003C\u002Fsup>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\u003Cbr>\n\u003Cdiv>\n    \u003Csup>1\u003C\u002Fsup>Texas A&M University&emsp;  \u003Csup>2\u003C\u002Fsup>Stanford University&emsp;  \u003Csup>3\u003C\u002Fsup>Snap Inc.&emsp;  \u003Csup>4\u003C\u002Fsup>CU Boulder\u003Cbr>\n    \u003Csup>5\u003C\u002Fsup>UT Austin&emsp;  \u003Csup>6\u003C\u002Fsup>California Institute of Technology&emsp;  \u003Csup>7\u003C\u002Fsup>Topaz Labs&emsp;  \u003Csup>8\u003C\u002Fsup>UC Merced\u003Cbr>\n    \u003Csup>†\u003C\u002Fsup>Indicates Equal Contribution\u003Cbr>\n    \u003Csup>*\u003C\u002Fsup>Corresponding Author\n\u003C\u002Fdiv>\n\n\u003C!-- [[paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.07105) -->\n\u003Cbr>\n\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n[![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject%20Page-8A2BE2)](https:\u002F\u002F4kagent.github.io)&nbsp;\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv%20paper-2507.07105-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.07105)&nbsp;\n[![🤗 Dataset](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Dataset-DIV--4K--50-yellow)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYSZuo\u002FDIV4K-50)\n![visitors](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaco-group_4KAgent_readme_3d3fd09589a4.png)\n\n---\n\n\u003C\u002Fdiv>\n\n\n\u003Cp align=\"center\">\n  \u003Cstrong>\u003Cem>Accepted by NeurIPS 2025\u003C\u002Fem>\u003C\u002Fstrong>\n\u003C\u002Fp>\n\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaco-group_4KAgent_readme_f76c5d5638c4.jpg\" width=95%>\n\u003Cp>\n\n\n## Introduction\n\nWe present **4KAgent**, an agentic image super-resolution generalist designed to universally upscale any image to **4K resolution**, regardless of input type, degradation level, or domain. **4KAgent** offers these key features:\n\n- 🔥 **Framework**: **4KAgent** is the first AI agent framework for universal any-image-to-4K upscaling, capable of handling **all image categories**, ranging from classical and realistic degradations, extreme low-quality inputs, AI-generated imagery, to scientific imaging tasks such as remote sensing, microscopy, and biomedical inputs.\n\n- 🔥 **System Design**: A multi-agent system in **4KAgent**, the **Perception Agent** employs large vision-language models (VLMs) to analyze the content and distortion in the image and provide the restoration plan for the restoration agent to execute. The **Restoration Agent**, which sets up an execution—reflection—rollback procedure for recursive restoration and upscaling.\n\n- 🔥 **Q-MoE & Face Restoration pipeline**: In each restoration step of the restoration plan, we propose a Quality-Driven Mixture-of-Expert (**Q-MoE**) policy in execution and reflection to select the optimal image. We further develop a **face restoration pipeline** to enhance faces in images.\n\n- 🔥 **Profile Module**: To expand the applicability of **4KAgent**, we propose a **Profile Module** to bring the availability to customize the system for different restoration tasks. **4KAgent** can adapt to different restoration tasks without extra training.\n\n- 🔥 **[DIV4K-50 Dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYSZuo\u002FDIV4K-50)**: We build the **DIV4K-50** dataset as a challenging testset to upscale a low-quality (LQ) image in 256 × 256 resolution with multiple degradations to a high-quality (HQ) 4K image in 4096 × 4096 resolution.\n\n\n## Pipeline\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaco-group_4KAgent_readme_efd40fa2e728.png\" width=95%>\n\u003Cp>\n\n\n## Dependencies and Installation\n\nPlease refer to the [Installation Guide](installation\u002FInstallation.md) for detailed instructions on setting up the environment and installing dependencies.\n\n## Inference\n\n**Prerequest:** Before running 4KAgent, please fill in the API key in [config file](config.yml)\n\nThe inference of 4KAgent relies on profile, we present examples here:\n\n**Profiles use 'llama_vision' as the VLM in perception agent:**\n\n**Classic SR (ExpSR_s4_F)**\n```bash\nCUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \\\n  --input_dir .\u002Fassets\u002Fprofile_test_example\u002Fclassicsr \\\n  --output_dir .\u002Foutputs\u002F4KAgent_test\u002Fclassicsr \\\n  --profile_name ExpSR_s4_F \\\n  --tool_run_gpu_id 2\n```\n\n**Real-World SR (ExpSR_s4_P)**\n```bash\nCUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \\\n  --input_dir .\u002Fassets\u002Fprofile_test_example\u002Frealworldsr \\\n  --output_dir .\u002Foutputs\u002F4KAgent_test\u002Frealworldsr \\\n  --profile_name ExpSR_s4_P \\\n  --tool_run_gpu_id 2\n```\n\n**Profiles use 'depictqa' as the VLM in perception agent:**\n\n**Joint IR and 4K SR:**\n```bash\n# Set up depictqa in portal A:\ncd .\u002FDepictQA\nconda activate depictqa\nCUDA_VISIBLE_DEVICES=0 python src\u002Fapp_eval.py\n\n# 4KAgent inference in portal B:\nCUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \\\n  --input_dir .\u002Fassets\u002Fprofile_test_example\u002F4ksr \\\n  --output_dir .\u002Foutputs\u002F4KAgent_test\u002F4ksr \\\n  --profile_name FastGen4K_P \\\n  --tool_run_gpu_id 2\n```\n\nWe recommend the `FastGen4K_P` profile, which infers faster and has good perceptual quality. \n\n`tool_run_gpu_id` is used to specify the GPU to execute tools (restoration methods). For GPUs with larger VRAM, `tool_run_gpu_id` can be set as the same as `CUDA_VISIBLE_DEVICES`.\n\n**Old Photo 4K SR**\n```bash\n# Set up depictqa in portal A:\ncd .\u002FDepictQA\nconda activate depictqa\nCUDA_VISIBLE_DEVICES=0 python src\u002Fapp_eval.py\n\n# 4KAgent inference in portal B:\nCUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \\\n  --input_dir .\u002Fassets\u002Fprofile_test_example\u002Fopr \\\n  --output_dir .\u002Foutputs\u002F4KAgent_test\u002Fopr \\\n  --profile_name OldP4K_P \\\n  --tool_run_gpu_id 2\n```\n\n**Multiple Degradation Image Restoration**\n```bash\n# Set up depictqa in portal A:\ncd .\u002FDepictQA\nconda activate depictqa\nCUDA_VISIBLE_DEVICES=0 python src\u002Fapp_eval.py\n\n# 4KAgent inference in portal B:\nCUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \\\n  --input_dir .\u002Fassets\u002Fprofile_test_example\u002Fmir \\\n  --output_dir .\u002Foutputs\u002F4KAgent_test\u002Fmir \\\n  --profile_name GenMIR_P \\\n  --tool_run_gpu_id 2\n```\n\n\n## Profile Setting\n\nWe provide several example profiles in the `pipeline\u002Fprofiles` as references for different use cases. Users can customize their own profiles based on these examples.\n\n\n## DIV4K-50 Dataset\n\nWe provide the [DIV4K-50 dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYSZuo\u002FDIV4K-50) on 🤗 Hugging Face for easy access and reproducibility. To download the dataset, please ensure you have the huggingface_hub CLI installed:\n```bash\npython -m pip install \"huggingface_hub[cli]\"\n\n# run the following command to download the dataset to your local directory:\nhuggingface-cli download --repo-type dataset YSZuo\u002FDIV4K-50 --local-dir .\u002Fdataset\u002FDIV4K-50\n\n# unzip the dataset:\ncd .\u002Fdataset\u002FDIV4K-50\nunzip DIV4K-50.zip\n```\n\n\n## Useful Tools\n[1] Extract result images: [utils\u002Fimage_export.py](utils\u002Fimage_export.py) \n\nCurrently, 4KAgent will generate a folder which contains logs, images in inference. If we only need the final output image for calculating metrics (e.g., PSNR \u002F SSIM \u002F LPIPS \u002F ...), we can use this script to extract every `output` image into a new folder with their original image name.\n\n[2] Extract result toolchain: [utils\u002Ftoolchain_export.py](utils\u002Ftoolchain_export.py) \n\nIf we run multiple images and we want to know the tool-chain of 4KAgent for each image, we can use this script to extract every tool-chain of each image. For example,\n\n```\n001: defocus deblurring@diffplugin-brightening@gamma_correction-super-resolution@diffbir.\n002: defocus deblurring@drbnet-super-resolution@diffbir.\n003: defocus deblurring@restormer-super-resolution@pisasr.\n```\n\n[3] Extract result tool for face restoration: [utils\u002Fface_restoration_tool_export.py](utils\u002Fface_restoration_tool_export.py)\n\nIf we activate `face restoration` in the profile (set `FaceRestore` to true) and want to see which face restoration method is used, we can use this script. For example,\n\n```\n00006_01: codeformer\n00006_02: gfpgan\n00006_03: img\n```\n`img` means the original face.\n\n\n## Evaluation\nWe have multiple evaluation scripts in [eval](.\u002Feval\u002F) folder, which corresponding to different tasks:\n\n[1] [test_metrics_classic](.\u002Feval\u002Ftest_metrics_classic.py): `crop_border=4`, Used to evaluate images in Classic SR task. (Set5, Set14, B100, Urban100, Manga109)\n\n[2] [test_metrics](.\u002Feval\u002Ftest_metrics.py): Used to evaluate images in Real-World SR task. (RealSR, DRealSR)\n\n[3] [test_metrics_mio](.\u002Feval\u002Ftest_metrics_mio.py): Used to evaluate images in Multi-Degradation Restoration task. (MiO100)\n\n[4] [test_metrics_nr](.\u002Feval\u002Ftest_metrics_nr.py): Used to evaluate images with non-reference metrics (NIQE, MUSIQ, MANIQA (pipal), CLIPIQA). (RealSRSet (16x SR), DIV4K-50) We can also use [test_metrics_nr_low_gpu](.\u002Feval\u002Ftest_metrics_nr_low_gpu.py) if the VRAM of GPU is limited (\u003C24G).\n\n\n## Experiment Results\n\nWe evaluate 4KAgent on 11 different image SR tasks. The overall experiment results are summarized as follows:\n| Task                          | Dataset           | Profile(s)                                      | Scale Factor | Result |\n|-------------------------------|-------------------|-------------------------------------------------|--------------|--------|\n| Classical SR                  | Set5              | ExpSR-s4-F, ExpSR-s4-P, GenSR-s4-P              |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1Ju24iX9mC8Yg_8NLcy-XbnQpW5l0GbAu?usp=drive_link)       |\n| Classical SR                  | Set14             | ExpSR-s4-F, ExpSR-s4-P, GenSR-s4-P              |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1KaLBVceysfmqeZgjV1OAyjQ3N0vyrWux?usp=drive_link)       |\n| Classical SR                  | B100              | ExpSR-s4-F, ExpSR-s4-P, GenSR-s4-P              |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1G7MnvyDnP6bwCoanMGG-vcFCI1A6zXYH?usp=drive_link)       |\n| Classical SR                  | Urban100          | ExpSR-s4-F, ExpSR-s4-P, GenSR-s4-P              |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1Fd52RajdelcNSIfLAyXNWEZsVqVyMnvC?usp=drive_link)       |\n| Classical SR                  | Manga109          | ExpSR-s4-F, ExpSR-s4-P, GenSR-s4-P              |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1hnIA63GOiyawMs8siiW3MN7AhuEfXuVR?usp=drive_link)       |\n| Real-World SR                 | DRealSR           | ExpSR-s4-F, ExpSR-s4-P, GenSR-s4-P              |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1wDArrO8TjT56kxH0LF790hEQqQY9M3KE?usp=drive_link)       |\n| Real-World SR                 | RealSR            | ExpSR-s4-F, ExpSR-s4-P, GenSR-s4-P              |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1vSkBn54ypcqd6k1QlLpZvey58eBC_4CY?usp=drive_link)       |\n| Multiple-Degradation IR       | MiO100            | GenMIR-P                                        |    4 *  | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F12uC-KYuCaeoCA0RcgQHU3z9FuL_E0sG1?usp=drive_link)       |\n| Face Restoration              | WebPhoto-Test     | GenSRFR-s4-P                                    |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1hAGGg72A-tEtr78oqyN7JCdaLTr7kXzz?usp=drive_link)       |\n| 16x SR                        | RealSRSet         | Gen4K-P                                         |    16    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1K2pIG-PTc1GRzSmbxyvVSFpHcuI11TbA?usp=drive_link)       |\n| Joint IR + 4K SR              | DIV4K-50          | Gen4K-P                                         |    16    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1cDCRdfVzOrWX_AziLHInjJzhl5QR_rgN?usp=drive_link)       |\n| AIGC 4K SR **                    | GenAIBench-4K     | ExpSR-s4-P                                      |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1rw-evPx8w5uFtVqjVx97clKbDXsKB2Lx\u002Fview?usp=drive_link)       |\n| AIGC 4K SR **                    | DiffusionDB-4K    | ExpSR-s4-P                                      |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1L3XRiYb1_BiEmwvRtSkadWdicutY_OAF\u002Fview?usp=drive_link)       |\n| Remote Sensing SR             | AID               | AerSR-s4-F, AerSR-s4-P                          |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1Ntm1xQZmFUvw-UcSSmyYGTA5CsAY6NEL?usp=drive_link)       |\n| Remote Sensing SR             | DIOR              | AerSR-s4-F, AerSR-s4-P                          |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1lgVlYb0_ob2skc0in8qrSOoVyS1592iN?usp=drive_link)       |\n| Remote Sensing SR             | DOTA              | AerSR-s4-F, AerSR-s4-P, Aer4K-F, Aer4K-P        |    4, 16    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1Uxbu7bzJcO6L1ATZecfTxe7651SO_r1_?usp=drive_link)       |\n| Remote Sensing SR             | WorldStrat        | AerSR-s4-F, AerSR-s4-P                          |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1anwNZ4lzYw49g7X-1vBMje494hZp5jtZ?usp=drive_link)       |\n| Fluorescence Microscopy Image SR | SR-CACO-2       | ExpSR-s2-F, ExpSR-s4-F, ExpSR-s8-F             |    2, 4, 8    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1k42i-eLdxhdaOJRSB0DSLH4ihUrLfPiC?usp=drive_link)       |\n| Pathology Image SR            | bcSR              | ExpSR-s4-F, ExpSR-s8-F                          |    4, 8    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1aF2fA-NVbd9B61KBRqSi3XuYtvt7kZHB?usp=drive_link)       |\n| Medical Image SR              | Chest X-ray 2017  | ExpSR-s4-F                                      |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1cvplHEtYf6IKrYmqfgl2Gd9w1wdcD_TG?usp=drive_link)       |\n| Medical Image SR              | Chest X-ray 14    | ExpSR-s4-F                                      |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1AxiFdhQic821vSqObIPzIYRERJYYwp6k?usp=drive_link)       |\n| Medical Image SR              | US-CASE           | ExpSR-s4-F                                      |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1WB1hYNLcRf8AxjPgNsensIpGxF7JERs1?usp=drive_link)       |\n| Medical Image SR              | MMUS1K            | ExpSR-s4-F                                      |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F174V8ehE3PUApvLPCTWsv-i1XAAktVeb3?usp=drive_link)       |\n| Medical Image SR              | DRIVE             | ExpSR-s4-F                                      |    4    | [Result](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1CdXT6aoS_2AtQuJaz4KaODgh9S-ABRyQ?usp=drive_link)       |\n\n\n*: For LQ image which triggers `super-resolution` in 4KAgent with `GenMIR-P` profile (based on the resolution of the LQ image), the scale factor is set to 4.\n\n**: We use the standard sample prompt to evaluate the performance of 4KAgent in the AIGC domain. We employ no reference metrics (NIQE, MUSIQ-P, MANIQA, CLIPIQA) for evaluation, and we provide the test prompts for generation. (MUSIQ-P: a patch-applied variant that computes MUSIQ scores over non-overlapping 512 x 512 patches and averages them, thereby improving sensitivity to localized artifacts in ultra-high-resolution content.)\n\nWe present the naming convention and detail of profiles used in these tasks in [profile_setup](.\u002Fpipeline\u002Fprofiles\u002Fprofile_setup.md).\n\n## License\nThis project is released under the [Apache 2.0 license](LICENSE).\n\n## Contact\nIf you have any questions, please feel free to contact: `zuoyushen12@gmail.com`\n\n\n## Citation\nIf you find our work useful in your research, we gratefully request that you consider citing our paper:\n```bibtex\n@article{zuo20254kagent,\n      title={4KAgent: Agentic Any Image to 4K Super-Resolution}, \n      author={Yushen Zuo and Qi Zheng and Mingyang Wu and Xinrui Jiang and Renjie Li and Jian Wang and Yide Zhang and Gengchen Mai and Lihong V. Wang and James Zou and Xiaoyu Wang and Ming-Hsuan Yang and Zhengzhong Tu},\n      year={2025},\n      eprint={2507.07105},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.07105}, \n}\n```\n\n\n## Acknowledgements\n\nOur code is built upon [AgenticIR](https:\u002F\u002Fgithub.com\u002FKaiwen-Zhu\u002FAgenticIR), along with several excellent open-source restoration tools and vision-language models, which we concluded in [Toolbox](.\u002FToolbox.md). We gratefully acknowledge the authors for their valuable contributions to the community.\n\n","\u003Cdiv align=\"center\">\n\n\n\u003Ch1>4KAgent：基于智能体的任意图像至4K超分辨率重建\u003C\u002Fh1>\n\n\u003Cdiv>\n    \u003Ca href='https:\u002F\u002Fyushenzuo.github.io' target='_blank'>Yushen Zuo\u003Csup>1\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    Qi Zheng\u003Csup>1†\u003C\u002Fsup>&emsp;\n    Mingyang Wu\u003Csup>1†\u003C\u002Fsup>&emsp;\n    Xinrui Jiang\u003Csup>2†\u003C\u002Fsup>&emsp;\n    \u003Ca href='https:\u002F\u002Fshadowiterator.github.io' target='_blank'>Renjie Li\u003Csup>1\u003C\u002Fsup>\u003C\u002Fa>&emsp;\u003Cbr>\n    \u003Ca href='https:\u002F\u002Fjianwang-cmu.github.io' target='_blank'>Jian Wang\u003Csup>3\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Fyzhang34.github.io\u002Fauthor\u002Fyide-zhang' target='_blank'>Yide Zhang\u003Csup>4\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Fgengchenmai.github.io' target='_blank'>Gengchen Mai\u003Csup>5\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Fcoilab.caltech.edu\u002Fmembers\u002Fdirectors-biography' target='_blank'>Lihong V. Wang\u003Csup>6\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Fwww.james-zou.com' target='_blank'>James Zou\u003Csup>2\u003C\u002Fsup>\u003C\u002Fa>&emsp;\u003Cbr>\n    \u003Ca href='https:\u002F\u002Fwww.xiaoyumu.com' target='_blank'>Xiaoyu Wang\u003Csup>7\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Ffaculty.ucmerced.edu\u002Fmhyang' target='_blank'>Ming-Hsuan Yang\u003Csup>8\u003C\u002Fsup>\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Fvztu.github.io' target='_blank'>Zhengzhong Tu\u003Csup>1*\u003C\u002Fsup>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\u003Cbr>\n\u003Cdiv>\n    \u003Csup>1\u003C\u002Fsup>德克萨斯农工大学&emsp;  \u003Csup>2\u003C\u002Fsup>斯坦福大学&emsp;  \u003Csup>3\u003C\u002Fsup>Snap Inc.&emsp;  \u003Csup>4\u003C\u002Fsup>科罗拉多大学博尔德分校\u003Cbr>\n    \u003Csup>5\u003C\u002Fsup>德克萨斯大学奥斯汀分校&emsp;  \u003Csup>6\u003C\u002Fsup>加州理工学院&emsp;  \u003Csup>7\u003C\u002Fsup>Topaz Labs&emsp;  \u003Csup>8\u003C\u002Fsup>加州大学默塞德分校\u003Cbr>\n    \u003Csup>†\u003C\u002Fsup>表示共同第一作者\u003Cbr>\n    \u003Csup>*\u003C\u002Fsup>通讯作者\n\u003C\u002Fdiv>\n\n\u003C!-- [[paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.07105) -->\n\u003Cbr>\n\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n[![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject%20Page-8A2BE2)](https:\u002F\u002F4kagent.github.io)&nbsp;\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv%20paper-2507.07105-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.07105)&nbsp;\n[![🤗 Dataset](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Dataset-DIV--4K--50-yellow)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYSZuo\u002FDIV4K-50)\n![visitors](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaco-group_4KAgent_readme_3d3fd09589a4.png)\n\n---\n\n\u003C\u002Fdiv>\n\n\n\u003Cp align=\"center\">\n  \u003Cstrong>\u003Cem>已被NeurIPS 2025接收\u003C\u002Fem>\u003C\u002Fstrong>\n\u003C\u002Fp>\n\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaco-group_4KAgent_readme_f76c5d5638c4.jpg\" width=95%>\n\u003Cp>\n\n\n## 引言\n\n我们提出了**4KAgent**，一个基于智能体的通用图像超分辨率模型，旨在将任意图像无差别地提升至**4K分辨率**，无论输入类型、退化程度或领域如何。**4KAgent**具备以下关键特性：\n\n- 🔥 **框架**：**4KAgent**是首个用于通用任意图像至4K超分辨率的AI智能体框架，能够处理**所有图像类别**，从经典和现实中的退化图像、极端低质量输入，到AI生成图像，以及遥感、显微镜和生物医学等科学成像任务。\n\n- 🔥 **系统设计**：在**4KAgent**中，感知智能体采用大型视觉-语言模型（VLM）分析图像内容与失真，并制定修复方案供修复智能体执行。修复智能体则通过执行—反思—回滚的递归流程进行修复与超分辨率重建。\n\n- 🔥 **Q-MoE与人脸修复流水线**：在修复方案的每一步中，我们提出了一种基于质量驱动的专家混合（Q-MoE）策略，在执行与反思阶段选择最优图像。此外，我们还开发了**人脸修复流水线**，以增强图像中的人脸细节。\n\n- 🔥 **配置模块**：为扩展**4KAgent**的适用性，我们提出了**配置模块**，允许用户根据不同的修复任务自定义系统。**4KAgent**无需额外训练即可适应多种修复任务。\n\n- 🔥 **[DIV4K-50数据集](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYSZuo\u002FDIV4K-50)**：我们构建了**DIV4K-50**数据集，作为一项具有挑战性的测试集，用于将256×256分辨率的低质量（LQ）图像，经过多重退化后，提升至4096×4096分辨率的高质量（HQ）4K图像。\n\n\n## 流程\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaco-group_4KAgent_readme_efd40fa2e728.png\" width=95%>\n\u003Cp>\n\n\n## 依赖与安装\n\n请参阅[安装指南](installation\u002FInstallation.md)，获取关于环境搭建和依赖安装的详细说明。\n\n## 推理\n\n**前提条件**：在运行4KAgent之前，请在[配置文件](config.yml)中填写API密钥。\n\n4KAgent的推理依赖于配置文件，以下是示例：\n\n**使用‘llama_vision’作为感知智能体VLM的配置文件：**\n\n**经典SR（ExpSR_s4_F）**\n```bash\nCUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \\\n  --input_dir .\u002Fassets\u002Fprofile_test_example\u002Fclassicsr \\\n  --output_dir .\u002Foutputs\u002F4KAgent_test\u002Fclassicsr \\\n  --profile_name ExpSR_s4_F \\\n  --tool_run_gpu_id 2\n```\n\n**真实世界SR（ExpSR_s4_P）**\n```bash\nCUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \\\n  --input_dir .\u002Fassets\u002Fprofile_test_example\u002Frealworldsr \\\n  --output_dir .\u002Foutputs\u002F4KAgent_test\u002Frealworldsr \\\n  --profile_name ExpSR_s4_P \\\n  --tool_run_gpu_id 2\n```\n\n**使用‘depictqa’作为感知智能体VLM的配置文件：**\n\n**红外与4K SR联合处理：**\n```bash\n# 在门户A中设置depictqa：\ncd .\u002FDepictQA\nconda activate depictqa\nCUDA_VISIBLE_DEVICES=0 python src\u002Fapp_eval.py\n\n# 在门户B中运行4KAgent推理：\nCUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \\\n  --input_dir .\u002Fassets\u002Fprofile_test_example\u002F4ksr \\\n  --output_dir .\u002Foutputs\u002F4KAgent_test\u002F4ksr \\\n  --profile_name FastGen4K_P \\\n  --tool_run_gpu_id 2\n```\n\n我们推荐`FastGen4K_P`配置文件，它推理速度更快，且具有良好的感知质量。\n\n`tool_run_gpu_id`用于指定执行工具（修复方法）的GPU。对于显存较大的GPU，可以将`tool_run_gpu_id`设置为与`CUDA_VISIBLE_DEVICES`相同。\n\n**老照片4K SR**\n```bash\n# 在门户A中设置depictqa：\ncd .\u002FDepictQA\nconda activate depictqa\nCUDA_VISIBLE_DEVICES=0 python src\u002Fapp_eval.py\n\n# 在门户B中运行4KAgent推理：\nCUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \\\n  --input_dir .\u002Fassets\u002Fprofile_test_example\u002Fopr \\\n  --output_dir .\u002Foutputs\u002F4KAgent_test\u002Fopr \\\n  --profile_name OldP4K_P \\\n  --tool_run_gpu_id 2\n```\n\n**多重退化图像修复**\n```bash\n# 在门户A中设置depictqa：\ncd .\u002FDepictQA\nconda activate depictqa\nCUDA_VISIBLE_DEVICES=0 python src\u002Fapp_eval.py\n\n# 在门户B中运行4KAgent推理：\nCUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \\\n  --input_dir .\u002Fassets\u002Fprofile_test_example\u002Fmir \\\n  --output_dir .\u002Foutputs\u002F4KAgent_test\u002Fmir \\\n  --profile_name GenMIR_P \\\n  --tool_run_gpu_id 2\n```\n\n\n## 配置文件设置\n\n我们在`pipeline\u002Fprofiles`目录下提供了若干示例配置文件，供不同应用场景参考。用户可根据这些示例自定义自己的配置文件。\n\n## DIV4K-50 数据集\n\n我们已在 🤗 Hugging Face 上提供了 [DIV4K-50 数据集](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYSZuo\u002FDIV4K-50)，方便大家访问和复现实验。要下载该数据集，请确保已安装 huggingface_hub CLI：\n```bash\npython -m pip install \"huggingface_hub[cli]\"\n\n# 运行以下命令将数据集下载到本地目录：\nhuggingface-cli download --repo-type dataset YSZuo\u002FDIV4K-50 --local-dir .\u002Fdataset\u002FDIV4K-50\n\n# 解压数据集：\ncd .\u002Fdataset\u002FDIV4K-50\nunzip DIV4K-50.zip\n```\n\n\n## 实用工具\n[1] 提取结果图像：[utils\u002Fimage_export.py](utils\u002Fimage_export.py) \n\n目前，4KAgent 会生成一个包含日志和推理过程中生成图像的文件夹。如果我们仅需要最终输出图像来计算指标（例如 PSNR \u002F SSIM \u002F LPIPS 等），可以使用此脚本将每个 `output` 图像按其原始文件名提取到一个新的文件夹中。\n\n[2] 提取结果工具链：[utils\u002Ftoolchain_export.py](utils\u002Ftoolchain_export.py) \n\n如果我们对多张图像进行推理，并希望了解每张图像对应的 4KAgent 工具链，可以使用此脚本提取每张图像的工具链。例如：\n\n```\n001: defocus deblurring@diffplugin-brightening@gamma_correction-super-resolution@diffbir.\n002: defocus deblurring@drbnet-super-resolution@diffbir.\n003: defocus deblurring@restormer-super-resolution@pisasr.\n```\n\n[3] 提取人脸修复结果工具：[utils\u002Fface_restoration_tool_export.py](utils\u002Fface_restoration_tool_export.py)\n\n如果在配置文件中启用了“人脸修复”功能（将 `FaceRestore` 设置为 `true`），并想查看具体使用了哪种人脸修复方法，可以使用此脚本。例如：\n\n```\n00006_01: codeformer\n00006_02: gfpgan\n00006_03: img\n```\n其中，“img”表示原始人脸图像。\n\n\n## 评估\n我们在 [eval](.\u002Feval\u002F) 文件夹中提供了多个评估脚本，分别对应不同的任务：\n\n[1] [test_metrics_classic](.\u002Feval\u002Ftest_metrics_classic.py)：`crop_border=4`，用于评估经典超分辨率任务中的图像。（Set5、Set14、B100、Urban100、Manga109）\n\n[2] [test_metrics](.\u002Feval\u002Ftest_metrics.py)：用于评估真实世界超分辨率任务中的图像。（RealSR、DRealSR）\n\n[3] [test_metrics_mio](.\u002Feval\u002Ftest_metrics_mio.py)：用于评估多退化修复任务中的图像。（MiO100）\n\n[4] [test_metrics_nr](.\u002Feval\u002Ftest_metrics_nr.py)：用于评估无参考指标的图像。（NIQE、MUSIQ、MANIQA (pipal)、CLIPIQA）。（RealSRSet（16倍超分辨率）、DIV4K-50）如果 GPU 显存有限（小于 24G），也可以使用 [test_metrics_nr_low_gpu](.\u002Feval\u002Ftest_metrics_nr_low_gpu.py)。\n\n## 实验结果\n\n我们在11个不同的图像超分辨率任务上评估了4KAgent。总体实验结果总结如下：\n\n| 任务                          | 数据集           | 配置文件                                      | 缩放因子 | 结果 |\n|-------------------------------|-------------------|-------------------------------------------------|--------------|--------|\n| 经典超分辨率                  | Set5              | ExpSR-s4-F, ExpSR-s4-P, GenSR-s4-P              |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1Ju24iX9mC8Yg_8NLcy-XbnQpW5l0GbAu?usp=drive_link)       |\n| 经典超分辨率                  | Set14             | ExpSR-s4-F, ExpSR-s4-P, GenSR-s4-P              |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1KaLBVceysfmqeZgjV1OAyjQ3N0vyrWux?usp=drive_link)       |\n| 经典超分辨率                  | B100              | ExpSR-s4-F, ExpSR-s4-P, GenSR-s4-P              |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1G7MnvyDnP6bwCoanMGG-vcFCI1A6zXYH?usp=drive_link)       |\n| 经典超分辨率                  | Urban100          | ExpSR-s4-F, ExpSR-s4-P, GenSR-s4-P              |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1Fd52RajdelcNSIfLAyXNWEZsVqVyMnvC?usp=drive_link)       |\n| 经典超分辨率                  | Manga109          | ExpSR-s4-F, ExpSR-s4-P, GenSR-s4-P              |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1hnIA63GOiyawMs8siiW3MN7AhuEfXuVR?usp=drive_link)       |\n| 现实场景超分辨率                 | DRealSR           | ExpSR-s4-F, ExpSR-s4-P, GenSR-s4-P              |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1wDArrO8TjT56kxH0LF790hEQqQY9M3KE?usp=drive_link)       |\n| 现实场景超分辨率                 | RealSR            | ExpSR-s4-F, ExpSR-s4-P, GenSR-s4-P              |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1vSkBn54ypcqd6k1QlLpZvey58eBC_4CY?usp=drive_link)       |\n| 多重退化红外图像超分辨率         | MiO100            | GenMIR-P                                        |    4 *  | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F12uC-KYuCaeoCA0RcgQHU3z9FuL_E0sG1?usp=drive_link)       |\n| 人脸修复                      | WebPhoto-Test     | GenSRFR-s4-P                                    |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1hAGGg72A-tEtr78oqyN7JCdaLTr7kXzz?usp=drive_link)       |\n| 16倍超分辨率                  | RealSRSet         | Gen4K-P                                         |    16    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1K2pIG-PTc1GRzSmbxyvVSFpHcuI11TbA?usp=drive_link)       |\n| 红外与4K超分辨率联合处理         | DIV4K-50          | Gen4K-P                                         |    16    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1cDCRdfVzOrWX_AziLHInjJzhl5QR_rgN?usp=drive_link)       |\n| AIGC 4K超分辨率 **                    | GenAIBench-4K     | ExpSR-s4-P                                      |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1rw-evPx8w5uFtVqjVx97clKbDXsKB2Lx\u002Fview?usp=drive_link)       |\n| AIGC 4K超分辨率 **                    | DiffusionDB-4K    | ExpSR-s4-P                                      |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1L3XRiYb1_BiEmwvRtSkadWdicutY_OAF\u002Fview?usp=drive_link)       |\n| 遥感图像超分辨率               | AID               | AerSR-s4-F, AerSR-s4-P                          |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1Ntm1xQZmFUvw-UcSSmyYGTA5CsAY6NEL?usp=drive_link)       |\n| 遥感图像超分辨率               | DIOR              | AerSR-s4-F, AerSR-s4-P                          |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1lgVlYb0_ob2skc0in8qrSOoVyS1592iN?usp=drive_link)       |\n| 遥感图像超分辨率               | DOTA              | AerSR-s4-F, AerSR-s4-P, Aer4K-F, Aer4K-P        |    4, 16    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1Uxbu7bzJcO6L1ATZecfTxe7651SO_r1_?usp=drive_link)       |\n| 遥感图像超分辨率               | WorldStrat        | AerSR-s4-F, AerSR-s4-P                          |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1anwNZ4lzYw49g7X-1vBMje494hZp5jtZ?usp=drive_link)       |\n| 荧光显微镜图像超分辨率         | SR-CACO-2       | ExpSR-s2-F, ExpSR-s4-F, ExpSR-s8-F             |    2, 4, 8    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1k42i-eLdxhdaOJRSB0DSLH4ihUrLfPiC?usp=drive_link)       |\n| 病理图像超分辨率              | bcSR              | ExpSR-s4-F, ExpSR-s8-F                          |    4, 8    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1aF2fA-NVbd9B61KBRqSi3XuYtvt7kZHB?usp=drive_link)       |\n| 医学图像超分辨率              | 胸部X光2017      | ExpSR-s4-F                                      |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1cvplHEtYf6IKrYmqfgl2Gd9w1wdcD_TG?usp=drive_link)       |\n| 医学图像超分辨率              | 胸部X光14        | ExpSR-s4-F                                      |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1AxiFdhQic821vSqObIPzIYRERJYYwp6k?usp=drive_link)       |\n| 医学图像超分辨率              | US-CASE           | ExpSR-s4-F                                      |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1WB1hYNLcRf8AxjPgNsensIpGxF7JERs1?usp=drive_link)       |\n| 医学图像超分辨率              | MMUS1K            | ExpSR-s4-F                                      |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F174V8ehE3PUApvLPCTWsv-i1XAAktVeb3?usp=drive_link)       |\n| 医学图像超分辨率              | DRIVE             | ExpSR-s4-F                                      |    4    | [结果](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1CdXT6aoS_2AtQuJaz4KaODgh9S-ABRyQ?usp=drive_link)       |\n\n\n*: 对于使用`GenMIR-P`配置文件在4KAgent中触发`超分辨率`的低质量图像（基于低质量图像的分辨率），缩放因子被设置为4。\n\n**: 我们使用标准样本提示来评估4KAgent在AIGC领域的性能。我们未采用无参考指标（NIQE、MUSIQ-P、MANIQA、CLIPIQA）进行评估，并提供了用于生成的测试提示。（MUSIQ-P：一种应用于补丁的方法，它在不重叠的512×512补丁上计算MUSIQ分数并取平均值，从而提高对超高分辨率内容中局部伪影的敏感性。）\n\n我们在[profile_setup](.\u002Fpipeline\u002Fprofiles\u002Fprofile_setup.md)中介绍了这些任务中使用的配置文件命名规范及详细信息。\n\n## 许可证\n本项目根据[Apache 2.0许可证](LICENSE)发布。\n\n## 联系方式\n如有任何问题，请随时联系：`zuoyushen12@gmail.com`\n\n## 引用\n如果您在研究中使用了我们的工作，我们诚挚地希望您能考虑引用我们的论文：\n```bibtex\n@article{zuo20254kagent,\n      title={4KAgent: 基于智能体的任意图像至4K超分辨率}, \n      author={Yushen Zuo 和 Qi Zheng 和 Mingyang Wu 和 Xinrui Jiang 和 Renjie Li 和 Jian Wang 和 Yide Zhang 和 Gengchen Mai 和 Lihong V. Wang 和 James Zou 和 Xiaoyu Wang 和 Ming-Hsuan Yang 和 Zhengzhong Tu},\n      year={2025},\n      eprint={2507.07105},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.07105}, \n}\n```\n\n\n## 致谢\n\n我们的代码基于 [AgenticIR](https:\u002F\u002Fgithub.com\u002FKaiwen-Zhu\u002FAgenticIR)，同时还借鉴了若干优秀的开源图像修复工具和视觉语言模型，这些内容已在 [工具箱](.\u002FToolbox.md) 中列出。我们衷心感谢各位作者对社区所做的宝贵贡献。","# 4KAgent 快速上手指南\n\n**4KAgent** 是一个通用的智能体图像超分辨率框架，能够将任意类型的低质量图像（包括经典退化、真实世界退化、AI 生成图像、遥感、显微及生物医学图像等）统一 upscale 至 **4K 分辨率**。\n\n## 1. 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**: Linux (推荐 Ubuntu)\n*   **GPU**: NVIDIA GPU (支持 CUDA)，建议显存 ≥ 24GB 以获得最佳体验（小显存可使用低显存评估脚本，但推理仍建议大显存）。\n*   **Python**: 3.8 或更高版本\n*   **依赖管理**: Conda (推荐)\n*   **API Key**: 本项目依赖视觉语言模型 (VLM)，使用前需在配置文件中填入对应的 API Key。\n\n## 2. 安装步骤\n\n### 2.1 克隆项目\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ftaco-group\u002F4KAgent.git\ncd 4KAgent\n```\n\n### 2.2 创建虚拟环境并安装依赖\n详细的环境配置请参考项目自带的 `installation\u002FInstallation.md`，以下是基础安装流程：\n\n```bash\n# 创建 conda 环境\nconda create -n 4kagent python=3.9 -y\nconda activate 4kagent\n\n# 安装 PyTorch (请根据您的 CUDA 版本调整，以下为示例)\npip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n\n# 安装项目核心依赖\npip install -r requirements.txt\n```\n\n> **注意**：如果部分依赖下载缓慢，可尝试使用国内镜像源加速：\n> `pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n\n### 2.3 配置 API Key\n运行前必须配置 VLM 的 API Key。编辑根目录下的 `config.yml` 文件，填入您的密钥。\n\n## 3. 基本使用\n\n4KAgent 基于 **Profile（配置文件）** 运行，针对不同类型的图像（如经典超分、真实世界超分、老照片修复等）提供了预设 Profile。\n\n### 3.1 下载测试数据（可选）\n您可以使用提供的示例数据进行测试：\n```bash\n# 确保已安装 huggingface-cli\npip install \"huggingface_hub[cli]\"\n\n# 下载 DIV4K-50 数据集或其他测试样本（根据实际需求）\n# 此处以手动放置图片到 input_dir 为例，无需强制下载完整数据集即可运行\n```\n\n### 3.2 运行推理\n\n#### 场景 A：经典超分辨率 (Classic SR)\n使用 `llama_vision` 作为感知代理，适用于标准退化图像。\n\n```bash\nCUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \\\n  --input_dir .\u002Fassets\u002Fprofile_test_example\u002Fclassicsr \\\n  --output_dir .\u002Foutputs\u002F4KAgent_test\u002Fclassicsr \\\n  --profile_name ExpSR_s4_F \\\n  --tool_run_gpu_id 2\n```\n\n#### 场景 B：真实世界超分辨率 (Real-World SR)\n适用于包含复杂噪声和模糊的真实拍摄图像。\n\n```bash\nCUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \\\n  --input_dir .\u002Fassets\u002Fprofile_test_example\u002Frealworldsr \\\n  --output_dir .\u002Foutputs\u002F4KAgent_test\u002Frealworldsr \\\n  --profile_name ExpSR_s4_P \\\n  --tool_run_gpu_id 2\n```\n\n#### 场景 C：老照片修复与多重退化恢复 (需部署 DepictQA)\n对于老照片 (`OldP4K_P`) 或多重退化 (`GenMIR_P`) 任务，需要额外启动 `DepictQA` 服务作为感知代理。\n\n**步骤 1：启动 DepictQA 服务 (终端 A)**\n```bash\ncd .\u002FDepictQA\nconda activate depictqa  # 假设已单独配置 depictqa 环境\nCUDA_VISIBLE_DEVICES=0 python src\u002Fapp_eval.py\n```\n\n**步骤 2：运行 4KAgent 推理 (终端 B)**\n```bash\nCUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \\\n  --input_dir .\u002Fassets\u002Fprofile_test_example\u002Fopr \\\n  --output_dir .\u002Foutputs\u002F4KAgent_test\u002Fopr \\\n  --profile_name OldP4K_P \\\n  --tool_run_gpu_id 2\n```\n\n### 3.3 参数说明\n*   `--input_dir`: 输入低质量图像文件夹路径。\n*   `--output_dir`: 输出高清 4K 图像保存路径。\n*   `--profile_name`: 预设配置文件名（位于 `pipeline\u002Fprofiles`），决定处理策略。\n    *   推荐尝试 `FastGen4K_P`：推理速度较快且感知质量良好。\n*   `--tool_run_gpu_id`: 指定执行具体修复工具（Restoration Tools）的 GPU ID。若显存充足，可与 `CUDA_VISIBLE_DEVICES` 设为相同值。\n\n### 3.4 结果提取\n推理完成后，输出目录包含日志和中间过程图。若只需最终结果用于指标计算，可使用工具脚本提取：\n\n```bash\n# 提取最终输出图像\npython utils\u002Fimage_export.py --input_dir .\u002Foutputs\u002F4KAgent_test\u002Fxxx --output_dir .\u002Ffinal_results\n\n# 查看每张图使用的工具链\npython utils\u002Ftoolchain_export.py --input_dir .\u002Foutputs\u002F4KAgent_test\u002Fxxx\n```","一位数字档案管理员正在处理一批珍贵的 20 世纪老照片，这些照片不仅分辨率极低（仅 256x256），还混杂着严重的划痕、噪点以及模糊的人脸细节，急需修复并放大至 4K 标准以供高清展览使用。\n\n### 没有 4KAgent 时\n- **工具碎片化严重**：需要分别使用去噪软件、超分模型和专门的人脸修复工具，手动串联流程极易出错且耗时。\n- **细节丢失与伪影**：传统算法在极端低质输入下容易产生模糊或奇怪的纹理伪影，无法还原真实的胶片质感。\n- **人脸修复失败**：通用超分模型对老旧照片中模糊不清的五官往往无能为力，导致人物面部扭曲或无法识别。\n- **缺乏智能判断**：无法根据照片具体的退化类型（如划痕 vs 噪点）自动调整策略，只能套用固定参数，效果参差不齐。\n\n### 使用 4KAgent 后\n- **一站式智能代理**：4KAgent 的多智能体系统自动分析图像退化情况，一键执行从感知、规划到修复的全流程，无需人工干预。\n- **高质量细节重建**：借助质量驱动的混合专家策略（Q-MoE），能精准去除噪点并生成自然的 4K 高频细节，完美保留历史韵味。\n- **专属人脸增强**：内置的人脸修复流水线专门针对模糊五官进行优化，即使原图极度模糊也能还原清晰、自然的面部特征。\n- **自适应场景处理**：无论是显微图像还是老旧胶片，4KAgent 的配置文件模块能自动适配不同任务，无需额外训练即可达到最佳效果。\n\n4KAgent 将繁琐的多步图像处理转化为智能化的单次交互，让任何低质图像都能以完美的 4K 画质重获新生。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaco-group_4KAgent_f76c5d56.jpg","taco-group","TACO-Group","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ftaco-group_8c0e4012.png","Trustworthy, Autonomous, Human-Centered, Embodied (TACO) Intelligence Group @ Texas A&M University",null,"_vztu","taco-group.github.io","https:\u002F\u002Fgithub.com\u002Ftaco-group",[82,86,90,94,97,101,104,108],{"name":83,"color":84,"percentage":85},"Python","#3572A5",76.2,{"name":87,"color":88,"percentage":89},"Jupyter Notebook","#DA5B0B",23.3,{"name":91,"color":92,"percentage":93},"MATLAB","#e16737",0.2,{"name":95,"color":96,"percentage":93},"Shell","#89e051",{"name":98,"color":99,"percentage":100},"HTML","#e34c26",0.1,{"name":102,"color":103,"percentage":100},"JavaScript","#f1e05a",{"name":105,"color":106,"percentage":107},"CSS","#663399",0,{"name":109,"color":110,"percentage":107},"Dockerfile","#384d54",785,45,"2026-04-17T08:17:28","Apache-2.0",4,"Linux","必需 NVIDIA GPU。示例命令使用多卡并行（感知代理、推理主进程、工具执行分别占用不同 GPU）。显存需求较高：运行非参考指标评估时，若显存小于 24GB 需使用低显存脚本；建议大显存显卡以支持 4K 分辨率生成及多模型并发。","未说明",{"notes":120,"python":121,"dependencies":122},"1. 架构为多智能体系统，包含感知代理（需调用 LLaMA-Vision 或 DepictQA 等视觉语言模型）和修复代理。\n2. 必须配置 API Key 到 config.yml 文件才能运行。\n3. 支持多 GPU 部署：感知代理和具体修复工具可指定在不同 GPU 上运行（通过 --tool_run_gpu_id 参数）。\n4. 若使用 DepictQA 作为感知模型，需单独启动其服务端口。\n5. 数据集 DIV4K-50 需通过 huggingface-cli 下载并解压。","未说明 (通过 conda 环境管理)",[123,124,125,126],"torch","transformers (用于 VLM)","huggingface_hub","conda",[14,15,35,13],[129,130,131,132,133,134,135,136,137,138,139,140,141,142,143],"agent","agentic-ai","computer-vision","image-enhancement","image-processing","image-restoration","large-language-models","low-level","super-resolution","vision-language-models","workflow","llm","mllm","neurips","neurips-2025","2026-03-27T02:49:30.150509","2026-04-18T14:25:59.304668",[147,152,157,162,167,171],{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},39913,"代码和预训练模型是否已经发布？如何获取？","是的，代码库和评估脚本已经发布，您可以直接在仓库中尝试使用。所有预训练模型来自其原始仓库，但项目方已确保可以通过单个脚本轻松配置。此外，预训练模型集合已发布在 Hugging Face 上：https:\u002F\u002Fhuggingface.co\u002FYSZuo\u002F4KAgent-Toolbox-Pretrained-Models，DIV4K-50 数据集也已发布：https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYSZuo\u002FDIV4K-50。团队还计划准备 Docker 环境以简化部署。","https:\u002F\u002Fgithub.com\u002Ftaco-group\u002F4KAgent\u002Fissues\u002F1",{"id":153,"question_zh":154,"answer_zh":155,"source_url":156},39914,"是否可以使用本地 VLM（视觉语言模型）进行图像描述，而不是依赖在线 API？","可以。您需要先在 `_llm_` 文件夹下创建一个新的 `.py` 文件，并编写对应本地 VLM 的接口函数代码。然后，在管道代码文件（`.\u002Fpipeline\u002Fthe4kagent_pipeline.py`）中引用该文件。具体实现可参考现有的 `.\u002Fllm\u002Fllama_vision.py` 和 `.\u002Fllm\u002Fqwen_vl.py` 文件作为模板。","https:\u002F\u002Fgithub.com\u002Ftaco-group\u002F4KAgent\u002Fissues\u002F5",{"id":158,"question_zh":159,"answer_zh":160,"source_url":161},39915,"示例图片中的“超分辨率”图像是否正确？","经核实，部分示例图片确实存在错误。例如，皇家马德里体育城的图片实际上是来自 Google Earth 的高清参考图（拍摄于 2016 年 8 月 2 日），而非 4KAgent 的输出结果。维护者确认将把项目网站上的图片替换为正确的 4KAgent 生成结果。代码和实验结果已在仓库中发布供测试。","https:\u002F\u002Fgithub.com\u002Ftaco-group\u002F4KAgent\u002Fissues\u002F4",{"id":163,"question_zh":164,"answer_zh":165,"source_url":166},39916,"4KAgent 的预训练模型和 DIV4K-50 数据集在哪里下载？","4KAgent 使用的预训练模型集合已上传至 Hugging Face：https:\u002F\u002Fhuggingface.co\u002FYSZuo\u002F4KAgent-Toolbox-Pretrained-Models。DIV4K-50 数据集也已发布在 Hugging Face Datasets：https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYSZuo\u002FDIV4K-50。您可以使用 `load_dataset(\"YSZuo\u002FDIV4K-50\")` 直接加载数据集。","https:\u002F\u002Fgithub.com\u002Ftaco-group\u002F4KAgent\u002Fissues\u002F2",{"id":168,"question_zh":169,"answer_zh":170,"source_url":151},39917,"项目是否支持 Docker 环境部署？","目前团队已明确表示计划准备 Docker 环境，以便用户更容易地配置和运行项目。虽然具体镜像尚未在评论中给出下载链接，但代码库已发布，用户可以先行通过脚本配置环境，后续可关注仓库更新获取 Docker 支持。",{"id":172,"question_zh":173,"answer_zh":174,"source_url":161},39918,"如何验证 4KAgent 的去噪能力，特别是针对 Sentinel-1 SAR 图像？","目前代码和实验结果已发布在仓库中，您可以下载代码自行测试去噪功能。虽然评论中未提供针对 Sentinel-1 SAR 图像的具体预设脚本，但既然代码已开源，您可以参考已有的管线代码（`.\u002Fpipeline\u002Fthe4kagent_pipeline.py`）并调整输入数据格式来测试 SAR 图像的去噪效果。",[176],{"id":177,"version":178,"summary_zh":179,"released_at":180},323417,"v1.0","4KAgent 的初始版本软件已发布。","2025-09-24T00:34:26"]