[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-ziqihuangg--ReVersion":3,"tool-ziqihuangg--ReVersion":64},[4,17,26,40,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,2,"2026-04-03T11:11:01",[13,14,15],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":23,"last_commit_at":32,"category_tags":33,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,34,35,36,15,37,38,13,39],"数据工具","视频","插件","其他","语言模型","音频",{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":10,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,38,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74913,"2026-04-05T10:44:17",[38,14,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":23,"last_commit_at":62,"category_tags":63,"status":16},2471,"tesseract","tesseract-ocr\u002Ftesseract","Tesseract 是一款历史悠久且备受推崇的开源光学字符识别（OCR）引擎，最初由惠普实验室开发，后由 Google 维护，目前由全球社区共同贡献。它的核心功能是将图片中的文字转化为可编辑、可搜索的文本数据，有效解决了从扫描件、照片或 PDF 文档中提取文字信息的难题，是数字化归档和信息自动化的重要基础工具。\n\n在技术层面，Tesseract 展现了强大的适应能力。从版本 4 开始，它引入了基于长短期记忆网络（LSTM）的神经网络 OCR 引擎，显著提升了行识别的准确率；同时，为了兼顾旧有需求，它依然支持传统的字符模式识别引擎。Tesseract 原生支持 UTF-8 编码，开箱即用即可识别超过 100 种语言，并兼容 PNG、JPEG、TIFF 等多种常见图像格式。输出方面，它灵活支持纯文本、hOCR、PDF、TSV 等多种格式，方便后续数据处理。\n\nTesseract 主要面向开发者、研究人员以及需要构建文档处理流程的企业用户。由于它本身是一个命令行工具和库（libtesseract），不包含图形用户界面（GUI），因此最适合具备一定编程能力的技术人员集成到自动化脚本或应用程序中",73286,"2026-04-03T01:56:45",[13,14],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":82,"owner_website":83,"owner_url":84,"languages":85,"stars":90,"forks":91,"last_commit_at":92,"license":93,"difficulty_score":10,"env_os":94,"env_gpu":95,"env_ram":96,"env_deps":97,"category_tags":106,"github_topics":107,"view_count":23,"oss_zip_url":81,"oss_zip_packed_at":81,"status":16,"created_at":114,"updated_at":115,"faqs":116,"releases":157},2034,"ziqihuangg\u002FReVersion","ReVersion","[SIGGRAPH Asia 2024] ReVersion: Diffusion-Based Relation Inversion from Images","ReVersion 是一个基于扩散模型的图像关系逆向工具，能从几张示例图像中自动学习并提取出物体之间的互动关系（如“猫坐在沙发上”），然后将这种关系迁移到新物体上，生成符合逻辑的新场景。它解决了传统图像生成模型难以精准控制物体间语义关系的难题——以往用户只能通过文字描述间接引导，而 ReVersion 能直接从图像中“读懂”关系，并复用它，让生成结果更准确、更自然。适合图像生成研究者、AI视觉设计师以及希望探索语义关系建模的开发者使用。其独特之处在于无需重新训练整个模型，仅需保存轻量级的“关系提示词”（relation prompt），即可在现有扩散模型上快速应用新关系，大幅降低计算成本。用户可通过 Hugging Face 在线体验，或本地部署进行定制化创作。","# ReVersion (SIGGRAPH Asia, 2024)\n\n\u003C!-- ![visitors](https:\u002F\u002Fvisitor-badge.glitch.me\u002Fbadge?page_id=ziqihuangg\u002FReVersion&right_color=MediumAquamarine) -->\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FUpdated-PDF-66cdaa?logo=arxiv&logoColor=66cdaa)](https:\u002F\u002Fziqihuangg.github.io\u002Fpapers\u002F2024SigAsia-ReVersion.pdf)\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcs.CV-Paper-66cdaa?logo=arxiv&logoColor=66cdaa)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13495)\n[![Project Page](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Website-66cdaa?logo=googlechrome&logoColor=66cdaa)](https:\u002F\u002Fziqihuangg.github.io\u002Fprojects\u002Freversion.html)\n[![Video](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FYouTube-Video-66cdaa?logo=youtube&logoColor=66cdaa)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=pkal3yjyyKQ)\n![Visitors](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fziqihuangg_ReVersion_readme_c66d2f4a1b2a.png)\n[![Hugging Face](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-%F0%9F%A4%97%20Hugging%20Face-66cdaa)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FZiqi\u002FReVersion)\n\nThis repository contains the implementation of the following paper:\n> **ReVersion: Diffusion-Based Relation Inversion from Images**\u003Cbr>\n> [Ziqi Huang](https:\u002F\u002Fziqihuangg.github.io\u002F)\u003Csup>∗\u003C\u002Fsup>, [Tianxing Wu](https:\u002F\u002Ftianxingwu.github.io\u002F)\u003Csup>∗\u003C\u002Fsup>, [Yuming Jiang](https:\u002F\u002Fyumingj.github.io\u002F), [Kelvin C.K. Chan](https:\u002F\u002Fckkelvinchan.github.io\u002F), [Ziwei Liu](https:\u002F\u002Fliuziwei7.github.io\u002F)\u003Cbr>\n\nFrom [MMLab@NTU](https:\u002F\u002Fwww.mmlab-ntu.com\u002F) affiliated with S-Lab, Nanyang Technological University\n\n\u003C!-- [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13495)] | -->\n\u003C!-- [[Project Page](https:\u002F\u002Fziqihuangg.github.io\u002Fprojects\u002Freversion.html)] | -->\n\u003C!-- [[Video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=pkal3yjyyKQ)] |  -->\n\u003C!-- [[Dataset](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1FU1Ni-oDpxQCNYKo-ZLEfSGqO-j_Hw7X?usp=sharing)]  -->\n\u003C!-- [[Huggingface Demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FZiqi\u002FReVersion)] | -->\n\n\n## :open_book: Overview\n![overall_structure](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fziqihuangg_ReVersion_readme_38681cd9bf23.jpg)\n\nWe propose a new task, **Relation Inversion**: Given a few exemplar images, where a relation co-exists in every image, we aim to find a relation prompt **\\\u003CR>** to capture this interaction, and apply the relation to new entities to synthesize new scenes. The above images are generated by our **ReVersion** framework.\n\n## :heavy_check_mark: Updates\n- [12\u002F2024] We are presenting ReVersion at [SIGGRAPH Asia 2024, Tokyo](https:\u002F\u002Fasia.siggraph.org\u002F2024\u002F). Welcome to join our [presentation and discussion](https:\u002F\u002Fasia.siggraph.org\u002F2024\u002Fpresentation\u002F?id=papers_824&sess=sess105) on 3 December 2024.\n- [03\u002F2024] We optimized the code implementation. You only need to save and load the relation prompt, without having to save or load the entire text-to-image model.\n- [08\u002F2023] We released the [training code](https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\u002Ftree\u002Fmaster#relation-inversion) for Relation Inversion.\n- [04\u002F2023] We released the [ReVersion Benchmark](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1FU1Ni-oDpxQCNYKo-ZLEfSGqO-j_Hw7X?usp=sharing).\n- [04\u002F2023] Integrated into [Hugging Face 🤗](https:\u002F\u002Fhuggingface.co\u002Fspaces) using [Gradio](https:\u002F\u002Fgithub.com\u002Fgradio-app\u002Fgradio). Try out the online Demo: [![Hugging Face](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-%F0%9F%A4%97%20Hugging%20Face-66cdaa)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FZiqi\u002FReVersion)\n- [03\u002F2023] [Arxiv paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13495) available.\n- [03\u002F2023] Pre-trained models with relation prompts released at [this link](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1apFk6TF3pGH00hHF1nO1S__tDlrcLQAh?usp=sharing).\n- [03\u002F2023] [Project page](https:\u002F\u002Fziqihuangg.github.io\u002Fprojects\u002Freversion.html) and [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=pkal3yjyyKQ) available.\n- [03\u002F2023] Inference code released.\n\n\n## :hammer: Installation\n\n1. Clone Repo\n\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\n   cd ReVersion\n   ```\n\n2. Create Conda Environment and Install Dependencies\n\n   ```bash\n   conda create -n reversion\n   conda activate reversion\n   conda install python=3.8 pytorch==1.11.0 torchvision==0.12.0 cudatoolkit=11.3 -c pytorch\n   pip install diffusers[\"torch\"]\n   pip install -r requirements.txt\n   ```\n## :page_with_curl: Usage\n\n### Relation Inversion\nGiven a set of exemplar images and their entities' coarse descriptions, you can optimize a relation prompt **\\\u003CR>** to capture the co-existing relation in these images, namely *Relation Inversion*.\n\n\n1. Prepare the exemplar images (\u003Cem>e.g.\u003C\u002Fem>, `0.jpg` - `9.jpg`) and coarse descriptions (`text.json`), and put them inside a folder. Feel free to use our ReVersion benchmark, or you can also prepare your own images. An example from our ReVersion benchmark is as follows:\n    ```\n    .reversion_benchmark_v1\n    ├── painted_on\n    │   ├── 0.jpg\n    │   ├── 1.jpg\n    │   ├── 2.jpg\n    │   ├── 3.jpg\n    │   ├── 4.jpg\n    │   ├── 5.jpg\n    │   ├── 6.jpg\n    │   ├── 7.jpg\n    │   ├── 8.jpg\n    │   ├── 9.jpg\n    │   └── text.json\n    ```\n\n2. Take the relation `painted_on` for example, you can start training using this script:\n    ```\n    accelerate launch \\\n        --config_file=\".\u002Fconfigs\u002Fsingle_gpu.yml\" \\\n        train.py \\\n        --seed=\"2023\" \\\n        --pretrained_model_name_or_path=\"runwayml\u002Fstable-diffusion-v1-5\" \\\n        --train_data_dir=\".\u002Freversion_benchmark_v1\u002Fpainted_on\" \\\n        --placeholder_token=\"\u003CR>\" \\\n        --initializer_token=\"and\" \\\n        --train_batch_size=\"2\" \\\n        --gradient_accumulation_steps=\"4\" \\\n        --max_train_steps=\"3000\" \\\n        --learning_rate='2.5e-04' --scale_lr \\\n        --lr_scheduler=\"constant\" \\\n        --lr_warmup_steps=\"0\" \\\n        --output_dir=\".\u002Fexperiments\u002Fpainted_on\" \\\n        --save_steps=\"1000\" \\\n        --importance_sampling \\\n        --denoise_loss_weight=\"1.0\" \\\n        --steer_loss_weight=\"0.01\" \\\n        --num_positives=\"4\" \\\n        --temperature=\"0.07\" \\\n        --only_save_embeds\n    ```\n\n    Where `train_data_dir` is the path to the exemplar images and coarse descriptions. `output_dir` is the path to save the inverted relation and the experiment logs. To generate relation-specific images, you can follow the next section [Generation](https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\u002Ftree\u002Fmaster#generation).\n\n    Note that the `only_save_embeds` option allows you to only save the relation prompt **\\\u003CR>**, without having to save the entire Stable Diffusion model. You can decide whether to turn it on.\n\n\n### :framed_picture: Generation\nWe can use the learned relation prompt **\\\u003CR>** to generate relation-specific images with new objects, backgrounds, and style.\n\n1. You can obtain a learned **\\\u003CR>** from [Relation Inversion](https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\u002Ftree\u002Fmaster#relation-inversion) using your customized data. You can also download the models from [here](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1apFk6TF3pGH00hHF1nO1S__tDlrcLQAh?usp=sharing), where we provide several pre-trained relation prompts for you to play with.\n\n2. Put the models (\u003Cem>i.e.\u003C\u002Fem>, learned relation prompt **\\\u003CR>**) under `.\u002Fexperiments\u002F` as follows:\n    ```\n    .\u002Fexperiments\u002F\n    ├── painted_on\n    │   ├── checkpoint-500\n    │   ...\n    │   └── model_index.json\n    ├── carved_by\n    │   ├── checkpoint-500\n    │   ...\n    │   └── model_index.json\n    ├── inside\n    │   ├── checkpoint-500\n    │   ...\n    │   └── model_index.json\n    ...\n    ```\n\n3. Take the relation `painted_on` for example, you can either use the following script to generate images using a single prompt, *e.g.*, \"cat \\\u003CR> stone\":\n    ```\n    python inference.py \\\n    --model_id .\u002Fexperiments\u002Fpainted_on \\\n    --prompt \"cat \u003CR> stone\" \\\n    --placeholder_string \"\u003CR>\" \\\n    --num_samples 10 \\\n    --guidance_scale 7.5 \\\n    --only_load_embeds\n    ```\n    Or write a list prompts in `.\u002Ftemplates\u002Ftemplates.py` with the key name `$your_template_name` and generate images for every prompt in the list `$your_template_name`:\n    ```\n    your_template_name='painted_on_examples'\n    python inference.py \\\n    --model_id .\u002Fexperiments\u002Fpainted_on \\\n    --template_name $your_template_name \\\n    --placeholder_string \"\u003CR>\" \\\n    --num_samples 10 \\\n    --guidance_scale 7.5 \\\n    --only_load_embeds\n    ```\n    Where  `model_id` is the model directory, `num_samples` is the number of images to generate for each prompt, and `guidance_scale` is the classifier-free guidance scale.\n\n    We provide several example templates for each relation in `.\u002Ftemplates\u002Ftemplates.py`, such as `painted_on_examples`, `carved_by_examples`, etc.\n\n    Note that if you saved the entire model during the inversion process, that is, without the `only_save_embeds` flag turned on, then you should turn off the  `only_load_embeds` flag during inference. \n    The `only_load_embeds` option only loads the relation prompt **\\\u003CR>** from the experiment folder, and automatically loads the rest of the Stable Diffusion model (including other text token's embeddings) from the default cache location that contains the pre-trained Stable Diffusion model.\n\n\n### :hugs: Gradio Demo\n- We also provide a Gradio Demo to test our method using a UI. This demo supports relation-specific text-to-image generation on the fly. Running the following command will launch the demo:\n\n    ```\n    python app_gradio.py\n    ```\n- Alternatively, you can try the online demo [here](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FZiqi\u002FReVersion).\n\n### :art: Diverse Generation\nYou can also specify diverse prompts with the relation prompt **\\\u003CR>** to generate images of diverse backgrounds and style. For example, your prompt could be `\"michael jackson \u003CR> wall, in the desert\"`, `\"cat \u003CR> stone, on the beach\"`, \u003Cem>etc\u003C\u002Fem>.\n\n![diverse_results](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fziqihuangg_ReVersion_readme_68a6cfe84e22.jpg)\n\n\n## :straight_ruler: The ReVersion Benchmark\nThe ReVersion Benchmark consists of diverse relations and entities, along with a set of well-defined text descriptions.\n\n- [\u003Cb>Relations and Entities\u003C\u002Fb>](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1FU1Ni-oDpxQCNYKo-ZLEfSGqO-j_Hw7X?usp=sharing). We define ten representative object relations with different abstraction levels, ranging from basic spatial relations (*e.g.*, “on top of”), entity interactions (*e.g.*, “shakes hands with”), to abstract concepts (*e.g.*, “is carved by”). A wide range of entities, such as animals, human, household items, are involved to further increase the diversity of the benchmark.\n- [\u003Cb>Exemplar Images and Text Descriptions\u003C\u002Fb>](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1FU1Ni-oDpxQCNYKo-ZLEfSGqO-j_Hw7X?usp=sharing). For each relation, we collect four to ten exemplar images containing different entities. We further annotate several text templates for each exemplar image to describe them with different levels of details. These training templates can be used for the optimization of the relation prompt.\n- [\u003Cb>Benchmark Scenarios\u003C\u002Fb>](https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\u002Fblob\u002Fmaster\u002Ftemplates\u002Fbenchmark_scenarios.py). We design 100 inference templates composing of different object entities for each of the ten relations.\n\n## :fountain_pen: Citation\n\n   If you find our repo useful for your research, please consider citing our paper:\n\n   ```bibtex\n   @inproceedings{huang2023reversion,\n        title={{ReVersion}: Diffusion-Based Relation Inversion from Images},\n        author={Huang, Ziqi and Wu, Tianxing and Jiang, Yuming and Chan, Kelvin C.K. and Liu, Ziwei},\n         booktitle={SIGGRAPH Asia 2024 Conference Papers},\n        year={2024}\n   }\n   ```\n\n\n## :white_heart: Acknowledgement\n\nThe codebase is maintained by [Ziqi Huang](https:\u002F\u002Fziqihuangg.github.io\u002F) and [Tianxing Wu](https:\u002F\u002Ftianxingwu.github.io\u002F).\n\nThis project is built using the following open source repositories:\n- [Stable Diffusion 1.5](https:\u002F\u002Fhuggingface.co\u002Frunwayml\u002Fstable-diffusion-v1-5)\n- [Diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers)\n","# ReVersion（SIGGRAPH Asia，2024）\n\n\u003C!-- ![visitors](https:\u002F\u002Fvisitor-badge.glitch.me\u002Fbadge?page_id=ziqihuangg\u002FReVersion&right_color=MediumAquamarine) -->\n[![论文](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FUpdated-PDF-66cdaa?logo=arxiv&logoColor=66cdaa)](https:\u002F\u002Fziqihuangg.github.io\u002Fpapers\u002F2024SigAsia-ReVersion.pdf)\n[![论文](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcs.CV-Paper-66cdaa?logo=arxiv&logoColor=66cdaa)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13495)\n[![项目页面](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Website-66cdaa?logo=googlechrome&logoColor=66cdaa)](https:\u002F\u002Fziqihuangg.github.io\u002Fprojects\u002Freversion.html)\n[![视频](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FYouTube-Video-66cdaa?logo=youtube&logoColor=66cdaa)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=pkal3yjyyKQ)\n![访问量](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fziqihuangg_ReVersion_readme_c66d2f4a1b2a.png)\n[![Hugging Face](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-%F0%9F%A4%97%20Hugging%20Face-66cdaa)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FZiqi\u002FReVersion)\n\n本仓库包含以下论文的实现：\n> **ReVersion：基于扩散模型的图像关系反转**\u003Cbr>\n> [黄子琪](https:\u002F\u002Fziqihuangg.github.io\u002F) \u003Csup>∗\u003C\u002Fsup>, [吴天行](https:\u002F\u002Ftianxingwu.github.io\u002F) \u003Csup>∗\u003C\u002Fsup>, [江宇明](https:\u002F\u002Fyumingj.github.io\u002F)，[陈凯文](https:\u002F\u002Fckkelvinchan.github.io\u002F)，[刘子威](https:\u002F\u002Fliuziwei7.github.io\u002F) \u003Cbr>\n\n来自南洋理工大学S-Lab附属的[MMLab@NTU](https:\u002F\u002Fwww.mmlab-ntu.com\u002F)\n\n\u003C!-- [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13495)] | -->\n\u003C!-- [[项目页面](https:\u002F\u002Fziqihuangg.github.io\u002Fprojects\u002Freversion.html)] | -->\n\u003C!-- [[视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=pkal3yjyyKQ)] |  -->\n\u003C!-- [[数据集](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1FU1Ni-oDpxQCNYKo-ZLEfSGqO-j_Hw7X?usp=sharing)]  -->\n\u003C!-- [[Huggingface演示](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FZiqi\u002FReVersion)] | -->\n\n\n## :open_book: 概览\n![overall_structure](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fziqihuangg_ReVersion_readme_38681cd9bf23.jpg)\n\n我们提出了一项新任务——**关系反转**：给定若干示例图像，每张图像中都存在一种特定的关系，我们的目标是找到一个关系提示**\\\u003CR>**，以捕捉这种交互，并将该关系应用于新的实体，从而合成新的场景。上述图像由我们的**ReVersion**框架生成。\n\n## :heavy_check_mark: 更新内容\n- [2024年12月] 我们将在【SIGGRAPH Asia 2024，东京】上展示ReVersion。欢迎参加我们的【演示与讨论】，时间为2024年12月3日。\n- [2024年3月] 我们优化了代码实现。现在你只需保存和加载关系提示，而无需保存或加载整个文本到图像模型。\n- [2023年8月] 我们发布了关系反转的【训练代码】（https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\u002Ftree\u002Fmaster#relation-inversion）。\n- [2023年4月] 我们发布了【ReVersion基准测试】（https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1FU1Ni-oDpxQCNYKo-ZLEfSGqO-j_Hw7X?usp=sharing）。\n- [2023年4月] 使用【Gradio】集成到【Hugging Face 🤗】（https:\u002F\u002Fgithub.com\u002Fgradio-app\u002Fgradio）。试试在线演示：[![Hugging Face](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-%F0%9F%A4%97%20Hugging%20Face-66cdaa)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FZiqi\u002FReVersion)\n- [2023年3月] 【Arxiv论文】（https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13495）已上线。\n- [2023年3月] 带有关系提示的预训练模型已发布，链接如下：（https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1apFk6TF3pGH00hHF1nO1S__tDlrcLQAh?usp=sharing）\n- [2023年3月] 【项目页面】（https:\u002F\u002Fziqihuangg.github.io\u002Fprojects\u002Freversion.html）和【视频】（https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=pkal3yjyyKQ）已上线。\n- [2023年3月] 推理代码已发布。\n\n\n## :hammer: 安装\n\n1. 克隆仓库\n\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\n   cd ReVersion\n   ```\n\n2. 创建Conda环境并安装依赖\n\n   ```bash\n   conda create -n reversion\n   conda activate reversion\n   conda install python=3.8 pytorch==1.11.0 torchvision==0.12.0 cudatoolkit=11.3 -c pytorch\n   pip install diffusers[\"torch\"]\n   pip install -r requirements.txt\n   ```\n## :page_with_curl: 使用方法\n\n### 关系反转\n给定一组示例图像及其实体的粗略描述，你可以优化一个关系提示**\\\u003CR>**，以捕捉这些图像中共同存在的关系，即“关系反转”。\n\n1. 准备示例图像（例如，`0.jpg`至`9.jpg`）和粗略描述文件（`text.json`），并将它们放在一个文件夹中。你可以使用我们的ReVersion基准测试，也可以自行准备图像。以下是我们的ReVersion基准测试的一个示例：\n    ```\n    .reversion_benchmark_v1\n    ├── painted_on\n    │   ├── 0.jpg\n    │   ├── 1.jpg\n    │   ├── 2.jpg\n    │   ├── 3.jpg\n    │   ├── 4.jpg\n    │   ├── 5.jpg\n    │   ├── 6.jpg\n    │   ├── 7.jpg\n    │   ├── 8.jpg\n    │   ├── 9.jpg\n    │   └── text.json\n    ```\n\n2. 以关系`painted_on`为例，你可以使用以下脚本开始训练：\n    ```\n    accelerate launch \\\n        --config_file=\".\u002Fconfigs\u002Fsingle_gpu.yml\" \\\n        train.py \\\n        --seed=\"2023\" \\\n        --pretrained_model_name_or_path=\"runwayml\u002Fstable-diffusion-v1-5\" \\\n        --train_data_dir=\".\u002Freversion_benchmark_v1\u002Fpainted_on\" \\\n        --placeholder_token=\"\u003CR>\" \\\n        --initializer_token=\"and\" \\\n        --train_batch_size=\"2\" \\\n        --gradient_accumulation_steps=\"4\" \\\n        --max_train_steps=\"3000\" \\\n        --learning_rate='2.5e-04' --scale_lr \\\n        --lr_scheduler=\"constant\" \\\n        --lr_warmup_steps=\"0\" \\\n        --output_dir=\".\u002Fexperiments\u002Fpainted_on\" \\\n        --save_steps=\"1000\" \\\n        --importance_sampling \\\n        --denoise_loss_weight=\"1.0\" \\\n        --steer_loss_weight=\"0.01\" \\\n        --num_positives=\"4\" \\\n        --temperature=\"0.07\" \\\n        --only_save_embeds\n    ```\n\n    其中，`train_data_dir`是示例图像和粗略描述文件的路径。`output_dir`是保存反转后的关系及实验日志的路径。要生成特定于关系的图像，可以参考下一节【生成】（https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\u002Ftree\u002Fmaster#generation）。\n\n    注意，`only_save_embeds`选项允许你仅保存关系提示**\\\u003CR>**，而无需保存整个Stable Diffusion模型。你可以根据需要决定是否启用它。\n\n### :framed_picture: 生成\n我们可以利用学习到的关系提示**\\\u003CR>**，生成包含新对象、背景和风格的关系特定图像。\n\n1. 您可以使用自定义数据从[关系反转](https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\u002Ftree\u002Fmaster#relation-inversion)中获取一个学习到的**\\\u003CR>**。您也可以从[这里](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1apFk6TF3pGH00hHF1nO1S__tDlrcLQAh?usp=sharing)下载模型，我们提供了几个预训练的关系提示供您试用。\n\n2. 将这些模型（即学习到的关系提示**\\\u003CR>**）放置在`.\u002Fexperiments\u002F`目录下，如下所示：\n    ```\n    .\u002Fexperiments\u002F\n    ├── painted_on\n    │   ├── checkpoint-500\n    │   ...\n    │   └── model_index.json\n    ├── carved_by\n    │   ├── checkpoint-500\n    │   ...\n    │   └── model_index.json\n    ├── inside\n    │   ├── checkpoint-500\n    │   ...\n    │   └── model_index.json\n    ...\n    ```\n\n3. 以关系`painted_on`为例，您可以使用以下脚本，通过单个提示生成图像，例如：“猫 \\\u003CR> 石头”：\n    ```\n    python inference.py \\\n    --model_id .\u002Fexperiments\u002Fpainted_on \\\n    --prompt \"cat \u003CR> stone\" \\\n    --placeholder_string \"\u003CR>\" \\\n    --num_samples 10 \\\n    --guidance_scale 7.5 \\\n    --only_load_embeds\n    ```\n    或者，在`.\u002Ftemplates\u002Ftemplates.py`中编写一个提示列表，键名为 `$your_template_name`，然后为列表中的每个提示生成图像：\n    ```\n    your_template_name='painted_on_examples'\n    python inference.py \\\n    --model_id .\u002Fexperiments\u002Fpainted_on \\\n    --template_name $your_template_name \\\n    --placeholder_string \"\u003CR>\" \\\n    --num_samples 10 \\\n    --guidance_scale 7.5 \\\n    --only_load_embeds\n    ```\n    其中，`model_id`是模型目录，`num_samples`是每个提示要生成的图像数量，`guidance_scale`是无分类器引导尺度。\n\n    我们在`.\u002Ftemplates\u002Ftemplates.py`中为每种关系提供了几个示例模板，如`painted_on_examples`、`carved_by_examples`等。\n\n    注意，如果您在反转过程中保存了整个模型——即未开启`only_save_embeds`标志——那么在推理时应关闭`only_load_embeds`标志。\n    `only_load_embeds`选项仅从实验文件夹加载关系提示**\\\u003CR>**，并自动从包含预训练Stable Diffusion模型的默认缓存位置加载Stable Diffusion模型的其余部分（包括其他文本标记的嵌入）。\n\n### :hugs: Gradio演示\n- 我们还提供了一个Gradio演示，让您通过UI测试我们的方法。此演示支持即时进行关系特定的文生图生成。运行以下命令即可启动演示：\n\n    ```\n    python app_gradio.py\n    ```\n- 或者，您也可以尝试在线演示[这里](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FZiqi\u002FReVersion)。\n\n### :art: 多样化生成\n您还可以利用关系提示**\\\u003CR>**指定多样化的提示，生成具有不同背景和风格的图像。例如，您的提示可以是：“迈克尔·杰克逊 \\\u003CR> 墙壁，位于沙漠中”，“猫 \\\u003CR> 石头，位于海滩上”等等。\n\n![diverse_results](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fziqihuangg_ReVersion_readme_68a6cfe84e22.jpg)\n\n\n## :straight_ruler: ReVersion基准测试\nReVersion基准测试由多种关系和实体组成，并附有一组定义明确的文本描述。\n\n- [\u003Cb>关系与实体\u003C\u002Fb>](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1FU1Ni-oDpxQCNYKo-ZLEfSGqO-j_Hw7X?usp=sharing)。我们定义了十种具有不同抽象层次的代表性物体关系，从基本的空间关系（例如，“在……之上”）、实体交互（例如，“与……握手”）到抽象概念（例如，“由……雕刻”）。此外，还涉及广泛的实体，如动物、人类、家居用品等，以进一步增加基准测试的多样性。\n- [\u003Cb>示例图像与文本描述\u003C\u002Fb>](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1FU1Ni-oDpxQCNYKo-ZLEfSGqO-j_Hw7X?usp=sharing)。对于每种关系，我们收集了四到十个包含不同实体的示例图像。我们还为每个示例图像标注了几种文本模板，以便用不同级别的细节描述它们。这些训练模板可用于优化关系提示。\n- [\u003Cb>基准测试场景\u003C\u002Fb>](https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\u002Fblob\u002Fmaster\u002Ftemplates\u002Fbenchmark_scenarios.py)。我们为十种关系中的每一种设计了100个推理模板，每个模板都包含不同的物体实体。\n\n## :fountain_pen: 引用\n\n如果您觉得我们的仓库对您的研究有帮助，请考虑引用我们的论文：\n\n```bibtex\n@inproceedings{huang2023reversion,\n        title={{ReVersion}: 基于扩散的关系反转从图像中实现},\n        author={黄子琪、吴天行、江宇明、陈凯文·C·K.、刘子威},\n         booktitle={SIGGRAPH Asia 2024会议论文},\n        year={2024}\n   }\n```\n\n\n## :white_heart: 致谢\n\n代码库由[黄子琪](https:\u002F\u002Fziqihuangg.github.io\u002F)和[吴天行](https:\u002F\u002Ftianxingwu.github.io\u002F)维护。\n\n该项目基于以下开源仓库构建：\n- [Stable Diffusion 1.5](https:\u002F\u002Fhuggingface.co\u002Frunwayml\u002Fstable-diffusion-v1-5)\n- [Diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers)","# ReVersion 快速上手指南\n\n## 环境准备\n\n- **系统要求**：Linux \u002F Windows \u002F macOS，推荐使用 NVIDIA 显卡（显存 ≥ 8GB）\n- **前置依赖**：Python 3.8、CUDA 11.3（或兼容版本）、conda\n\n> 推荐使用国内镜像加速依赖安装：  \n> 在 `pip install` 前添加 `-i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n\n## 安装步骤\n\n1. 克隆仓库  \n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\n   cd ReVersion\n   ```\n\n2. 创建并激活 Conda 环境  \n   ```bash\n   conda create -n reversion python=3.8\n   conda activate reversion\n   ```\n\n3. 安装 PyTorch 与 CUDA 依赖  \n   ```bash\n   conda install pytorch==1.11.0 torchvision==0.12.0 cudatoolkit=11.3 -c pytorch\n   ```\n\n4. 安装其他依赖  \n   ```bash\n   pip install diffusers[\"torch\"] -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n   pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n   ```\n\n## 基本使用\n\n### 1. 关系倒置（训练关系提示词）\n\n使用预设数据集 `painted_on` 示例，训练一个关系提示符 `\u003CR>`：\n\n```bash\naccelerate launch \\\n    --config_file=\".\u002Fconfigs\u002Fsingle_gpu.yml\" \\\n    train.py \\\n    --seed=\"2023\" \\\n    --pretrained_model_name_or_path=\"runwayml\u002Fstable-diffusion-v1-5\" \\\n    --train_data_dir=\".\u002Freversion_benchmark_v1\u002Fpainted_on\" \\\n    --placeholder_token=\"\u003CR>\" \\\n    --initializer_token=\"and\" \\\n    --only_save_embeds \\\n    --output_dir=\".\u002Fexperiments\u002Fpainted_on\"\n```\n\n> 训练完成后，仅保存 `\u003CR>` 提示词嵌入，不保存完整模型，节省存储。\n\n### 2. 生成图像（使用训练好的关系）\n\n使用训练好的 `\u003CR>` 生成新图像，例如：“cat \u003CR> stone”：\n\n```bash\npython inference.py \\\n    --model_id .\u002Fexperiments\u002Fpainted_on \\\n    --prompt \"cat \u003CR> stone\" \\\n    --placeholder_string \"\u003CR>\" \\\n    --num_samples 5 \\\n    --guidance_scale 7.5 \\\n    --only_load_embeds\n```\n\n> 默认从 `.\u002Fexperiments\u002Fpainted_on` 加载 `\u003CR>`，其余模型参数自动从 Hugging Face 缓存加载。\n\n### 3. 快速体验在线 Demo\n\n无需安装，直接访问：  \n[![Hugging Face](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-%F0%9F%A4%97%20Hugging%20Face-66cdaa)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FZiqi\u002FReVersion)","一位独立游戏美术师正在为一款奇幻风格的独立游戏设计场景，需要为“精灵与发光蝴蝶共舞”的核心视觉主题生成多组不同角色与环境的变体图，但手绘效率低，且AI绘图工具无法稳定复现“精灵与蝴蝶之间动态互动”的关系。\n\n### 没有 ReVersion 时\n- 每次生成新场景都需要手动重写复杂提示词，如“精灵伸出手，蝴蝶围绕其手腕盘旋”，但AI常误解为蝴蝶只是飞在头顶。\n- 无法从已有成功样本中提取“互动关系”，只能靠试错调整，平均每个场景需15次以上生成尝试。\n- 不同角色（如男精灵、女精灵、幼精灵）的蝴蝶互动风格不一致，缺乏视觉统一性。\n- 使用Stable Diffusion等通用模型时，即使使用LoRA也难以捕捉“关系”而非“物体”，蝴蝶常脱离角色或静止不动。\n- 美术团队无法复用已验证的视觉关系，每次新角色都要从零开始设计提示。\n\n### 使用 ReVersion 后\n- 仅需上传3张成功示例图（精灵与蝴蝶互动的不同角度），ReVersion 自动提取出关系提示符“\\\u003CR>：蝴蝶围绕角色手腕呈螺旋轨迹飞舞”。\n- 新增角色时，只需输入“一个穿斗篷的男精灵”，ReVersion 自动将提取的关系\\\u003CR>注入生成过程，蝴蝶立刻自然环绕其手腕，无需重写提示。\n- 所有角色的蝴蝶互动风格高度一致，视觉语言统一，大幅降低美术风格管理成本。\n- 生成成功率从不足20%提升至85%以上，单场景平均生成次数从15次降至2次。\n- 提取的关系提示符可保存为轻量文件（\u003C1MB），团队共享后直接复用于新场景，无需重新训练模型。\n\nReVersion 让视觉关系从“玄学提示”变为可复用、可迁移的设计资产，彻底改变了AI生成中“只画物体，不画互动”的局限。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fziqihuangg_ReVersion_38681cd9.jpg","ziqihuangg","Ziqi Huang","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fziqihuangg_2c954007.jpg","Ph.D. student at MMLab@NTU","Nanyang Technological University (NTU)","Singapore",null,"ziqi_huang_","https:\u002F\u002Fziqihuangg.github.io","https:\u002F\u002Fgithub.com\u002Fziqihuangg",[86],{"name":87,"color":88,"percentage":89},"Python","#3572A5",100,505,20,"2026-03-24T05:50:25","NOASSERTION","Linux, macOS","需要 NVIDIA GPU，显存 8GB+，CUDA 11.3","16GB+",{"notes":98,"python":99,"dependencies":100},"建议使用 conda 管理环境，首次运行需下载约 5GB 的 Stable Diffusion 模型文件，训练和推理时需确保网络畅通以加载预训练权重","3.8",[101,102,103,104,105],"torch==1.11.0","torchvision==0.12.0","diffusers","accelerate","transformers",[14],[108,109,110,111,112,113],"diffusion-model","image-generation","relation-modeling","stable-diffusion","aigc","gen-ai","2026-03-27T02:49:30.150509","2026-04-06T07:12:48.766173",[117,122,127,132,137,142,147,152],{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},9253,"ReVersion 训练需要多少显存？","当前训练设置最多需要 18GB 显存，可以在消费级显卡上运行。","https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\u002Fissues\u002F2",{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},9254,"ReVersion 训练需要多长时间？","在 A100 GPU 上优化一个 \u003CR> 约需 40-50 分钟，在 V100 GPU 上约需 1 小时 45 分钟。","https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\u002Fissues\u002F4",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},9255,"如何解决 'Can't load tokenizer for 'runwayml\u002Fstable-diffusion-v1-5'' 错误？","由于 Runway 已删除其 SD1.5 模型仓库，请改用 Hugging Face 上的官方模型路径：https:\u002F\u002Fhuggingface.co\u002Fstable-diffusion-v1-5\u002Fstable-diffusion-v1-5，确保配置中使用该地址。","https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\u002Fissues\u002F9",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},9256,"在 Windows 系统上运行训练脚本时出现路径分隔符错误，如何修复？","将 train.py 第 465 行的路径分割符从正斜杠改为反斜杠：image_name = image_path.split('\\\\')[-1]，以兼容 Windows 文件路径格式。","https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\u002Fissues\u002F3",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},9257,"为什么 initializer_token 使用 'and' 而不是 'paint' 或 'painted'？","根据经验，不同的初始标记对最终学习的 \u003CR> 视觉效果影响不大，目前尚未系统实验，用户可自行探索其他词如 'paint' 或 'painted' 的效果。","https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\u002Fissues\u002F7",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},9258,"为什么使用 ReVersion 生成 'child \u003CR> panda' 图像时，孩子无法正确显示？","这是由于 Stable Diffusion 本身存在概念混合问题：当生成多个实体时，模型可能无法正确渲染其中一个实体或混合其特征。ReVersion 不微调 SD 的 UNet，因此受限于原始模型能力，建议尝试简化提示词或调整生成参数。","https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\u002Fissues\u002F5",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},9259,"Figure 3 中的 Part-of-Speech Clustering 图表示什么？","该图可视化了 CLIP 文本嵌入空间中单词的分布，展示了词性聚类情况，用于分析语言概念在嵌入空间中的结构。","https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\u002Fissues\u002F8",{"id":153,"question_zh":154,"answer_zh":155,"source_url":156},9260,"ReVersion 的训练代码何时发布？","训练代码已发布，请参考项目最新更新或仓库代码库获取完整训练脚本。","https:\u002F\u002Fgithub.com\u002Fziqihuangg\u002FReVersion\u002Fissues\u002F1",[]]