[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-tamarott--SinGAN":3,"tool-tamarott--SinGAN":64},[4,17,26,40,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,2,"2026-04-03T11:11:01",[13,14,15],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":23,"last_commit_at":32,"category_tags":33,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,34,35,36,15,37,38,13,39],"数据工具","视频","插件","其他","语言模型","音频",{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":10,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,38,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74913,"2026-04-05T10:44:17",[38,14,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":23,"last_commit_at":62,"category_tags":63,"status":16},2471,"tesseract","tesseract-ocr\u002Ftesseract","Tesseract 是一款历史悠久且备受推崇的开源光学字符识别（OCR）引擎，最初由惠普实验室开发，后由 Google 维护，目前由全球社区共同贡献。它的核心功能是将图片中的文字转化为可编辑、可搜索的文本数据，有效解决了从扫描件、照片或 PDF 文档中提取文字信息的难题，是数字化归档和信息自动化的重要基础工具。\n\n在技术层面，Tesseract 展现了强大的适应能力。从版本 4 开始，它引入了基于长短期记忆网络（LSTM）的神经网络 OCR 引擎，显著提升了行识别的准确率；同时，为了兼顾旧有需求，它依然支持传统的字符模式识别引擎。Tesseract 原生支持 UTF-8 编码，开箱即用即可识别超过 100 种语言，并兼容 PNG、JPEG、TIFF 等多种常见图像格式。输出方面，它灵活支持纯文本、hOCR、PDF、TSV 等多种格式，方便后续数据处理。\n\nTesseract 主要面向开发者、研究人员以及需要构建文档处理流程的企业用户。由于它本身是一个命令行工具和库（libtesseract），不包含图形用户界面（GUI），因此最适合具备一定编程能力的技术人员集成到自动化脚本或应用程序中",73286,"2026-04-03T01:56:45",[13,14],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":78,"owner_location":78,"owner_email":78,"owner_twitter":78,"owner_website":79,"owner_url":80,"languages":81,"stars":86,"forks":87,"last_commit_at":88,"license":89,"difficulty_score":10,"env_os":90,"env_gpu":91,"env_ram":90,"env_deps":92,"category_tags":97,"github_topics":98,"view_count":111,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":112,"updated_at":113,"faqs":114,"releases":150},295,"tamarott\u002FSinGAN","SinGAN","Official pytorch implementation of the paper: \"SinGAN: Learning a Generative Model from a Single Natural Image\"","SinGAN 是一个基于深度学习的图像生成与处理工具，其核心能力是**仅凭一张自然图像就能训练出一个生成模型**。这意味着用户不需要庞大的数据集，只需要一张照片，就能让模型学习这张图像的纹理、结构和分布特征，进而生成多样化的新样本。\n\n传统生成模型（如GAN）通常需要数千甚至数万张图像进行训练，而 SinGAN 打破了这一限制。它通过多尺度对抗训练策略，让模型逐层学习从粗糙到精细的图像特征，从而实现单样本学习。\n\nSinGAN 的应用场景非常丰富：可以生成随机样本、调整图像尺寸、创建短视频动画，还能进行图像编辑、物体融合（协调化）等操作。例如，设计师可以将一个物体自然地“粘贴”到另一张图片中，使其光照和纹理与背景融为一体。\n\n这个工具特别适合计算机视觉研究人员、深度学习开发者以及需要快速原型设计的创意工作者。需要注意的是，目前官方版本仅支持 PyTorch 1.4 及更早版本，对环境有一定要求。","# SinGAN\n\n[Project](https:\u002F\u002Ftamarott.github.io\u002FSinGAN.htm) | [Arxiv](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.01164.pdf) | [CVF](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_ICCV_2019\u002Fpapers\u002FShaham_SinGAN_Learning_a_Generative_Model_From_a_Single_Natural_Image_ICCV_2019_paper.pdf) | [Supplementary materials](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_ICCV_2019\u002Fsupplemental\u002FShaham_SinGAN_Learning_a_ICCV_2019_supplemental.pdf) | [Talk (ICCV`19)](https:\u002F\u002Fyoutu.be\u002FmdAcPe74tZI?t=3191) \n### Official pytorch implementation of the paper: \"SinGAN: Learning a Generative Model from a Single Natural Image\"\n#### ICCV 2019 Best paper award (Marr prize)\n\n\n## Random samples from a *single* image\nWith SinGAN, you can train a generative model from a single natural image, and then generate random samples from the given image, for example:\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftamarott_SinGAN_readme_8cd3f0f4e8a5.png)\n\n\n## SinGAN's applications\nSinGAN can be also used for a line of image manipulation tasks, for example:\n ![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftamarott_SinGAN_readme_cb21c27cc9e5.png)\nThis is done by injecting an image to the already trained model. See section 4 in our [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.01164.pdf) for more details.\n\n\n### Citation\nIf you use this code for your research, please cite our paper:\n\n```\n@inproceedings{rottshaham2019singan,\n  title={SinGAN: Learning a Generative Model from a Single Natural Image},\n  author={Rott Shaham, Tamar and Dekel, Tali and Michaeli, Tomer},\n  booktitle={Computer Vision (ICCV), IEEE International Conference on},\n  year={2019}\n}\n```\n\n## Code\n\n### Install dependencies\n\n```\npython -m pip install -r requirements.txt\n```\n\nThis code was tested with python 3.6, torch 1.4\n\nPlease note: the code currently only supports torch 1.4 or earlier because of the optimization scheme.\n\nFor later torch versions, you may try this repository: https:\u002F\u002Fgithub.com\u002Fkligvasser\u002FSinGAN (results won't necessarily be identical to the official implementation).\n\n\n###  Train\nTo train SinGAN model on your own image, put the desired training image under Input\u002FImages, and run\n\n```\npython main_train.py --input_name \u003Cinput_file_name>\n```\n\nThis will also use the resulting trained model to generate random samples starting from the coarsest scale (n=0).\n\nTo run this code on a cpu machine, specify `--not_cuda` when calling `main_train.py`\n\n###  Random samples\nTo generate random samples from any starting generation scale, please first train SinGAN model on the desired image (as described above), then run \n\n```\npython random_samples.py --input_name \u003Ctraining_image_file_name> --mode random_samples --gen_start_scale \u003Cgeneration start scale number>\n```\n\npay attention: for using the full model, specify the generation start scale to be 0, to start the generation from the second scale, specify it to be 1, and so on. \n\n###  Random samples of arbitrary sizes\nTo generate random samples of arbitrary sizes, please first train SinGAN model on the desired image (as described above), then run \n\n```\npython random_samples.py --input_name \u003Ctraining_image_file_name> --mode random_samples_arbitrary_sizes --scale_h \u003Chorizontal scaling factor> --scale_v \u003Cvertical scaling factor>\n```\n\n###  Animation from a single image\n\nTo generate short animation from a single image, run\n\n```\npython animation.py --input_name \u003Cinput_file_name> \n```\n\nThis will automatically start a new training phase with noise padding mode.\n\n###  Harmonization\n\nTo harmonize a pasted object into an image (See example in Fig. 13 in [our paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.01164.pdf)), please first train SinGAN model on the desired background image (as described above), then save the naively pasted reference image and it's binary mask under \"Input\u002FHarmonization\" (see saved images for an example). Run the command\n\n```\npython harmonization.py --input_name \u003Ctraining_image_file_name> --ref_name \u003Cnaively_pasted_reference_image_file_name> --harmonization_start_scale \u003Cscale to inject>\n\n```\n\nPlease note that different injection scale will produce different harmonization effects. The coarsest injection scale equals 1. \n\n###  Editing\n\nTo edit an image, (See example in Fig. 12 in [our paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.01164.pdf)), please first train SinGAN model on the desired non-edited image (as described above), then save the naive edit as a reference image under \"Input\u002FEditing\" with a corresponding binary map (see saved images for an example). Run the command\n\n```\npython editing.py --input_name \u003Ctraining_image_file_name> --ref_name \u003Cedited_image_file_name> --editing_start_scale \u003Cscale to inject>\n\n```\nboth the masked and unmasked output will be saved.\nHere as well, different injection scale will produce different editing effects. The coarsest injection scale equals 1. \n\n###  Paint to Image\n\nTo transfer a paint into a realistic image (See example in Fig. 11 in [our paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.01164.pdf)), please first train SinGAN model on the desired image (as described above), then save your paint under \"Input\u002FPaint\", and run the command\n\n```\npython paint2image.py --input_name \u003Ctraining_image_file_name> --ref_name \u003Cpaint_image_file_name> --paint_start_scale \u003Cscale to inject>\n\n```\nHere as well, different injection scale will produce different editing effects. The coarsest injection scale equals 1. \n\nAdvanced option: Specify quantization_flag to be True, to re-train *only* the injection level of the model, to get a on a color-quantized version of upsampled generated images from the previous scale. For some images, this might lead to more realistic results.\n\n### Super Resolution\nTo super resolve an image, please run:\n```\npython SR.py --input_name \u003CLR_image_file_name>\n```\nThis will automatically train a SinGAN model correspond to 4x upsampling factor (if not exist already).\nFor different SR factors, please specify it using the parameter `--sr_factor` when calling the function.\nSinGAN's results on the BSD100 dataset can be download from the 'Downloads' folder.\n\n## Additional Data and Functions\n\n### Single Image Fréchet Inception Distance (SIFID score)\nTo calculate the SIFID between real images and their corresponding fake samples, please run:\n```\npython SIFID\u002Fsifid_score.py --path2real \u003Creal images path> --path2fake \u003Cfake images path> \n```  \nMake sure that each of the fake images file name is identical to its corresponding real image file name. Images should be saved in `.jpg` format.\n\n### Super Resolution Results\nSinGAN's SR results on the BSD100 dataset can be download from the 'Downloads' folder.\n\n### User Study\nThe data used for the user study can be found in the Downloads folder. \n\nreal folder: 50 real images, randomly picked from the [places database](http:\u002F\u002Fplaces.csail.mit.edu\u002F)\n\nfake_high_variance folder: random samples starting from n=N for each of the real images \n\nfake_mid_variance folder: random samples starting from n=N-1 for each of the real images \n\nFor additional details please see section 3.1 in our [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.01164.pdf)\n","# SinGAN\n\n[Project](https:\u002F\u002Ftamarott.github.io\u002FSinGAN.htm) | [Arxiv](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.01164.pdf) | [CVF](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_ICCV_2019\u002Fpapers\u002FShaham_SinGAN_Learning_a_Generative_Model_From_a_Single_Natural_Image_ICCV_2019_paper.pdf) | [Supplementary materials](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_ICCV_2019\u002Fsupplemental\u002FShaham_SinGAN_Learning_a_ICCV_2019_supplemental.pdf) | [Talk (ICCV`19)](https:\u002F\u002Fyoutu.be\u002FmdAcPe74tZI?t=3191) \n### 论文\"SinGAN: Learning a Generative Model from a Single Natural Image\"的官方 PyTorch 实现\n#### ICCV 2019 最佳论文奖（Marr 奖）\n\n\n## 来自*单张*图像的随机样本\n使用 SinGAN，你可以从单张自然图像训练生成模型，然后从给定图像生成随机样本，例如：\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftamarott_SinGAN_readme_8cd3f0f4e8a5.png)\n\n\n## SinGAN 的应用\nSinGAN 还可以用于一系列图像处理任务，例如：\n ![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftamarott_SinGAN_readme_cb21c27cc9e5.png)\n这是通过将图像注入已训练的模型来实现的。更多详细信息请参阅论文第 4 节。\n\n\n### 引用\n如果您在研究中使用此代码，请引用我们的论文：\n\n```\n@inproceedings{rottshaham2019singan,\n  title={SinGAN: Learning a Generative Model from a Single Natural Image},\n  author={Rott Shaham, Tamar and Dekel, Tali and Michaeli, Tomer},\n  booktitle={Computer Vision (ICCV), IEEE International Conference on},\n  year={2019}\n}\n```\n\n## 代码\n\n### 安装依赖\n\n```\npython -m pip install -r requirements.txt\n```\n\n此代码已在 python 3.6、torch 1.4 上测试\n\n请注意：由于优化方案的限制，该代码目前仅支持 torch 1.4 或更早版本。\n\n对于更新版本的 torch，您可以尝试此仓库：https:\u002F\u002Fgithub.com\u002Fkligvasser\u002FSinGAN（结果可能与官方实现不完全相同）。\n\n\n### 训练\n要训练您自己的图像的 SinGAN 模型，请将所需的训练图像放在 Input\u002FImages 下，然后运行\n\n```\npython main_train.py --input_name \u003Cinput_file_name>\n```\n\n这还将使用生成的训练模型从最粗尺度（n=0）开始生成随机样本。\n\n要在 CPU 机器上运行此代码，请在调用 `main_train.py` 时指定 `--not_cuda`\n\n### 随机样本\n要生成从任何起始生成尺度的随机样本，请首先在所需图像上训练 SinGAN 模型（如上所述），然后运行 \n\n```\npython random_samples.py --input_name \u003Ctraining_image_file_name> --mode random_samples --gen_start_scale \u003Cgeneration start scale number>\n```\n\n请注意：要使用完整模型，请将生成起始尺度指定为 0，要从第二尺度开始生成，请将其指定为 1，依此类推。 \n\n### 任意尺寸的随机样本\n要生成任意尺寸的随机样本，请首先在所需图像上训练 SinGAN 模型（如上所述），然后运行 \n\n```\npython random_samples.py --input_name \u003Ctraining_image_file_name> --mode random_samples_arbitrary_sizes --scale_h \u003Chorizontal scaling factor> --scale_v \u003Cvertical scaling factor>\n```\n\n### 从单张图像生成动画\n\n要从单张图像生成短动画，请运行\n\n```\npython animation.py --input_name \u003Cinput_file_name> \n```\n\n这将自动启动一个新的训练阶段，使用噪声填充模式。\n\n### 协调\n\n要将粘贴的对象协调到图像中（请参阅论文图 13 中的示例），请首先在所需的背景图像上训练 SinGAN 模型（如上所述），然后将简单粘贴的参考图像及其二进制掩码保存到\"Input\u002FHarmonization\"中（请参阅保存的图像作为示例）。运行命令\n\n```\npython harmonization.py --input_name \u003Ctraining_image_file_name> --ref_name \u003Cnaively_pasted_reference_image_file_name> --harmonization_start_scale \u003Cscale to inject>\n\n```\n\n请注意，不同的注入尺度会产生不同的协调效果。最粗的注入尺度等于 1。 \n\n### 编辑\n\n要编辑图像（请参阅论文图 12 中的示例），请首先在所需的未编辑图像上训练 SinGAN 模型（如上所述），然后将简单编辑保存为\"Input\u002FEditing\"下的参考图像，并附上相应的二进制映射（请参阅保存的图像作为示例）。运行命令\n\n```\npython editing.py --input_name \u003Ctraining_image_file_name> --ref_name \u003Cedited_image_file_name> --editing_start_scale \u003Cscale to inject>\n\n```\n带掩码和不带掩码的输出都将被保存。同样，不同的注入尺度会产生不同的编辑效果。最粗的注入尺度等于 1。 \n\n### 绘画转图像\n\n要将绘画转换为逼真的图像（请参阅论文图 11 中的示例），请首先在所需图像上训练 SinGAN 模型（如上所述），然后将您的绘画保存到\"Input\u002FPaint\"，并运行命令\n\n```\npython paint2image.py --input_name \u003Ctraining_image_file_name> --ref_name \u003Cpaint_image_file_name> --paint_start_scale \u003Cscale to inject>\n\n```\n同样，不同的注入尺度会产生不同的编辑效果。最粗的注入尺度等于 1。 \n\n高级选项：将 quantization_flag 指定为 True，仅重新训练模型的注入级别，以获得上一尺度上采样生成图像的颜色量化版本。对于某些图像，这可能会产生更逼真的结果。\n\n### 超分辨率\n要超分辨率处理图像，请运行：\n```\npython SR.py --input_name \u003CLR_image_file_name>\n```\n这将自动训练一个对应于 4 倍上采样因子的 SinGAN 模型（如果尚不存在）。\n对于不同的 SR 因子，请在调用函数时使用参数 `--sr_factor` 指定。\nSinGAN 在 BSD100 数据集上的结果可以从\"Downloads\"文件夹下载。\n\n## 附加数据和功能\n\n### 单图像 Fréchet Inception Distance（SIFID 分数）\n要计算真实图像与其对应的伪造样本之间的 SIFID，请运行：\n```\npython SIFID\u002Fsifid_score.py --path2real \u003Creal images path> --path2fake \u003Cfake images path> \n```  \n请确保每个伪造图像的文件名与其对应的真实图像的文件名相同。图像应以 `.jpg` 格式保存。\n\n### 超分辨率结果\nSinGAN 在 BSD100 数据集上的 SR 结果可以从\"Downloads\"文件夹下载。\n\n### 用户研究\n用户研究中使用的数据可以在 Downloads 文件夹中找到。\n\nreal 文件夹：50 张真实图像，从 [places 数据库](http:\u002F\u002Fplaces.csail.mit.edu\u002F) 中随机选取\n\nfake_high_variance 文件夹：从 n=N 开始的每个真实图像的随机样本\n\nfake_mid_variance 文件夹：从 n=N-1 开始的每个真实图像的随机样本\n\n更多详细信息请参阅论文第 3.1 节","# SinGAN 快速上手指南\n\nSinGAN 是一个基于单张自然图像训练生成模型的深度学习框架，可用于图像生成、编辑、超分辨率等多种任务。\n\n## 环境准备\n\n- **Python**: 3.6\n- **PyTorch**: 1.4 或更早版本（由于优化方案限制，后续版本可能不兼容）\n- **系统**: Linux \u002F Windows \u002F macOS\n\n## 安装步骤\n\n1. 克隆仓库并进入目录：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ftamarott\u002FSinGAN.git\ncd SinGAN\n```\n\n2. 安装依赖：\n\n```bash\npython -m pip install -r requirements.txt\n```\n\n> **国内加速**：如遇网络问题，可使用清华镜像源：\n> ```bash\n> pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 基本使用\n\n### 1. 训练模型\n\n将训练图像放入 `Input\u002FImages` 目录，然后运行：\n\n```bash\npython main_train.py --input_name \u003Cinput_file_name>\n```\n\n训练完成后会自动生成随机样本。\n\n**CPU 模式**：\n\n```bash\npython main_train.py --input_name \u003Cinput_file_name> --not_cuda\n```\n\n### 2. 生成随机样本\n\n```bash\npython random_samples.py --input_name \u003Ctraining_image_file_name> --mode random_samples --gen_start_scale 0\n```\n\n- `--gen_start_scale 0`：从最粗尺度开始生成\n- `--gen_start_scale 1`：从第二尺度开始，以此类推\n\n### 3. 生成任意尺寸样本\n\n```bash\npython random_samples.py --input_name \u003Ctraining_image_file_name> --mode random_samples_arbitrary_sizes --scale_h \u003C水平缩放因子> --scale_v \u003C垂直缩放因子>\n```\n\n## 常用功能速查\n\n| 功能 | 命令 |\n|------|------|\n| 图像动画 | `python animation.py --input_name \u003Cinput_file_name>` |\n| 图像协调 | `python harmonization.py --input_name \u003Ctraining_image_file_name> --ref_name \u003Cref_image> --harmonization_start_scale \u003Cscale>` |\n| 图像编辑 | `python editing.py --input_name \u003Ctraining_image_file_name> --ref_name \u003Cedit_image> --editing_start_scale \u003Cscale>` |\n| 绘画转图像 | `python paint2image.py --input_name \u003Ctraining_image_file_name> --ref_name \u003Cpaint_image> --paint_start_scale \u003Cscale>` |\n| 超分辨率 | `python SR.py --input_name \u003CLR_image_file_name>` |\n\n## 注意事项\n\n- 首次使用需下载预训练模型或等待训练完成\n- 不同注入尺度（scale）会产生不同的效果，最粗尺度为 1\n- 如需使用更新版 PyTorch，可尝试社区分支：https:\u002F\u002Fgithub.com\u002Fkligvasser\u002FSinGAN","场景背景：某电商团队的设计师小李负责为新品发布会制作一系列营销海报，需要多张风格统一但细节不同的背景纹理图，用于不同尺寸的社交媒体配图。\n\n### 没有 SinGAN 时\n\n- 设计师需要手动在 Photoshop 中逐张绘制或修改背景，工作量巨大且难以保证风格完全一致\n- 使用网络图片存在版权风险，付费购买素材库会员成本较高，单张图片费用从几十到几百元不等\n- 传统图像拼接或滤镜效果生成的纹理显得生硬、不自然，无法达到高端海报的视觉要求\n- 每次需要新尺寸的纹理时，都需要重新设计或调整，无法快速生成任意尺寸的变体\n- 团队沟通成本高，设计师需要反复根据需求修改，难以在短时间内提供多个版本供选择\n\n### 使用 SinGAN 后\n\n- 只需准备一张高质量的背景纹理图，SinGAN 即可自动学习图像的分布特征，生成无限数量的风格一致变体\n- 生成的图像属于原创内容，完全规避版权风险，无需额外付费即可商用\n- 生成结果保留原图的视觉特征和纹理细节，自然度高，看起来如同真实拍摄的不同场景\n- 支持生成任意尺寸的图像，水平\u002F垂直缩放比例灵活可调，满足不同平台的海报需求\n- 可快速生成多个版本供团队挑选，极大缩短设计周期，从数小时缩短至几分钟\n\nSinGAN 让设计师能够以极低的成本获得大量风格统一且合法的图像变体，显著提升创意工作效率。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftamarott_SinGAN_8cd3f0f4.png","tamarott","Tamar Rott Shaham","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ftamarott_89a22742.jpg",null,"https:\u002F\u002Ftamarott.github.io\u002F","https:\u002F\u002Fgithub.com\u002Ftamarott",[82],{"name":83,"color":84,"percentage":85},"Python","#3572A5",100,3347,621,"2026-02-27T10:04:40","NOASSERTION","未说明","需要 NVIDIA GPU（支持 CUDA），未说明具体型号和显存要求",{"notes":93,"python":94,"dependencies":95},"代码默认需要 CUDA 支持，如需在 CPU 上运行请使用 --not_cuda 参数。该代码仅支持 torch 1.4 或更早版本，因优化方案与新版本不兼容。如需在更高版本 torch 上运行，可尝试 https:\u002F\u002Fgithub.com\u002Fkligvasser\u002FSinGAN（但结果可能与官方实现不一致）。安装依赖命令：python -m pip install -r requirements.txt","3.6",[96],"torch 1.4",[35,14],[99,100,101,102,103,104,105,106,107,108,109,110],"singan","gan","official","single-image","harmonization","animation","single-image-animation","single-image-generation","image-edit","arbitrery-sizes","single-image-super-resolution","super-resolution",14,"2026-03-27T02:49:30.150509","2026-04-06T05:37:11.134853",[115,120,125,130,135,140,145],{"id":116,"question_zh":117,"answer_zh":118,"source_url":119},979,"SinGAN 是否支持 1024x1024 像素的高分辨率图像？","是的，SinGAN 支持高分辨率图像。1200x900（相当于 1024x1024）的图像可以在 A100 80GB GPU 上运行。Google Colab 的 GPU 也允许处理 1024x1024 图像，但需要使用适当的 GPU 配置（如 --max_size=1024 参数）。","https:\u002F\u002Fgithub.com\u002Ftamarott\u002FSinGAN\u002Fissues\u002F52",{"id":121,"question_zh":122,"answer_zh":123,"source_url":124},978,"运行 SR.py 时出现 RuntimeError: The size of tensor a (276) must match the size of tensor b (282) 错误如何解决？","这个错误通常是由于 PyTorch 版本兼容性问题导致的。解决方案是使用 torch==1.4.0 和 torchvision==0.5.0 版本。可以在 conda 环境中运行：conda install pytorch=1.4.0 torchvision=0.5.0 -c pytorch","https:\u002F\u002Fgithub.com\u002Ftamarott\u002FSinGAN\u002Fissues\u002F2",{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},980,"如何在 CPU 上运行 SinGAN（不使用 CUDA）？","SinGAN 提供了 --not_cuda 参数用于在 CPU 上运行。使用方法是在训练或生成命令中添加 --not_cuda 标志，例如：python main_train.py --input_name image.jpg --not_cuda。维护者后来也添加了 --no_cuda 参数作为替代。","https:\u002F\u002Fgithub.com\u002Ftamarott\u002FSinGAN\u002Fissues\u002F22",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},981,"训练时出现 RuntimeError: Kernel size can't be greater than actual input size 错误如何解决？","这个错误通常是因为输入图像尺寸太小，导致卷积核大小超过了输入尺寸。解决方案是使用更大尺寸的输入图像，或者调整 --max_size 参数确保图像在每个尺度上都足够大。维护者已修复了这个问题，请确保使用最新版本的代码。","https:\u002F\u002Fgithub.com\u002Ftamarott\u002FSinGAN\u002Fissues\u002F28",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},982,"能否用小分辨率图像训练后生成大分辨率图像？","可以，但需要注意内存问题。使用 random_samples.py 的 random_samples_arbitrary_sizes 模式可以生成任意大小的图像，例如：python random_samples.py --input_name mountain.jpeg --mode random_samples_arbitrary_sizes --scale_h 480 --scale_v 600。如果出现内存不足错误，需要降低 --max_size 参数重新训练。训练时如果因内存不足导致训练中断，检查点可能无法正常工作，需要降低分辨率重新训练。","https:\u002F\u002Fgithub.com\u002Ftamarott\u002FSinGAN\u002Fissues\u002F29",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},983,"训练卡在某个 scale 的最后一个 epoch [1999\u002F2000] 无法继续怎么办？","这是训练过程中的一个已知问题。如果训练在某个 scale 卡住，可以尝试以下方法：1) 降低 --max_size 参数重新训练；2) 从较小的 scale 开始训练（例如只训练到 scale 3），然后使用该模型生成任意大小的图像；3) 检查 GPU 内存是否充足。如果设置了过高的 --max_size 导致内存不足，训练不会留下可用的检查点，需要降低参数重新开始。","https:\u002F\u002Fgithub.com\u002Ftamarott\u002FSinGAN\u002Fissues\u002F19",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},984,"运行 random_samples.py 后输出文件夹为空，没有生成图像怎么办？","这个问题可能由多种原因导致：1) 确保训练已正常完成所有 scale，检查点已保存；2) 确认使用的 --input_name 与训练时一致；3) 检查 --gen_start_scale 参数是否合理（建议从 scale 2 开始）；4) 确保在正确的 conda 环境中运行。如果训练过程中强制中断，可能导致检查点损坏，需要重新完整训练。","https:\u002F\u002Fgithub.com\u002Ftamarott\u002FSinGAN\u002Fissues\u002F7",[]]