[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-naver--deep-image-retrieval":3,"tool-naver--deep-image-retrieval":62},[4,18,26,35,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,2,"2026-04-10T11:39:34",[14,15,13],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":32,"last_commit_at":41,"category_tags":42,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[43,13,15,14],"插件",{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[52,15,13,14],"语言模型",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,61],"视频",{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":78,"owner_twitter":77,"owner_website":79,"owner_url":80,"languages":81,"stars":86,"forks":87,"last_commit_at":88,"license":89,"difficulty_score":32,"env_os":90,"env_gpu":91,"env_ram":90,"env_deps":92,"category_tags":103,"github_topics":77,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":105,"updated_at":106,"faqs":107,"releases":137},7981,"naver\u002Fdeep-image-retrieval","deep-image-retrieval","End-to-end learning of deep visual representations for image retrieval","deep-image-retrieval 是一个专注于图像检索任务的开源深度学习项目，旨在通过端到端的学习方式，让计算机更精准地理解并查找相似图片。它主要解决了传统方法在海量图片库中难以快速、准确匹配目标图像的难题，能够将任意图片压缩成一个紧凑的特征向量，从而高效计算图片间的相似度。\n\n该项目特别适合计算机视觉领域的研究人员和开发者使用，尤其是那些需要构建以图搜图系统、进行特征表示学习或复现前沿学术成果的技术人员。其核心亮点在于网络的全程可微性：从卷积神经网络提取特征，到利用广义平均池化（GeM）进行全局聚合，再到最终的向量归一化，所有环节均可联合训练。此外，项目不仅支持经典的三元组损失函数，还创新性地引入了直接优化平均精度（Average Precision）的列表级损失函数，显著提升了检索排序的质量。代码基于 PyTorch 构建，提供了多种在牛津和巴黎等权威数据集上表现优异的预训练模型，方便用户直接评估或作为基线进行二次开发。","# Deep Image Retrieval\n\nThis repository contains the models and the evaluation scripts (in Python3 and Pytorch 1.0+) of the papers:\n\n**[1] End-to-end Learning of Deep Visual Representations for Image Retrieval**\nAlbert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus, IJCV 2017 [\\[PDF\\]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1610.07940)\n\n**[2] Learning with Average Precision: Training Image Retrieval with a Listwise Loss**\nJerome Revaud, Jon Almazan, Rafael S. Rezende, Cesar de Souza, ICCV 2019 [\\[PDF\\]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.07589)\n\n\nBoth papers tackle the problem of image retrieval and explore different ways to learn deep visual representations for this task. In both cases, a CNN is used to extract a feature map that is aggregated into a compact, fixed-length representation by a global-aggregation layer*. Finally, this representation is first projected using a FC layer, and L2 normalized so images can be efficiently compared with the dot product.\n\n\n![dir_network](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnaver_deep-image-retrieval_readme_d1fdb93f5a62.png)\n\nAll components in this network, including the aggregation layer, are differentiable, which makes it end-to-end trainable for the end task. In [1], a Siamese architecture that combines three streams with a triplet loss was proposed to train this network.  In [2], this work was extended by replacing the triplet loss with a new loss that directly optimizes for Average Precision.\n\n![Losses](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnaver_deep-image-retrieval_readme_70b0ce13b466.png)\n\n\\* Originally, [1] used R-MAC pooling [3] as the global-aggregation layer. However, due to its efficiency and better performace we have replaced the R-MAC pooling layer with the Generalized-mean pooling layer (GeM) proposed in [4]. You can find the original implementation of [1] in Caffe following [this link](https:\u002F\u002Feurope.naverlabs.com\u002FResearch\u002FComputer-Vision\u002FLearning-Visual-Representations\u002FDeep-Image-Retrieval\u002F).\n\n\n## News\n\n- **(6\u002F9\u002F2019)** AP loss, Tie-aware AP loss, Triplet Margin loss, and Triplet LogExp loss added for reference\n- **(5\u002F9\u002F2019)** Update evaluation and AP numbers for all the benchmarks\n- **(22\u002F7\u002F2019)** Paper **_Learning with Average Precision: Training Image Retrieval with a Listwise Loss_** accepted at ICCV 2019\n\n\n## Pre-requisites\n\nIn order to run this toolbox you will need:\n\n- Python3 (tested with Python 3.7.3)\n- PyTorch (tested with version 1.4)\n- The following packages: numpy, matplotlib, tqdm, scikit-learn\n\nWith conda you can run the following commands:\n\n```\nconda install numpy matplotlib tqdm scikit-learn\nconda install pytorch torchvision -c pytorch\n```\n\n## Installation\n\n```\n# Download the code\ngit clone https:\u002F\u002Fgithub.com\u002Fnaver\u002Fdeep-image-retrieval.git\n\n# Create env variables\ncd deep-image-retrieval\nexport DIR_ROOT=$PWD\nexport DB_ROOT=\u002FPATH\u002FTO\u002FYOUR\u002FDATASETS\n# for example: export DB_ROOT=$PWD\u002Fdirtorch\u002Fdata\u002Fdatasets\n```\n\n\n## Evaluation\n\n\n### Pre-trained models\n\nThe table below contains the pre-trained models that we provide with this library, together with their mAP performance on some of the most well-know image retrieval benchmakrs: [Oxford5K](http:\u002F\u002Fwww.robots.ox.ac.uk\u002F~vgg\u002Fdata\u002Foxbuildings\u002F), [Paris6K](http:\u002F\u002Fwww.robots.ox.ac.uk\u002F~vgg\u002Fdata\u002Fparisbuildings\u002F), and their Revisited versions ([ROxford5K and RParis6K](https:\u002F\u002Fgithub.com\u002Ffilipradenovic\u002Frevisitop)).\n\n\n| Model | Oxford5K | Paris6K |  ROxford5K (med\u002Fhard) | RParis6K (med\u002Fhard) |\n|---\t|:-:|:-:|:-:|:-:|\n|  [Resnet101-TL-MAC](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F13MUGNwn_CYGZvqDBD8FGD8fVYxThsSDg\u002Fview?usp=sharing) |  85.6\t| 90.1 |  63.3 \u002F 35.7 \t|   76.6 \u002F 55.5  |\n|  [Resnet101-TL-GeM](https:\u002F\u002Fdrive.google.com\u002Fopen?id=1vhm1GYvn8T3-1C4SPjPNJOuTU9UxKAG6) | 85.7 | **93.4** | 64.5 \u002F 40.9 |  78.8 \u002F 59.2  |\n|  [Resnet50-AP-GeM](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1oPtE_go9tnsiDLkWjN4NMpKjh-_md1G5\u002Fview?usp=sharing) | 87.7 \t| 91.9 |  65.5 \u002F 41.0 | 77.6 \u002F 57.1 |\n|  [Resnet101-AP-GeM](https:\u002F\u002Fdrive.google.com\u002Fopen?id=1UWJGDuHtzaQdFhSMojoYVQjmCXhIwVvy) | **89.1** | **93.0** | **67.1** \u002F **42.3** |  **80.3**\u002F**60.9** |\n|  [Resnet101-AP-GeM-LM18](https:\u002F\u002Fdrive.google.com\u002Fopen?id=1r76NLHtJsH-Ybfda4aLkUIoW3EEsi25I)** |  88.1\t| **93.1** | 66.3 \u002F **42.5**\t|   **80.2** \u002F **60.8**  |\n\n\nThe name of the model encodes the backbone architecture of the network and the loss that has been used to train it (TL for triplet loss and AP for Average Precision loss). All models use **Generalized-mean pooling (GeM)** [3] as the global pooling mechanism, except for the model in the first row that uses MAC [3] \\(i.e. max-pooling), and have been trained on the **Landmarks-clean** [1] dataset (the clean version of the [Landmarks dataset](http:\u002F\u002Fsites.skoltech.ru\u002Fcompvision\u002Fprojects\u002Fneuralcodes\u002F)) directly **fine-tuning from ImageNet**. These numbers have been obtained using a **single resolution** and applying **whitening** to the output features (which has also been learned on Landmarks-clean). For a detailed explanation of all the hyper-parameters see [1] and [2] for the triplet loss and AP loss models, respectively.\n\n** For the sake of completeness, we have added an extra model, `Resnet101-AP-LM18`, which has been trained on the [Google-Landmarks Dataset](https:\u002F\u002Fwww.kaggle.com\u002Fgoogle\u002Fgoogle-landmarks-dataset), a large dataset consisting of more than 1M images and 15K classes.\n\n### Reproducing the results\n\nThe script `test_dir.py` can be used to evaluate the pre-trained models provided and to reproduce the results above:\n\n```\npython -m dirtorch.test_dir --dataset DATASET --checkpoint PATH_TO_MODEL \\\n\t\t[--whiten DATASET] [--whitenp POWER] [--aqe ALPHA-QEXP] \\\n\t\t[--trfs TRANSFORMS] [--gpu ID] [...]\n```\n\n- `--dataset`: selects the dataset (eg.: Oxford5K, Paris6K, ROxford5K, RParis6K) [**required**]\n- `--checkpoint`: path to the model weights [**required**]\n- `--whiten`: applies whitening to the output features [default 'Landmarks_clean']\n- `--whitenp`: whitening power [default: 0.25]\n- `--aqe`: alpha-query expansion parameters [default: None]\n- `--trfs`: input image transformations (can be used to apply multi-scale) [default: None]\n- `--gpu`: selects the GPU ID (-1 selects the CPU)\n\nFor example, to reproduce the results of the Resnet101-AP_loss model on the RParis6K dataset download the model `Resnet-101-AP-GeM.pt` from [here](https:\u002F\u002Fdrive.google.com\u002Fopen?id=1mi50tG6oXY1eE9yJnmGCPdTmlIjG7mr0) and run:\n\n```\ncd $DIR_ROOT\nexport DB_ROOT=\u002FPATH\u002FTO\u002FYOUR\u002FDATASETS\n\npython -m dirtorch.test_dir --dataset RParis6K \\\n\t\t--checkpoint dirtorch\u002Fdata\u002FResnet101-AP-GeM.pt \\\n\t\t--whiten Landmarks_clean --whitenp 0.25 --gpu 0\n```\n\nAnd you should see the following output:\n\n```\n>> Evaluation...\n * mAP-easy = 0.907568\n * mAP-medium = 0.803098\n * mAP-hard = 0.608556\n```\n\n**Note:** this script integrates an automatic downloader for the Oxford5K, Paris6K, ROxford5K, and RParis6K datasets (kudos to Filip Radenovic ;)). The datasets will be saved in `$DB_ROOT`.\n\n## Feature extractor\n\nYou can also use the pre-trained models to extract features from your own datasets or collection of images. For that we provide the script `feature_extractor.py`:\n\n```\npython -m dirtorch.extract_features --dataset DATASET --checkpoint PATH_TO_MODEL \\\n\t\t--output PATH_TO_FILE [--whiten DATASET] [--whitenp POWER] \\\n\t\t[--trfs TRANSFORMS] [--gpu ID] [...]\n```\n\nwhere `--output` is used to specify the destination where the features will be saved. The rest of the parameters are the same as seen above.\n\nFor example, this is how the script can be used to extract a feature representation for each one of the images in the RParis6K dataset using the `Resnet-101-AP-GeM.pt` model, and storing them in `rparis6k_features.npy`:\n\n```\ncd $DIR_ROOT\nexport DB_ROOT=\u002FPATH\u002FTO\u002FYOUR\u002FDATASETS\n\npython -m dirtorch.extract_features --dataset RParis6K \\\n\t\t--checkpoint dirtorch\u002Fdata\u002FResnet101-AP-GeM.pt \\\n\t\t--output rparis6k_features.npy \\\n\t\t--whiten Landmarks_clean --whitenp 0.25 --gpu 0\n```\n\nThe library also provides a **generic class dataset** (`ImageList`) that allows you to specify the list of images by providing a simple text file.\n\n```\n--dataset 'ImageList(\"PATH_TO_TEXTFILE\" [, \"IMAGES_ROOT\"])'\n```\n\nEach row of the text file should contain a single path to a given image:\n\n```\n\u002FPATH\u002FTO\u002FYOUR\u002FDATASET\u002Fimages\u002Fimage1.jpg\n\u002FPATH\u002FTO\u002FYOUR\u002FDATASET\u002Fimages\u002Fimage2.jpg\n\u002FPATH\u002FTO\u002FYOUR\u002FDATASET\u002Fimages\u002Fimage3.jpg\n\u002FPATH\u002FTO\u002FYOUR\u002FDATASET\u002Fimages\u002Fimage4.jpg\n\u002FPATH\u002FTO\u002FYOUR\u002FDATASET\u002Fimages\u002Fimage5.jpg\n```\n\nAlternatively, you can also use relative paths, and use `IMAGES_ROOT` to specify the root folder.\n\n## Feature extraction with kapture datasets\n\nKapture is a pivot file format, based on text and binary files, used to describe SFM (Structure From Motion) and more generally sensor-acquired data.\n\nIt is available at https:\u002F\u002Fgithub.com\u002Fnaver\u002Fkapture.\nIt contains conversion tools for popular formats and several popular datasets are directly available in kapture.\n\nIt can be installed with:\n```bash\npip install kapture\n```\n\nDatasets can be downloaded with:\n```bash\nkapture_download_dataset.py update\nkapture_download_dataset.py list\n# e.g.: install mapping and query of Extended-CMU-Seasons_slice22\nkapture_download_dataset.py install \"Extended-CMU-Seasons_slice22_*\"\n```\nIf you want to convert your own dataset into kapture, please find some examples [here](https:\u002F\u002Fgithub.com\u002Fnaver\u002Fkapture\u002Fblob\u002Fmaster\u002Fdoc\u002Fdatasets.adoc).\n\nOnce installed, you can extract global features for your kapture dataset with:\n```bash\ncd $DIR_ROOT\npython -m dirtorch.extract_kapture --kapture-root pathto\u002Fyourkapturedataset --checkpoint dirtorch\u002Fdata\u002FResnet101-AP-GeM-LM18.pt --gpu 0\n```\n\nRun `python -m dirtorch.extract_kapture --help` for more information on the extraction parameters. \n\n## Citations\n\nPlease consider citing the following papers in your publications if this helps your research.\n\n```\n@article{GARL17,\n title = {End-to-end Learning of Deep Visual Representations for Image Retrieval},\n author = {Gordo, A. and Almazan, J. and Revaud, J. and Larlus, D.}\n journal = {IJCV},\n year = {2017}\n}\n\n@inproceedings{RARS19,\n title = {Learning with Average Precision: Training Image Retrieval with a Listwise Loss},\n author = {Revaud, J. and Almazan, J. and Rezende, R.S. and de Souza, C.R.}\n booktitle = {ICCV},\n year = {2019}\n}\n```\n\n## Contributors\n\nThis library has been developed by Jerome Revaud, Rafael de Rezende, Cesar de Souza, Diane Larlus, and Jon Almazan at **[Naver Labs Europe](https:\u002F\u002Feurope.naverlabs.com)**.\n\n\n**Special thanks to [Filip Radenovic](https:\u002F\u002Fgithub.com\u002Ffilipradenovic).** In this library, we have used the ROxford5K and RParis6K downloader from his awesome **[CNN-imageretrieval repository](https:\u002F\u002Fgithub.com\u002Ffilipradenovic\u002Fcnnimageretrieval-pytorch)**. Consider checking it out if you want to train your own models for image retrieval!\n\n## References\n\n[1] Gordo, A., Almazan, J., Revaud, J., Larlus, D., [End-to-end Learning of Deep Visual Representations for Image Retrieval](https:\u002F\u002Farxiv.org\u002Fabs\u002F1610.07940). IJCV 2017\n\n[2] Revaud, J., Almazan, J., Rezende, R.S., de Souza, C., [Learning with Average Precision: Training Image Retrieval with a Listwise Loss](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.07589). ICCV 2019\n\n[3] Tolias, G., Sicre, R., Jegou, H., [Particular object retrieval with integral max-pooling of CNN activations](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.05879). ICLR 2016\n\n[4] Radenovic, F., Tolias, G., Chum, O., [Fine-tuning CNN Image Retrieval with No Human Annotation](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.02512). TPAMI 2018\n","# 深度图像检索\n\n本仓库包含以下论文中的模型及评估脚本（基于 Python3 和 PyTorch 1.0+）：\n\n**[1] 面向图像检索的深度视觉表征端到端学习**  \nAlbert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus，IJCV 2017 [\\[PDF\\]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1610.07940)\n\n**[2] 基于平均精度的学习：使用列表级损失训练图像检索**  \nJerome Revaud, Jon Almazan, Rafael S. Rezende, Cesar de Souza，ICCV 2019 [\\[PDF\\]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.07589)\n\n\n这两篇论文均针对图像检索问题，探讨了为该任务学习深度视觉表征的不同方法。在两种情况下，都使用卷积神经网络提取特征图，并通过全局聚合层将其聚合为紧凑的固定长度表示*。随后，该表示先经过全连接层投影，再进行 L2 归一化，以便能够高效地利用点积来比较图像。\n\n![dir_network](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnaver_deep-image-retrieval_readme_d1fdb93f5a62.png)\n\n该网络中的所有组件，包括聚合层，均可微分，因此可以针对最终任务进行端到端训练。在 [1] 中，提出了一种结合三条流并采用三元组损失的暹罗架构来训练该网络。而在 [2] 中，这项工作被进一步扩展，用一种直接优化平均精度的新损失函数取代了三元组损失。\n\n![Losses](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnaver_deep-image-retrieval_readme_70b0ce13b466.png)\n\n\\* 最初，[1] 使用 R-MAC 池化 [3] 作为全局聚合层。然而，由于其效率更高且性能更优，我们已将 R-MAC 池化层替换为 [4] 中提出的广义均值池化层 (GeM)。您可以通过此链接找到 [1] 的原始 Caffe 实现：[链接](https:\u002F\u002Feurope.naverlabs.com\u002FResearch\u002FComputer-Vision\u002FLearning-Visual-Representations\u002FDeep-Image-Retrieval\u002F)。\n\n\n## 新闻\n\n- **(6\u002F9\u002F2019)** 添加了 AP 损失、考虑并列情况的 AP 损失、三元组边界损失和三元组 LogExp 损失，供参考\n- **(5\u002F9\u002F2019)** 更新了所有基准测试的评估结果及 AP 指标\n- **(22\u002F7\u002F2019)** 论文 **_基于平均精度的学习：使用列表级损失训练图像检索_** 被 ICCV 2019 接受\n\n\n## 先决条件\n\n为了运行本工具箱，您需要：\n\n- Python3（经测试版本为 3.7.3）\n- PyTorch（经测试版本为 1.4）\n- 以下软件包：numpy、matplotlib、tqdm、scikit-learn\n\n使用 conda 时，您可以运行以下命令：\n\n```\nconda install numpy matplotlib tqdm scikit-learn\nconda install pytorch torchvision -c pytorch\n```\n\n\n## 安装\n\n```\n# 下载代码\ngit clone https:\u002F\u002Fgithub.com\u002Fnaver\u002Fdeep-image-retrieval.git\n\n# 设置环境变量\ncd deep-image-retrieval\nexport DIR_ROOT=$PWD\nexport DB_ROOT=\u002FPATH\u002FTO\u002FYOUR\u002FDATASETS\n# 例如：export DB_ROOT=$PWD\u002Fdirtorch\u002Fdata\u002Fdatasets\n```\n\n\n## 评估\n\n\n### 预训练模型\n\n下表包含了我们随本库提供的预训练模型及其在一些最著名的图像检索基准上的 mAP 性能：[Oxford5K](http:\u002F\u002Fwww.robots.ox.ac.uk\u002F~vgg\u002Fdata\u002Foxbuildings\u002F)、[Paris6K](http:\u002F\u002Fwww.robots.ox.ac.uk\u002F~vgg\u002Fdata\u002Fparisbuildings\u002F)，以及它们的重访版本（[ROxford5K 和 RParis6K](https:\u002F\u002Fgithub.com\u002Ffilipradenovic\u002Frevisitop)）。\n\n\n| 模型 | Oxford5K | Paris6K | ROxford5K (中\u002F难) | RParis6K (中\u002F难) |\n|---\t|:-:|:-:|:-:|:-:|\n|  [Resnet101-TL-MAC](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F13MUGNwn_CYGZvqDBD8FGD8fVYxThsSDg\u002Fview?usp=sharing) |  85.6\t| 90.1 |  63.3 \u002F 35.7 \t|   76.6 \u002F 55.5  |\n|  [Resnet101-TL-GeM](https:\u002F\u002Fdrive.google.com\u002Fopen?id=1vhm1GYvn8T3-1C4SPjPNJOuTU9UxKAG6) | 85.7 | **93.4** | 64.5 \u002F 40.9 |  78.8 \u002F 59.2  |\n|  [Resnet50-AP-GeM](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1oPtE_go9tnsiDLkWjN4NMpKjh-_md1G5\u002Fview?usp=sharing) | 87.7 \t| 91.9 |  65.5 \u002F 41.0 | 77.6 \u002F 57.1 |\n|  [Resnet101-AP-GeM](https:\u002F\u002Fdrive.google.com\u002Fopen?id=1UWJGDuHtzaQdFhSMojoYVQjmCXhIwVvy) | **89.1** | **93.0** | **67.1** \u002F **42.3** |  **80.3**\u002F**60.9** |\n|  [Resnet101-AP-GeM-LM18](https:\u002F\u002Fdrive.google.com\u002Fopen?id=1r76NLHtJsH-Ybfda4aLkUIoW3EEsi25I)** |  88.1\t| **93.1** | 66.3 \u002F **42.5**\t|   **80.2** \u002F **60.8**  |\n\n\n模型名称编码了网络的主干架构以及用于训练它的损失类型（TL 表示三元组损失，AP 表示平均精度损失）。除第一行的模型使用 MAC [3]（即最大池化）外，所有模型均采用 **广义均值池化 (GeM)** [3] 作为全局池化机制，并且均在 **Landmarks-clean** [1] 数据集上训练（[Landmarks 数据集](http:\u002F\u002Fsites.skoltech.ru\u002Fcompvision\u002Fprojects\u002Fneuralcodes\u002F)的干净版本），直接从 ImageNet 进行 **微调**。这些指标是在使用 **单一分辨率**并对输出特征应用 **白化**处理后获得的（白化参数同样是在 Landmarks-clean 上学习得到的）。有关所有超参数的详细说明，请参阅分别针对三元组损失和 AP 损失模型的 [1] 和 [2]。\n\n** 为完整起见，我们还添加了一个额外的模型 `Resnet101-AP-LM18`，该模型是在 [Google-Landmarks 数据集](https:\u002F\u002Fwww.kaggle.com\u002Fgoogle\u002Fgoogle-landmarks-dataset)上训练的，这是一个包含超过 100 万张图像和 1.5 万个类别的大型数据集。\n\n### 复现结果\n\n脚本 `test_dir.py` 可用于评估提供的预训练模型并复现上述结果：\n\n```\npython -m dirtorch.test_dir --dataset DATASET --checkpoint PATH_TO_MODEL \\\n\t\t[--whiten DATASET] [--whitenp POWER] [--aqe ALPHA-QEXP] \\\n\t\t[--trfs TRANSFORMS] [--gpu ID] [...]\n```\n\n- `--dataset`: 选择数据集（例如：Oxford5K、Paris6K、ROxford5K、RParis6K）[**必填**]\n- `--checkpoint`: 模型权重路径 [**必填**]\n- `--whiten`: 对输出特征应用白化处理 [默认为 'Landmarks_clean']\n- `--whitenp`: 白化强度 [默认为 0.25]\n- `--aqe`: alpha-query 扩展参数 [默认为 None]\n- `--trfs`: 输入图像变换（可用于多尺度处理）[默认为 None]\n- `--gpu`: 选择 GPU ID（-1 表示使用 CPU）\n\n例如，要复现 Resnet101-AP_loss 模型在 RParis6K 数据集上的结果，请从 [此处](https:\u002F\u002Fdrive.google.com\u002Fopen?id=1mi50tG6oXY1eE9yJnmGCPdTmlIjG7mr0)下载模型 `Resnet-101-AP-GeM.pt`，然后执行：\n\n```\ncd $DIR_ROOT\nexport DB_ROOT=\u002FPATH\u002FTO\u002FYOUR\u002FDATASETS\n\npython -m dirtorch.test_dir --dataset RParis6K \\\n\t\t--checkpoint dirtorch\u002Fdata\u002FResnet101-AP-GeM.pt \\\n\t\t--whiten Landmarks_clean --whitenp 0.25 --gpu 0\n```\n\n您应该会看到如下输出：\n\n```\n>> 评估...\n * mAP-易 = 0.907568\n * mAP-中 = 0.803098\n * mAP-难 = 0.608556\n```\n\n**注意：** 此脚本集成了 Oxford5K、Paris6K、ROxford5K 和 RParis6K 数据集的自动下载功能（感谢 Filip Radenovic ;))。数据集将保存在 `$DB_ROOT` 目录下。\n\n## 特征提取器\n\n你也可以使用预训练模型从自己的数据集或图像集合中提取特征。为此，我们提供了脚本 `feature_extractor.py`：\n\n```\npython -m dirtorch.extract_features --dataset DATASET --checkpoint PATH_TO_MODEL \\\n\t\t--output PATH_TO_FILE [--whiten DATASET] [--whitenp POWER] \\\n\t\t[--trfs TRANSFORMS] [--gpu ID] [...]\n```\n\n其中 `--output` 用于指定特征将保存的目标路径。其余参数与上述相同。\n\n例如，以下是如何使用该脚本通过 `Resnet-101-AP-GeM.pt` 模型为 RParis6K 数据集中每张图像提取特征表示，并将其存储在 `rparis6k_features.npy` 中：\n\n```\ncd $DIR_ROOT\nexport DB_ROOT=\u002FPATH\u002FTO\u002FYOUR\u002FDATASETS\n\npython -m dirtorch.extract_features --dataset RParis6K \\\n\t\t--checkpoint dirtorch\u002Fdata\u002FResnet101-AP-GeM.pt \\\n\t\t--output rparis6k_features.npy \\\n\t\t--whiten Landmarks_clean --whitenp 0.25 --gpu 0\n```\n\n该库还提供了一个**通用数据集类**（`ImageList`），允许通过提供一个简单的文本文件来指定图像列表。\n\n```\n--dataset 'ImageList(\"PATH_TO_TEXTFILE\" [, \"IMAGES_ROOT\"])'\n```\n\n文本文件的每一行应包含单个图像的路径：\n\n```\n\u002FPATH\u002FTO\u002FYOUR\u002FDATASET\u002Fimages\u002Fimage1.jpg\n\u002FPATH\u002FTO\u002FYOUR\u002FDATASET\u002Fimages\u002Fimage2.jpg\n\u002FPATH\u002FTO\u002FYOUR\u002FDATASET\u002Fimages\u002Fimage3.jpg\n\u002FPATH\u002FTO\u002FYOUR\u002FDATASET\u002Fimages\u002Fimage4.jpg\n\u002FPATH\u002FTO\u002FYOUR\u002FDATASET\u002Fimages\u002Fimage5.jpg\n```\n\n或者，你也可以使用相对路径，并通过 `IMAGES_ROOT` 指定根目录。\n\n## 使用 kapture 数据集进行特征提取\n\nKapture 是一种基于文本和二进制文件的枢纽文件格式，用于描述 SFM（运动恢复结构）以及更广泛意义上的传感器采集数据。\n\n它可在 https:\u002F\u002Fgithub.com\u002Fnaver\u002Fkapture 上获取。该库包含常用格式的转换工具，并且多个流行的数据集可以直接以 kapture 格式提供。\n\n可以通过以下命令安装：\n```bash\npip install kapture\n```\n\n数据集可以使用以下命令下载：\n```bash\nkapture_download_dataset.py update\nkapture_download_dataset.py list\n# 例如：安装 Extended-CMU-Seasons_slice22 的映射和查询数据\nkapture_download_dataset.py install \"Extended-CMU-Seasons_slice22_*\"\n```\n\n如果你想将自己的数据集转换为 kapture 格式，请参阅 [此处](https:\u002F\u002Fgithub.com\u002Fnaver\u002Fkapture\u002Fblob\u002Fmaster\u002Fdoc\u002Fdatasets.adoc) 的示例。\n\n安装完成后，你可以使用以下命令为你的 kapture 数据集提取全局特征：\n```bash\ncd $DIR_ROOT\npython -m dirtorch.extract_kapture --kapture-root pathto\u002Fyourkapturedataset --checkpoint dirtorch\u002Fdata\u002FResnet101-AP-GeM-LM18.pt --gpu 0\n```\n\n运行 `python -m dirtorch.extract_kapture --help` 可以获取更多关于提取参数的信息。\n\n## 引用\n\n如果你的研究受益于本项目，请在你的出版物中引用以下论文：\n\n```\n@article{GARL17,\n title = {端到端学习用于图像检索的深度视觉表示},\n author = {Gordo, A. 和 Almazan, J. 和 Revaud, J. 和 Larlus, D.}\n journal = {IJCV},\n year = {2017}\n}\n\n@inproceedings{RARS19,\n title = {基于平均精度的学习：使用列表损失训练图像检索},\n author = {Revaud, J. 和 Almazan, J. 和 Rezende, R.S. 和 de Souza, C.R.}\n booktitle = {ICCV},\n year = {2019}\n}\n```\n\n## 贡献者\n\n本库由 Jerome Revaud、Rafael de Rezende、Cesar de Souza、Diane Larlus 和 Jon Almazan 在 **[Naver Labs Europe](https:\u002F\u002Feurope.naverlabs.com)** 开发。\n\n\n**特别感谢 [Filip Radenovic](https:\u002F\u002Fgithub.com\u002Ffilipradenovic)。** 在本库中，我们使用了来自他优秀的 **[CNN-imageretrieval 仓库](https:\u002F\u002Fgithub.com\u002Ffilipradenovic\u002Fcnnimageretrieval-pytorch)** 的 ROxford5K 和 RParis6K 下载工具。如果你想训练自己的图像检索模型，不妨去看看这个仓库！\n\n## 参考文献\n\n[1] Gordo, A., Almazan, J., Revaud, J., Larlus, D.，《端到端学习用于图像检索的深度视觉表示》（https:\u002F\u002Farxiv.org\u002Fabs\u002F1610.07940）。IJCV 2017\n\n[2] Revaud, J., Almazan, J., Rezende, R.S., de Souza, C.，《基于平均精度的学习：使用列表损失训练图像检索》（https:\u002F\u002Farxiv.org\u002Fabs\u002F1906.07589）。ICCV 2019\n\n[3] Tolias, G., Sicre, R., Jegou, H.，《利用 CNN 激活的积分最大池化进行特定目标检索》（https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.05879）。ICLR 2016\n\n[4] Radenovic, F., Tolias, G., Chum, O.，《无需人工标注的 CNN 图像检索微调》（https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.02512）。TPAMI 2018","# deep-image-retrieval 快速上手指南\n\n`deep-image-retrieval` 是一个基于 PyTorch 的图像检索工具库，提供了用于提取深度视觉特征预训练模型及评估脚本。该项目实现了两篇顶级会议论文（IJCV 2017, ICCV 2019）中的算法，支持使用 GeM 池化和平均精度（Average Precision）损失函数进行端到端训练与推理。\n\n## 环境准备\n\n在运行此工具箱之前，请确保您的系统满足以下要求：\n\n*   **操作系统**: Linux \u002F macOS (Windows 需自行配置兼容环境)\n*   **Python**: Python 3.7+ (推荐 3.7.3)\n*   **深度学习框架**: PyTorch 1.4+ (推荐最新版)\n*   **依赖包**: `numpy`, `matplotlib`, `tqdm`, `scikit-learn`\n\n### 安装依赖\n\n推荐使用 `conda` 创建虚拟环境并安装依赖。国内用户可使用清华源或中科大源加速下载：\n\n```bash\n# 创建并激活环境 (可选)\nconda create -n dir_env python=3.7\nconda activate dir_env\n\n# 安装基础依赖 (使用清华源加速)\nconda install -c https:\u002F\u002Fmirrors.tuna.tsinghua.edu.cn\u002Fanaconda\u002Fpkgs\u002Fmain numpy matplotlib tqdm scikit-learn\n\n# 安装 PyTorch (根据 CUDA 版本选择，此处以 CPU 版为例，GPU 版请访问 pytorch.org 获取对应命令)\n# 国内加速源示例 (清华源)\nconda install pytorch torchvision cpuonly -c pytorch -c https:\u002F\u002Fmirrors.tuna.tsinghua.edu.cn\u002Fanaconda\u002Fcloud\u002Fpytorch\u002F\n```\n\n## 安装步骤\n\n1.  **克隆代码仓库**\n\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fnaver\u002Fdeep-image-retrieval.git\n    cd deep-image-retrieval\n    ```\n\n2.  **配置环境变量**\n\n    设置项目根目录和数据集存放路径：\n\n    ```bash\n    export DIR_ROOT=$PWD\n    # 请将 \u002FPATH\u002FTO\u002FYOUR\u002FDATASETS 替换为您实际的数据集存储路径\n    export DB_ROOT=\u002FPATH\u002FTO\u002FYOUR\u002FDATASETS\n    ```\n\n## 基本使用\n\n本工具主要包含两个核心功能：**模型评估**（复现论文结果）和**特征提取**（应用于自定义图片）。\n\n### 1. 评估预训练模型\n\n您可以下载预训练模型并在标准数据集（如 Oxford5K, Paris6K 等）上进行评估。脚本会自动下载测试数据集到 `$DB_ROOT` 目录。\n\n**示例**：使用 `Resnet101-AP-GeM` 模型在 `RParis6K` 数据集上评估 mAP。\n\n首先，从 [Google Drive](https:\u002F\u002Fdrive.google.com\u002Fopen?id=1mi50tG6oXY1eE9yJnmGCPdTmlIjG7mr0) 下载模型文件 `Resnet101-AP-GeM.pt` 并放入 `dirtorch\u002Fdata\u002F` 目录（或指定任意路径）。\n\n然后运行以下命令：\n\n```bash\ncd $DIR_ROOT\n\npython -m dirtorch.test_dir --dataset RParis6K \\\n\t\t--checkpoint dirtorch\u002Fdata\u002FResnet101-AP-GeM.pt \\\n\t\t--whiten Landmarks_clean --whitenp 0.25 --gpu 0\n```\n\n**参数说明**：\n*   `--dataset`: 数据集名称 (Oxford5K, Paris6K, ROxford5K, RParis6K)。\n*   `--checkpoint`: 模型权重文件路径。\n*   `--whiten`: 应用白化处理 (默认使用在 Landmarks_clean 上学到的参数)。\n*   `--gpu`: 指定 GPU ID，若设为 `-1` 则使用 CPU。\n\n预期输出将包含不同难度下的 mAP 值：\n```text\n>> Evaluation...\n * mAP-easy = 0.907568\n * mAP-medium = 0.803098\n * mAP-hard = 0.608556\n```\n\n### 2. 提取自定义图像特征\n\n您可以使用预训练模型为自己的图片集合提取特征向量。\n\n#### 方式 A：使用内置数据集格式\n如果图片已整理为支持的格式，直接指定 dataset 名称即可。\n\n```bash\npython -m dirtorch.extract_features --dataset RParis6K \\\n\t\t--checkpoint dirtorch\u002Fdata\u002FResnet101-AP-GeM.pt \\\n\t\t--output rparis6k_features.npy \\\n\t\t--whiten Landmarks_clean --whitenp 0.25 --gpu 0\n```\n\n#### 方式 B：使用自定义图片列表 (推荐)\n创建一个文本文件（例如 `images.txt`），每行包含一张图片的绝对路径或相对于根目录的路径：\n\n```text\n\u002Fdata\u002Fimages\u002Fimage1.jpg\n\u002Fdata\u002Fimages\u002Fimage2.jpg\n\u002Fdata\u002Fimages\u002Fimage3.jpg\n```\n\n运行提取命令，通过 `ImageList` 类加载该文本文件：\n\n```bash\npython -m dirtorch.extract_features --dataset 'ImageList(\"images.txt\", \"\u002Fdata\u002Fimages\")' \\\n\t\t--checkpoint dirtorch\u002Fdata\u002FResnet101-AP-GeM.pt \\\n\t\t--output my_custom_features.npy \\\n\t\t--whiten Landmarks_clean --whitenp 0.25 --gpu 0\n```\n\n*   `--output`: 指定保存特征文件的 `.npy` 路径。\n*   `ImageList(\"文本文件路径\", \"图片根目录\")`: 第二个参数为可选，若文本中使用相对路径则必须提供。\n\n### 3. 进阶：支持 Kapture 数据集格式\n\n如果您使用 `kapture` 格式管理数据（常用于 SLAM 和视觉定位），请先安装 kapture：\n\n```bash\npip install kapture\n# 下载示例数据集 (可选)\nkapture_download_dataset.py update\nkapture_download_dataset.py install \"Extended-CMU-Seasons_slice22_*\"\n```\n\n提取特征命令如下：\n\n```bash\npython -m dirtorch.extract_kapture --kapture-root pathto\u002Fyourkapturedataset \\\n\t\t--checkpoint dirtorch\u002Fdata\u002FResnet101-AP-GeM-LM18.pt \\\n\t\t--gpu 0\n```","某大型在线艺术品交易平台的技术团队正致力于升级其“以图搜图”功能，以便买家能通过上传局部细节图快速找到同款或相似风格的画作。\n\n### 没有 deep-image-retrieval 时\n- **检索精度低**：传统特征提取方法难以理解艺术品的深层语义，导致搜索“印象派笔触”时返回大量无关的写实风格图片。\n- **细粒度识别差**：无法有效区分构图相似但细节不同的作品，用户很难通过局部截图找到原画的全貌。\n- **优化目标偏差**：模型训练使用通用的分类损失函数，未直接优化检索核心指标（如平均精度 mAP），导致排序结果不符合业务需求。\n- **迭代成本高**：特征提取与聚合模块分离，无法端到端联合调优，每次调整都需要繁琐的多阶段训练。\n\n### 使用 deep-image-retrieval 后\n- **语义匹配精准**：利用端到端学习的深度视觉表示，模型能准确捕捉艺术风格，搜索结果与查询意图高度契合。\n- **局部定位能力强**：借助 GeM 全局聚合层，即使输入仅为画作的一角，也能高效召回包含该特征的完整高分辨率图像。\n- **排序效果显著提升**：采用直接优化平均精度（Average Precision）的列表级损失函数，在 Oxford5K 等基准测试中将 mAP 提升至 90% 以上，大幅改善用户体验。\n- **训练流程简化**：整个网络（包括聚合层）均可微分，支持端到端训练，团队能快速针对特定数据集进行微调并部署。\n\ndeep-image-retrieval 通过将检索指标直接融入端到端训练，彻底解决了传统方法在细粒度图像搜索中“查不准、排不好”的核心难题。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnaver_deep-image-retrieval_a78d510d.png","naver","NAVER","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fnaver_b4681208.png","",null,"opensource@navercorp.com","http:\u002F\u002Fdevelopers.naver.com","https:\u002F\u002Fgithub.com\u002Fnaver",[82],{"name":83,"color":84,"percentage":85},"Python","#3572A5",100,681,102,"2026-04-12T07:08:04","BSD-3-Clause","未说明","非必需（支持 CPU，通过 --gpu -1 指定），若使用 GPU 需兼容 PyTorch 的 NVIDIA 显卡，具体型号和显存未说明",{"notes":93,"python":94,"dependencies":95},"该工具基于 PyTorch 1.4+ 和 Python 3.7.3 测试。可通过 conda 安装主要依赖。支持自动下载 Oxford5K、Paris6K 等基准数据集。若处理大规模数据或使用 Kapture 格式，需额外安装 kapture 库。模型训练和推理均支持端到端微分，包含 GeM 池化层及多种损失函数（如 Triplet Loss, AP Loss）。","3.7+",[96,97,98,99,100,101,102],"torch>=1.4","numpy","matplotlib","tqdm","scikit-learn","torchvision","kapture (可选)",[15,104],"其他","2026-03-27T02:49:30.150509","2026-04-16T10:47:42.523133",[108,113,118,123,128,132],{"id":109,"question_zh":110,"answer_zh":111,"source_url":112},35738,"在 Oxford\u002FParis 数据集上计算平均精度（AP）时，应该使用哪种方法？代码中使用的 sklearn 方法与标准方法有何不同？","Oxford\u002FParis 数据集及其修订版（Revisited）的标准惯例是使用“插值法”（interpolation method），即通过平均两个相邻的精度点然后乘以召回率步长来计算。这与 `sklearn.metrics.average_precision_score` 使用的“有限和”（finite sum）方法不同，后者通常会导致更高的 AP 值。维护者已根据社区反馈更新了代码和 README 中的数值，以采用标准的插值法进行计算。","https:\u002F\u002Fgithub.com\u002Fnaver\u002Fdeep-image-retrieval\u002Fissues\u002F8",{"id":114,"question_zh":115,"answer_zh":116,"source_url":117},35739,"运行特征提取脚本时出现 'KeyError: Lankmarks_clean' 错误，如何解决？","这是一个拼写错误。命令行参数 `--whiten` 后面的数据集名称应为 `Landmarks_clean`（注意 'Landmarks' 中有 'd'），而不是 'Lankmarks_clean'。请修正命令如下：\n`python -m dirtorch.extract_features --dataset 'ImageList(\"dirtorch\u002Fimage.txt\")' --checkpoint dirtorch\u002Fdata\u002FResnet-101-AP-GeM.pt --output dirtorch\u002Fdata\u002Fresults --whiten Landmarks_clean --whitenp 0.25 --gpu 0`\n此外，请确保模型文件名与命令行中指定的一致（注意连字符的使用，如 `Resnet-101-AP-GeM.pt`）。","https:\u002F\u002Fgithub.com\u002Fnaver\u002Fdeep-image-retrieval\u002Fissues\u002F6",{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},35740,"如何获取用于训练的 Landmarks-clean 数据集？原始链接似乎已失效。","官方并未直接提供清洗后的 Landmarks-clean 数据集。原始数据集的部分图片链接已失效。建议的获取途径包括：\n1. 尝试联系原始 Landmarks 数据集的作者获取数据。\n2. 参考相关论文或项目（如 Radenovic 等人的工作）中提供的数据下载指引。\n3. 社区用户曾尝试重新实现并清洗数据，但目前仓库维护者手中也没有该清洗版本，若有用户获取到，欢迎共享。","https:\u002F\u002Fgithub.com\u002Fnaver\u002Fdeep-image-retrieval\u002Fissues\u002F1",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},35741,"在 Google Landmarks 2018 数据集上训练的 GeM 池化参数 `p` 最终收敛到的值是多少？","在使用 ResNet-101 架构并在 Google Landmarks 2018 数据集（模型代号 `Resnet101-AP-GeM-LM18`）上进行训练的实验中，GeM 池化的 `p` 参数最终收敛到了 `2.99`。","https:\u002F\u002Fgithub.com\u002Fnaver\u002Fdeep-image-retrieval\u002Fissues\u002F9",{"id":129,"question_zh":130,"answer_zh":131,"source_url":117},35742,"测试 RParis6K 数据集时，如何配置本地路径以避免重复下载？需要额外的文件吗？","如果数据集已存储在 `$DB_ROOT\u002Fparis6k` 目录下，下载器会自动检测到并跳过下载过程。\n注意：代码使用的是由 Radenovic 等人提供的真值文件（ground-truth file），你需要单独下载该文件并将其放在同一目录下。下载地址为：http:\u002F\u002Fcmp.felk.cvut.cz\u002Fcnnimageretrieval\u002Fdata\u002Ftest\u002Fparis6k\u002Fgnd_paris6k.pkl",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},35743,"在实现多阶段反向传播（multi-staged backpropagation）时，模型中的原地操作（in-place operation）导致梯度计算失败，如何修复？","在 `dirtorch\u002Fnets` 目录下的网络定义文件（如 `rmac_resnet_fpn.py`, `rmac_resnet.py`, `rmac_resnext.py`）中，存在使用原地操作 `.squeeze_()` 的问题，这会阻碍梯度的正确计算。\n解决方案是将原地操作改为非原地操作。例如，将代码：\n`x.squeeze_()`\n修改为：\n`x = x.squeeze()`\n请对所有涉及此类操作的地方进行相同修改。","https:\u002F\u002Fgithub.com\u002Fnaver\u002Fdeep-image-retrieval\u002Fissues\u002F19",[]]