[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-robodhruv--visualnav-transformer":3,"tool-robodhruv--visualnav-transformer":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",141543,2,"2026-04-06T11:32:54",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":77,"owner_email":78,"owner_twitter":76,"owner_website":79,"owner_url":80,"languages":81,"stars":90,"forks":91,"last_commit_at":92,"license":93,"difficulty_score":94,"env_os":95,"env_gpu":96,"env_ram":97,"env_deps":98,"category_tags":108,"github_topics":76,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":110,"updated_at":111,"faqs":112,"releases":147},4646,"robodhruv\u002Fvisualnav-transformer","visualnav-transformer","Official code and checkpoint release for mobile robot foundation models: GNM, ViNT, and NoMaD.","visualnav-transformer 是伯克利人工智能研究实验室开源的移动机器人基础模型套件，核心包含 GNM、ViNT 和 NoMaD 三大模型。它旨在解决传统机器人导航策略泛化能力弱、难以跨机型复用的痛点。通过在海量的跨形态机器人数据上进行训练，visualnav-transformer 能够作为通用的“视觉导航大脑”，让机器人在无需额外训练（零样本）的情况下，仅凭摄像头画面即可理解环境并导航至指定目标。\n\n该项目最大的技术亮点在于其强大的迁移能力：不仅支持直接控制多种不同构型的机器人，还允许开发者利用少量新数据进行高效微调，快速适配特定任务或全新硬件。此外，NoMaD 模型创新性地引入了目标掩码扩散策略，显著提升了机器人在复杂未知环境中的探索效率。\n\nvisualnav-transformer 主要面向机器人领域的研究人员与开发者。仓库提供了完整的训练脚本、数据处理工具以及在 TurtleBot 等主流平台上的部署示例，帮助用户轻松复现前沿论文成果，或基于此构建自定义的自主导航系统。无论是希望验证新算法的学者，还是致力于开发通用移动机器人的工程师，都能从中获得强有力的支持。","# General Navigation Models: GNM, ViNT and NoMaD\n\n**Contributors**: Dhruv Shah, Ajay Sridhar, Nitish Dashora, Catherine Glossop, Kyle Stachowicz, Arjun Bhorkar, Kevin Black, Noriaki Hirose, Sergey Levine\n\n_Berkeley AI Research_\n\n[Project Page](https:\u002F\u002Fgeneral-navigation-models.github.io) | [Citing](https:\u002F\u002Fgithub.com\u002Frobodhruv\u002Fvisualnav-transformer#citing) | [Pre-Trained Models](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1a9yWR2iooXFAqjQHetz263--4_2FFggg?usp=sharing)\n\n---\n\nGeneral Navigation Models are general-purpose goal-conditioned visual navigation policies trained on diverse, cross-embodiment training data, and can control many different robots in zero-shot. They can also be efficiently fine-tuned, or adapted, to new robots and downstream tasks. Our family of models is described in the following research papers (and growing):\n1. [GNM: A General Navigation Model to Drive Any Robot](https:\u002F\u002Fsites.google.com\u002Fview\u002Fdrive-any-robot) (_October 2022_, presented at ICRA 2023)\n2. [ViNT: A Foundation Model for Visual Navigation](https:\u002F\u002Fgeneral-navigation-models.github.io\u002Fvint\u002Findex.html) (_June 2023_, presented at CoRL 2023)\n3. [NoMaD: Goal Masking Diffusion Policies for Navigation and Exploration](https:\u002F\u002Fgeneral-navigation-models.github.io\u002Fnomad\u002Findex.html) (_October 2023_)\n\n## Overview\nThis repository contains code for training our family of models with your own data, pre-trained model checkpoints, as well as example code to deploy it on a TurtleBot2\u002FLoCoBot robot. The repository follows the organization from [GNM](https:\u002F\u002Fgithub.com\u002FPrieureDeSion\u002Fdrive-any-robot).\n\n- `.\u002Ftrain\u002Ftrain.py`: training script to train or fine-tune the ViNT model on your custom data.\n- `.\u002Ftrain\u002Fvint_train\u002Fmodels\u002F`: contains model files for GNM, ViNT, and some baselines.\n- `.\u002Ftrain\u002Fprocess_*.py`: scripts to process rosbags or other formats of robot trajectories into training data.\n- `.\u002Fdeployment\u002Fsrc\u002Frecord_bag.sh`: script to collect a demo trajectory as a ROS bag in the target environment on the robot. This trajectory is subsampled to generate a topological graph of the environment.\n- `.\u002Fdeployment\u002Fsrc\u002Fcreate_topomap.sh`: script to convert a ROS bag of a demo trajectory into a topological graph that the robot can use to navigate.\n- `.\u002Fdeployment\u002Fsrc\u002Fnavigate.sh`: script that deploys a trained GNM\u002FViNT\u002FNoMaD model on the robot to navigate to a desired goal in the generated topological graph. Please see relevant sections below for configuration settings.\n- `.\u002Fdeployment\u002Fsrc\u002Fexplore.sh`: script that deploys a trained NoMaD model on the robot to randomly explore its environment. Please see relevant sections below for configuration settings.\n\n## Train\n\nThis subfolder contains code for processing datasets and training models from your own data.\n\n### Pre-requisites\n\nThe codebase assumes access to a workstation running Ubuntu (tested on 18.04 and 20.04), Python 3.7+, and a GPU with CUDA 10+. It also assumes access to conda, but you can modify it to work with other virtual environment packages, or a native setup.\n### Setup\nRun the commands below inside the `vint_release\u002F` (topmost) directory:\n1. Set up the conda environment:\n    ```bash\n    conda env create -f train\u002Ftrain_environment.yml\n    ```\n2. Source the conda environment:\n    ```\n    conda activate vint_train\n    ```\n3. Install the vint_train packages:\n    ```bash\n    pip install -e train\u002F\n    ```\n4. Install the `diffusion_policy` package from this [repo](https:\u002F\u002Fgithub.com\u002Freal-stanford\u002Fdiffusion_policy):\n    ```bash\n    git clone git@github.com:real-stanford\u002Fdiffusion_policy.git\n    pip install -e diffusion_policy\u002F\n    ```\n\n\n### Data-Wrangling\nIn the [papers](https:\u002F\u002Fgeneral-navigation-models.github.io), we train on a combination of publicly available and unreleased datasets. Below is a list of publicly available datasets used for training; please contact the respective authors for access to the unreleased data.\n- [RECON](https:\u002F\u002Fsites.google.com\u002Fview\u002Frecon-robot\u002Fdataset)\n- [TartanDrive](https:\u002F\u002Fgithub.com\u002Fcastacks\u002Ftartan_drive)\n- [SCAND](https:\u002F\u002Fwww.cs.utexas.edu\u002F~xiao\u002FSCAND\u002FSCAND.html#Links)\n- [GoStanford2 (Modified)](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1RYseCpbtHEFOsmSX2uqNY_kvSxwZLVP_?usp=sharing)\n- [SACSoN\u002FHuRoN](https:\u002F\u002Fsites.google.com\u002Fview\u002Fsacson-review\u002Fhuron-dataset)\n\nWe recommend you to download these (and any other datasets you may want to train on) and run the processing steps below.\n\n#### Data Processing \n\nWe provide some sample scripts to process these datasets, either directly from a rosbag or from a custom format like HDF5s:\n1. Run `process_bags.py` with the relevant args, or `process_recon.py` for processing RECON HDF5s. You can also manually add your own dataset by following our structure below (if you are adding a custom dataset, please checkout the [Custom Datasets](#custom-datasets) section).\n2. Run `data_split.py` on your dataset folder with the relevant args.\n\nAfter step 1 of data processing, the processed dataset should have the following structure:\n\n```\n├── \u003Cdataset_name>\n│   ├── \u003Cname_of_traj1>\n│   │   ├── 0.jpg\n│   │   ├── 1.jpg\n│   │   ├── ...\n│   │   ├── T_1.jpg\n│   │   └── traj_data.pkl\n│   ├── \u003Cname_of_traj2>\n│   │   ├── 0.jpg\n│   │   ├── 1.jpg\n│   │   ├── ...\n│   │   ├── T_2.jpg\n│   │   └── traj_data.pkl\n│   ...\n└── └── \u003Cname_of_trajN>\n    \t├── 0.jpg\n    \t├── 1.jpg\n    \t├── ...\n        ├── T_N.jpg\n        └── traj_data.pkl\n```  \n\nEach `*.jpg` file contains an forward-facing RGB observation from the robot, and they are temporally labeled. The `traj_data.pkl` file is the odometry data for the trajectory. It’s a pickled dictionary with the keys:\n- `\"position\"`: An np.ndarray [T, 2] of the xy-coordinates of the robot at each image observation.\n- `\"yaw\"`: An np.ndarray [T,] of the yaws of the robot at each image observation.\n\n\nAfter step 2 of data processing, the processed data-split should the following structure inside `vint_release\u002Ftrain\u002Fvint_train\u002Fdata\u002Fdata_splits\u002F`:\n\n```\n├── \u003Cdataset_name>\n│   ├── train\n|   |   └── traj_names.txt\n└── └── test\n        └── traj_names.txt \n``` \n\n### Training your General Navigation Models\nRun this inside the `vint_release\u002Ftrain` directory:\n```bash\npython train.py -c \u003Cpath_of_train_config_file>\n```\nThe premade config yaml files are in the `train\u002Fconfig` directory. \n\n#### Custom Config Files\nYou can use one of the premade yaml files as a starting point and change the values as you need. `config\u002Fvint.yaml` is good choice since it has commented arguments. `config\u002Fdefaults.yaml` contains the default config values (don't directly train with this config file since it does not specify any datasets for training).\n\n#### Custom Datasets\nMake sure your dataset and data-split directory follows the structures provided in the [Data Processing](#data-processing) section. Locate `train\u002Fvint_train\u002Fdata\u002Fdata_config.yaml` and append the following:\n\n```\n\u003Cdataset_name>:\n    metric_waypoints_distance: \u003Caverage_distance_in_meters_between_waypoints_in_the_dataset>\n```\n\nLocate your training config file and add the following text under the `datasets` argument (feel free to change the values of `end_slack`, `goals_per_obs`, and `negative_mining`):\n```\n\u003Cdataset_name>:\n    data_folder: \u003Cpath_to_the_dataset>\n    train: data\u002Fdata_splits\u002F\u003Cdataset_name>\u002Ftrain\u002F \n    test: data\u002Fdata_splits\u002F\u003Cdataset_name>\u002Ftest\u002F \n    end_slack: 0 # how many timesteps to cut off from the end of each trajectory  (in case many trajectories end in collisions)\n    goals_per_obs: 1 # how many goals are sampled per observation\n    negative_mining: True # negative mining from the ViNG paper (Shah et al.)\n```\n\n#### Training your model from a checkpoint\nInstead of training from scratch, you can also load an existing checkpoint from the published results.\nAdd `load_run: \u003Cproject_name>\u002F\u003Clog_run_name>`to your .yaml config file in `vint_release\u002Ftrain\u002Fconfig\u002F`. The `*.pth` of the file you are loading to be saved in this file structure and renamed to “latest”: `vint_release\u002Ftrain\u002Flogs\u002F\u003Cproject_name>\u002F\u003Clog_run_name>\u002Flatest.pth`. This makes it easy to train from the checkpoint of a previous run since logs are saved this way by default. Note: if you are loading a checkpoint from a previous run, check for the name the run in the `vint_release\u002Ftrain\u002Flogs\u002F\u003Cproject_name>\u002F`, since the code appends a string of the date to each run_name specified in the config yaml file of the run to avoid duplicate run names. \n\n\nIf you want to use our checkpoints, you can download the `*.pth` files from [this link](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1a9yWR2iooXFAqjQHetz263--4_2FFggg?usp=sharing).\n\n\n## Deployment\nThis subfolder contains code to load a pre-trained ViNT and deploy it on the open-source [LoCoBot indoor robot platform](http:\u002F\u002Fwww.locobot.org\u002F) with a [NVIDIA Jetson Orin Nano](https:\u002F\u002Fwww.amazon.com\u002FNVIDIA-Jetson-Orin-Nano-Developer\u002Fdp\u002FB0BZJTQ5YP\u002Fref=asc_df_B0BZJTQ5YP\u002F?tag=hyprod-20&linkCode=df0&hvadid=652427572954&hvpos=&hvnetw=g&hvrand=12520404772764575478&hvpone=&hvptwo=&hvqmt=&hvdev=c&hvdvcmdl=&hvlocint=&hvlocphy=1013585&hvtargid=pla-2112361227514&psc=1&gclid=CjwKCAjw4P6oBhBsEiwAKYVkq7dqJEwEPz0K-H33oN7MzjO0hnGcAJDkx2RdT43XZHdSWLWHKDrODhoCmnoQAvD_BwE). It can be easily adapted to be run on alternate robots, and researchers have been able to independently deploy it on the following robots – Clearpath Jackal, DJI Tello, Unitree A1, TurtleBot2, Vizbot – and in simulated environments like CARLA.\n\n### LoCoBot Setup\n\nThis software was tested on a LoCoBot running Ubuntu 20.04.\n\n\n#### Software Installation (in this order)\n1. ROS: [ros-noetic](https:\u002F\u002Fwiki.ros.org\u002Fnoetic\u002FInstallation\u002FUbuntu)\n2. ROS packages: \n    ```bash\n    sudo apt-get install ros-noetic-usb-cam ros-noetic-joy\n    ```\n3. [kobuki](http:\u002F\u002Fwiki.ros.org\u002Fkobuki\u002FTutorials\u002FInstallation)\n4. Conda \n    - Install anaconda\u002Fminiconda\u002Fetc. for managing environments\n    - Make conda env with environment.yml (run this inside the `vint_release\u002F` directory)\n        ```bash\n        conda env create -f deployment\u002Fdeployment_environment.yaml\n        ```\n    - Source env \n        ```bash\n        conda activate vint_deployment\n        ```\n    - (Recommended) add to `~\u002F.bashrc`: \n        ```bash\n        echo “conda activate vint_deployment” >> ~\u002F.bashrc \n        ```\n5. Install the `vint_train` packages (run this inside the `vint_release\u002F` directory):\n    ```bash\n    pip install -e train\u002F\n    ```\n6. Install the `diffusion_policy` package from this [repo](https:\u002F\u002Fgithub.com\u002Freal-stanford\u002Fdiffusion_policy):\n    ```bash\n    git clone git@github.com:real-stanford\u002Fdiffusion_policy.git\n    pip install -e diffusion_policy\u002F\n    ```\n7. (Recommended) Install [tmux](https:\u002F\u002Fgithub.com\u002Ftmux\u002Ftmux\u002Fwiki\u002FInstalling) if not present.\n    Many of the bash scripts rely on tmux to launch multiple screens with different commands. This will be useful for debugging because you can see the output of each screen.\n\n#### Hardware Requirements\n- LoCoBot: http:\u002F\u002Flocobot.org (just the navigation stack)\n- A wide-angle RGB camera: [Example](https:\u002F\u002Fwww.amazon.com\u002FELP-170degree-Fisheye-640x480-Resolution\u002Fdp\u002FB00VTHD17W). The `vint_locobot.launch` file uses camera parameters that work with cameras like the ELP fisheye wide angle, feel free to modify to your own. Adjust the camera parameters in `vint_release\u002Fdeployment\u002Fconfig\u002Fcamera.yaml` your camera accordingly (used for visualization).\n- [Joystick](https:\u002F\u002Fwww.amazon.com\u002FLogitech-Wireless-Nano-Receiver-Controller-Vibration\u002Fdp\u002FB0041RR0TW)\u002F[keyboard teleop](http:\u002F\u002Fwiki.ros.org\u002Fteleop_twist_keyboard) that works with Linux. Add the index mapping for the _deadman_switch_ on the joystick to the `vint_release\u002Fdeployment\u002Fconfig\u002Fjoystick.yaml`. You can find the mapping from buttons to indices for common joysticks in the [wiki](https:\u002F\u002Fwiki.ros.org\u002Fjoy). \n\n\n### Loading the model weights\n\nSave the model weights *.pth file in `vint_release\u002Fdeployment\u002Fmodel_weights` folder. Our model's weights are in [this link](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1a9yWR2iooXFAqjQHetz263--4_2FFggg?usp=sharing).\n\n### Collecting a Topological Map\n\n_Make sure to run these scripts inside the `vint_release\u002Fdeployment\u002Fsrc\u002F` directory._\n\n\nThis section discusses a simple way to create a topological map of the target environment for deployment. For simplicity, we will use the robot in “path-following” mode, i.e. given a single trajectory in an environment, the task is to follow the same trajectory to the goal. The environment may have new\u002Fdynamic obstacles, lighting variations etc.\n\n#### Record the rosbag: \n```bash\n.\u002Frecord_bag.sh \u003Cbag_name>\n```\n\nRun this command to teleoperate the robot with the joystick and camera. This command opens up three windows \n1. `roslaunch vint_locobot.launch`: This launch file opens the `usb_cam` node for the camera, the joy node for the joystick, and nodes for the robot’s mobile base.\n2. `python joy_teleop.py`: This python script starts a node that reads inputs from the joy topic and outputs them on topics that teleoperate the robot’s base.\n3. `rosbag record \u002Fusb_cam\u002Fimage_raw -o \u003Cbag_name>`: This command isn’t run immediately (you have to click Enter). It will be run in the vint_release\u002Fdeployment\u002Ftopomaps\u002Fbags directory, where we recommend you store your rosbags.\n\nOnce you are ready to record the bag, run the `rosbag record` script and teleoperate the robot on the map you want the robot to follow. When you are finished with recording the path, kill the `rosbag record` command, and then kill the tmux session.\n\n#### Make the topological map: \n```bash\n.\u002Fcreate_topomap.sh \u003Ctopomap_name> \u003Cbag_filename>\n```\n\nThis command opens up 3 windows:\n1. `roscore`\n2. `python create_topomap.py —dt 1 —dir \u003Ctopomap_dir>`: This command creates a directory in `\u002Fvint_release\u002Fdeployment\u002Ftopomaps\u002Fimages` and saves an image as a node in the map every second the bag is played.\n3. `rosbag play -r 1.5 \u003Cbag_filename>`: This command plays the rosbag at x5 speed, so the python script is actually recording nodes 1.5 seconds apart. The `\u003Cbag_filename>` should be the entire bag name with the .bag extension. You can change this value in the `make_topomap.sh` file. The command does not run until you hit Enter, which you should only do once the python script gives its waiting message. Once you play the bag, move to the screen where the python script is running so you can kill it when the rosbag stops playing.\n\nWhen the bag stops playing, kill the tmux session.\n\n\n### Running the model \n#### Navigation\n_Make sure to run this script inside the `vint_release\u002Fdeployment\u002Fsrc\u002F` directory._\n\n```bash\n.\u002Fnavigate.sh “--model \u003Cmodel_name> --dir \u003Ctopomap_dir>”\n```\n\nTo deploy one of the models from the published results, we are releasing model checkpoints that you can download from [this link](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1a9yWR2iooXFAqjQHetz263--4_2FFggg?usp=sharing).\n\n\nThe `\u003Cmodel_name>` is the name of the model in the `vint_release\u002Fdeployment\u002Fconfig\u002Fmodels.yaml` file. In this file, you specify these parameters of the model for each model (defaults used):\n- `config_path` (str): path of the *.yaml file in `vint_release\u002Ftrain\u002Fconfig\u002F` used to train the model\n- `ckpt_path` (str): path of the *.pth file in `vint_release\u002Fdeployment\u002Fmodel_weights\u002F`\n\n\nMake sure these configurations match what you used to train the model. The configurations for the models we provided the weights for are provided in yaml file for your reference.\n\nThe `\u003Ctopomap_dir>` is the name of the directory in `vint_release\u002Fdeployment\u002Ftopomaps\u002Fimages` that has the images corresponding to the nodes in the topological map. The images are ordered by name from 0 to N.\n\nThis command opens up 4 windows:\n\n1. `roslaunch vint_locobot.launch`: This launch file opens the usb_cam node for the camera, the joy node for the joystick, and several nodes for the robot’s mobile base).\n2. `python navigate.py --model \u003Cmodel_name> -—dir \u003Ctopomap_dir>`: This python script starts a node that reads in image observations from the `\u002Fusb_cam\u002Fimage_raw` topic, inputs the observations and the map into the model, and publishes actions to the `\u002Fwaypoint` topic.\n3. `python joy_teleop.py`: This python script starts a node that reads inputs from the joy topic and outputs them on topics that teleoperate the robot’s base.\n4. `python pd_controller.py`: This python script starts a node that reads messages from the `\u002Fwaypoint` topic (waypoints from the model) and outputs velocities to navigate the robot’s base.\n\nWhen the robot is finishing navigating, kill the `pd_controller.py` script, and then kill the tmux session. If you want to take control of the robot while it is navigating, the `joy_teleop.py` script allows you to do so with the joystick.\n\n#### Exploration\n_Make sure to run this script inside the `vint_release\u002Fdeployment\u002Fsrc\u002F` directory._\n\n```bash\n.\u002Fexploration.sh “--model \u003Cmodel_name>”\n```\n\nTo deploy one of the models from the published results, we are releasing model checkpoints that you can download from [this link](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1a9yWR2iooXFAqjQHetz263--4_2FFggg?usp=sharing).\n\n\nThe `\u003Cmodel_name>` is the name of the model in the `vint_release\u002Fdeployment\u002Fconfig\u002Fmodels.yaml` file (note that only NoMaD works for exploration). In this file, you specify these parameters of the model for each model (defaults used):\n- `config_path` (str): path of the *.yaml file in `vint_release\u002Ftrain\u002Fconfig\u002F` used to train the model\n- `ckpt_path` (str): path of the *.pth file in `vint_release\u002Fdeployment\u002Fmodel_weights\u002F`\n\n\nMake sure these configurations match what you used to train the model. The configurations for the models we provided the weights for are provided in yaml file for your reference.\n\nThe `\u003Ctopomap_dir>` is the name of the directory in `vint_release\u002Fdeployment\u002Ftopomaps\u002Fimages` that has the images corresponding to the nodes in the topological map. The images are ordered by name from 0 to N.\n\nThis command opens up 4 windows:\n\n1. `roslaunch vint_locobot.launch`: This launch file opens the usb_cam node for the camera, the joy node for the joystick, and several nodes for the robot’s mobile base.\n2. `python explore.py --model \u003Cmodel_name>`: This python script starts a node that reads in image observations from the `\u002Fusb_cam\u002Fimage_raw` topic, inputs the observations and the map into the model, and publishes exploration actions to the `\u002Fwaypoint` topic.\n3. `python joy_teleop.py`: This python script starts a node that reads inputs from the joy topic and outputs them on topics that teleoperate the robot’s base.\n4. `python pd_controller.py`: This python script starts a node that reads messages from the `\u002Fwaypoint` topic (waypoints from the model) and outputs velocities to navigate the robot’s base.\n\nWhen the robot is finishing navigating, kill the `pd_controller.py` script, and then kill the tmux session. If you want to take control of the robot while it is navigating, the `joy_teleop.py` script allows you to do so with the joystick.\n\n\n### Adapting this code to different robots\n\nWe hope that this codebase is general enough to allow you to deploy it to your favorite ROS-based robots. You can change the robot configuration parameters in `vint_release\u002Fdeployment\u002Fconfig\u002Frobot.yaml`, like the max angular and linear velocities of the robot and the topics to publish to teleop and control the robot. Please feel free to create a Github Issue or reach out to the authors at shah@cs.berkeley.edu.\n\n\n## Citing\n```\n@inproceedings{shah2022gnm,\n  author    = {Dhruv Shah and Ajay Sridhar and Arjun Bhorkar and Noriaki Hirose and Sergey Levine},\n  title     = {{GNM: A General Navigation Model to Drive Any Robot}},\n  booktitle = {International Conference on Robotics and Automation (ICRA)},\n  year      = {2023},\n  url       = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.03370}\n}\n\n@inproceedings{shah2023vint,\n  title     = {Vi{NT}: A Foundation Model for Visual Navigation},\n  author    = {Dhruv Shah and Ajay Sridhar and Nitish Dashora and Kyle Stachowicz and Kevin Black and Noriaki Hirose and Sergey Levine},\n  booktitle = {7th Annual Conference on Robot Learning},\n  year      = {2023},\n  url       = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.14846}\n}\n\n@article{sridhar2023nomad,\n  author  = {Ajay Sridhar and Dhruv Shah and Catherine Glossop and Sergey Levine},\n  title   = {{NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration}},\n  journal = {arXiv pre-print},\n  year    = {2023},\n  url     = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.xxxx}\n}\n```\n","# 通用导航模型：GNM、ViNT 和 NoMaD\n\n**贡献者**: Dhruv Shah, Ajay Sridhar, Nitish Dashora, Catherine Glossop, Kyle Stachowicz, Arjun Bhorkar, Kevin Black, Noriaki Hirose, Sergey Levine\n\n_伯克利人工智能研究组_\n\n[项目页面](https:\u002F\u002Fgeneral-navigation-models.github.io) | [引用](https:\u002F\u002Fgithub.com\u002Frobodhruv\u002Fvisualnav-transformer#citing) | [预训练模型](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1a9yWR2iooXFAqjQHetz263--4_2FFggg?usp=sharing)\n\n---\n\n通用导航模型是一类通用的目标条件视觉导航策略，它们基于多样化的跨机器人平台训练数据进行训练，能够在零样本条件下控制多种不同的机器人。此外，这些模型还可以高效地进行微调或适配，以应对新的机器人和下游任务。我们的一系列模型已在以下研究论文中有所介绍（且仍在不断扩展）：\n1. [GNM：一种可驱动任意机器人的通用导航模型](https:\u002F\u002Fsites.google.com\u002Fview\u002Fdrive-any-robot)（2022年10月，于ICRA 2023会议上发表）\n2. [ViNT：用于视觉导航的基础模型](https:\u002F\u002Fgeneral-navigation-models.github.io\u002Fvint\u002Findex.html)（2023年6月，于CoRL 2023会议上发表）\n3. [NoMaD：用于导航与探索的目标掩码扩散策略](https:\u002F\u002Fgeneral-navigation-models.github.io\u002Fnomad\u002Findex.html)（2023年10月）\n\n## 概述\n本仓库包含使用您自己的数据训练我们系列模型的代码、预训练模型检查点，以及在TurtleBot2\u002FLoCoBot机器人上部署这些模型的示例代码。该仓库的组织结构沿用了[GNM](https:\u002F\u002Fgithub.com\u002FPrieureDeSion\u002Fdrive-any-robot)的设计。\n\n- `.\u002Ftrain\u002Ftrain.py`：用于在您的自定义数据上训练或微调ViNT模型的训练脚本。\n- `.\u002Ftrain\u002Fvint_train\u002Fmodels\u002F`：包含GNM、ViNT及一些基线模型的文件。\n- `.\u002Ftrain\u002Fprocess_*.py`：用于将rosbag或其他格式的机器人轨迹处理为训练数据的脚本。\n- `.\u002Fdeployment\u002Fsrc\u002Frecord_bag.sh`：用于在目标环境中收集演示轨迹并将其保存为ROS bag文件的脚本。该轨迹会被下采样以生成环境的拓扑图。\n- `.\u002Fdeployment\u002Fsrc\u002Fcreate_topomap.sh`：用于将演示轨迹的ROS bag文件转换为机器人可用于导航的拓扑图的脚本。\n- `.\u002Fdeployment\u002Fsrc\u002Fnavigate.sh`：用于在机器人上部署训练好的GNM\u002FViNT\u002FNoMaD模型，使其根据生成的拓扑图导航到指定目标的脚本。请参阅下方的相关部分以了解配置设置。\n- `.\u002Fdeployment\u002Fsrc\u002Fexplore.sh`：用于在机器人上部署训练好的NoMaD模型，使其随机探索周围环境的脚本。请参阅下方的相关部分以了解配置设置。\n\n## 训练\n\n此子文件夹包含处理数据集以及使用您自己的数据训练模型的代码。\n\n### 前置条件\n\n该代码库假设您拥有一台运行Ubuntu系统的工作站（已测试过18.04和20.04版本），Python 3.7及以上版本，以及配备CUDA 10+的GPU。此外，还假设有conda环境，但您可以对其进行修改以适应其他虚拟环境工具或原生安装方式。\n\n### 设置\n请在`vint_release\u002F`（最顶层）目录下执行以下命令：\n1. 创建conda环境：\n    ```bash\n    conda env create -f train\u002Ftrain_environment.yml\n    ```\n2. 激活conda环境：\n    ```\n    conda activate vint_train\n    ```\n3. 安装vint_train相关包：\n    ```bash\n    pip install -e train\u002F\n    ```\n4. 从这个[仓库](https:\u002F\u002Fgithub.com\u002Freal-stanford\u002Fdiffusion_policy)安装`diffusion_policy`包：\n    ```bash\n    git clone git@github.com:real-stanford\u002Fdiffusion_policy.git\n    pip install -e diffusion_policy\u002F\n    ```\n\n### 数据处理\n\n在我们的[论文](https:\u002F\u002Fgeneral-navigation-models.github.io)中，我们结合了公开可用和未公开的数据集进行训练。以下是用于训练的公开数据集列表；如需访问未公开的数据，请联系相应作者。\n- [RECON](https:\u002F\u002Fsites.google.com\u002Fview\u002Frecon-robot\u002Fdataset)\n- [TartanDrive](https:\u002F\u002Fgithub.com\u002Fcastacks\u002Ftartan_drive)\n- [SCAND](https:\u002F\u002Fwww.cs.utexas.edu\u002F~xiao\u002FSCAND\u002FSCAND.html#Links)\n- [GoStanford2（修改版）](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1RYseCpbtHEFOsmSX2uqNY_kvSxwZLVP_?usp=sharing)\n- [SACSoN\u002FHuRoN](https:\u002F\u002Fsites.google.com\u002Fview\u002Fsacson-review\u002Fhuron-dataset)\n\n我们建议您下载这些（以及您可能希望用于训练的其他数据集）并按照以下步骤进行数据处理。\n\n#### 数据处理\n\n我们提供了一些示例脚本，用于直接从rosbag或自定义格式（如HDF5）处理这些数据集：\n1. 使用相关参数运行`process_bags.py`，或运行`process_recon.py`来处理RECON的HDF5文件。您也可以按照我们下面的结构手动添加自己的数据集（如果您要添加自定义数据集，请参阅[自定义数据集](#custom-datasets)部分）。\n2. 在您的数据集文件夹中使用相关参数运行`data_split.py`。\n\n完成第一步数据处理后，处理后的数据集应具有如下结构：\n\n```\n├── \u003Cdataset_name>\n│   ├── \u003Cname_of_traj1>\n│   │   ├── 0.jpg\n│   │   ├── 1.jpg\n│   │   ├── ...\n│   │   ├── T_1.jpg\n│   │   └── traj_data.pkl\n│   ├── \u003Cname_of_traj2>\n│   │   ├── 0.jpg\n│   │   ├── 1.jpg\n│   │   ├── ...\n│   │   ├── T_2.jpg\n│   │   └── traj_data.pkl\n│   ...\n└── └── \u003Cname_of_trajN>\n    \t├── 0.jpg\n    \t├── 1.jpg\n    \t├── ...\n        ├── T_N.jpg\n        └── traj_data.pkl\n```  \n\n每个`*.jpg`文件都包含机器人前方的RGB观测图像，并按时间顺序进行了标记。`traj_data.pkl`文件则存储了该轨迹的里程计数据，它是一个包含以下键的pickle字典：\n- `\"position\"`：一个形状为[T, 2]的np.ndarray，记录了机器人在每张图像观测时刻的xy坐标。\n- `\"yaw\"`：一个形状为[T,]的np.ndarray，记录了机器人在每张图像观测时刻的偏航角。\n\n\n完成第二步数据处理后，处理后的数据分割应在`vint_release\u002Ftrain\u002Fvint_train\u002Fdata\u002Fdata_splits\u002F`目录下呈现如下结构：\n\n```\n├── \u003Cdataset_name>\n│   ├── train\n|   |   └── traj_names.txt\n└── └── test\n        └── traj_names.txt \n```\n\n### 训练你的通用导航模型\n在 `vint_release\u002Ftrain` 目录下运行以下命令：\n```bash\npython train.py -c \u003C训练配置文件路径>\n```\n预先准备好的配置 YAML 文件位于 `train\u002Fconfig` 目录中。\n\n#### 自定义配置文件\n你可以以其中一个预设的 YAML 文件为起点，根据需要修改其中的参数。`config\u002Fvint.yaml` 是一个不错的选择，因为它包含了注释说明。而 `config\u002Fdefaults.yaml` 则包含了默认的配置值（请勿直接使用此配置文件进行训练，因为它未指定任何用于训练的数据集）。\n\n#### 自定义数据集\n确保你的数据集和数据划分目录遵循【数据处理】部分中提供的结构。找到 `train\u002Fvint_train\u002Fdata\u002Fdata_config.yaml` 文件，并添加以下内容：\n\n```\n\u003C数据集名称>:\n    metric_waypoints_distance: \u003C该数据集中航点之间的平均距离（米）>\n```\n\n然后找到你的训练配置文件，在 `datasets` 参数下添加以下内容（你可以根据需要调整 `end_slack`、`goals_per_obs` 和 `negative_mining` 的值）：\n```\n\u003C数据集名称>:\n    data_folder: \u003C数据集路径>\n    train: data\u002Fdata_splits\u002F\u003C数据集名称>\u002Ftrain\u002F \n    test: data\u002Fdata_splits\u002F\u003C数据集名称>\u002Ftest\u002F \n    end_slack: 0 # 从每条轨迹末尾截取的时间步数（以防许多轨迹因碰撞而终止）\n    goals_per_obs: 1 # 每次观测采样的目标数量\n    negative_mining: True # 来自 ViNG 论文中的负样本挖掘方法（Shah 等人）\n```\n\n#### 从检查点继续训练模型\n你也可以不从头开始训练，而是加载已发布的成果中的现有检查点。在 `vint_release\u002Ftrain\u002Fconfig\u002F` 目录下的 YAML 配置文件中添加 `load_run: \u003C项目名称>\u002F\u003C日志运行名称>`。要加载的 `.pth` 文件应保存在以下目录结构中，并重命名为“latest”：`vint_release\u002Ftrain\u002Flogs\u002F\u003C项目名称>\u002F\u003C日志运行名称>\u002Flatest.pth`。这样可以方便地从先前运行的检查点继续训练，因为日志默认就是按这种方式保存的。注意：如果你是从之前的运行中加载检查点，请先在 `vint_release\u002Ftrain\u002Flogs\u002F\u003C项目名称>\u002F` 中确认运行名称，因为代码会在每个运行的配置 YAML 文件中指定的运行名称后附加日期字符串，以避免运行名称重复。\n\n如果你想使用我们的检查点，可以从[这个链接](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1a9yWR2iooXFAqjQHetz263--4_2FFggg?usp=sharing)下载 `.pth` 文件。\n\n## 部署\n本子文件夹包含用于加载预训练的 ViNT 并将其部署到开源 [LoCoBot 室内机器人平台](http:\u002F\u002Fwww.locobot.org\u002F)上的代码，该平台搭载了 [NVIDIA Jetson Orin Nano](https:\u002F\u002Fwww.amazon.com\u002FNVIDIA-Jetson-Orin-Nano-Developer\u002Fdp\u002FB0BZJTQ5YP\u002Fref=asc_df_B0BZJTQ5YP\u002F?tag=hyprod-20&linkCode=df0&hvadid=652427572954&hvpos=&hvnetw=g&hvrand=12520404772764575478&hvpone=&hvptwo=&hvqmt=&hvdev=c&hvdvcmdl=&hvlocint=&hvlocphy=1013585&hvtargid=pla-2112361227514&psc=1&gclid=CjwKCAjw4P6oBhBsEiwAKYVkq7dqJEwEPz0K-H33oN7MzjO0hnGcAJDkx2RdT43XZHdSWLWHKDrODhoCmnoQAvD_BwE)。该软件可以轻松适配到其他机器人上，研究人员已成功将其独立部署到以下机器人及仿真环境中：Clearpath Jackal、DJI Tello、Unitree A1、TurtleBot2、Vizbot，以及 CARLA 等仿真环境。\n\n### LoCoBot 设置\n\n本软件已在运行 Ubuntu 20.04 的 LoCoBot 上进行了测试。\n\n#### 软件安装（按顺序）\n1. ROS：[ros-noetic](https:\u002F\u002Fwiki.ros.org\u002Fnoetic\u002FInstallation\u002FUbuntu)\n2. ROS 软件包：\n    ```bash\n    sudo apt-get install ros-noetic-usb-cam ros-noetic-joy\n    ```\n3. [kobuki](http:\u002F\u002Fwiki.ros.org\u002Fkobuki\u002FTutorials\u002FInstallation)\n4. Conda\n    - 安装 Anaconda、Miniconda 等工具来管理环境\n    - 使用 environment.yml 创建 Conda 环境（在 `vint_release\u002F` 目录下执行）\n        ```bash\n        conda env create -f deployment\u002Fdeployment_environment.yaml\n        ```\n    - 激活环境\n        ```bash\n        conda activate vint_deployment\n        ```\n    - （推荐）将以下内容添加到 `~\u002F.bashrc`：\n        ```bash\n        echo “conda activate vint_deployment” >> ~\u002F.bashrc \n        ```\n5. 安装 `vint_train` 软件包（在 `vint_release\u002F` 目录下执行）：\n    ```bash\n    pip install -e train\u002F\n    ```\n6. 从该 [仓库](https:\u002F\u002Fgithub.com\u002Freal-stanford\u002Fdiffusion_policy) 安装 `diffusion_policy` 包：\n    ```bash\n    git clone git@github.com:real-stanford\u002Fdiffusion_policy.git\n    pip install -e diffusion_policy\u002F\n    ```\n7. （推荐）如果尚未安装，可安装 [tmux](https:\u002F\u002Fgithub.com\u002Ftmux\u002Ftmux\u002Fwiki\u002FInstalling)。许多 Bash 脚本依赖 tmux 启动多个终端窗口并分别运行不同命令。这将有助于调试，因为你可以在每个窗口中查看输出。\n\n#### 硬件要求\n- LoCoBot：http:\u002F\u002Flocobot.org（仅导航栈）\n- 广角 RGB 摄像头：[示例](https:\u002F\u002Fwww.amazon.com\u002FELP-170degree-Fisheye-640x480-Resolution\u002Fdp\u002FB00VTHD17W)。`vint_locobot.launch` 文件使用的摄像头参数适用于类似 ELP 广角鱼眼镜头的设备，你可以根据自己的摄像头进行相应调整。请根据你的摄像头情况，在 `vint_release\u002Fdeployment\u002Fconfig\u002Fcamera.yaml` 中调整摄像头参数（用于可视化）。\n- 可与 Linux 兼容的 [操纵杆](https:\u002F\u002Fwww.amazon.com\u002FLogitech-Wireless-Nano-Receiver-Controller-Vibration\u002Fdp\u002FB0041RR0TW)\u002F[键盘遥控](http:\u002F\u002Fwiki.ros.org\u002Fteleop_twist_keyboard)。将操纵杆上的“死人开关”按键映射添加到 `vint_release\u002Fdeployment\u002Fconfig\u002Fjoystick.yaml` 文件中。常见操纵杆的按键与索引对应关系可在 [ROS 维基](https:\u002F\u002Fwiki.ros.org\u002Fjoy) 中找到。\n\n### 加载模型权重\n\n将模型权重 `.pth` 文件保存到 `vint_release\u002Fdeployment\u002Fmodel_weights` 文件夹中。我们模型的权重可在[此链接](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1a9yWR2iooXFAqjQHetz263--4_2FFggg?usp=sharing)获取。\n\n### 收集拓扑地图\n\n_请确保在 `vint_release\u002Fdeployment\u002Fsrc\u002F` 目录下运行这些脚本。_\n\n\n本节介绍一种简单的方法，用于为目标部署环境创建拓扑地图。为简化起见，我们将机器人置于“路径跟踪”模式：即给定环境中的一条轨迹，任务是沿着该轨迹到达目标。环境可能会出现新的或动态变化的障碍物、光照条件变化等情况。\n\n#### 录制 rosbag：\n```bash\n.\u002Frecord_bag.sh \u003Cbag_name>\n```\n\n运行此命令以使用操纵杆和摄像头遥控机器人。该命令会打开三个窗口：\n1. `roslaunch vint_locobot.launch`：此启动文件会启动摄像头的 `usb_cam` 节点、操纵杆的 `joy` 节点以及机器人移动基座的相关节点。\n2. `python joy_teleop.py`：此 Python 脚本会启动一个节点，从 `joy` 主题读取输入，并将其输出到用于遥控机器人基座的主题上。\n3. `rosbag record \u002Fusb_cam\u002Fimage_raw -o \u003Cbag_name>`：此命令不会立即执行（需要按 Enter 键）。它将在 `vint_release\u002Fdeployment\u002Ftopomaps\u002Fbags` 目录中运行，我们建议将 rosbag 文件存储在此目录中。\n\n准备好录制 bag 后，运行 `rosbag record` 命令，并在希望机器人遵循的地图上遥控机器人。完成路径录制后，终止 `rosbag record` 命令，然后关闭 tmux 会话。\n\n#### 生成拓扑地图：\n```bash\n.\u002Fcreate_topomap.sh \u003Ctopomap_name> \u003Cbag_filename>\n```\n\n此命令会打开 3 个窗口：\n1. `roscore`\n2. `python create_topomap.py —dt 1 —dir \u003Ctopomap_dir>`：此命令会在 `\u002Fvint_release\u002Fdeployment\u002Ftopomaps\u002Fimages` 中创建一个目录，并在播放 bag 的每一秒保存一张图像作为地图中的一个节点。\n3. `rosbag play -r 1.5 \u003Cbag_filename>`：此命令以 5 倍速播放 rosbag，因此 Python 脚本实际上是以 1.5 秒的间隔记录节点。`\u003Cbag_filename>` 应该是包含 .bag 扩展名的完整 bag 文件名。您可以在 `make_topomap.sh` 文件中更改此值。该命令不会运行，直到您按下 Enter 键，而您应该仅在 Python 脚本发出等待消息后才按下 Enter 键。开始播放 bag 后，请切换到运行 Python 脚本的屏幕，以便在 rosbag 播放结束后终止该脚本。\n\n当 bag 播放结束时，终止 tmux 会话。\n\n\n### 运行模型\n#### 导航\n_请确保在 `vint_release\u002Fdeployment\u002Fsrc\u002F` 目录下运行此脚本。_\n\n```bash\n.\u002Fnavigate.sh “--model \u003Cmodel_name> --dir \u003Ctopomap_dir>”\n```\n\n为了部署已发表结果中的模型，我们发布了模型检查点，您可以从[此链接](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1a9yWR2iooXFAqjQHetz263--4_2FFggg?usp=sharing)下载。\n\n\n`\u003Cmodel_name>` 是 `vint_release\u002Fdeployment\u002Fconfig\u002Fmodels.yaml` 文件中模型的名称。在此文件中，您需要为每个模型指定以下参数（默认值）：\n- `config_path`（str）：用于训练模型的位于 `vint_release\u002Ftrain\u002Fconfig\u002F` 中的 *.yaml 文件路径。\n- `ckpt_path`（str）：位于 `vint_release\u002Fdeployment\u002Fmodel_weights\u002F` 中的 *.pth 文件路径。\n\n\n请确保这些配置与您训练模型时使用的配置一致。我们提供的权重对应的模型配置已在 yaml 文件中列出，供您参考。\n\n`\u003Ctopomap_dir>` 是 `vint_release\u002Fdeployment\u002Ftopomaps\u002Fimages` 中的目录名称，其中包含与拓扑地图节点相对应的图像。这些图像按名称从 0 到 N 排序。\n\n此命令会打开 4 个窗口：\n\n1. `roslaunch vint_locobot.launch`：此启动文件会开启摄像头的 usb_cam 节点、操纵杆的 joy 节点以及机器人移动基座的多个节点。\n2. `python navigate.py --model \u003Cmodel_name> -—dir \u003Ctopomap_dir>`：此 Python 脚本会启动一个节点，从 `\u002Fusb_cam\u002Fimage_raw` 主题读取图像观测，将观测和地图输入模型，并发布动作到 `\u002Fwaypoint` 主题。\n3. `python joy_teleop.py`：此 Python 脚本会启动一个节点，从 joy 主题读取输入，并将其输出到用于遥控机器人基座的主题上。\n4. `python pd_controller.py`：此 Python 脚本会启动一个节点，从 `\u002Fwaypoint` 主题读取消息（来自模型的航路点），并输出速度指令以控制机器人基座的运动。\n\n当机器人完成导航时，终止 `pd_controller.py` 脚本，然后终止 tmux 会话。如果您想在机器人导航过程中接管控制，可以使用 `joy_teleop.py` 脚本通过操纵杆进行操作。\n\n#### 探索\n_请确保在 `vint_release\u002Fdeployment\u002Fsrc\u002F` 目录下运行此脚本。_\n\n```bash\n.\u002Fexploration.sh “--model \u003Cmodel_name>”\n```\n\n为了部署已发表结果中的模型，我们发布了模型检查点，您可以从[此链接](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1a9yWR2iooXFAqjQHetz263--4_2FFggg?usp=sharing)下载。\n\n\n`\u003Cmodel_name>` 是 `vint_release\u002Fdeployment\u002Fconfig\u002Fmodels.yaml` 文件中模型的名称（请注意，只有 NoMaD 可用于探索）。在此文件中，您需要为每个模型指定以下参数（默认值）：\n- `config_path`（str）：用于训练模型的位于 `vint_release\u002Ftrain\u002Fconfig\u002F` 中的 *.yaml 文件路径。\n- `ckpt_path`（str）：位于 `vint_release\u002Fdeployment\u002Fmodel_weights\u002F` 中的 *.pth 文件路径。\n\n\n请确保这些配置与您训练模型时使用的配置一致。我们提供的权重对应的模型配置已在 yaml 文件中列出，供您参考。\n\n`\u003Ctopomap_dir>` 是 `vint_release\u002Fdeployment\u002Ftopomaps\u002Fimages` 中的目录名称，其中包含与拓扑地图节点相对应的图像。这些图像按名称从 0 到 N 排序。\n\n此命令会打开 4 个窗口：\n\n1. `roslaunch vint_locobot.launch`：此启动文件会开启摄像头的 usb_cam 节点、操纵杆的 joy 节点以及机器人移动基座的多个节点。\n2. `python explore.py --model \u003Cmodel_name>`：此 Python 脚本会启动一个节点，从 `\u002Fusb_cam\u002Fimage_raw` 主题读取图像观测，将观测和地图输入模型，并发布探索行动到 `\u002Fwaypoint` 主题。\n3. `python joy_teleop.py`：此 Python 脚本会启动一个节点，从 joy 主题读取输入，并将其输出到用于遥控机器人基座的主题上。\n4. `python pd_controller.py`：此 Python 脚本会启动一个节点，从 `\u002Fwaypoint` 主题读取消息（来自模型的航路点），并输出速度指令以控制机器人基座的运动。\n\n当机器人完成导航时，终止 `pd_controller.py` 脚本，然后终止 tmux 会话。如果您想在机器人导航过程中接管控制，可以使用 `joy_teleop.py` 脚本通过操纵杆进行操作。\n\n### 将此代码适配到不同机器人\n\n我们希望这个代码库足够通用，以便您可以将其部署到您喜爱的基于 ROS 的机器人上。您可以在 `vint_release\u002Fdeployment\u002Fconfig\u002Frobot.yaml` 中修改机器人配置参数，例如机器人的最大角速度和线速度，以及用于遥控和控制机器人的发布话题。如果您有任何问题或建议，欢迎随时创建 GitHub 问题，或通过 shah@cs.berkeley.edu 联系作者。\n\n\n## 引用\n```\n@inproceedings{shah2022gnm,\n  author    = {Dhruv Shah 和 Ajay Sridhar 和 Arjun Bhorkar 和 Noriaki Hirose 和 Sergey Levine},\n  title     = {{GNM：一种可驱动任意机器人的通用导航模型}},\n  booktitle = {国际机器人与自动化会议（ICRA）},\n  year      = {2023},\n  url       = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.03370}\n}\n\n@inproceedings{shah2023vint,\n  title     = {Vi{NT}：视觉导航的基础模型},\n  author    = {Dhruv Shah 和 Ajay Sridhar 和 Nitish Dashora 和 Kyle Stachowicz 和 Kevin Black 和 Noriaki Hirose 和 Sergey Levine},\n  booktitle = {第七届机器人学习年度会议},\n  year      = {2023},\n  url       = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.14846}\n}\n\n@article{sridhar2023nomad,\n  author  = {Ajay Sridhar 和 Dhruv Shah 和 Catherine Glossop 和 Sergey Levine},\n  title   = {{NoMaD：用于导航与探索的目标掩码扩散策略}},\n  journal = {arXiv 预印本},\n  year    = {2023},\n  url     = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.xxxx}\n}\n```","# visualnav-transformer 快速上手指南\n\nvisualnav-transformer 是一个通用的目标条件视觉导航模型库（包含 GNM、ViNT 和 NoMaD），支持在多种机器人上进行零样本导航或微调适配。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: Ubuntu 18.04 或 20.04（推荐 20.04）\n- **Python**: 3.7+\n- **GPU**: 支持 CUDA 10+ 的 NVIDIA 显卡\n- **包管理器**: Conda (Anaconda 或 Miniconda)\n\n### 硬件依赖（部署阶段）\n- 机器人平台：LoCoBot, TurtleBot2, Clearpath Jackal 等（或仿真环境如 CARLA）\n- 摄像头：广角 RGB 摄像头（如 ELP Fisheye）\n- 控制设备：兼容 Linux 的游戏手柄或键盘\n\n## 安装步骤\n\n以下命令请在项目根目录 `vint_release\u002F` 下执行。\n\n### 1. 创建并激活 Conda 环境\n```bash\nconda env create -f train\u002Ftrain_environment.yml\nconda activate vint_train\n```\n\n### 2. 安装核心训练包\n```bash\npip install -e train\u002F\n```\n\n### 3. 安装扩散策略依赖\n需要克隆并安装 `diffusion_policy` 仓库：\n```bash\ngit clone git@github.com:real-stanford\u002Fdiffusion_policy.git\npip install -e diffusion_policy\u002F\n```\n> **提示**：如果国内网络克隆缓慢，可尝试使用 Gitee 镜像或配置 Git 代理。\n\n### 4. 部署环境安装（仅在机器人端需要）\n若在机器人（如 LoCoBot）上部署，还需安装 ROS Noetic 及相关驱动：\n```bash\n# 安装 ROS Noetic (参考官方文档)\n# 安装 ROS 组件\nsudo apt-get install ros-noetic-usb-cam ros-noetic-joy\n\n# 创建部署专用环境\nconda env create -f deployment\u002Fdeployment_environment.yaml\nconda activate vint_deployment\n\n# 重复安装上述 python 包\npip install -e train\u002F\npip install -e diffusion_policy\u002F\n```\n\n## 基本使用\n\n### 1. 数据准备与处理\n模型训练需要特定的数据格式。你可以使用公开数据集（如 RECON, TartanDrive 等）或自定义数据。\n\n**处理 Rosbag 或自定义数据：**\n```bash\n# 处理 rosbag 文件\npython train\u002Fprocess_bags.py --input_path \u003Cpath_to_bag> --output_path \u003Coutput_dir>\n\n# 或者处理 RECON 格式的 HDF5 数据\npython train\u002Fprocess_recon.py --input_path \u003Cpath_to_hdf5> --output_path \u003Coutput_dir>\n```\n\n**划分训练集\u002F测试集：**\n```bash\npython train\u002Fdata_split.py --data_dir \u003Cpath_to_processed_dataset>\n```\n处理后数据结构应包含按时间序列命名的图片 (`0.jpg`, `1.jpg`...) 和里程计文件 (`traj_data.pkl`)。\n\n### 2. 配置训练\n修改配置文件 `train\u002Fconfig\u002Fvint.yaml`（推荐以此为基础修改），指定数据集路径和参数。\n\n若添加自定义数据集，需编辑 `train\u002Fvint_train\u002Fdata\u002Fdata_config.yaml` 添加数据集元数据，并在训练配置文件的 `datasets` 字段中引用：\n```yaml\n\u003Cdataset_name>:\n    data_folder: \u003Cpath_to_the_dataset>\n    train: data\u002Fdata_splits\u002F\u003Cdataset_name>\u002Ftrain\u002F \n    test: data\u002Fdata_splits\u002F\u003Cdataset_name>\u002Ftest\u002F \n    end_slack: 0\n    goals_per_obs: 1\n    negative_mining: True\n```\n\n### 3. 开始训练\n使用配置文件启动训练或微调：\n```bash\ncd train\npython train.py -c config\u002Fvint.yaml\n```\n若要加载预训练权重进行微调，请在配置文件中添加 `load_run: \u003Cproject_name>\u002F\u003Clog_run_name>`，并将下载的 `.pth` 文件重命名为 `latest.pth` 放入对应的 `logs` 目录中。\n\n### 4. 机器人部署示例\n部署流程分为三步：采集演示、构建拓扑地图、导航。\n\n**步骤 A: 采集演示轨迹 (生成 Rosbag)**\n在机器人终端运行：\n```bash\n.\u002Fdeployment\u002Fsrc\u002Frecord_bag.sh\n```\n手动遥控机器人走一遍环境，脚本将保存为 Rosbag 文件。\n\n**步骤 B: 构建拓扑地图**\n将采集的 Rosbag 转换为导航用的拓扑图：\n```bash\n.\u002Fdeployment\u002Fsrc\u002Fcreate_topomap.sh \u003Cpath_to_bag_file>\n```\n\n**步骤 C: 执行导航**\n加载训练好的模型进行导航：\n```bash\n.\u002Fdeployment\u002Fsrc\u002Fnavigate.sh --model_path \u003Cpath_to_model_weights> --topomap_path \u003Cpath_to_generated_map>\n```\n对于 NoMaD 模型的随机探索任务，可使用：\n```bash\n.\u002Fdeployment\u002Fsrc\u002Fexplore.sh --model_path \u003Cpath_to_model_weights>\n```\n\n> **注意**：使用前请确保已将预训练权重文件 (`*.pth`) 下载至 `vint_release\u002Fdeployment\u002Fmodel_weights` 目录，并根据实际摄像头和手柄型号调整 `deployment\u002Fconfig\u002F` 下的 YAML 配置文件。","某物流仓储团队正试图让一台新采购的轮式机器人，在未经过专门地图构建和长时间训练的仓库环境中，自主导航至指定货架取货。\n\n### 没有 visualnav-transformer 时\n- **开发周期漫长**：工程师必须为这台新机器人采集数千条专属轨迹数据，并从头训练导航策略，耗时数周才能上线。\n- **泛化能力极差**：一旦仓库灯光变化或地面材质不同，原本训练好的模型就会失效，需要重新调整参数甚至返工。\n- **多机型适配困难**：若后续引入不同底盘结构的机器人，整个感知与控制代码需大幅重构，无法复用已有成果。\n- **零样本任务不可行**：机器人完全无法在未见过的新区域直接执行“去某地”的指令，必须先由人工遥控建图。\n\n### 使用 visualnav-transformer 后\n- **实现零样本部署**：利用预训练的 GNM 或 ViNT 模型，机器人无需任何额外训练，即可直接在陌生仓库中理解视觉指令并导航。\n- **跨场景鲁棒性强**：得益于多样化的跨机体训练数据，模型能自适应不同的光照、地面纹理及动态障碍物干扰。\n- **高效迁移适配**：仅需少量新机器人的演示数据（ROS bag），通过 `train.py` 脚本微调，即可快速将通用策略迁移到新硬件上。\n- **拓扑地图自动构建**：通过 `create_topomap.sh` 脚本，机器人可基于单次演示轨迹自动生成导航用的拓扑图，大幅降低部署门槛。\n\nvisualnav-transformer 将移动机器人的导航开发从“定制化手工打造”转变为“通用模型即插即用”，显著降低了多场景、多机型的落地成本。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frobodhruv_visualnav-transformer_63a62a53.png","robodhruv","Dhruv Shah","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Frobodhruv_3239a0ed.jpg","Roboticist. Tinkerer. Programmer.",null,"Berkeley, CA","shah@eecs.berkeley.edu","cs.berkeley.edu\u002F~shah","https:\u002F\u002Fgithub.com\u002Frobodhruv",[82,86],{"name":83,"color":84,"percentage":85},"Python","#3572A5",97.6,{"name":87,"color":88,"percentage":89},"Shell","#89e051",2.4,1178,177,"2026-04-02T11:27:24","MIT",4,"Linux (Ubuntu 18.04, 20.04)","需要 NVIDIA GPU，支持 CUDA 10+，具体显存大小未说明","未说明",{"notes":99,"python":100,"dependencies":101},"训练和部署均基于 Ubuntu 系统。训练端需安装 conda 环境及 diffusion_policy 包；部署端（如 LoCoBot）需额外安装 ROS Noetic、相关 ROS 包及 kobuki 驱动。代码库组织遵循 GNM 项目结构，支持通过配置文件自定义数据集和模型参数。","3.7+",[102,103,104,105,106,107],"conda","torch (隐含)","diffusion_policy","ros-noetic (部署端)","kobuki (部署端)","tmux (推荐)",[15,14,109],"其他","2026-03-27T02:49:30.150509","2026-04-07T06:14:51.565465",[113,118,123,127,132,137,142],{"id":114,"question_zh":115,"answer_zh":116,"source_url":117},21129,"如何修复 NoMaD 模型权重无法加载训练参数的问题？","维护者已确认该问题并进行了修复。请拉取最新的代码更新以解决权重加载问题。具体的修复提交记录为：https:\u002F\u002Fgithub.com\u002Frobodhruv\u002Fvisualnav-transformer\u002Fcommit\u002Fb3957458a0bcc9a3501e7066e3593292ebf13321。如果更新后仍遇到问题，请重新反馈。","https:\u002F\u002Fgithub.com\u002Frobodhruv\u002Fvisualnav-transformer\u002Fissues\u002F10",{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},21130,"在自定义机器人上部署模型时出现碰撞或导航效果不佳怎么办？","这通常是因为预训练模型未针对特定机器人的传感器（如鱼眼相机）和底盘动力学进行微调。用户反馈表明，直接使用零样本（zero-shot）部署在自定义硬件（如松灵底盘、鱼眼相机）上会导致左右晃动、避障能力差甚至碰撞。建议收集自己机器人的数据进行微调（Fine-tuning），而不是直接使用默认权重。此外，检查相机视场角（FOV）设置，有用户尝试将 FOV 从 90 度调整到 120 度以改善性能，但需重新验证路点生成的准确性。","https:\u002F\u002Fgithub.com\u002Frobodhruv\u002Fvisualnav-transformer\u002Fissues\u002F13",{"id":124,"question_zh":125,"answer_zh":126,"source_url":122},21131,"如何在图像上可视化路点（Waypoints）和轨迹？","可以使用代码中的 `plot_trajs_and_points_on_image()` 函数将路点绘制在图像上。具体步骤是：首先使用 `project_points()` 函数将模型输出的动作（naction）转换为 2D 坐标，然后调用绘图函数将其渲染到图像上。如果需要集成到回调函数中，可能需要修改 `callback_obs()`。",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},21132,"启动时 closest_node 节点值闪烁或不更新导致机器人无法找到目标，如何调试？","这是一个已知问题，特别是在使用广角鱼眼相机（如 190 度 FOV）且未进行微调的情况下。表现为机器人在静止时最近节点在 0 和 5 之间跳变，或移动时丢失跟踪。目前社区尚未给出统一的代码级修复方案，主要建议是：1. 检查拓扑地图（Topomap）的构建质量；2. 尝试对特定相机视角进行模型微调；3. 确保相机参数与模型训练时的假设一致。如果问题持续，建议在 Issue 中提供拓扑地图截图以便进一步分析。","https:\u002F\u002Fgithub.com\u002Frobodhruv\u002Fvisualnav-transformer\u002Fissues\u002F19",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},21133,"如何解决 CARLA 模拟器与客户端版本不匹配的警告？","当出现 \"Simulator \u002F Client version mismatch\" 警告时，可能会导致连接错误或渲染问题。解决方法包括：1. 确保安装的 CARLA 服务器版本与 Python 客户端 API 版本完全一致（例如都是 0.9.13）；2. 检查 CARLA 的 Content 文件夹是否包含完整的蓝图（blueprints）资源；3. 如果是渲染相关问题，可以尝试设置 `RenderingOffScreen` 选项。版本不一致可能导致无法正确生成路点或接收图像数据。","https:\u002F\u002Fgithub.com\u002Frobodhruv\u002Fvisualnav-transformer\u002Fissues\u002F14",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},21134,"在 CARLA 仿真中如何根据速度估算车辆位移和生成路点？","可以通过假设一个较小的时间增量（time delta）来估算。具体方法是：角位移 = 角速度 × 时间增量；线性位移大小 = 线速度 × 时间增量。利用这两个值可以粗略估计车辆的下一位置并生成路点。虽然这种方法不是完美的（需要导数计算），但在时间增量足够小的情况下可以提供有效的近似值。","https:\u002F\u002Fgithub.com\u002Frobodhruv\u002Fvisualnav-transformer\u002Fissues\u002F16",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},21135,"NoMaD 模型生成的路点看起来散乱或起始点距离车辆过远是否正常？","路点散乱或起始点异常通常表明模型输出或投影计算存在问题。这可能是由于：1. 相机内参或外参配置错误；2. 视场角（FOV）调整后未相应调整投影逻辑；3. 模型在未微调的情况下对当前场景泛化能力不足。建议检查 `project_points()` 函数的输入参数，并对比不同 FOV 设置下的可视化结果。如果路点随机散射，可能会严重影响导航性能，需重点排查传感器标定数据。","https:\u002F\u002Fgithub.com\u002Frobodhruv\u002Fvisualnav-transformer\u002Fissues\u002F22",[148],{"id":149,"version":150,"summary_zh":151,"released_at":152},127176,"code_release","用于配合我们2023年CoRL论文《ViNT：视觉导航基础模型》的代码发布。","2023-10-06T23:28:49"]