[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-opendilab--LMDrive":3,"tool-opendilab--LMDrive":62},[4,18,26,36,46,54],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",159636,2,"2026-04-17T23:33:34",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":42,"last_commit_at":43,"category_tags":44,"status":17},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[13,45],"插件",{"id":47,"name":48,"github_repo":49,"description_zh":50,"stars":51,"difficulty_score":32,"last_commit_at":52,"category_tags":53,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":55,"name":56,"github_repo":57,"description_zh":58,"stars":59,"difficulty_score":32,"last_commit_at":60,"category_tags":61,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[45,13,15,14],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":78,"owner_twitter":74,"owner_website":77,"owner_url":79,"languages":80,"stars":116,"forks":117,"last_commit_at":118,"license":119,"difficulty_score":120,"env_os":121,"env_gpu":122,"env_ram":123,"env_deps":124,"category_tags":135,"github_topics":77,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":137,"updated_at":138,"faqs":139,"releases":180},8968,"opendilab\u002FLMDrive","LMDrive","[CVPR 2024] LMDrive: Closed-Loop End-to-End Driving with Large Language Models","LMDrive 是一个基于大语言模型（LLM）的端到端自动驾驶框架，旨在实现闭环自主驾驶。它不仅能处理多视角传感器数据来感知动态环境，还能理解自然语言指令，让车辆像人类一样“听懂”导航要求并做出相应驾驶决策。\n\n传统自动驾驶系统往往难以灵活应对复杂的长尾场景或遵循模糊的人类指令。LMDrive 通过引入大语言模型的强大推理能力，解决了这一痛点，将视觉感知与语言理解深度融合，使车辆能够在开放环境中进行更智能、更具解释性的规划与控制。\n\n这款工具主要适合自动驾驶领域的研究人员和开发者使用。如果你正在探索多模态融合、具身智能或语言引导的机器人控制，LMDrive 提供了完整的代码库、预训练模型以及基于 CARLA 仿真器构建的数据集，便于复现论文成果或开展二次开发。\n\n其核心技术亮点在于“闭环”与“语言引导”。不同于仅做单次预测的开环系统，LMDrive 能根据执行结果不断调整策略；同时，它利用大模型作为决策中枢，实现了从传感器输入到控制输出的端到端训练，为下一代可交互、可解释的智能驾驶系统提供了新的研究范式。","# LMDrive: Closed-Loop End-to-End Driving with Large Language Models\n*An end-to-end, closed-loop, language-based autonomous driving framework, which interacts with the dynamic environment via multi-modal multi-view sensor data and natural language instructions.*\n\n[[Project Page](https:\u002F\u002Fhao-shao.com\u002Fprojects\u002Flmdrive.html)] [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.07488)]  [[Dataset(hugging face)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FOpenDILabCommunity\u002FLMDrive)]  [[Model Zoo](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FOpenDILabCommunity\u002Flmdrive-658aee50ce38d143c4925a98)]\n\n[[Dataset(OpenXlab)](https:\u002F\u002Fopenxlab.org.cn\u002Fdatasets\u002Fdeepcs233\u002FLMDrive)]\n[[Model Zoo(OpenXLab)](https:\u002F\u002Fopenxlab.org.cn\u002Fmodels\u002Fdetail\u002Fdeepcs233\u002FLMDrive)]\n\n\n[![Hits](https:\u002F\u002Fhits.seeyoufarm.com\u002Fapi\u002Fcount\u002Fincr\u002Fbadge.svg?url=https%3A%2F%2Fgithub.com%2Fopendilab%2FLMDrive&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=hits&edge_flat=false)](https:\u002F\u002Fhits.seeyoufarm.com)\n[![Code License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCode%20License-Apache_2.0-green.svg)](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca\u002Fblob\u002Fmain\u002FLICENSE)\n[![Data License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FData%20License-CC%20By%20NC%204.0-red.svg)](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca\u002Fblob\u002Fmain\u002FDATA_LICENSE)\n\n## News\n\n- `[02\u002F27]` [LMDrive](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.07488) is accepted by CVPR 2024 🎉🎉🎉\n- `[01\u002F25]` We uploaded our models to [OpenXLab](https:\u002F\u002Fopenxlab.org.cn\u002Fmodels\u002Fdetail\u002Fdeepcs233\u002FLMDrive)\n- `[01\u002F23]` We gave a talk at [ZhiDongXi (智东西)](https:\u002F\u002Fwqpoq.xetslk.com\u002Fsl\u002F3D1aRZ)\n- `[01\u002F20]` We uploaded our dataset to [OpenXLab](https:\u002F\u002Fopenxlab.org.cn\u002Fdatasets\u002Fdeepcs233\u002FLMDrive)\n- `[12\u002F21]` We released our project website [here](https:\u002F\u002Fhao-shao.com\u002Fprojects\u002Flmdrive.html)\n\n****\n\n\u003Cdiv align=\"center\">\n  \u003Cimg width=\"800\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_LMDrive_readme_5561e4ee08b4.png\">\u003C\u002Fimg>\n\u003C\u002Fdiv>\n\n> [Hao Shao](http:\u002F\u002Fhao-shao.com\u002F), Yuxuan Hu, [Letian Wang](https:\u002F\u002Fletianwang0.wixsite.com\u002Fmyhome), [Steven L. Waslander](https:\u002F\u002Fwww.trailab.utias.utoronto.ca\u002Fstevenwaslander), [Yu Liu](https:\u002F\u002Fliuyu.us\u002F), [Hongsheng Li](http:\u002F\u002Fwww.ee.cuhk.edu.hk\u002F~hsli\u002F).\n\nThis repository contains code for the paper [LMDrive: Closed-Loop End-to-End Driving with Large Language Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.07488). This work proposes a novel language-guided, end-to-end, closed-loop autonomous driving framework.\n\n## Demo Video\n\n\n\u003Cdiv align=\"center\">\n  \u003Cvideo width=\"800\" src=\"https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLMDrive\u002Fassets\u002F17512647\u002F65b2785d-e8bc-4ec1-ac86-e077299a465d\">\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n## Contents\n1. [Setup](#setup)\n2. [Model Weights](#lmdrive-Weights)\n3. [Dataset](#dataset)\n      1. [Overview](#overview)\n      1. [Data Generation](#data-generation)\n      2. [Data Pre-procession](#data-pre-procession)\n      3. [Data Parsing](#data-parsing)\n4. [Training](#training)\n      1. [Vision encoder pre-training](#vision-encoder-pre-training)\n      2. [Instruction finetuning](#instruction-finetuning)\n5. [Evaluation](#evaluation)\n6. [Citation](#citation)\n7. [Acknowledgements](#acknowledgements)\n\n## Setup\n\nOur project is built on three parts: (1) vision encoder (corresponding repo: timm); (2) vision LLM (corresponding repo: LAVIS); (3) data collection, agent controller (corresponding repo: InterFuser, Leaderboard, ScenarioRunner). \n\nInstall anaconda\n```Shell\nwget https:\u002F\u002Frepo.anaconda.com\u002Farchive\u002FAnaconda3-2020.11-Linux-x86_64.sh\nbash Anaconda3-2020.11-Linux-x86_64.sh\nsource ~\u002F.bashrc\n```\n\nClone the repo and build the environment\n\n```Shell\ngit clone https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLMDrive.git\ncd LMDrive\nconda create -n lmdrive python=3.8\nconda activate lmdrive\ncd vision_encoder\npip3 install -r requirements.txt\npython setup.py develop # if you have installed timm before, please uninstall it\ncd ..\u002FLAVIS\npip3 install -r requirements.txt\npython setup.py develop # if you have installed LAVIS before, please uninstall it\n\npip install flash-attn --no-build-isolation # optional\n```\n\nDownload and setup CARLA 0.9.10.1\n```Shell\nchmod +x setup_carla.sh\n.\u002Fsetup_carla.sh\npip install carla\n```\n\n> If you encounter some problems related to Carla, please refer to [Carla Issues](https:\u002F\u002Fgithub.com\u002Fcarla-simulator\u002Fcarla\u002Fissues) and [InterFuser Issues](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FInterFuser) first.\n\n\n## LMDrive Weights\nIf you are interested in including any other details in Model Zoo, please open an issue :)\n\n\n| Version | Size |  Checkpoint | VisionEncoder | LLM-base | DS (LangAuto) | DS (LangAuto-short) |\n|---------|------|------------|----------------|-----------|:---:|:---:|\n| LMDrive-1.0 (LLaVA-v1.5-7B) | 7B |  [LMDrive-llava-v1.5-7b-v1.0](https:\u002F\u002Fhuggingface.co\u002FOpenDILabCommunity\u002FLMDrive-llava-v1.5-7b-v1.0) | [R50](https:\u002F\u002Fhuggingface.co\u002FOpenDILabCommunity\u002FLMDrive-vision-encoder-r50-v1.0) | [LLaVA-v1.5-7B](https:\u002F\u002Fhuggingface.co\u002Fliuhaotian\u002Fllava-v1.5-7b) | 36.2 | 50.6|\n| LMDrive-1.0 (Vicuna-v1.5-7B) | 7B |  [LMDrive-vicuna-v1.5-7b-v1.0](https:\u002F\u002Fhuggingface.co\u002FOpenDILabCommunity\u002FLMDrive-vicuna-v1.5-7b-v1.0) | [R50](https:\u002F\u002Fhuggingface.co\u002FOpenDILabCommunity\u002FLMDrive-vision-encoder-r50-v1.0) | [Vicuna-v1.5-7B](https:\u002F\u002Fhuggingface.co\u002Flmsys\u002Fvicuna-7b-v1.5-16k) | 33.5 | 45.3 |\n| LMDrive-1.0 (LLaMA-7B) | 7B |  [LMDrive-llama-7b-v1.0](https:\u002F\u002Fhuggingface.co\u002FOpenDILabCommunity\u002FLMDrive-llama-7b-v1.0) | [R50](https:\u002F\u002Fhuggingface.co\u002FOpenDILabCommunity\u002FLMDrive-vision-encoder-r50-v1.0) | [LLaMA-7B](https:\u002F\u002Fhuggingface.co\u002Fhuggyllama\u002Fllama-7b) | 31.3 | 42.8 |\n\n*DS denotes the driving score*\n\n## Dataset\n\nWe aim to develop an intelligent driving agent that can generate driving actions based on three sources of input: 1) sensor data (multi-view camera and LiDAR), so that the agent can generate actions that are aware of and compliant with the current scene; 2) navigation instructions (e.g. lane changing, turning), so that the agent can drive to meet the requirement in natural language (instruction from humans or navigation software); and 3) human notice instruction, so that the agent can interact with humans and adapt to human's suggestions and preferences (e.g. pay attention to adversarial events, deal with long-tail events, etc).\n\nWe provide a dataset with about 64K data clips, where each clip includes one navigation instruction, several notice instructions, a sequence of multi-modal multi-view sensor data, and control signals. The duration of the clip spans from 2 to 20 seconds. The dataset used in our paper can be downloaded [here](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FOpenDILabCommunity\u002FLMDrive). If you want to create your own dataset, please follow the steps we've outlined below.\n\n### Overview\nThe data is generated with ```leaderboard\u002Fteam_code\u002Fauto_pilot.py``` in 8 CARLA towns using the routes and scenarios files provided at ```leaderboard\u002Fdata``` on CARLA 0.9.10.1 . The dataset is collected at a high frequency (~10Hz).\n\nOnce you have downloaded our dataset or collected your own dataset, it's necessary to systematically organize the data as follows. DATASET_ROOT is the root directory where your dataset is stored.\n\n```\n├── $DATASET_ROOT\n│   └── dataset_index.txt  # for vision encoder pretraining\n│   └── navigation_instruction_list.txt  # for instruction finetuning\n│   └── notice_instruction_list.json  # for instruction finetuning\n│   └── routes_town06_long_w7_11_28_18_28_35  #  data folder\n│   └── routes_town01_short_w2_11_16_08_27_10\n│   └── routes_town02_short_w2_11_16_22_55_25\n│   └── routes_town01_short_w2_11_16_11_44_08 \n      ├── rgb_full\n      ├── lidar\n      └── ...\n```\n\nThe `navigation_instruction_list.txt` and `notice_instruction_list.txt` can be generated with our scripts by the data parsing [scripts](#data-parsing).\nEach subfolder in the dataset you've collected should be structured as follows:\n\n```\n- routes_town(town_id)_{tiny,short,long}_w(weather_id)_timestamp: corresponding to different towns and routes files\n    - routes_X: contains data for an individual route\n        - rgb_full: a big multi-view camera image at 400x1200 resolution, which can be split into four images (left, center, right, rear)\n        - lidar: 3d point cloud in .npy format. It only includes the LiDAR points captured in 1\u002F20 second, covering 180 degrees of horizontal view. So if you want to utilize 360 degrees of view, you need to merge it with the data from lidar_odd.\n        - lidar_odd: 3d point cloud in .npy format.\n        - birdview: topdown segmentation images, LAV and LBC used this type of data for training\n        - topdown: similar to birdview but it's captured by the down-facing camera\n        - 3d_bbs: 3d bounding boxes for different agents\n        - affordances: different types of affordances\n        - actors_data: contains the positions, velocities and other metadata of surrounding vehicles and the traffic lights\n        - measurements: contains ego agent's position, velocity, future waypoints, and other metadata\n        - measurements_full: merges measurement and actors_data\n        - measurements_all.json: merges the files in measurement_full into a single file\n```\nThe `$DATASET_ROOT` directory must contain a file named `dataset_index.txt`, which can be generated by our data pre-processing [script](#data-pre-procession). It should list the training and evaluation data in the following format:\n\n```\n\u003Crelative_route_path_dir> \u003Cnum_data_frames_in_this_dir>\nroutes_town06_long_w7_11_28_18_28_35\u002F 1062\nroutes_town01_short_w2_11_16_08_27_10\u002F 1785\nroutes_town01_short_w2_11_16_09_55_05\u002F 918\nroutes_town02_short_w2_11_16_22_55_25\u002F 134\nroutes_town01_short_w2_11_16_11_44_08\u002F 569\n```\n\nHere, `\u003Crelative_route_path_dir>` should be a relative path to the `$DATASET_ROOT`. The training code will concatenate the `$DATASET_ROOT` and `\u003Crelative_route_path_dir>` to create the full path for loading the data. \nIn this format, 1062 represents the number of frames in the routes_town06_long_w7_11_28_18_28_35\u002Frgb_full directory or routes_town06_long_w7_11_28_18_28_35\u002Flidar, etc.\n\n### Data Generation\n#### Data Generation with multiple CARLA Servers\nIn addition to the dataset, we have also provided all the scripts used for generating data and these can be modified as required for different CARLA versions. The dataset is collected by a rule-based expert agent in different weathers and towns.\n\n##### Running CARLA Servers\n```bash\n# Start 4 carla servers: ip [localhost], port [2000, 2002, 2004, 2006]. You can adjust the number of CARLA servers according to your situation and more servers can collect more data. If you use N servers to collect data, it means you have collected data N times on each route, except that the weather and traffic scenarios are random each time.\n\ncd carla\nCUDA_VISIBLE_DEVICES=0 .\u002FCarlaUE4.sh --world-port=2000 -opengl &\nCUDA_VISIBLE_DEVICES=1 .\u002FCarlaUE4.sh --world-port=2002 -opengl &\nCUDA_VISIBLE_DEVICES=2 .\u002FCarlaUE4.sh --world-port=2004 -opengl &\nCUDA_VISIBLE_DEVICES=3 .\u002FCarlaUE4.sh --world-port=2006 -opengl &\n```\n\nInstructions for setting up docker are available [here](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Finstall-guide.html#docker). Pull the docker image of CARLA 0.9.10.1 ```docker pull carlasim\u002Fcarla:0.9.10.1```.\n\nDocker 18:\n```\ndocker run -it --rm -p 2000-2002:2000-2002 --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 carlasim\u002Fcarla:0.9.10.1 .\u002FCarlaUE4.sh --world-port=2000 -opengl\n```\n\nDocker 19:\n```Shell\ndocker run -it --rm --net=host --gpus '\"device=0\"' carlasim\u002Fcarla:0.9.10.1 .\u002FCarlaUE4.sh --world-port=2000 -opengl\n```\n\nIf the docker container doesn't start properly then add another environment variable ```-e SDL_AUDIODRIVER=dsp```.\n\n##### Run the Autopilot\nGenerate scripts for collecting data in batches.\n```bash\ncd dataset\npython init_dir.py\ncd ..\ncd data_collection\n\n# You can modify FPS, waypoints distribution strength in auto_agent.yaml ...\n\n# If you do not use 4 servers, the following Python scripts are needed to modify\npython generate_bashs.py\npython generate_batch_collect.py \ncd ..\n```\n\nRun batch-run scripts of the town and route type that you need to collect.\n```bash\nbash data_collection\u002Fbatch_run\u002Frun_route_routes_town01_long.sh\nbash data_collection\u002Fbatch_run\u002Frun_route_routes_town01_short.sh\n...\nbash data_collection\u002Fbatch_run\u002Frun_route_routes_town07_tiny.sh\n...\nbash data_collection\u002Fbatch_run\u002Frun_route_routes_town10_tiny.sh\n```\n\n**Note:** Our scripts will use a random weather condition for data collection\n\n##### Data Generation with a single CARLA Server\nWith a single CARLA server, roll out the autopilot to start data generation.\n```Shell\ncarla\u002FCarlaUE4.sh --world-port=2000 -opengl\n.\u002Fleaderboard\u002Fscripts\u002Frun_evaluation.sh\n```\nThe expert agent used for data generation is defined in ```leaderboard\u002Fteam_code\u002Fauto_pilot.py```. Different variables which need to be set are specified in ```leaderboard\u002Fscripts\u002Frun_evaluation.sh```. \n\n### Data Pre-procession\nWe provide some Python scripts for pre-processing the collected data in `tools\u002Fdata_preprocessing`, some of them are optional. Please execute them **in the order**:\n1. `python get_list_file.py $DATASET_ROOT`: obtain the dataset_list.txt.\n2. `python batch_merge_data.py $DATASET_ROOT`: merge several scattered data files into one file to reduce IO time when training. **[Optional]**\n3. `python batch_rm_rgb_data.py $DATASET_ROOT`: delete redundant files after we have merged them into new files. **[Optional]**\n4. `python batch_stat_blocked_data.py $DATASET_ROOT`: find the frames that the ego-vehicle is blocked for a long time. By removing them, we can enhance data distribution and decrease the overall data size.\n5. `python batch_rm_blocked_data.py $DATASET_ROOT`: delete the blocked frames.\n6. `python batch_recollect_data.py $DATASET_ROOT`: since we have removed some frames, we need to reorganize them to ensure that the frame ids are continuous. \n7. `python batch_merge_measurements.py $DATASET_ROOT`: merge the measurement files from all frames in one route folder to reduce IO time\n\n### Data Parsing\nAfter collecting and pre-processing the data, we need to parse the navigation instructions and notice instructions data with some Python scripts in `tools\u002Fdata_parsing`.\n\nThe script for parsing navigation instructions:\n```bash\npython3 parse_instruction.py $DATSET_ROOT \n```\n\nThe parsed navigation clips will be saved in `$DATSET_ROOT\u002Fnavigation_instruction_list.txt`, under the root directory of the dataset.\n\n\nThe script for parsing notice instructions:\n```bash\npython3 parse_notice.py $DATSET_ROOT \n```\n\nThe parsed notice clips will be saved in  `$DATSET_ROOT\u002Fnotice_instruction_list.txt`.\n\n\nThe script for parsing misleading instructions:\n```bash\npython3 parse_misleading.py $DATSET_ROOT \n```\n\nThe parsed misleading clips will be saved in `$DATSET_ROOT\u002Fmisleading_data.txt`.\n\n\n## Training\n\nLMDrive's training consists of two stages: 1) the vision encoder pre-training stage, to generate visual tokens from sensor inputs; and 2) the instruction-finetuning stage, to align the instruction\u002Fvision and control signal. \n\nLMDrive is trained on 8 A100 GPUs with 80GB memory (the first stage can be trained on GPUS with 32G memory). To train on fewer GPUs, you can reduce the `batch-size` and the `learning-rate` while maintaining their proportion.\nPlease download the multi-modal dataset with instructions collected in the CARLA simulator we use in the paper [here](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FOpenDILabCommunity\u002FLMDrive) or [openxlab (uploading)](https:\u002F\u002Fopenxlab.org.cn\u002Fdatasets\u002Fdeepcs233\u002FLMDrive)], if you do not collect the dataset by yourself. You can only download part of them to verify our framework or your improvement.\n\n    \n### Vision encoder pre-training\n\nPretrain takes around 2~3 days for the visual encoder on 8x A100 (80G). Once the training is completed, you can locate the checkpoint of the vision encoder in the `output\u002F` directory.\n\n```bash\ncd vision_encoder\nbash scripts\u002Ftrain.sh\n```\n\nSome options to note:\n\n- `GPU_NUM`:  the number of GPUs you want to use. By default, it is set to 8.\n- `DATASET_ROOT`: the root directory for storing the dataset.\n- `--model`: the structure of visual model. You can choose memfuser_baseline_e1d3_r26 which replaces ResNet50 with ResNet26. It's also possible to create new model variants in `visual_encoder\u002Ftimm\u002Fmodels\u002Fmemfuser.py`\n- `--train-towns\u002Ftrain-weathers`: the data filter for the training dataset. Similarly, there are corresponding options, `val-towns\u002Fval-weathers` to filter the validation dataset accordingly.\n\n### Instruction finetuning \n\nInstruction finetuning takes around 2~3 days for the visual encoder on 8x A100 (80G). Once the training is completed, you can locate the checkpoint of the adapter and qformer in the `lavis\u002Foutput\u002F` directory.\n\n```bash\ncd LAVIS\nbash run.sh 8 lavis\u002Fprojects\u002Flmdrive\u002Fnotice_llava15_visual_encoder_r50_seq40.yaml # 8 is the GPU number\n```\n\nSome options in the config.yaml to note:\n\n- `preception_model`:  the model architecture of the vision encoder.\n- `preception_model_ckpt`: the checkpoint path of the vision encoder.\n- `llm_model`: the checkpoint path of the llm (Vicuna\u002FLLaVA).\n- `use_notice_prompt`: whether to use notice instruction data when training.\n- `split_section_num_for_visual_encoder`: the number of sections the frames are divided into during the forward encoding of visual features. Higher values can save more memory, and it needs to be a factor of `token_max_length`.\n- **datasets:**\n  - `storage`: the root directory for storing the dataset.\n  - `towns\u002Fweathers`: the data filter for training\u002Fevaluating.\n  - `token_max_length`: the maximum number of frames, if the number of frames exceeds this value, they will be truncated.\n  - `sample_interval`: the interval at which frames are sampled.\n\n## Evaluation\nStart a CARLA server (described above) and run the required agent. The adequate routes and scenarios files are provided in ```leaderboard\u002Fdata``` and the required variables need to be set in ```leaderboard\u002Fscripts\u002Frun_evaluation.sh```.\n\nSome options need to be updated in the `leaderboard\u002Fteam_code\u002Flmdrive_config.py`:\n- `preception_model`:  the model architecture of the vision encoder.\n- `preception_model_ckpt`: the checkpoint path of the vision encoder (obtained in the vision encoder pretraining stage).\n- `llm_model`: the checkpoint path of the llm (LLaMA\u002FVicuna\u002FLLaVA).\n- `lmdrive_ckpt`: the checkpoint path of the lmdrive (obtained in the instruction finetuing stage).\n\nUpdate ```leaderboard\u002Fscripts\u002Frun_evaluation.sh``` to include the following code for evaluating the model on Town05 Long Benchmark.\n```shell\nexport CARLA_ROOT=\u002Fpath\u002Fto\u002Fcarla\u002Froot\nexport TEAM_AGENT=leaderboard\u002Fteam_code\u002Flmdrive_agent.py\nexport TEAM_CONFIG=leaderboard\u002Fteam_code\u002Flmdrive_config.py\nexport CHECKPOINT_ENDPOINT=results\u002Flmdrive_result.json\nexport SCENARIOS=leaderboard\u002Fdata\u002Fofficial\u002Fall_towns_traffic_scenarios_public.json\nexport ROUTES=leaderboard\u002Fdata\u002FLangAuto\u002Flong.xml\n```\n\n```shell\nCUDA_VISIBLE_DEVICES=0 .\u002Fleaderboard\u002Fscripts\u002Frun_evaluation.sh\n```\n\nHere, the `long.json` and `long.xml` files are replaced with `short.json` and `short.xml` for the evaluation of the agent in the LangAuto-Short benchmark.\n\nFor LangAuto-Tiny benchmark evaluation, replace the `long.json` and `long.xml` files with `tiny.json` and `tiny.xml`:\n\n```shell\nexport SCENARIOS=leaderboard\u002Fdata\u002FLangAuto\u002Ftiny.json\nexport ROUTES=leaderboard\u002Fdata\u002FLangAuto\u002Ftiny.xml\n```\n\n### LangAuto-Notice\n\nSet the `agent_use_notice` as True in the lmdriver_config.py.\n\n\n## Citation\nIf you find our repo, dataset or paper useful, please cite us as\n```bibtex\n@misc{shao2023lmdrive,\n      title={LMDrive: Closed-Loop End-to-End Driving with Large Language Models}, \n      author={Hao Shao and Yuxuan Hu and Letian Wang and Steven L. Waslander and Yu Liu and Hongsheng Li},\n      year={2023},\n      eprint={2312.07488},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```\n\n## Acknowledgements\nThis implementation is based on code from several repositories.\n- [InterFuser](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FInterFuser)\n- [Transfuser](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Ftransfuser)\n- [2020_CARLA_challenge](https:\u002F\u002Fgithub.com\u002Fbradyz\u002F2020_CARLA_challenge)\n- [CARLA Leaderboard](https:\u002F\u002Fgithub.com\u002Fcarla-simulator\u002Fleaderboard)\n- [Scenario Runner](https:\u002F\u002Fgithub.com\u002Fcarla-simulator\u002Fscenario_runner)\n- [LAVIS](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002FLAVIS)\n- [pytorch-image-models](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models)\n\n\n## License\nAll code within this repository is under [Apache License 2.0](https:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0).\n","# LMDrive: 基于大型语言模型的闭环端到端自动驾驶\n*一个端到端、闭环、基于语言的自动驾驶框架，通过多模态多视角传感器数据和自然语言指令与动态环境交互。*\n\n[[项目页面](https:\u002F\u002Fhao-shao.com\u002Fprojects\u002Flmdrive.html)] [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.07488)]  [[数据集(Hugging Face)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FOpenDILabCommunity\u002FLMDrive)]  [[模型库](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FOpenDILabCommunity\u002Flmdrive-658aee50ce38d143c4925a98)]\n\n[[数据集(OpenXlab)](https:\u002F\u002Fopenxlab.org.cn\u002Fdatasets\u002Fdeepcs233\u002FLMDrive)]\n[[模型库(OpenXLab)](https:\u002F\u002Fopenxlab.org.cn\u002Fmodels\u002Fdetail\u002Fdeepcs233\u002FLMDrive)]\n\n\n[![访问量](https:\u002F\u002Fhits.seeyoufarm.com\u002Fapi\u002Fcount\u002Fincr\u002Fbadge.svg?url=https%3A%2F%2Fgithub.com%2Fopendilab%2FLMDrive&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=hits&edge_flat=false)](https:\u002F\u002Fhits.seeyoufarm.com)\n[![代码许可证](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCode%20License-Apache_2.0-green.svg)](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca\u002Fblob\u002Fmain\u002FLICENSE)\n[![数据许可证](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FData%20License-CC%20By%20NC%204.0-red.svg)](https:\u002F\u002Fgithub.com\u002Ftatsu-lab\u002Fstanford_alpaca\u002Fblob\u002Fmain\u002FDATA_LICENSE)\n\n## 新闻\n\n- `[02\u002F27]` [LMDrive](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.07488) 被 CVPR 2024 接收 🎉🎉🎉\n- `[01\u002F25]` 我们将模型上传至 [OpenXLab](https:\u002F\u002Fopenxlab.org.cn\u002Fmodels\u002Fdetail\u002Fdeepcs233\u002FLMDrive)\n- `[01\u002F23]` 我们在 [智东西](https:\u002F\u002Fwqpoq.xetslk.com\u002Fsl\u002F3D1aRZ) 进行了演讲\n- `[01\u002F20]` 我们将数据集上传至 [OpenXLab](https:\u002F\u002Fopenxlab.org.cn\u002Fdatasets\u002Fdeepcs233\u002FLMDrive)\n- `[12\u002F21]` 我们发布了项目官网 [这里](https:\u002F\u002Fhao-shao.com\u002Fprojects\u002Flmdrive.html)\n\n****\n\n\u003Cdiv align=\"center\">\n  \u003Cimg width=\"800\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_LMDrive_readme_5561e4ee08b4.png\">\u003C\u002Fimg>\n\u003C\u002Fdiv>\n\n> [邵浩](http:\u002F\u002Fhao-shao.com\u002F)、胡宇轩、[王乐天](https:\u002F\u002Fletianwang0.wixsite.com\u002Fmyhome)、[史蒂文·L·瓦斯兰德](https:\u002F\u002Fwww.trailab.utias.utoronto.ca\u002Fstevenwaslander)、[刘宇](https:\u002F\u002Fliuyu.us\u002F)、[李宏升](http:\u002F\u002Fwww.ee.cuhk.edu.hk\u002F~hsli\u002F)。\n\n本仓库包含论文 [LMDrive: 基于大型语言模型的闭环端到端自动驾驶](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.07488) 的代码。这项工作提出了一种新颖的、由语言引导的、端到端、闭环的自动驾驶框架。\n\n## 演示视频\n\n\n\u003Cdiv align=\"center\">\n  \u003Cvideo width=\"800\" src=\"https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLMDrive\u002Fassets\u002F17512647\u002F65b2785d-e8bc-4ec1-ac86-e077299a465d\">\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n## 目录\n1. [设置](#setup)\n2. [模型权重](#lmdrive-Weights)\n3. [数据集](#dataset)\n      1. [概述](#overview)\n      1. [数据生成](#data-generation)\n      2. [数据预处理](#data-pre-procession)\n      3. [数据解析](#data-parsing)\n4. [训练](#training)\n      1. [视觉编码器预训练](#vision-encoder-pre-training)\n      2. [指令微调](#instruction-finetuning)\n5. [评估](#evaluation)\n6. [引用](#citation)\n7. [致谢](#acknowledgements)\n\n## 设置\n\n我们的项目基于三个部分构建：(1) 视觉编码器（对应仓库：timm）；(2) 视觉大语言模型（对应仓库：LAVIS）；(3) 数据采集、智能体控制器（对应仓库：InterFuser、Leaderboard、ScenarioRunner）。\n\n安装 Anaconda\n```Shell\nwget https:\u002F\u002Frepo.anaconda.com\u002Farchive\u002FAnaconda3-2020.11-Linux-x86_64.sh\nbash Anaconda3-2020.11-Linux-x86_64.sh\nsource ~\u002F.bashrc\n```\n\n克隆仓库并搭建环境\n\n```Shell\ngit clone https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLMDrive.git\ncd LMDrive\nconda create -n lmdrive python=3.8\nconda activate lmdrive\ncd vision_encoder\npip3 install -r requirements.txt\npython setup.py develop # 如果你之前已经安装过 timm，请先卸载\ncd ..\u002FLAVIS\npip3 install -r requirements.txt\npython setup.py develop # 如果你之前已经安装过 LAVIS，请先卸载\n\npip install flash-attn --no-build-isolation # 可选\n```\n\n下载并设置 CARLA 0.9.10.1\n```Shell\nchmod +x setup_carla.sh\n.\u002Fsetup_carla.sh\npip install carla\n```\n\n> 如果你在使用 Carla 时遇到问题，请先参考 [Carla Issues](https:\u002F\u002Fgithub.com\u002Fcarla-simulator\u002Fcarla\u002Fissues) 和 [InterFuser Issues](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FInterFuser)。\n\n\n## LMDrive 权重\n如果你有兴趣在模型库中添加其他内容，请提交一个问题 :)\n\n\n| 版本 | 大小 | 检查点 | 视觉编码器 | LLM 基础 | DS (LangAuto) | DS (LangAuto-short) |\n|---------|------|------------|----------------|-----------|:---:|:---:|\n| LMDrive-1.0 (LLaVA-v1.5-7B) | 7B |  [LMDrive-llava-v1.5-7b-v1.0](https:\u002F\u002Fhuggingface.co\u002FOpenDILabCommunity\u002FLMDrive-llava-v1.5-7b-v1.0) | [R50](https:\u002F\u002Fhuggingface.co\u002FOpenDILabCommunity\u002FLMDrive-vision-encoder-r50-v1.0) | [LLaVA-v1.5-7B](https:\u002F\u002Fhuggingface.co\u002Fliuhaotian\u002Fllava-v1.5-7b) | 36.2 | 50.6|\n| LMDrive-1.0 (Vicuna-v1.5-7B) | 7B |  [LMDrive-vicuna-v1.5-7b-v1.0](https:\u002F\u002Fhuggingface.co\u002FOpenDILabCommunity\u002FLMDrive-vicuna-v1.5-7b-v1.0) | [R50](https:\u002F\u002Fhuggingface.co\u002FOpenDILabCommunity\u002FLMDrive-vision-encoder-r50-v1.0) | [Vicuna-v1.5-7B](https:\u002F\u002Fhuggingface.co\u002Flmsys\u002Fvicuna-7b-v1.5-16k) | 33.5 | 45.3 |\n| LMDrive-1.0 (LLaMA-7B) | 7B |  [LMDrive-llama-7b-v1.0](https:\u002F\u002Fhuggingface.co\u002FOpenDILabCommunity\u002FLMDrive-llama-7b-v1.0) | [R50](https:\u002F\u002Fhuggingface.co\u002FOpenDILabCommunity\u002FLMDrive-vision-encoder-r50-v1.0) | [LLaMA-7B](https:\u002F\u002Fhuggingface.co\u002Fhuggyllama\u002Fllama-7b) | 31.3 | 42.8 |\n\n*DS 表示驾驶评分*\n\n## 数据集\n\n我们旨在开发一种智能驾驶代理，该代理能够根据三种输入源生成驾驶动作：1) 传感器数据（多视角摄像头和 LiDAR），使代理能够生成既了解当前场景又符合场景要求的动作；2) 导航指令（例如变道、转弯），使代理能够按照自然语言指令（来自人类或导航软件）进行驾驶；以及 3) 人工提示指令，使代理能够与人类互动，并适应人类的建议和偏好（例如关注对抗性事件、处理长尾事件等）。\n\n我们提供了一个包含约 6.4 万个数据片段的数据集，每个片段包括一条导航指令、若干条提示指令、一系列多模态多视角传感器数据以及控制信号。每个片段的时长从 2 秒到 20 秒不等。我们论文中使用的数据集可在此下载 [这里](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FOpenDILabCommunity\u002FLMDrive)。如果你想创建自己的数据集，请按照我们下面列出的步骤操作。\n\n### 概述\n数据是使用 CARLA 0.9.10.1 中 `leaderboard\u002Fdata` 路径下的路线和场景文件，在 8 个 CARLA 城镇中，通过 `leaderboard\u002Fteam_code\u002Fauto_pilot.py` 生成的。数据集以高频率（约 10Hz）采集。\n\n在下载我们的数据集或自行采集数据后，需要按照以下方式系统地组织数据。`DATASET_ROOT` 是您存储数据集的根目录。\n\n```\n├── $DATASET_ROOT\n│   └── dataset_index.txt  # 用于视觉编码器预训练\n│   └── navigation_instruction_list.txt  # 用于指令微调\n│   └── notice_instruction_list.json  # 用于指令微调\n│   └── routes_town06_long_w7_11_28_18_28_35  # 数据文件夹\n│   └── routes_town01_short_w2_11_16_08_27_10\n│   └── routes_town02_short_w2_11_16_22_55_25\n│   └── routes_town01_short_w2_11_16_11_44_08 \n      ├── rgb_full\n      ├── lidar\n      └── ...\n```\n\n`navigation_instruction_list.txt` 和 `notice_instruction_list.json` 可以通过我们的数据解析脚本生成 [脚本](#data-parsing)。\n\n您收集的数据集中每个子文件夹应按如下结构组织：\n\n```\n- routes_town(town_id)_{tiny,short,long}_w(weather_id)_timestamp：对应不同的城镇和路线文件\n    - routes_X：包含单条路线的数据\n        - rgb_full：分辨率为 400x1200 的多视角大图，可拆分为四张图像（左、中、右、后）\n        - lidar：.npy 格式的 3D 点云数据。仅包含 1\u002F20 秒内采集的 LiDAR 点，覆盖 180 度水平视野。若需 360 度视野，则需与 lidar_odd 数据合并。\n        - lidar_odd：.npy 格式的 3D 点云数据。\n        - birdview：俯视分割图像，LAV 和 LBC 曾使用此类数据进行训练。\n        - topdown：类似于 birdview，但由向下拍摄的摄像头捕捉。\n        - 3d_bbs：不同目标的 3D 包围盒。\n        - affordances：各类可供性信息。\n        - actors_data：包含周围车辆及交通信号灯的位置、速度等元数据。\n        - measurements：包含自车位置、速度、未来航点等元数据。\n        - measurements_full：将 measurements 和 actors_data 合并。\n        - measurements_all.json：将 measurements_full 中的文件合并为一个单独的文件。\n```\n\n`$DATASET_ROOT` 目录下必须包含名为 `dataset_index.txt` 的文件，该文件可通过我们的数据预处理脚本生成 [脚本](#data-pre-procession)。文件应按以下格式列出训练和评估数据：\n\n```\n\u003Crelative_route_path_dir> \u003Cnum_data_frames_in_this_dir>\nroutes_town06_long_w7_11_28_18_28_35\u002F 1062\nroutes_town01_short_w2_11_16_08_27_10\u002F 1785\nroutes_town01_short_w2_11_16_09_55_05\u002F 918\nroutes_town02_short_w2_11_16_22_55_25\u002F 134\nroutes_town01_short_w2_11_16_11_44_08\u002F 569\n```\n\n其中 `\u003Crelative_route_path_dir>` 应为相对于 `$DATASET_ROOT` 的相对路径。训练代码会将 `$DATASET_ROOT` 和 `\u003Crelative_route_path_dir>` 拼接起来，形成加载数据的完整路径。例如，1062 表示 `routes_town06_long_w7_11_28_18_28_35\u002Frgb_full` 或 `routes_town06_long_w7_11_28_18_28_35\u002Flidar` 等目录中的帧数。\n\n### 数据生成\n#### 使用多个 CARLA 服务器生成数据\n除了数据集外，我们还提供了所有用于生成数据的脚本，并可根据不同 CARLA 版本的需求进行修改。数据集由基于规则的专家智能体在不同天气和城镇中采集。\n\n##### 启动 CARLA 服务器\n```bash\n# 启动 4 个 CARLA 服务器：IP [localhost]，端口 [2000, 2002, 2004, 2006]。您可以根据实际情况调整 CARLA 服务器的数量，更多的服务器可以采集更多数据。如果您使用 N 个服务器采集数据，则意味着您在每条路线上采集了 N 次数据，每次的天气和交通场景都是随机的。\n\ncd carla\nCUDA_VISIBLE_DEVICES=0 .\u002FCarlaUE4.sh --world-port=2000 -opengl &\nCUDA_VISIBLE_DEVICES=1 .\u002FCarlaUE4.sh --world-port=2002 -opengl &\nCUDA_VISIBLE_DEVICES=2 .\u002FCarlaUE4.sh --world-port=2004 -opengl &\nCUDA_VISIBLE_DEVICES=3 .\u002FCarlaUE4.sh --world-port=2006 -opengl &\n```\n\nDocker 设置说明请参见 [此处](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Finstall-guide.html#docker)。拉取 CARLA 0.9.10.1 的 Docker 镜像：`docker pull carlasim\u002Fcarla:0.9.10.1`。\n\nDocker 18：\n```\ndocker run -it --rm -p 2000-2002:2000-2002 --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 carlasim\u002Fcarla:0.9.10.1 .\u002FCarlaUE4.sh --world-port=2000 -opengl\n```\n\nDocker 19：\n```Shell\ndocker run -it --rm --net=host --gpus '\"device=0\"' carlasim\u002Fcarla:0.9.10.1 .\u002FCarlaUE4.sh --world-port=2000 -opengl\n```\n\n如果 Docker 容器无法正常启动，请添加环境变量 `-e SDL_AUDIODRIVER=dsp`。\n\n##### 运行自动驾驶程序\n批量采集数据的脚本生成。\n```bash\ncd dataset\npython init_dir.py\ncd ..\ncd data_collection\n\n# 您可以修改 auto_agent.yaml 中的 FPS、航点分布强度等参数...\n\n# 如果未使用 4 个服务器，则需要使用以下 Python 脚本来调整：\npython generate_bashs.py\npython generate_batch_collect.py \ncd ..\n```\n\n运行您需要采集的城镇和路线类型的批量执行脚本。\n```bash\nbash data_collection\u002Fbatch_run\u002Frun_route_routes_town01_long.sh\nbash data_collection\u002Fbatch_run\u002Frun_route_routes_town01_short.sh\n...\nbash data_collection\u002Fbatch_run\u002Frun_route_routes_town07_tiny.sh\n...\nbash data_collection\u002Fbatch_run\u002Frun_route_routes_town10_tiny.sh\n```\n\n**注意：** 我们的脚本在采集数据时会随机选择天气条件。\n\n##### 使用单个 CARLA 服务器生成数据\n使用单个 CARLA 服务器，运行自动驾驶程序开始数据生成。\n```Shell\ncarla\u002FCarlaUE4.sh --world-port=2000 -opengl\n.\u002Fleaderboard\u002Fscripts\u002Frun_evaluation.sh\n```\n\n用于数据生成的专家智能体定义在 `leaderboard\u002Fteam_code\u002Fauto_pilot.py` 中。需要设置的不同变量在 `leaderboard\u002Fscripts\u002Frun_evaluation.sh` 中指定。\n\n### 数据预处理\n我们在 `tools\u002Fdata_preprocessing` 中提供了一些用于预处理采集数据的 Python 脚本，其中部分为可选步骤。请按照以下顺序执行：\n1. `python get_list_file.py $DATASET_ROOT`：生成 dataset_list.txt。\n2. `python batch_merge_data.py $DATASET_ROOT`：将分散的多个数据文件合并为一个文件，以减少训练时的 IO 时间。**[可选]**\n3. `python batch_rm_rgb_data.py $DATASET_ROOT`：在将数据合并到新文件后，删除冗余文件。**[可选]**\n4. `python batch_stat_blocked_data.py $DATASET_ROOT`：查找自车长时间被阻挡的帧。移除这些帧可以改善数据分布并减小整体数据量。\n5. `python batch_rm_blocked_data.py $DATASET_ROOT`：删除被阻挡的帧。\n6. `python batch_recollect_data.py $DATASET_ROOT`：由于我们已移除部分帧，需要重新整理数据，以确保帧 ID 连续。\n7. `python batch_merge_measurements.py $DATASET_ROOT`：将单个路线文件夹中所有帧的测量文件合并，以减少 IO 时间。\n\n### 数据解析\n在完成数据采集和预处理后，我们需要使用 `tools\u002Fdata_parsing` 中的一些 Python 脚本来解析导航指令和提示指令数据。\n\n解析导航指令的脚本：\n```bash\npython3 parse_instruction.py $DATSET_ROOT\n```\n\n解析后的导航片段将保存在数据集根目录下的 `$DATSET_ROOT\u002Fnavigation_instruction_list.txt` 中。\n\n解析提示指令的脚本：\n```bash\npython3 parse_notice.py $DATSET_ROOT\n```\n\n解析后的提示片段将保存在 `$DATSET_ROOT\u002Fnotice_instruction_list.txt` 中。\n\n解析误导性指令的脚本：\n```bash\npython3 parse_misleading.py $DATSET_ROOT\n```\n\n解析后的误导性片段将保存在 `$DATSET_ROOT\u002Fmisleading_data.txt` 中。\n\n## 训练\n\nLMDrive 的训练分为两个阶段：1) 视觉编码器预训练阶段，用于从传感器输入中生成视觉 token；2) 指令微调阶段，用于对齐指令\u002F视觉信息与控制信号。\n\nLMDrive 使用 8 张 80GB 显存的 A100 GPU 进行训练（第一阶段也可在 32GB 显存的 GPU 上进行）。若使用较少的 GPU，可在保持比例不变的情况下降低 `batch-size` 和 `learning-rate`。\n\n如果您未自行采集数据，请从论文中使用的 CARLA 模拟器中收集的多模态指令数据集下载数据，链接如下：[Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FOpenDILabCommunity\u002FLMDrive) 或 [OpenXLab](https:\u002F\u002Fopenxlab.org.cn\u002Fdatasets\u002Fdeepcs233\u002FLMDrive)。您也可以仅下载部分数据来验证我们的框架或您的改进。\n\n### 视觉编码器预训练\n\n在 8 张 A100（80G）上，视觉编码器的预训练大约需要 2~3 天。训练完成后，您可以在 `output\u002F` 目录下找到视觉编码器的检查点。\n\n```bash\ncd vision_encoder\nbash scripts\u002Ftrain.sh\n```\n\n需要注意的几个选项：\n- `GPU_NUM`：您希望使用的 GPU 数量，默认为 8。\n- `DATASET_ROOT`：存储数据集的根目录。\n- `--model`：视觉模型的结构。您可以选择 memfuser_baseline_e1d3_r26，它用 ResNet26 替代了 ResNet50。此外，您还可以在 `visual_encoder\u002Ftimm\u002Fmodels\u002Fmemfuser.py` 中创建新的模型变体。\n- `--train-towns\u002Ftrain-weathers`：用于筛选训练数据集的过滤条件。同样地，也有对应的 `val-towns\u002Fval-weathers` 选项来筛选验证数据集。\n\n### 指令微调\n\n在 8 张 A100（80G）上，指令微调大约需要 2~3 天。训练完成后，您可以在 `lavis\u002Foutput\u002F` 目录下找到适配器和 qformer 的检查点。\n\n```bash\ncd LAVIS\nbash run.sh 8 lavis\u002Fprojects\u002Flmdrive\u002Fnotice_llava15_visual_encoder_r50_seq40.yaml # 8 表示 GPU 数量\n```\n\nconfig.yaml 中需要注意的几个选项：\n- `preception_model`：视觉编码器的模型架构。\n- `preception_model_ckpt`：视觉编码器的检查点路径。\n- `llm_model`：LLM（Vicuna\u002FLLaVA）的检查点路径。\n- `use_notice_prompt`：训练时是否使用提示指令数据。\n- `split_section_num_for_visual_encoder`：视觉特征前向编码过程中，帧被分割成的段数。数值越高越节省显存，且必须是 `token_max_length` 的因数。\n- **数据集：**\n  - `storage`：存储数据集的根目录。\n  - `towns\u002Fweathers`：用于训练和评估的数据筛选条件。\n  - `token_max_length`：最大帧数，超过此值的帧会被截断。\n  - `sample_interval`：采样间隔。\n\n## 评估\n启动 CARLA 服务器（如上所述）并运行所需的智能体。充分的路线和场景文件位于 `leaderboard\u002Fdata` 中，所需变量需在 `leaderboard\u002Fscripts\u002Frun_evaluation.sh` 中设置。\n\n需要在 `leaderboard\u002Fteam_code\u002Flmdrive_config.py` 中更新以下选项：\n- `preception_model`：视觉编码器的模型架构。\n- `preception_model_ckpt`：视觉编码器的检查点路径（在视觉编码器预训练阶段获得）。\n- `llm_model`：LLM（LLaMA\u002FVicuna\u002FLLaVA）的检查点路径。\n- `lmdrive_ckpt`：LMDrive 的检查点路径（在指令微调阶段获得）。\n\n更新 `leaderboard\u002Fscripts\u002Frun_evaluation.sh`，加入以下代码以评估模型在 Town05 Long Benchmark 上的表现：\n```shell\nexport CARLA_ROOT=\u002Fpath\u002Fto\u002Fcarla\u002Froot\nexport TEAM_AGENT=leaderboard\u002Fteam_code\u002Flmdrive_agent.py\nexport TEAM_CONFIG=leaderboard\u002Fteam_code\u002Flmdrive_config.py\nexport CHECKPOINT_ENDPOINT=results\u002Flmdrive_result.json\nexport SCENARIOS=leaderboard\u002Fdata\u002Fofficial\u002Fall_towns_traffic_scenarios_public.json\nexport ROUTES=leaderboard\u002Fdata\u002FLangAuto\u002Flong.xml\n```\n\n```shell\nCUDA_VISIBLE_DEVICES=0 .\u002Fleaderboard\u002Fscripts\u002Frun_evaluation.sh\n```\n\n在此处，将 `long.json` 和 `long.xml` 替换为 `short.json` 和 `short.xml`，即可评估智能体在 LangAuto-Short 基准上的表现。\n\n对于 LangAuto-Tiny 基准的评估，将 `long.json` 和 `long.xml` 替换为 `tiny.json` 和 `tiny.xml`：\n\n```shell\nexport SCENARIOS=leaderboard\u002Fdata\u002FLangAuto\u002Ftiny.json\nexport ROUTES=leaderboard\u002Fdata\u002FLangAuto\u002Ftiny.xml\n```\n\n### LangAuto-Notice\n在 lmdriver_config.py 中将 `agent_use_notice` 设置为 True。\n\n## 引用\n如果您认为我们的仓库、数据集或论文有用，请按以下格式引用：\n```bibtex\n@misc{shao2023lmdrive,\n      title={LMDrive: Closed-Loop End-to-End Driving with Large Language Models}, \n      author={Hao Shao and Yuxuan Hu and Letian Wang and Steven L. Waslander and Yu Liu and Hongsheng Li},\n      year={2023},\n      eprint={2312.07488},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```\n\n## 致谢\n本实现基于多个仓库中的代码。\n- [InterFuser](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FInterFuser)\n- [Transfuser](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Ftransfuser)\n- [2020_CARLA_challenge](https:\u002F\u002Fgithub.com\u002Fbradyz\u002F2020_CARLA_challenge)\n- [CARLA排行榜](https:\u002F\u002Fgithub.com\u002Fcarla-simulator\u002Fleaderboard)\n- [场景运行器](https:\u002F\u002Fgithub.com\u002Fcarla-simulator\u002Fscenario_runner)\n- [LAVIS](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002FLAVIS)\n- [pytorch-image-models](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models)\n\n\n## 许可证\n本仓库中的所有代码均采用 [Apache License 2.0](https:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0) 许可证。","# LMDrive 快速上手指南\n\nLMDrive 是一个基于大语言模型（LLM）的端到端、闭环自动驾驶框架。它能够通过多模态多视角传感器数据（摄像头、激光雷达）和自然语言指令与动态环境进行交互。\n\n## 1. 环境准备\n\n### 系统要求\n- **操作系统**: Linux (推荐 Ubuntu)\n- **Python 版本**: 3.8\n- **GPU**: 支持 CUDA 的 NVIDIA 显卡（建议显存 >= 16GB 以运行 7B 模型）\n- **模拟器**: CARLA 0.9.10.1\n\n### 前置依赖\n- Anaconda\n- Git\n- CUDA Toolkit (与你的显卡驱动匹配)\n\n## 2. 安装步骤\n\n### 2.1 安装 Anaconda (如未安装)\n```Shell\nwget https:\u002F\u002Frepo.anaconda.com\u002Farchive\u002FAnaconda3-2020.11-Linux-x86_64.sh\nbash Anaconda3-2020.11-Linux-x86_64.sh\nsource ~\u002F.bashrc\n```\n\n### 2.2 克隆项目并构建环境\n```Shell\ngit clone https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLMDrive.git\ncd LMDrive\n\n# 创建并激活虚拟环境\nconda create -n lmdrive python=3.8\nconda activate lmdrive\n\n# 安装视觉编码器依赖 (timm)\ncd vision_encoder\npip3 install -r requirements.txt\npython setup.py develop \n# 注意：如果之前安装过 timm，请先卸载\n\n# 安装视觉大模型依赖 (LAVIS)\ncd ..\u002FLAVIS\npip3 install -r requirements.txt\npython setup.py develop \n# 注意：如果之前安装过 LAVIS，请先卸载\n\n# 返回根目录并安装可选加速组件\ncd ..\npip install flash-attn --no-build-isolation\n```\n\n### 2.3 设置 CARLA 模拟器\n项目依赖 CARLA 0.9.10.1 版本。\n```Shell\nchmod +x setup_carla.sh\n.\u002Fsetup_carla.sh\npip install carla\n```\n> **提示**: 如果遇到 CARLA 相关问题，请参考 [Carla Issues](https:\u002F\u002Fgithub.com\u002Fcarla-simulator\u002Fcarla\u002Fissues) 或 [InterFuser Issues](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FInterFuser)。\n\n## 3. 基本使用\n\n### 3.1 下载预训练模型\n你可以从 Hugging Face 或 OpenXLab（国内加速推荐）下载模型权重。以下是基于 LLaVA-v1.5-7B 的模型示例：\n\n**Hugging Face:**\n- [LMDrive-llava-v1.5-7b-v1.0](https:\u002F\u002Fhuggingface.co\u002FOpenDILabCommunity\u002FLMDrive-llava-v1.5-7b-v1.0)\n- [Vision Encoder (R50)](https:\u002F\u002Fhuggingface.co\u002FOpenDILabCommunity\u002FLMDrive-vision-encoder-r50-v1.0)\n\n**OpenXLab (国内推荐):**\n- [Model Zoo](https:\u002F\u002Fopenxlab.org.cn\u002Fmodels\u002Fdetail\u002Fdeepcs233\u002FLMDrive)\n\n将下载的权重放置在项目指定的 `weights` 或 `checkpoints` 目录下（具体路径参考配置文件）。\n\n### 3.2 数据集准备\n如果你直接使用官方数据集：\n1. 从 [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FOpenDILabCommunity\u002FLMDrive) 或 [OpenXLab](https:\u002F\u002Fopenxlab.org.cn\u002Fdatasets\u002Fdeepcs233\u002FLMDrive) 下载数据。\n2. 按照以下结构组织数据（`$DATASET_ROOT` 为你的数据根目录）：\n   ```text\n   ├── $DATASET_ROOT\n   │   ├── dataset_index.txt\n   │   ├── navigation_instruction_list.txt\n   │   ├── notice_instruction_list.json\n   │   └── routes_townXX_...  # 数据文件夹\n   │       ├── rgb_full\n   │       ├── lidar\n   │       └── ...\n   ```\n3. 确保生成 `dataset_index.txt` 文件，格式如下：\n   ```text\n   routes_town06_long_w7_11_28_18_28_35\u002F 1062\n   routes_town01_short_w2_11_16_08_27_10\u002F 1785\n   ```\n\n### 3.3 运行推理\u002F评估\n在配置好模型权重和数据集路径后，使用提供的评估脚本进行测试。假设你已配置好环境变量和数据路径：\n\n```Shell\n# 示例：运行评估脚本 (具体脚本名称请参考 repo 中的 evaluation 部分)\npython eval.py \\\n    --model_path .\u002Fcheckpoints\u002FLMDrive-llava-v1.5-7b-v1.0 \\\n    --data_root $DATASET_ROOT \\\n    --config configs\u002Feval_config.yaml\n```\n\n*注：具体的推理命令参数需根据 `configs` 目录下的配置文件进行调整，主要指定模型路径、数据根目录及测试路线。*\n\n### 3.4 训练 (可选)\n如果需要从头训练或微调，流程分为两步：\n1. **视觉编码器预训练**:\n   ```Shell\n   # 进入 vision_encoder 目录执行相关训练脚本\n   ```\n2. **指令微调 (Instruction Finetuning)**:\n   ```Shell\n   # 使用准备好的导航指令和注意指令数据进行 LLM 微调\n   ```\n详细训练命令请参考仓库中的 `Training` 章节及对应脚本。","某自动驾驶研发团队正在复杂城市路口测试车辆对突发指令的响应能力，要求车辆能理解“在前方施工区域减速并绕行”这类自然语言命令。\n\n### 没有 LMDrive 时\n- **指令解析僵化**：传统系统依赖预定义代码规则，无法理解“施工区域”、“绕行”等非结构化自然语言，必须人工编写特定脚本。\n- **感知决策割裂**：视觉传感器识别到的障碍物数据与决策模块分离，导致车辆看到路障后无法立即结合语音指令调整路径。\n- **闭环响应滞后**：从接收指令到执行动作需经过多个独立模块转换，延迟高，难以应对动态变化的交通流。\n- **场景泛化性差**：遇到训练数据未覆盖的罕见路况（如临时改道），系统往往直接报错或停止运行。\n\n### 使用 LMDrive 后\n- **自然语言直连控制**：LMDrive 利用大语言模型直接理解“施工区域绕行”指令，无需中间代码翻译，瞬间转化为驾驶策略。\n- **多模态端到端融合**：系统将多视角摄像头数据与语言指令在模型内部深度融合，边看路边理解意图，实时规划避让轨迹。\n- **低延迟闭环执行**：基于端到端架构，感知到执行的链路极短，车辆能流畅地完成减速、变道等连续动作。\n- **强泛化适应能力**：凭借大模型的推理能力，即使面对从未见过的临时路障组合，也能依据常识做出合理驾驶判断。\n\nLMDrive 通过语言大模型重构了自动驾驶的感知决策闭环，让车辆真正具备了像人类司机一样“听懂话、看懂路、灵活开”的核心能力。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_LMDrive_a40bb6b4.png","opendilab","OpenDILab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fopendilab_83f31d72.png","Open-source Decision Intelligence (DI) Platform",null,"opendilab@pjlab.org.cn","https:\u002F\u002Fgithub.com\u002Fopendilab",[81,85,89,93,97,100,104,107,110,113],{"name":82,"color":83,"percentage":84},"Jupyter Notebook","#DA5B0B",74.7,{"name":86,"color":87,"percentage":88},"Python","#3572A5",24.6,{"name":90,"color":91,"percentage":92},"XSLT","#EB8CEB",0.4,{"name":94,"color":95,"percentage":96},"Shell","#89e051",0.1,{"name":98,"color":99,"percentage":96},"HTML","#e34c26",{"name":101,"color":102,"percentage":103},"Dockerfile","#384d54",0,{"name":105,"color":106,"percentage":103},"CSS","#663399",{"name":108,"color":109,"percentage":103},"JavaScript","#f1e05a",{"name":111,"color":112,"percentage":103},"Ruby","#701516",{"name":114,"color":115,"percentage":103},"Batchfile","#C1F12E",886,76,"2026-04-13T09:59:22","Apache-2.0",5,"Linux","必需 NVIDIA GPU。README 中启动 CARLA 服务器的命令使用了 CUDA_VISIBLE_DEVICES，且建议安装 flash-attn，隐含需要支持 CUDA 的 NVIDIA 显卡。具体显存大小未说明（取决于所选 LLM 基座模型，如 7B 模型通常建议 16GB+）。","未说明",{"notes":125,"python":126,"dependencies":127},"1. 必须安装特定版本的仿真器 CARLA 0.9.10.1。\n2. 项目依赖三个主要部分：视觉编码器 (timm)、视觉大语言模型 (LAVIS) 和数据采集\u002F控制器 (基于 InterFuser\u002FLeaderboard)。\n3. 安装过程中若之前安装过 timm 或 LAVIS，需先卸载再重新以 develop 模式安装本项目修改版。\n4. 数据集生成需要运行多个 CARLA 服务器实例，对 GPU 数量或多卡并行能力有较高要求。","3.8",[128,129,130,131,132,133,134],"timm","LAVIS","flash-attn (可选)","carla==0.9.10.1","InterFuser","Leaderboard","ScenarioRunner",[35,15,13,136],"其他","2026-03-27T02:49:30.150509","2026-04-18T16:27:29.457992",[140,145,150,155,160,165,170,175],{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},40242,"引入大语言模型（LLM）相比传统 Transformer 模型在端到端自动驾驶中的性能优势体现在哪里？","这是将 LLM 应用于端到端自动驾驶的初步尝试。虽然目前的闭环测试结果可能尚未完全超越传统模型（如 InterFuser），但 LLM 的核心优势在于能够理解和利用自然语言指令（notice instructions），从而显著提升长尾场景下的整体安全性能。未来计划支持更多类型的指令交互，例如“我迟到了，请忽略舒适度尽快到达”或“我不舒服，请慢速谨慎驾驶”，这是传统离散命令难以实现的灵活人机交互能力。","https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLMDrive\u002Fissues\u002F1",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},40243,"由于政策限制无法从 Hugging Face 下载数据集，是否有其他下载来源？","维护者已将完整数据集上传至 OpenXLab 平台。如果部分 tar 文件下载不完整或损坏，通常不会影响最终的训练结果。建议前往 OpenXLab 获取数据。","https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLMDrive\u002Fissues\u002F6",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},40244,"为什么项目选择使用 BLIP-2 框架中的 Qformer 进行视觉与语言模态对齐，而不是 LLaVA 框架中的线性层？","该问题涉及模型架构设计的具体考量。虽然官方回复中未详细展开理论对比，但该项目采用了 BLIP-2 框架的 Qformer 机制来处理多模态大型语言模型的视觉 - 语言对齐任务。这通常是為了更有效地压缩视觉特征并促进跨模态交互，具体实现可参考项目中集成的 BLIP-2 相关代码结构。","https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLMDrive\u002Fissues\u002F11",{"id":156,"question_zh":157,"answer_zh":158,"source_url":159},40245,"训练过程中出现数值为 NaN 的情况，应该如何设置精度（fp16\u002Ffp32）来解决？","建议对不同的网络部分采用混合精度设置：LLM backbone（大语言模型主干）和 Vision encoder（视觉编码器）应使用 fp16 精度，而网络其余部分应使用 fp32 精度。如果整个网络强制使用 fp16 训练，可能会导致数值不稳定从而产生 NaN。","https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLMDrive\u002Fissues\u002F5",{"id":161,"question_zh":162,"answer_zh":163,"source_url":164},40246,"如何修改 CARLA 版本以解决地图不匹配（The CARLA server uses the wrong map）的错误？","必须严格保持 CARLA 模拟器版本与地图包版本一致。如果遇到 Town10HD 地图加载错误，请确保使用的是 CARLA 0.9.10.1 版本，并搭配该版本对应的地图包。不要随意升级到 CARLA 0.9.13 等更高版本，除非同时更换了完全兼容的地图资源，否则会报地图不匹配异常。","https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLMDrive\u002Fissues\u002F122",{"id":166,"question_zh":167,"answer_zh":168,"source_url":169},40247,"如何移除仿真中突然出现的行人或其他背景车辆？","可以通过修改场景配置文件来实现：\n1. 移除特定类型的场景：检查 `town05_all_scenarios.json` 文件，根据 `scenario_type` 过滤掉不需要的场景类型。场景类型 ID 与类的映射关系可在 `scenario_runner\u002Fsrunner\u002Fscenarios\u002Froute_scenario.py` 第 53 行附近找到。\n2. 修改背景活动：对于行人和车辆的动态生成逻辑，可以参考 `leaderboard\u002Fleaderboard\u002Fscenarios\u002Fbackground_activity.py` 第 20 行附近的类进行修改。\n此外，查阅 CARLA Leaderboard 和 Scenario Runner 的官方文档也能获得帮助。","https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLMDrive\u002Fissues\u002F81",{"id":171,"question_zh":172,"answer_zh":173,"source_url":174},40248,"论文中提到的“误导性指令（misleading instructions）”训练设置在代码中如何实现？","目前对应的专用代码和数据集尚未完全开源。不过，现有的代码和数据集训练的 Agent 已具备有限的拒绝误导性指令的能力。其原理是让 Agent 识别出当前的“误导性”指令应该在极短时间内结束（通过识别特定的完成标志），从而忽略该指令的后续影响。","https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLMDrive\u002Fissues\u002F16",{"id":176,"question_zh":177,"answer_zh":178,"source_url":179},40249,"评估运行时 Pygame 窗口崩溃或程序卡死怎么办？","该问题通常发生在初始化 Pygame 显示界面时（`pygame.display.set_mode`）。如果单独运行 Pygame 初始化代码正常，但在项目评估脚本中崩溃，可能是由于环境变量、显示服务器配置或与 CARLA 渲染窗口的冲突导致。建议检查运行环境的图形驱动设置，或尝试在无头模式（headless）下运行评估脚本以避免弹出窗口冲突。","https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLMDrive\u002Fissues\u002F13",[]]