[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-autonomousvision--transfuser":3,"tool-autonomousvision--transfuser":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":75,"owner_avatar_url":76,"owner_bio":77,"owner_company":77,"owner_location":77,"owner_email":77,"owner_twitter":77,"owner_website":77,"owner_url":78,"languages":79,"stars":113,"forks":114,"last_commit_at":115,"license":116,"difficulty_score":117,"env_os":118,"env_gpu":119,"env_ram":118,"env_deps":120,"category_tags":129,"github_topics":130,"view_count":10,"oss_zip_url":77,"oss_zip_packed_at":77,"status":16,"created_at":135,"updated_at":136,"faqs":137,"releases":169},778,"autonomousvision\u002Ftransfuser","transfuser","[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving; [CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving","TransFuser 是一个面向自动驾驶领域的开源项目，致力于实现基于 Transformer 的多传感器融合模仿学习。它旨在解决传统自动驾驶系统中多模态数据（如摄像头图像、深度图等）难以高效协同的难题，通过端到端的架构直接将传感器输入转化为驾驶控制指令。\n\n其核心技术亮点在于利用 Transformer 强大的特征处理能力，对不同传感器的信息进行深度融合，从而显著提升模型在复杂动态环境下的感知精度与决策稳定性。作为 CVPR 2021 和 PAMI 2023 论文的官方实现，TransFuser 提供了完整的代码框架及基于 CARLA 仿真器生成的训练数据集。\n\n该项目非常适合自动驾驶领域的研究人员、算法工程师以及高校团队使用。无论是希望复现前沿论文成果，还是在 CARLA 平台上探索新的端到端驾驶策略，TransFuser 都能提供坚实的基础设施支持，帮助大家更高效地推进无人驾驶技术的研发进程。","# TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving\n\n## [Paper](http:\u002F\u002Fwww.cvlibs.net\u002Fpublications\u002FChitta2022PAMI.pdf) | [Supplementary](http:\u002F\u002Fwww.cvlibs.net\u002Fpublications\u002FChitta2022PAMI_supplementary.pdf) | [Talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=-GMhYcxOiEU) | [Poster](http:\u002F\u002Fwww.cvlibs.net\u002Fpublications\u002FChitta2022PAMI_poster.pdf) | [Slides](https:\u002F\u002Fkashyap7x.github.io\u002Fassets\u002Fpdf\u002Ftalks\u002FChitta2022AIR.pdf)\n\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Ftransfuser-imitation-with-transformer-based\u002Fautonomous-driving-on-carla-leaderboard)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fautonomous-driving-on-carla-leaderboard?p=transfuser-imitation-with-transformer-based)\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fautonomousvision_transfuser_readme_5cf38012046e.gif\">\n\nThis repository contains the code for the PAMI 2023 paper [TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.15997). This work is a journal extension of the CVPR 2021 paper [Multi-Modal Fusion Transformer for End-to-End Autonomous Driving](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.09224). The code for the CVPR 2021 paper is available in the [cvpr2021](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Ftransfuser\u002Ftree\u002Fcvpr2021) branch.\n\nIf you find our code or papers useful, please cite:\n\n```bibtex\n@article{Chitta2023PAMI,\n  author = {Chitta, Kashyap and\n            Prakash, Aditya and\n            Jaeger, Bernhard and\n            Yu, Zehao and\n            Renz, Katrin and\n            Geiger, Andreas},\n  title = {TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving},\n  journal = {Pattern Analysis and Machine Intelligence (PAMI)},\n  year = {2023},\n}\n```\n\n```bibtex\n@inproceedings{Prakash2021CVPR,\n  author = {Prakash, Aditya and\n            Chitta, Kashyap and\n            Geiger, Andreas},\n  title = {Multi-Modal Fusion Transformer for End-to-End Autonomous Driving},\n  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},\n  year = {2021}\n}\n```\n\nAlso, check out the code for other recent work on CARLA from our group:\n- [Jaeger et al., Hidden Biases of End-to-End Driving Models (ICCV 2023)](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Fcarla_garage)\n- [Renz et al., PlanT: Explainable Planning Transformers via Object-Level Representations (CoRL 2022)](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Fplant)\n- [Hanselmann et al., KING: Generating Safety-Critical Driving Scenarios for Robust Imitation via Kinematics Gradients (ECCV 2022)](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Fking)\n- [Chitta et al., NEAT: Neural Attention Fields for End-to-End Autonomous Driving (ICCV 2021)](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Fneat)\n\n## Contents\n\n1. [Setup](#setup)\n2. [Dataset and Training](#dataset-and-training)\n3. [Evaluation](#evaluation)\n\n\n## Setup\n\nClone the repo, setup CARLA 0.9.10.1, and build the conda environment:\n\n```Shell\ngit clone https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Ftransfuser.git\ncd transfuser\ngit checkout 2022\nchmod +x setup_carla.sh\n.\u002Fsetup_carla.sh\nconda env create -f environment.yml\nconda activate tfuse\npip install --only-binary=torch-scatter torch-scatter -f https:\u002F\u002Fdata.pyg.org\u002Fwhl\u002Ftorch-1.12.0+cu113.html\npip install --only-binary=mmcv-full mmcv-full==1.6.0 -f https:\u002F\u002Fdownload.openmmlab.com\u002Fmmcv\u002Fdist\u002Fcu113\u002Ftorch1.12.0\u002Findex.html\npip install mmsegmentation==0.25.0\npip install mmdet==2.25.0\n```\n\n## Dataset and Training\nOur dataset is generated via a privileged agent which we call the autopilot (`\u002Fteam_code_autopilot\u002Fautopilot.py`) in 8 CARLA towns using the routes and scenario files provided in [this folder](.\u002Fleaderboard\u002Fdata\u002Ftraining\u002F). See the [tools\u002Fdataset](.\u002Ftools\u002Fdataset) folder for detailed documentation regarding the training routes and scenarios. You can download the dataset (210GB) by running:\n\n```Shell\nchmod +x download_data.sh\n.\u002Fdownload_data.sh\n```\n\nThe dataset is structured as follows:\n```\n- Scenario\n    - Town\n        - Route\n            - rgb: camera images\n            - depth: corresponding depth images\n            - semantics: corresponding segmentation images\n            - lidar: 3d point cloud in .npy format\n            - topdown: topdown segmentation maps\n            - label_raw: 3d bounding boxes for vehicles\n            - measurements: contains ego-agent's position, velocity and other metadata\n```\n\n### Data generation\nIn addition to the dataset itself, we have provided the scripts for data generation with our autopilot agent. To generate data, the first step is to launch a CARLA server:\n\n```Shell\n.\u002FCarlaUE4.sh --world-port=2000 -opengl\n```\n\nFor more information on running CARLA servers (e.g. on a machine without a display), see the [official documentation.](https:\u002F\u002Fcarla.readthedocs.io\u002Fen\u002Fstable\u002Fcarla_headless\u002F) Once the server is running, use the script below for generating training data:\n```Shell\n.\u002Fleaderboard\u002Fscripts\u002Fdatagen.sh \u003Ccarla root> \u003Cworking directory of this repo (*\u002Ftransfuser\u002F)>\n```\n\nThe main variables to set for this script are `SCENARIOS` and `ROUTES`. \n\n### Training script\n\nThe code for training via imitation learning is provided in [train.py.](.\u002Fteam_code_transfuser\u002Ftrain.py) \\\nA minimal example of running the training script on a single machine:\n```Shell\ncd team_code_transfuser\npython train.py --batch_size 10 --logdir \u002Fpath\u002Fto\u002Flogdir --root_dir \u002Fpath\u002Fto\u002Fdataset_root\u002F --parallel_training 0\n```\nThe training script has many more useful features documented at the start of the main function. \nOne of them is parallel training. \nThe script has to be started differently when training on a multi-gpu node:\n```Shell\ncd team_code_transfuser\nCUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=16 OPENBLAS_NUM_THREADS=1 torchrun --nnodes=1 --nproc_per_node=2 --max_restarts=0 --rdzv_id=1234576890 --rdzv_backend=c10d train.py --logdir \u002Fpath\u002Fto\u002Flogdir --root_dir \u002Fpath\u002Fto\u002Fdataset_root\u002F --parallel_training 1\n```\nEnumerate the GPUs you want to train on with CUDA_VISIBLE_DEVICES.\nSet the variable OMP_NUM_THREADS to the number of cpus available on your system.\nSet OPENBLAS_NUM_THREADS=1 if you want to avoid threads spawning other threads.\nSet --nproc_per_node to the number of available GPUs on your node.\n\nThe evaluation agent file is build to evaluate models trained with multiple GPUs. \nIf you want to evaluate a model trained with a single GPU you need to remove [this line](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Ftransfuser\u002Fblob\u002Fa7d4db684c160095dec03851aff5ce92e36b2387\u002Fteam_code_transfuser\u002Fsubmission_agent.py#LL95C18-L95C18).\n\n\n## Evaluation\n\n### Longest6 benchmark\nWe make some minor modifications to the CARLA leaderboard code for the Longest6 benchmark, which are documented [here](.\u002Fleaderboard). See the [leaderboard\u002Fdata\u002Flongest6](.\u002Fleaderboard\u002Fdata\u002Flongest6\u002F) folder for a description of Longest6 and how to evaluate on it.\n\n### Pretrained agents\nPre-trained agent files for all 4 methods can be downloaded from [AWS](https:\u002F\u002Fs3.eu-central-1.amazonaws.com\u002Favg-projects\u002Ftransfuser\u002Fmodels_2022.zip):\n\n```Shell\nmkdir model_ckpt\nwget https:\u002F\u002Fs3.eu-central-1.amazonaws.com\u002Favg-projects\u002Ftransfuser\u002Fmodels_2022.zip -P model_ckpt\nunzip model_ckpt\u002Fmodels_2022.zip -d model_ckpt\u002F\nrm model_ckpt\u002Fmodels_2022.zip\n```\n\n### Running an agent\nTo evaluate a model, we first launch a CARLA server:\n\n```Shell\n.\u002FCarlaUE4.sh --world-port=2000 -opengl\n```\n\nOnce the CARLA server is running, evaluate an agent with the script:\n```Shell\n.\u002Fleaderboard\u002Fscripts\u002Flocal_evaluation.sh \u003Ccarla root> \u003Cworking directory of this repo (*\u002Ftransfuser\u002F)>\n```\n\nBy editing the arguments in `local_evaluation.sh`, we can benchmark performance on the Longest6 routes. You can evaluate both privileged agents (such as [autopilot.py]) and sensor-based models. To evaluate the sensor-based models use [submission_agent.py](.\u002Fteam_code_transfuser\u002Fsubmission_agent.py) as the `TEAM_AGENT` and point to the folder you downloaded the model weights into for the `TEAM_CONFIG`. The code is automatically configured to use the correct method based on the args.txt file in the model folder.\n\nYou can look at qualitative examples of the expected driving behavior of TransFuser on the Longest6 routes [here](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DZS-U3-iV0s&list=PL6LvknlY2HlQG3YQ2nMIx7WcnyzgK9meO).\n\n### Parsing longest6 results\nTo compute additional statistics from the results of evaluation runs we provide a parser script [tools\u002Fresult_parser.py](.\u002Ftools\u002Fresult_parser.py).\n\n```Shell\n${WORK_DIR}\u002Ftools\u002Fresult_parser.py --xml ${WORK_DIR}\u002Fleaderboard\u002Fdata\u002Flongest6\u002Flongest6.xml --results \u002Fpath\u002Fto\u002Ffolder\u002Fwith\u002Fjson_results\u002F --save_dir \u002Fpath\u002Fto\u002Foutput --town_maps ${WORK_DIR}\u002Fleaderboard\u002Fdata\u002Ftown_maps_xodr\n```\n\nIt will generate a results.csv file containing the average results of the run as well as additional statistics. It also generates town maps and marks the locations where infractions occurred.\n\n### Submitting to the CARLA leaderboard\nTo submit to the CARLA leaderboard you need docker installed on your system.\nEdit the paths at the start of [make_docker.sh](.\u002Fleaderboard\u002Fscripts\u002Fmake_docker.sh).\nCreate the folder *team_code_transfuser\u002Fmodel_ckpt\u002Ftransfuser*.\nCopy the *model.pth* files and *args.txt* that you want to evaluate to *team_code_transfuser\u002Fmodel_ckpt\u002Ftransfuser*.\nIf you want to evaluate an ensemble simply copy multiple .pth files into the folder, the code will load all of them and ensemble the predictions.\n\n```Shell\ncd leaderboard\ncd scripts\n.\u002Fmake_docker.sh\n```\nThe script will create a docker image with the name transfuser-agent.\nFollow the instructions on the [leaderboard](https:\u002F\u002Fleaderboard.carla.org\u002Fsubmit\u002F) to make an account and install alpha.\n\n```Shell\nalpha login\nalpha benchmark:submit  --split 3 transfuser-agent:latest\n```\nThe command will upload the docker image to the cloud and evaluate it.\n\n\u003C!-- ### Building docker image\n\nAdd the following paths to your ```~\u002F.bashrc```\n```\nexport CARLA_ROOT=\u003Cpath_to_carla_root>\nexport SCENARIO_RUNNER_ROOT=\u003Cpath_to_scenario_runner_in_this_repo>\nexport LEADERBOARD_ROOT=\u003Cpath_to_leaderboard_in_this_repo>\nexport PYTHONPATH=\"${CARLA_ROOT}\u002FPythonAPI\u002Fcarla\u002F\":\"${SCENARIO_RUNNER_ROOT}\":\"${LEADERBOARD_ROOT}\":${PYTHONPATH}\n```\n\nEdit the contents of ```leaderboard\u002Fscripts\u002FDockerfile.master``` to specify the required dependencies, agent code and model checkpoints. Add all the required information in the area delimited by the tags ```BEGINNING OF USER COMMANDS``` and ```END OF USER COMMANDS```. The current Dockerfile works for all the models in this repository.\n\nSpecify a name for the docker image in ```leaderboard\u002Fscripts\u002Fmake_docker.sh``` and run:\n```\nleaderboard\u002Fscripts\u002Fmake_docker.sh\n```\n\nRefer to the Transfuser example for the directory structure and where to include the code and checkpoints.\n\n### Testing the docker image locally\n\nSpin up a CARLA server:\n```\nSDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=0 .\u002FCarlaUE4.sh -world-port=2000 -opengl\n```\n\nRun the docker container:  \nDocker 19:  \n```\ndocker run -it --rm --net=host --gpus '\"device=0\"' -e PORT=2000 \u003Cdocker_image> .\u002Fleaderboard\u002Fscripts\u002Frun_evaluation.sh\n```\nIf the docker container doesn't start properly, add another environment variable ```SDL_AUDIODRIVER=dsp```.\n\n### Submitting docker image to the leaderboard\n\nRegister on [AlphaDriver](https:\u002F\u002Fapp.alphadrive.ai\u002F), create a team and apply to the CARLA Leaderboard.\n\nInstall AlphaDrive cli:\n```\ncurl http:\u002F\u002Fdist.alphadrive.ai\u002Finstall-ubuntu.sh | sh -\n```\n\nLogin to alphadrive and submit the docker image:\n```\nalpha login\nalpha benchmark:submit --split \u003C2\u002F3> \u003Cdocker_image>\n```\nUse ```split 2``` for MAP track and ```split 3``` for SENSORS track. -->\n","# TransFuser：基于 Transformer(变换器) 的传感器融合模仿学习用于自动驾驶\n\n## [论文](http:\u002F\u002Fwww.cvlibs.net\u002Fpublications\u002FChitta2022PAMI.pdf) | [补充材料](http:\u002F\u002Fwww.cvlibs.net\u002Fpublications\u002FChitta2022PAMI_supplementary.pdf) | [演讲](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=-GMhYcxOiEU) | [海报](http:\u002F\u002Fwww.cvlibs.net\u002Fpublications\u002FChitta2022PAMI_poster.pdf) | [幻灯片](https:\u002F\u002Fkashyap7x.github.io\u002Fassets\u002Fpdf\u002Ftalks\u002FChitta2022AIR.pdf)\n\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Ftransfuser-imitation-with-transformer-based\u002Fautonomous-driving-on-carla-leaderboard)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fautonomous-driving-on-carla-leaderboard?p=transfuser-imitation-with-transformer-based)\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fautonomousvision_transfuser_readme_5cf38012046e.gif\">\n\n本仓库包含 PAMI 2023 论文 [TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.15997) 的代码。这项工作是对 CVPR 2021 论文 [Multi-Modal Fusion Transformer for End-to-End Autonomous Driving](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.09224) 的期刊扩展版本。CVPR 2021 论文的代码可在 [cvpr2021](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Ftransfuser\u002Ftree\u002Fcvpr2021) 分支中找到。\n\n如果您发现我们的代码或论文有用，请引用：\n\n```bibtex\n@article{Chitta2023PAMI,\n  author = {Chitta, Kashyap and\n            Prakash, Aditya and\n            Jaeger, Bernhard and\n            Yu, Zehao and\n            Renz, Katrin and\n            Geiger, Andreas},\n  title = {TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving},\n  journal = {Pattern Analysis and Machine Intelligence (PAMI)},\n  year = {2023},\n}\n```\n\n```bibtex\n@inproceedings{Prakash2021CVPR,\n  author = {Prakash, Aditya and\n            Chitta, Kashyap and\n            Geiger, Andreas},\n  title = {Multi-Modal Fusion Transformer for End-to-End Autonomous Driving},\n  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},\n  year = {2021}\n}\n```\n\n此外，请查看我们团队关于 CARLA 的其他近期工作的代码：\n- [Jaeger et al., Hidden Biases of End-to-End Driving Models (ICCV 2023)](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Fcarla_garage)\n- [Renz et al., PlanT: Explainable Planning Transformers via Object-Level Representations (CoRL 2022)](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Fplant)\n- [Hanselmann et al., KING: Generating Safety-Critical Driving Scenarios for Robust Imitation via Kinematics Gradients (ECCV 2022)](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Fking)\n- [Chitta et al., NEAT: Neural Attention Fields for End-to-End Autonomous Driving (ICCV 2021)](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Fneat)\n\n## 目录\n\n1. [环境搭建](#setup)\n2. [数据集与训练](#dataset-and-training)\n3. [评估](#evaluation)\n\n\n## 环境搭建\n\n克隆仓库，设置 CARLA 0.9.10.1，并构建 conda 环境：\n\n```Shell\ngit clone https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Ftransfuser.git\ncd transfuser\ngit checkout 2022\nchmod +x setup_carla.sh\n.\u002Fsetup_carla.sh\nconda env create -f environment.yml\nconda activate tfuse\npip install --only-binary=torch-scatter torch-scatter -f https:\u002F\u002Fdata.pyg.org\u002Fwhl\u002Ftorch-1.12.0+cu113.html\npip install --only-binary=mmcv-full mmcv-full==1.6.0 -f https:\u002F\u002Fdownload.openmmlab.com\u002Fmmcv\u002Fdist\u002Fcu113\u002Ftorch1.12.0\u002Findex.html\npip install mmsegmentation==0.25.0\npip install mmdet==2.25.0\n```\n\n## 数据集与训练\n我们的数据集是通过一个特权代理生成的，我们称之为自动驾驶仪（`\u002Fteam_code_autopilot\u002Fautopilot.py`），在 8 个 CARLA 城镇中使用 [此文件夹](.\u002Fleaderboard\u002Fdata\u002Ftraining\u002F) 中提供的路线和场景文件生成。有关训练路线和场景的详细文档，请参阅 [tools\u002Fdataset](.\u002Ftools\u002Fdataset) 文件夹。您可以运行以下命令下载数据集（210GB）：\n\n```Shell\nchmod +x download_data.sh\n.\u002Fdownload_data.sh\n```\n\n数据集结构如下：\n```\n- Scenario\n    - Town\n        - Route\n            - rgb: camera images\n            - depth: corresponding depth images\n            - semantics: corresponding segmentation images\n            - lidar: 3d point cloud in .npy format\n            - topdown: topdown segmentation maps\n            - label_raw: 3d bounding boxes for vehicles\n            - measurements: contains ego-agent's position, velocity and other metadata\n```\n\n### 数据生成\n除了数据集本身外，我们还提供了使用自动驾驶代理进行数据生成的脚本。要生成数据，第一步是启动 CARLA 服务器：\n\n```Shell\n.\u002FCarlaUE4.sh --world-port=2000 -opengl\n```\n\n有关运行 CARLA 服务器的更多信息（例如在无显示器的机器上），请参阅 [官方文档](https:\u002F\u002Fcarla.readthedocs.io\u002Fen\u002Fstable\u002Fcarla_headless\u002F)。一旦服务器运行，使用以下脚本生成训练数据：\n```Shell\n.\u002Fleaderboard\u002Fscripts\u002Fdatagen.sh \u003Ccarla root> \u003Cworking directory of this repo (*\u002Ftransfuser\u002F)>\n```\n\n该脚本需要设置的主要变量是 `SCENARIOS` 和 `ROUTES`。 \n\n### 训练脚本\n\n通过模仿学习进行训练的代码提供在 [train.py](.\u002Fteam_code_transfuser\u002Ftrain.py) 中。\\\n在单台机器上运行训练脚本的最小示例：\n```Shell\ncd team_code_transfuser\npython train.py --batch_size 10 --logdir \u002Fpath\u002Fto\u002Flogdir --root_dir \u002Fpath\u002Fto\u002Fdataset_root\u002F --parallel_training 0\n```\n训练脚本在主函数开头记录了许多其他有用的功能。 \n其中之一是并行训练。 \n当在多 GPU(图形处理器) 节点上进行训练时，启动脚本的方式有所不同：\n```Shell\ncd team_code_transfuser\nCUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=16 OPENBLAS_NUM_THREADS=1 torchrun --nnodes=1 --nproc_per_node=2 --max_restarts=0 --rdzv_id=1234576890 --rdzv_backend=c10d train.py --logdir \u002Fpath\u002Fto\u002Flogdir --root_dir \u002Fpath\u002Fto\u002Fdataset_root\u002F --parallel_training 1\n```\n使用 `CUDA_VISIBLE_DEVICES` 列出您想要在其上训练的 GPU。\n将变量 `OMP_NUM_THREADS` 设置为系统上可用的 CPU(中央处理器) 数量。\n如果您希望避免线程派生其他线程，请将 `OPENBLAS_NUM_THREADS` 设置为 1。\n将 `--nproc_per_node` 设置为节点上可用的 GPU 数量。\n\n评估代理文件旨在评估使用多个 GPU 训练的模型。 \n如果您想评估使用单个 GPU 训练的模型，您需要删除 [这一行](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Ftransfuser\u002Fblob\u002Fa7d4db684c160095dec03851aff5ce92e36b2387\u002Fteam_code_transfuser\u002Fsubmission_agent.py#LL95C18-L95C18)。\n\n\n## 评估\n\n### Longest6 基准测试\n我们对 CARLA leaderboard 代码进行了一些微小修改以用于 Longest6 基准测试，这些修改记录在 [此处](.\u002Fleaderboard)。有关 Longest6 的描述以及如何对其进行评估，请参阅 [leaderboard\u002Fdata\u002Flongest6](.\u002Fleaderboard\u002Fdata\u002Flongest6\u002F) 文件夹。\n\n### 预训练智能体 (Agents)\n所有 4 种方法的预训练智能体 (Agent) 文件均可从 [AWS](https:\u002F\u002Fs3.eu-central-1.amazonaws.com\u002Favg-projects\u002Ftransfuser\u002Fmodels_2022.zip) 下载：\n\n```Shell\nmkdir model_ckpt\nwget https:\u002F\u002Fs3.eu-central-1.amazonaws.com\u002Favg-projects\u002Ftransfuser\u002Fmodels_2022.zip -P model_ckpt\nunzip model_ckpt\u002Fmodels_2022.zip -d model_ckpt\u002F\nrm model_ckpt\u002Fmodels_2022.zip\n```\n\n### 运行智能体 (Agent)\n要评估模型，我们首先启动一个 CARLA 服务器：\n\n```Shell\n.\u002FCarlaUE4.sh --world-port=2000 -opengl\n```\n\n一旦 CARLA 服务器运行起来，使用脚本评估智能体 (Agent)：\n```Shell\n.\u002Fleaderboard\u002Fscripts\u002Flocal_evaluation.sh \u003Ccarla root> \u003Cworking directory of this repo (*\u002Ftransfuser\u002F)>\n```\n\n通过编辑 `local_evaluation.sh` 中的参数，我们可以对 Longest6 路线的性能进行基准测试。您可以评估特权智能体 (Privileged Agents，例如 `autopilot.py`) 和基于传感器的模型。要评估基于传感器的模型，请将 `submission_agent.py` 用作 `TEAM_AGENT`，并将 `TEAM_CONFIG` 指向您下载模型权重的文件夹。代码会根据模型文件夹中的 `args.txt` 文件自动配置以使用正确的方法。\n\n您可以在 [此处](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DZS-U3-iV0s&list=PL6LvknlY2HlQG3YQ2nMIx7WcnyzgK9meO) 查看 TransFuser 在 Longest6 路线上预期驾驶行为的定性示例。\n\n### 解析 longest6 结果\n为了从评估运行的结果中计算额外的统计信息，我们提供了一个解析器脚本 `tools\u002Fresult_parser.py`。\n\n```Shell\n${WORK_DIR}\u002Ftools\u002Fresult_parser.py --xml ${WORK_DIR}\u002Fleaderboard\u002Fdata\u002Flongest6\u002Flongest6.xml --results \u002Fpath\u002Fto\u002Ffolder\u002Fwith\u002Fjson_results\u002F --save_dir \u002Fpath\u002Fto\u002Foutput --town_maps ${WORK_DIR}\u002Fleaderboard\u002Fdata\u002Ftown_maps_xodr\n```\n\n它将生成一个 `results.csv` 文件，其中包含运行的平均结果以及额外的统计信息。它还会生成城镇地图并标记违规行为发生的位置。\n\n### 提交到 CARLA 排行榜\n要向 CARLA 排行榜提交，您的系统需要安装 Docker。\n编辑 `make_docker.sh` 开头的路径。\n创建文件夹 `team_code_transfuser\u002Fmodel_ckpt\u002Ftransfuser`。\n将您想要评估的 `model.pth` 文件和 `args.txt` 复制到 `team_code_transfuser\u002Fmodel_ckpt\u002Ftransfuser`。\n如果您想评估集成 (Ensemble) 模型，只需将多个 `.pth` 文件复制到该文件夹中，代码将加载所有文件并对预测进行集成。\n\n```Shell\ncd leaderboard\ncd scripts\n.\u002Fmake_docker.sh\n```\n该脚本将创建一个名为 `transfuser-agent` 的 Docker 镜像。\n请按照 [排行榜](https:\u002F\u002Fleaderboard.carla.org\u002Fsubmit\u002F) 上的说明创建账户并安装 Alpha。\n\n```Shell\nalpha login\nalpha benchmark:submit  --split 3 transfuser-agent:latest\n```\n该命令将上传 Docker 镜像到云端并进行评估。\n\n\u003C!-- ### 构建 Docker 镜像\n\n将以下路径添加到您的 `~\u002F.bashrc` 中\n```\nexport CARLA_ROOT=\u003Cpath_to_carla_root>\nexport SCENARIO_RUNNER_ROOT=\u003Cpath_to_scenario_runner_in_this_repo>\nexport LEADERBOARD_ROOT=\u003Cpath_to_leaderboard_in_this_repo>\nexport PYTHONPATH=\"${CARLA_ROOT}\u002FPythonAPI\u002Fcarla\u002F\":\"${SCENARIO_RUNNER_ROOT}\":\"${LEADERBOARD_ROOT}\":${PYTHONPATH}\n```\n\n编辑 `leaderboard\u002Fscripts\u002FDockerfile.master` 的内容以指定所需的依赖项、智能体 (Agent) 代码和模型检查点 (Checkpoints)。在所有由标签 `BEGINNING OF USER COMMANDS` 和 `END OF USER COMMANDS` 分隔的区域中添加所有所需的信息。当前的 Dockerfile 适用于此仓库中的所有模型。\n\n在 `leaderboard\u002Fscripts\u002Fmake_docker.sh` 中指定 Docker 镜像的名称并运行：\n```\nleaderboard\u002Fscripts\u002Fmake_docker.sh\n```\n\n请参考 Transfuser 示例以了解目录结构以及代码和检查点 (Checkpoints) 的存放位置。\n\n### 在本地测试 Docker 镜像\n\n启动一个 CARLA 服务器：\n```\nSDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=0 .\u002FCarlaUE4.sh -world-port=2000 -opengl\n```\n\n运行 Docker 容器：  \nDocker 19：  \n```\ndocker run -it --rm --net=host --gpus '\"device=0\"' -e PORT=2000 \u003Cdocker_image> .\u002Fleaderboard\u002Fscripts\u002Frun_evaluation.sh\n```\n如果 Docker 容器无法正常启动，请添加另一个环境变量 `SDL_AUDIODRIVER=dsp`。\n\n### 提交 Docker 镜像到排行榜\n\n在 [AlphaDriver](https:\u002F\u002Fapp.alphadrive.ai\u002F) 注册，创建团队并申请加入 CARLA 排行榜。\n\n安装 AlphaDrive CLI：\n```\ncurl http:\u002F\u002Fdist.alphadrive.ai\u002Finstall-ubuntu.sh | sh -\n```\n\n登录 AlphaDrive 并提交 Docker 镜像：\n```\nalpha login\nalpha benchmark:submit --split \u003C2\u002F3> \u003Cdocker_image>\n```\nMAP 赛道使用 `split 2`，SENSORS 赛道使用 `split 3`。 -->","# TransFuser 快速上手指南\n\n**TransFuser** 是一种基于 Transformer 的传感器融合模仿学习自动驾驶工具，支持端到端自动驾驶任务。本指南帮助您快速搭建开发环境并运行训练与评估流程。\n\n## 1. 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n- **操作系统**: Linux (推荐使用 Ubuntu)\n- **CUDA**: 需安装兼容的 NVIDIA CUDA Toolkit (根据 PyTorch 版本选择)\n- **依赖管理**: 已安装 Anaconda 或 Miniconda\n- **CARLA**: 需要安装 CARLA 0.9.10.1 版本\n- **存储空间**: 数据集约 210GB，请预留足够磁盘空间\n\n> **注意**: 由于涉及大量数据下载（AWS S3）及模型权重获取，建议网络环境稳定。\n\n## 2. 安装步骤\n\n### 克隆仓库并配置基础环境\n\n```Shell\ngit clone https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Ftransfuser.git\ncd transfuser\ngit checkout 2022\nchmod +x setup_carla.sh\n.\u002Fsetup_carla.sh\n```\n\n### 创建 Conda 环境并安装依赖\n\n```Shell\nconda env create -f environment.yml\nconda activate tfuse\npip install --only-binary=torch-scatter torch-scatter -f https:\u002F\u002Fdata.pyg.org\u002Fwhl\u002Ftorch-1.12.0+cu113.html\npip install --only-binary=mmcv-full mmcv-full==1.6.0 -f https:\u002F\u002Fdownload.openmmlab.com\u002Fmmcv\u002Fdist\u002Fcu113\u002Ftorch1.12.0\u002Findex.html\npip install mmsegmentation==0.25.0\npip install mmdet==2.25.0\n```\n\n## 3. 基本使用\n\n### 下载数据集\n\n运行脚本下载训练所需的数据集（约 210GB）：\n\n```Shell\nchmod +x download_data.sh\n.\u002Fdownload_data.sh\n```\n\n数据集结构如下：\n```\n- Scenario\n    - Town\n        - Route\n            - rgb: 相机图像\n            - depth: 深度图像\n            - semantics: 分割图像\n            - lidar: .npy 格式的 3D 点云\n            - topdown: 俯视分割图\n            - label_raw: 车辆 3D 边界框\n            - measurements: 自车位置、速度等元数据\n```\n\n### 模型训练\n\n进入训练目录，运行单卡训练示例（可根据实际情况调整参数）：\n\n```Shell\ncd team_code_transfuser\npython train.py --batch_size 10 --logdir \u002Fpath\u002Fto\u002Flogdir --root_dir \u002Fpath\u002Fto\u002Fdataset_root\u002F --parallel_training 0\n```\n\n如需多卡并行训练，请使用 `torchrun` 启动：\n\n```Shell\ncd team_code_transfuser\nCUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=16 OPENBLAS_NUM_THREADS=1 torchrun --nnodes=1 --nproc_per_node=2 --max_restarts=0 --rdzv_id=1234576890 --rdzv_backend=c10d train.py --logdir \u002Fpath\u002Fto\u002Flogdir --root_dir \u002Fpath\u002Fto\u002Fdataset_root\u002F --parallel_training 1\n```\n\n### 模型评估\n\n#### 1. 下载预训练模型\n从 AWS 下载预训练权重：\n\n```Shell\nmkdir model_ckpt\nwget https:\u002F\u002Fs3.eu-central-1.amazonaws.com\u002Favg-projects\u002Ftransfuser\u002Fmodels_2022.zip -P model_ckpt\nunzip model_ckpt\u002Fmodels_2022.zip -d model_ckpt\u002F\nrm model_ckpt\u002Fmodels_2022.zip\n```\n\n#### 2. 启动 CARLA 服务器\n在另一个终端启动 CARLA 服务：\n\n```Shell\n.\u002FCarlaUE4.sh --world-port=2000 -opengl\n```\n\n#### 3. 运行评估脚本\n返回主目录，运行本地评估脚本：\n\n```Shell\n.\u002Fleaderboard\u002Fscripts\u002Flocal_evaluation.sh \u003Ccarla root> \u003Cworking directory of this repo (*\u002Ftransfuser\u002F)>\n```\n\n> **提示**: 可通过编辑 `local_evaluation.sh` 中的参数来指定不同的评测路线（如 Longest6）。对于传感器模型，请设置 `TEAM_AGENT` 为 `submission_agent.py` 并指向模型权重文件夹。","某自动驾驶算法团队正在 CARLA 仿真环境中构建城市复杂路口的端到端导航系统，急需解决多传感器协同与实时决策难题。\n\n### 没有 transfuser 时\n- 传统流水线需独立搭建感知、预测与控制模块，系统耦合度高且接口调试极其繁琐\n- 摄像头图像与激光雷达点云数据存在时空对齐误差，导致夜间或逆光场景下障碍物漏检\n- 面对突发加塞或行人遮挡等长尾场景，单一模态输入缺乏上下文理解能力，易引发急刹\n- 多个独立模型串行推理累积延迟，难以满足车辆高速过弯时的实时控制需求\n\n### 使用 transfuser 后\n- 利用 Transformer 架构直接融合 RGB 图像与深度语义信息，无需手动设计复杂的特征拼接逻辑\n- 端到端模仿学习框架将感知与决策打通，显著减少了中间环节带来的误差传播\n- 多模态注意力机制能自动关注关键区域，即使在部分传感器失效时也能保持稳定的路径规划\n- 统一网络结构优化了计算效率，在保持高精度的同时将推理速度提升至实时运行水平\n\n核心价值在于通过统一的 Transformer 架构实现了高鲁棒性的多传感器融合驾驶决策。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fautonomousvision_transfuser_5e42fba7.png","autonomousvision","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fautonomousvision_e0fa4bb2.png",null,"https:\u002F\u002Fgithub.com\u002Fautonomousvision",[80,84,88,92,96,100,103,106,109],{"name":81,"color":82,"percentage":83},"Python","#3572A5",91.4,{"name":85,"color":86,"percentage":87},"XSLT","#EB8CEB",6.3,{"name":89,"color":90,"percentage":91},"HTML","#e34c26",1.2,{"name":93,"color":94,"percentage":95},"Shell","#89e051",0.6,{"name":97,"color":98,"percentage":99},"Dockerfile","#384d54",0.1,{"name":101,"color":102,"percentage":99},"CSS","#663399",{"name":104,"color":105,"percentage":99},"JavaScript","#f1e05a",{"name":107,"color":108,"percentage":99},"Ruby","#701516",{"name":110,"color":111,"percentage":112},"SCSS","#c6538c",0,1531,235,"2026-04-03T03:34:20","MIT",4,"未说明","需要 NVIDIA GPU，CUDA 11.3+，显存大小未说明",{"notes":121,"python":118,"dependencies":122},"需下载约 210GB 数据集；依赖 CARLA 0.9.10.1 仿真环境；数据生成及评估需启动 CARLA 服务器；多卡训练需配置 torchrun；提交排行榜需构建 Docker 镜像",[123,124,125,126,127,128],"torch>=1.12.0","mmcv-full==1.6.0","mmsegmentation==0.25.0","mmdet==2.25.0","torch-scatter","CARLA==0.9.10.1",[15,26],[131,132,133,134],"imitation-learning","autonomous-driving","transformers","sensor-fusion","2026-03-27T02:49:30.150509","2026-04-06T08:09:03.460554",[138,143,147,151,156,160,164],{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},3341,"如何理解注意力图（attention map）的维度结构？","维度通常为 24_4_128_128。其中 '4' 代表架构中的 4 个注意力头（attention heads），第一维是 batch size，后两维 128x128 表示包含所有输入 token 的注意力矩阵。","https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Ftransfuser\u002Fissues\u002F89",{"id":144,"question_zh":145,"answer_zh":146,"source_url":142},3342,"如何判断注意力权重属于图像还是激光雷达（LiDAR）？","总共有 128 个 token（64 个图像 + 64 个激光雷达）。每个 token 生成一个 1x128 的注意力向量。若向量中前 64 个值的最高值出现，则属于图像；否则属于激光雷达。",{"id":148,"question_zh":149,"answer_zh":150,"source_url":142},3343,"在哪里可以找到融合部分的实现细节代码？","可以参考 `viz.py` 文件获取具体的实现细节，该文件位于 transfuser 目录下。",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},3344,"训练时是否应该包含长路线（long routes）数据？","不建议。观察发现包含长路线（如 town 1,2,3,4,6,7,10）会严重扭曲训练数据分布，导致性能下降。评估时仅考虑 Town05 长路线，其他路线可用于其他工作（如 NEAT）。","https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Ftransfuser\u002Fissues\u002F37",{"id":157,"question_zh":158,"answer_zh":159,"source_url":155},3345,"如何复现论文中报告的评估结果？","论文结果使用的是 `clear_weather_data`，而排行榜提交使用的是 `14_weathers_data`。若要复现论文结果，必须在 `clear_weather_data` 上进行训练。",{"id":161,"question_zh":162,"answer_zh":163,"source_url":155},3346,"为什么多次评估的 DS\u002FRC 指标波动较大？","这是正常现象。作者提到在论文中训练了 3 个不同模型，每个模型运行 3 次评估，共 9 个结果。如果想考虑多天气设置的评估，可参考 NEAT 项目或直接提交到 CARLA 排行榜。",{"id":165,"question_zh":166,"answer_zh":167,"source_url":168},3347,"如何在评估过程中保存可视化图像（如 TopDown、BEV）？","可以在 `forward_ego` 方法中临时将 `debug` 参数硬编码为 `True`。这样配置后，TopDown、BEV 和前视摄像头的图像将会被保存到指定的 `save_path` 目录中。","https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Ftransfuser\u002Fissues\u002F291",[]]