[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-EvolvingLMMs-Lab--Otter":3,"tool-EvolvingLMMs-Lab--Otter":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",160784,2,"2026-04-19T11:32:54",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",109154,"2026-04-18T11:18:24",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":78,"owner_website":76,"owner_url":79,"languages":80,"stars":93,"forks":94,"last_commit_at":95,"license":96,"difficulty_score":97,"env_os":98,"env_gpu":99,"env_ram":98,"env_deps":100,"category_tags":108,"github_topics":109,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":121,"updated_at":122,"faqs":123,"releases":154},9622,"EvolvingLMMs-Lab\u002FOtter","Otter","🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.","Otter 是一款基于 OpenFlamingo 架构开源的多模态人工智能模型，旨在提升机器对图像、视频与文本混合输入的理解能力。它通过在 MIMIC-IT 等高质量数据集上进行训练，显著增强了模型遵循复杂指令以及在上下文中快速学习新任务的能力，有效解决了传统多模态模型在细粒度视觉理解和交互灵活性上的不足。\n\n这款工具特别适合 AI 研究人员、开发者以及希望探索多模态大模型应用的技术团队使用。无论是进行学术实验、模型微调，还是构建需要处理高分辨率图像与视频的智能应用，Otter 都提供了坚实的基座。其最新推出的 OtterHD 版本更是一项技术亮点：它基于 Fuyu-8B 改进而来，创新地去除了独立的视觉编码器模块，直接将图像块线性变换后与文本令牌共同处理。这种设计不仅架构更加优雅，还能在不牺牲性能的前提下，实现对高分辨率视觉输入的精细化解读，甚至能识别仅占图像 1% 大小的微小物体及其空间关系。此外，项目方还开源了高效的微调脚本并支持多种主流基准测试，帮助使用者以更低的成本验证和部署自己的多模态解决方案。","\u003Cp align=\"center\" width=\"100%\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEvolvingLMMs-Lab_Otter_readme_503397b76a09.png\"  width=\"80%\" height=\"80%\">\n\u003C\u002Fp>\n\n---\n![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fotter-v0.3-darkcyan)\n[![Twitter](https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Furl\u002Fhttps\u002Ftwitter.com\u002Fcloudposse.svg?style=social&label=Follow%20%40Us)](https:\u002F\u002Ftwitter.com\u002FBoLi68567011)\n![](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fluodian\u002Fotter?style=social)\n[![Hits](https:\u002F\u002Fhits.seeyoufarm.com\u002Fapi\u002Fcount\u002Fincr\u002Fbadge.svg?url=https%3A%2F%2Fgithub.com%2FLuodian%2Fotter&count_bg=%23FFA500&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=visitors&edge_flat=false)](https:\u002F\u002Fhits.seeyoufarm.com)\n[![litellm](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%20%F0%9F%9A%85%20liteLLM-OpenAI%7CAzure%7CAnthropic%7CPalm%7CCohere-blue?color=green)](https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm)\n\n[Project Credits](https:\u002F\u002Fgithub.com\u002FLuodian\u002FOtter\u002Fblob\u002Fmain\u002Fdocs\u002Fcredits.md) | [Otter Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.03726) | [OtterHD Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.04219) | [MIMIC-IT Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.05425)\n\n**Checkpoints:**\n\n- [luodian\u002FOTTER-Image-MPT7B](https:\u002F\u002Fhuggingface.co\u002Fluodian\u002FOTTER-Image-MPT7B)\n- [luodian\u002FOTTER-Video-LLaMA7B-DenseCaption](https:\u002F\u002Fhuggingface.co\u002Fluodian\u002FOTTER-Video-LLaMA7B-DenseCaption)\n\nFor who in the mainland China: [![Open in OpenXLab](https:\u002F\u002Fcdn-static.openxlab.org.cn\u002Fheader\u002Fopenxlab_models.svg)](https:\u002F\u002Fopenxlab.org.cn\u002Fmodels\u002Fdetail\u002FYuanhanZhang\u002FOTTER-Image-MPT7B) | [![Open in OpenXLab](https:\u002F\u002Fcdn-static.openxlab.org.cn\u002Fheader\u002Fopenxlab_models.svg)](https:\u002F\u002Fopenxlab.org.cn\u002Fmodels\u002Fdetail\u002FYuanhanZhang\u002FOTTER-Video-LLaMA7B-DenseCaption)\n\n**Disclaimer:** The code may not be perfectly polished and refactored, but **all opensourced codes are tested and runnable** as we also use the code to support our research. If you have any questions, please feel free to open an issue. We are eagerly looking forward to suggestions and PRs to improve the code quality.\n\n## 🦾 Update\n\n**[2023-11]: Supporting GPT4V's Evaluation on 8 Benchmarks; Anouncing OtterHD-8B, improved from Fuyu-8B. Checkout [OtterHD](.\u002Fdocs\u002FOtterHD.md) for details.**\n\n\u003Cdiv style=\"text-align:center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEvolvingLMMs-Lab_Otter_readme_e0ec8426b692.png\"  width=\"100%\" height=\"100%\">\n\u003C\u002Fdiv>\n\n1. 🦦 Added [OtterHD](.\u002Fdocs\u002FOtterHD.md), a multimodal fine-tuned from [Fuyu-8B](https:\u002F\u002Fhuggingface.co\u002Fadept\u002Ffuyu-8b) to facilitate fine-grained interpretations of high-resolution visual input *without a explicit vision encoder module*. All image patches are linear transformed and processed together with text tokens. This is a very innovative and elegant exploration. We are fascinated and paved in this way, we opensourced the finetune script for Fuyu-8B and improve training throughput by 4-5 times faster with [Flash-Attention-2](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention). Try our finetune script at [OtterHD](.\u002Fdocs\u002FOtterHD.md).\n2. 🔍 Added [MagnifierBench](.\u002Fdocs\u002FOtterHD.md), an evaluation benchmark tailored to assess whether the model can identify the tiny objects' information (1% image size) and spatial relationships.\n3. Improved pipeline for [Pretrain](pipeline\u002Ftrain\u002Fpretraining.py) | [SFT](pipeline\u002Ftrain\u002Finstruction_following.py) | [RLHF]() with (part of) current leading LMMs.\n   1. **Models**: [Otter](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.03726) | [OpenFlamingo](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.01390) | [Idefics](https:\u002F\u002Fhuggingface.co\u002FHuggingFaceM4\u002Fidefics-80b-instruct) | [Fuyu](https:\u002F\u002Fhuggingface.co\u002Fadept\u002Ffuyu-8b)\n   2. **Training Datasets Interface: (Pretrain)** MMC4 | LAION2B | CC3M | CC12M, **(SFT)** MIMIC-IT | M3IT | LLAVAR | LRV | SVIT...\n        - *We tested above datasets for both pretraining and instruction tuning with OpenFlamingo and Otter. We also tested the datasets with Idefics and Fuyu for instruction tuning. We will opensource the training scripts gradually.*\n   3. [**Benchmark Interface**](https:\u002F\u002Fhuggingface.co\u002FOtter-AI): MagnifierBench\u002FMMBench\u002FMM-VET\u002FMathVista\u002FPOPE\u002FMME\u002FSicenceQA\u002FSeedBench. Run them can be in one-click, please see [Benchmark](.\u002Fdocs\u002Fbenchmark_eval.md) for details.\n    ```yaml\n        datasets:\n        - name: magnifierbench\n            split: test\n            prompt: Answer with the option's letter from the given choices directly.\n            api_key: [Your API Key] # GPT4 or GPT3.5 to evaluate the answers and ground truth.\n            debug: true # put debug=true will save the model response in log file.\n        - name: mme\n            split: test\n            debug: true\n        - name: mmbench\n            split: test\n            debug: true\n\n        models:\n        - name: gpt4v\n            api_key: [Your API Key] # to call GPT4V model.\n    ```\n   4. **Code refactorization** for **organizing multiple groups of datasets with integrated yaml file**, see details at [managing datasets in MIMIC-IT format](docs\u002Fmimicit_format.md). For example, \n    ```yaml\n        IMAGE_TEXT: # Group name should be in [IMAGE_TEXT, TEXT_ONLY, IMAGE_TEXT_IN_CONTEXT]\n            LADD: # Dataset name can be assigned at any name you want\n                mimicit_path: azure_storage\u002Fjson\u002FLA\u002FLADD_instructions.json # Path of the instruction json file\n                images_path: azure_storage\u002FParquets\u002FLA.parquet # Path of the image parquet file\n                num_samples: -1 # Number of samples you want to use, -1 means use all samples, if not set, default is -1.\n            M3IT_CAPTIONING:\n                mimicit_path: azure_storage\u002Fjson\u002FM3IT\u002Fcaptioning\u002Fcoco\u002Fcoco_instructions.json\n                images_path: azure_storage\u002FParquets\u002Fcoco.parquet\n                num_samples: 20000\n    ```\n   *This is a major change and would result previous code not runnable, please check the details.*\n\n**[2023-08]**\n\n1. Added Support for using Azure, Anthropic, Palm, Cohere models for Self-Instruct with Syphus pipeline, for information on usage modify [this line](https:\u002F\u002Fgithub.com\u002FLuodian\u002FOtter\u002Fblob\u002F16d73b399fac6352ebff7504b1acb1f228fbf3f4\u002Fmimic-it\u002Fsyphus\u002Ffile_utils.py#L53) with your selected model and set your API keys in the environment. For more information see [LiteLLM](https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm\u002F)\n\n**[2023-07]: Anouncing MIMIC-IT dataset for multiple interleaved image-text\u002Fvideo instruction tuning.**\n\n1. 🤗 Checkout [MIMIC-IT](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fpufanyi\u002FMIMICIT) on Huggingface datasets.\n2. 🥚 Update [Eggs](.\u002Fmimic-it\u002FREADME.md\u002F#eggs) section for downloading MIMIC-IT dataset.\n3. 🥃 Contact us **if you wish to develop Otter for your scenarios** (for satellite images or funny videos?). We aim to support and assist with Otter's diverse use cases. OpenFlamingo and Otter are strong models with the [Flamingo](https:\u002F\u002Fwww.deepmind.com\u002Fblog\u002Ftackling-multiple-tasks-with-a-single-visual-language-model)'s excellently designed architecture that accepts multiple images\u002Fvideos or other modality inputs. Let's build more interesting models together.\n\n**[2023-06]**\n\n1. 🧨 [Download MIMIC-IT Dataset](https:\u002F\u002Fentuedu-my.sharepoint.com\u002F:f:\u002Fg\u002Fpersonal\u002Flibo0013_e_ntu_edu_sg\u002FEo9bgNV5cjtEswfA-HfjNNABiKsjDzSWAl5QYAlRZPiuZA?e=M9isDT). For more details on navigating the dataset, please refer to [MIMIC-IT Dataset README](mimic-it\u002FREADME.md).\n2. 🏎️ [Run Otter Locally](.\u002Fpipeline\u002Fdemo). You can run our model locally with at least 16G GPU mem for tasks like image\u002Fvideo tagging and captioning and identifying harmful content. We fix a bug related to video inference where `frame tensors` were mistakenly unsqueezed to a wrong `vision_x`.\n   > Make sure to adjust the `sys.path.append(\"..\u002F..\")` correctly to access `otter.modeling_otter` in order to launch the model.\n3. 🤗 Check our [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.05425) introducing MIMIC-IT in details. Meet MIMIC-IT, the first multimodal in-context instruction tuning dataset with 2.8M instructions! From general scene understanding to spotting subtle differences and enhancing egocentric view comprehension for AR headsets, our MIMIC-IT dataset has it all.\n\n## 🦦 Why In-Context Instruction Tuning?\n\nLarge Language Models (LLMs) have demonstrated exceptional universal aptitude as few\u002Fzero-shot learners for numerous tasks, owing to their pre-training on extensive text data. Among these LLMs, GPT-3 stands out as a prominent model with significant capabilities. Additionally, variants of GPT-3, namely InstructGPT and ChatGPT, have proven effective in interpreting natural language instructions to perform complex real-world tasks, thanks to instruction tuning.\n\nMotivated by the upstream interleaved format pretraining of the Flamingo model, we present 🦦 Otter, a multi-modal model based on OpenFlamingo (the open-sourced version of DeepMind's Flamingo). We train our Otter in an in-context instruction tuning way on our proposed **MI**-**M**odal **I**n-**C**ontext **I**nstruction **T**uning (**MIMIC-IT**) dataset. Otter showcases improved instruction-following and in-context learning ability in both images and videos.\n\n## 🗄 MIMIC-IT Dataset Details\n\n\u003Cp align=\"center\" width=\"100%\">\n\u003Cimg src=\"https:\u002F\u002Fi.postimg.cc\u002FyYMm1G5X\u002Fmimicit-logo.png\"  width=\"80%\" height=\"80%\">\n\u003C\u002Fp>\n\nMIMIC-IT enables the application of egocentric visual assistant model that can serve that can answer your questions like **Hey, Do you think I left my keys on the table?**. Harness the power of MIMIC-IT to unlock the full potential of your AI-driven visual assistant and elevate your interactive vision-language tasks to new heights.\n\n\u003Cp align=\"center\" width=\"100%\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEvolvingLMMs-Lab_Otter_readme_0d8177a74229.png\"  width=\"80%\" height=\"80%\">\n\u003C\u002Fp>\n\nWe also introduce **Syphus**, an automated pipeline for generating high-quality instruction-response pairs in multiple languages. Building upon the framework proposed by LLaVA, we utilize ChatGPT to generate instruction-response pairs based on visual content. To ensure the quality of the generated instruction-response pairs, our pipeline incorporates system messages, visual annotations, and in-context examples as prompts for ChatGPT.\n\nFor more details, please check the [MIMIC-IT dataset](mimic-it\u002FREADME.md).\n\n## 🤖 Otter Model Details\n\n\u003Cdiv style=\"text-align:center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEvolvingLMMs-Lab_Otter_readme_8be3a4e5ce7a.png\"  width=\"100%\" height=\"100%\">\n\u003C\u002Fdiv>\n\nOtter is designed to support multi-modal in-context instruction tuning based on the OpenFlamingo model, which involves conditioning the language model on the corresponding media, such as an image that corresponds to a caption or an instruction-response pair.\n\nWe train Otter on MIMIC-IT dataset with approximately 2.8 million in-context instruction-response pairs, which are structured into a cohesive template to facilitate various tasks. Otter supports videos inputs (frames are arranged as original Flamingo's implementation) and multiple images inputs as in-context examples, which is **the first multi-modal instruction tuned model**.\n\nThe following template encompasses images, user instructions, and model-generated responses, utilizing the `User` and `GPT` role labels to enable seamless user-assistant interactions.\n\n```python\nprompt = f\"\u003Cimage>User: {instruction} GPT:\u003Canswer> {response}\u003Cendofchunk>\"\n```\n\nTraining the Otter model on the MIMIC-IT dataset allows it to acquire different capacities, as demonstrated by the LA and SD tasks. Trained on the LA task, the model exhibits exceptional scene comprehension, reasoning abilities, and multi-round conversation capabilities.\n\n```python\n# multi-round of conversation\nprompt = f\"\u003Cimage>User: {first_instruction} GPT:\u003Canswer> {first_response}\u003Cendofchunk>User: {second_instruction} GPT:\u003Canswer>\"\n```\n\nRegarding the concept of organizing visual-language in-context examples, we demonstrate here the acquired ability of the Otter model to follow inter-contextual instructions after training on the LA-T2T task. The organized input data format is as follows:\n\n```python\n# Multiple in-context example with similar instructions\nprompt = f\"\u003Cimage>User:{ict_first_instruction} GPT: \u003Canswer>{ict_first_response}\u003C|endofchunk|>\u003Cimage>User:{ict_second_instruction} GPT: \u003Canswer>{ict_second_response}\u003C|endofchunk|>\u003Cimage>User:{query_instruction} GPT: \u003Canswer>\"\n```\n\nFor more details, please refer to our [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.05425)'s appendix for other tasks.\n\n## 🗂️ Environments\n\n1. Compare cuda version returned by nvidia-smi and nvcc --version. They need to match. Or at least, the version get by nvcc --version should be \u003C= the version get by nvidia-smi.\n2. Install the pytorch that matches your cuda version. (e.g. cuda 11.7 torch 2.0.0). We have successfully run this code on cuda 11.1 torch 1.10.1 and cuda 11.7 torch 2.0.0. You can refer to PyTorch's documentation, [Latest](https:\u002F\u002Fpytorch.org\u002F) or [Previous](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Fprevious-versions\u002F).\n3. You may install via `conda env create -f environment.yml`. Especially to make sure the `transformers>=4.28.0`, `accelerate>=0.18.0`.\n\nAfter configuring environment, you can use the 🦩 Flamingo model \u002F 🦦 Otter model as a 🤗 Hugging Face model with only a few lines! One-click and then model configs\u002Fweights are downloaded automatically. Please refer to [Huggingface Otter\u002FFlamingo](.\u002Fdocs\u002Fhuggingface_compatible.md) for details.\n\n## ☄️ Training\n\nOtter is trained based on OpenFlamingo. You may need to use converted weights at [luodian\u002FOTTER-9B-INIT](https:\u002F\u002Fhuggingface.co\u002Fluodian\u002FOTTER-9B-INIT) or [luodian\u002FOTTER-MPT7B-Init](https:\u002F\u002Fhuggingface.co\u002Fluodian\u002FOTTER-MPT7B-Init). They are respectively converted from [OpenFlamingo-LLaMA7B-v1](https:\u002F\u002Fhuggingface.co\u002Fopenflamingo\u002FOpenFlamingo-9B) and [OpenFlamingo-MPT7B-v2](https:\u002F\u002Fhuggingface.co\u002Fopenflamingo\u002FOpenFlamingo-9B-vitl-mpt7b), we added a `\u003Canswer>` token for Otter's downstream instruction tuning.\n\nYou may also use any trained Otter weights to start with your training on top of ours, see them at [Otter Weights](https:\u002F\u002Fhuggingface.co\u002Fluodian). You can refer to [MIMIC-IT](https:\u002F\u002Fgithub.com\u002FLuodian\u002FOtter\u002Ftree\u002Fmain\u002Fmimic-it) for preparing image\u002Finstruction\u002Ftrain json files.\n\n```bash\nexport PYTHONPATH=.\nRUN_NAME=\"Otter_MPT7B\"\nGPU=8\nWORKERS=$((${GPU}*2))\n\necho \"Using ${GPU} GPUs and ${WORKERS} workers\"\necho \"Running ${RUN_NAME}\"\n\naccelerate launch --config_file=.\u002Fpipeline\u002Faccelerate_configs\u002Faccelerate_config_zero3.yaml \\\n    --num_processes=${GPU} \\\n    pipeline\u002Ftrain\u002Finstruction_following.py \\\n    --pretrained_model_name_or_path=luodian\u002FOTTER-MPT7B-Init \\\n    --model_name=otter \\\n    --instruction_format=simple \\\n    --training_data_yaml=.\u002Fshared_scripts\u002FDemo_Data.yaml \\\n    --batch_size=8 \\\n    --num_epochs=3 \\\n    --report_to_wandb \\\n    --wandb_entity=ntu-slab \\\n    --external_save_dir=.\u002Fcheckpoints \\\n    --run_name=${RUN_NAME} \\\n    --wandb_project=Otter_MPTV \\\n    --workers=${WORKERS} \\\n    --lr_scheduler=cosine \\\n    --learning_rate=2e-5 \\\n    --warmup_steps_ratio=0.01 \\\n    --save_hf_model \\\n    --max_seq_len=1024 \\\n```\n\n## 📑 Citation\n\nIf you found this repository useful, please consider citing:\n\n```\n@article{li2023otter,\n  title={Otter: A Multi-Modal Model with In-Context Instruction Tuning},\n  author={Li, Bo and Zhang, Yuanhan and Chen, Liangyu and Wang, Jinghao and Yang, Jingkang and Liu, Ziwei},\n  journal={arXiv preprint arXiv:2305.03726},\n  year={2023}\n}\n\n@article{li2023mimicit,\n    title={MIMIC-IT: Multi-Modal In-Context Instruction Tuning},\n    author={Bo Li and Yuanhan Zhang and Liangyu Chen and Jinghao Wang and Fanyi Pu and Jingkang Yang and Chunyuan Li and Ziwei Liu},\n    year={2023},\n    eprint={2306.05425},\n    archivePrefix={arXiv},\n    primaryClass={cs.CV}\n}\n```\n\n### 👨‍🏫 Acknowledgements\n\nWe thank [Jack Hessel](https:\u002F\u002Fjmhessel.com\u002F) for the advise and support, as well as the [OpenFlamingo](https:\u002F\u002Fgithub.com\u002Fmlfoundations\u002Fopen_flamingo) team for their great contribution to the open source community.\n\nHuge accolades to [Flamingo](https:\u002F\u002Fwww.deepmind.com\u002Fblog\u002Ftackling-multiple-tasks-with-a-single-visual-language-model) and [OpenFlamingo](https:\u002F\u002Fgithub.com\u002Fmlfoundations\u002Fopen_flamingo) team for the work on this great architecture.\n\n### 📝 Related Projects\n\n- [LLaVA: Visual Instruction Tuning](https:\u002F\u002Fgithub.com\u002Fhaotian-liu\u002FLLaVA)\n- [Instruction Tuning with GPT4](https:\u002F\u002Fgithub.com\u002FInstruction-Tuning-with-GPT-4\u002FGPT-4-LLM)\n","\u003Cp align=\"center\" width=\"100%\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEvolvingLMMs-Lab_Otter_readme_503397b76a09.png\"  width=\"80%\" height=\"80%\">\n\u003C\u002Fp>\n\n---\n![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fotter-v0.3-darkcyan)\n[![Twitter](https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Furl\u002Fhttps\u002Ftwitter.com\u002Fcloudposse.svg?style=social&label=Follow%20%40Us)](https:\u002F\u002Ftwitter.com\u002FBoLi68567011)\n![](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fluodian\u002Fotter?style=social)\n[![访问量](https:\u002F\u002Fhits.seeyoufarm.com\u002Fapi\u002Fcount\u002Fincr\u002Fbadge.svg?url=https%3A%2F%2Fgithub.com%2FLuodian%2Fotter&count_bg=%23FFA500&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=visitors&edge_flat=false)](https:\u002F\u002Fhits.seeyoufarm.com)\n[![litellm](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%20%F0%9F%9A%85%20liteLLM-OpenAI%7CAzure%7CAnthropic%7CPalm%7CCohere-blue?color=green)](https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm)\n\n**项目致谢**：[credits.md](https:\u002F\u002Fgithub.com\u002FLuodian\u002FOtter\u002Fblob\u002Fmain\u002Fdocs\u002Fcredits.md) | **论文**：[Otter 论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.03726) | [OtterHD 论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.04219) | [MIMIC-IT 论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.05425)\n\n**检查点**：\n\n- [luodian\u002FOTTER-Image-MPT7B](https:\u002F\u002Fhuggingface.co\u002Fluodian\u002FOTTER-Image-MPT7B)\n- [luodian\u002FOTTER-Video-LLaMA7B-DenseCaption](https:\u002F\u002Fhuggingface.co\u002Fluodian\u002FOTTER-Video-LLaMA7B-DenseCaption)\n\n适用于中国大陆用户：[![在 OpenXLab 中打开](https:\u002F\u002Fcdn-static.openxlab.org.cn\u002Fheader\u002Fopenxlab_models.svg)](https:\u002F\u002Fopenxlab.org.cn\u002Fmodels\u002Fdetail\u002FYuanhanZhang\u002FOTTER-Image-MPT7B) | [![在 OpenXLab 中打开](https:\u002F\u002Fcdn-static.openxlab.org.cn\u002Fheader\u002Fopenxlab_models.svg)](https:\u002F\u002Fopenxlab.org.cn\u002Fmodels\u002Fdetail\u002FYuanhanZhang\u002FOTTER-Video-LLaMA7B-DenseCaption)\n\n**免责声明**：代码可能尚未经过完美优化和重构，但**所有开源代码均已测试并通过运行验证**，因为我们也在使用这些代码来支持我们的研究。如果您有任何问题，请随时提交 issue。我们热切期待您的建议和 Pull Request，以进一步提升代码质量。\n\n## 🦾 更新\n\n**[2023-11]: 支持 GPT4V 在 8 个基准上的评估；宣布推出基于 Fuyu-8B 改进的 OtterHD-8B。详情请参阅 [OtterHD](.\u002Fdocs\u002FOtterHD.md)。**\n\n\u003Cdiv style=\"text-align:center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEvolvingLMMs-Lab_Otter_readme_e0ec8426b692.png\"  width=\"100%\" height=\"100%\">\n\u003C\u002Fdiv>\n\n1. 🦦 新增了 [OtterHD](.\u002Fdocs\u002FOtterHD.md)，它是基于 [Fuyu-8B](https:\u002F\u002Fhuggingface.co\u002Fadept\u002Ffuyu-8b) 进行多模态微调的模型，旨在无需显式视觉编码器模块的情况下，对高分辨率视觉输入进行细粒度解读。所有图像块都经过线性变换，并与文本标记一起处理。这是一项非常创新且优雅的探索。我们对此深感着迷，并以此为基础开源了 Fuyu-8B 的微调脚本，同时借助 [Flash-Attention-2](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention) 将训练吞吐量提升了 4–5 倍。欢迎在 [OtterHD](.\u002Fdocs\u002FOtterHD.md) 中尝试我们的微调脚本。\n2. 🔍 新增了 [MagnifierBench](.\u002Fdocs\u002FOtterHD.md)，这是一个专门用于评估模型能否识别极小物体信息（占图像大小的 1%）及其空间关系的评测基准。\n3. 针对当前领先的 LMM 模型，优化了 [预训练](pipeline\u002Ftrain\u002Fpretraining.py) | [SFT](pipeline\u002Ftrain\u002Finstruction_following.py) | [RLHF]() 的流程。\n   1. **模型**: [Otter](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.03726) | [OpenFlamingo](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.01390) | [Idefics](https:\u002F\u002Fhuggingface.co\u002FHuggingFaceM4\u002Fidefics-80b-instruct) | [Fuyu](https:\u002F\u002Fhuggingface.co\u002Fadept\u002Ffuyu-8b)\n   2. **训练数据集接口**: (预训练) MMC4 | LAION2B | CC3M | CC12M, (SFT) MIMIC-IT | M3IT | LLAVAR | LRV | SVIT...\n        - *我们使用 OpenFlamingo 和 Otter 测试了上述数据集的预训练和指令微调，并用 Idefics 和 Fuyu 测试了指令微调数据集。我们将逐步开源这些训练脚本。*\n   3. [**基准测试接口**](https:\u002F\u002Fhuggingface.co\u002FOtter-AI): MagnifierBench\u002FMMBench\u002FMM-VET\u002FMathVista\u002FPOPE\u002FMME\u002FSicenceQA\u002FSeedBench。只需一键即可运行，详细信息请参阅 [Benchmark](.\u002Fdocs\u002Fbenchmark_eval.md)。\n    ```yaml\n        datasets:\n        - name: magnifierbench\n            split: test\n            prompt: Answer with the option's letter from the given choices directly.\n            api_key: [Your API Key] # GPT4 or GPT3.5 to evaluate the answers and ground truth.\n            debug: true # put debug=true will save the model response in log file.\n        - name: mme\n            split: test\n            debug: true\n        - name: mmbench\n            split: test\n            debug: true\n\n        models:\n        - name: gpt4v\n            api_key: [Your API Key] # to call GPT4V model.\n    ```\n   4. **代码重构**，以 **通过集成的 YAML 文件组织多组数据集**，详情请参阅 [管理 MIMIC-IT 格式的数据集](docs\u002Fmimicit_format.md)。例如：\n    ```yaml\n        IMAGE_TEXT: # 组名应为 [IMAGE_TEXT, TEXT_ONLY, IMAGE_TEXT_IN_CONTEXT]\n            LADD: # 数据集名称可任意命名\n                mimicit_path: azure_storage\u002Fjson\u002FLA\u002FLADD_instructions.json # 指令 JSON 文件路径\n                images_path: azure_storage\u002FParquets\u002FLA.parquet # 图像 Parquet 文件路径\n                num_samples: -1 # 要使用的样本数量，-1 表示使用全部样本，若未设置则默认为 -1。\n            M3IT_CAPTIONING:\n                mimicit_path: azure_storage\u002Fjson\u002FM3IT\u002Fcaptioning\u002Fcoco\u002Fcoco_instructions.json\n                images_path: azure_storage\u002FParquets\u002Fcoco.parquet\n                num_samples: 20000\n    ```\n   *这是一项重大变更，可能导致旧代码无法运行，请仔细查看相关说明。*\n\n**[2023-08]**\n\n1. 新增支持使用 Azure、Anthropic、Palm、Cohere 等模型通过 Syphus 流程进行自指导训练。如需了解使用方法，请修改 [此行](https:\u002F\u002Fgithub.com\u002FLuodian\u002FOtter\u002Fblob\u002F16d73b399fac6352ebff7504b1acb1f228fbf3f4\u002Fmimic-it\u002Fsyphus\u002Ffile_utils.py#L53)，替换为您选择的模型，并在环境变量中设置您的 API 密钥。更多信息请参阅 [LiteLLM](https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm\u002F)。\n\n**[2023-07]: 宣布推出 MIMIC-IT 数据集，用于多模态上下文指令微调。**\n\n1. 🤗 请在 Huggingface 数据集上查看 [MIMIC-IT](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fpufanyi\u002FMIMICIT)。\n2. 🥚 更新了 [Eggs](.\u002Fmimic-it\u002FREADME.md\u002F#eggs) 部分，以便下载 MIMIC-IT 数据集。\n3. 🥃 如果您希望针对特定场景开发 Otter（例如卫星图像或趣味视频），请联系我们。我们致力于支持和协助 Otter 的多样化应用场景。OpenFlamingo 和 Otter 是基于 [Flamingo](https:\u002F\u002Fwww.deepmind.com\u002Fblog\u002Ftackling-multiple-tasks-with-a-single-visual-language-model) 卓越架构的强大模型，该架构能够接受多张图片、视频或其他模态输入。让我们携手打造更多有趣的模型。\n\n**[2023-06]**\n\n1. 🧨 [下载 MIMIC-IT 数据集](https:\u002F\u002Fentuedu-my.sharepoint.com\u002F:f:\u002Fg\u002Fpersonal\u002Flibo0013_e_ntu_edu_sg\u002FEo9bgNV5cjtEswfA-HfjNNABiKsjDzSWAl5QYAlRZPiuZA?e=M9isDT)。有关数据集导航的更多详情，请参阅 [MIMIC-IT 数据集 README](mimic-it\u002FREADME.md)。\n2. 🏎️ [本地运行 Otter](.\u002Fpipeline\u002Fdemo)。您可以在至少配备 16G 显存的 GPU 上本地运行我们的模型，用于图像\u002F视频标注、字幕生成以及有害内容识别等任务。我们修复了一个与视频推理相关的错误，即 `frame tensors` 被错误地解压缩成了不正确的 `vision_x`。\n   > 请确保正确调整 `sys.path.append(\"..\u002F..\")` 以访问 `otter.modeling_otter`，从而启动模型。\n3. 🤗 请查阅我们的 [论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.05425)，其中详细介绍了 MIMIC-IT。认识一下 MIMIC-IT——首个包含 280 万条指令的多模态上下文指令微调数据集！从通用场景理解到捕捉细微差异，再到增强 AR 头戴设备的自我中心视角理解，我们的 MIMIC-IT 数据集应有尽有。\n\n## 🦦 为什么采用上下文指令微调？\n\n大型语言模型（LLMs）凭借其在海量文本数据上的预训练，在众多任务中展现出卓越的零\u002F少样本学习能力。在这些 LLMs 中，GPT-3 以其强大的能力脱颖而出。此外，GPT-3 的变体 InstructGPT 和 ChatGPT 通过指令微调，能够有效理解自然语言指令并完成复杂的现实世界任务。\n\n受 Flamingo 模型上游交错格式预训练的启发，我们推出了 🦦 Otter，这是一款基于 OpenFlamingo（DeepMind Flamingo 的开源版本）的多模态模型。我们采用上下文指令微调的方式，在我们提出的 **MI**-**M**odal **I**n-**C**ontext **I**nstruction **T**uning (**MIMIC-IT**) 数据集上训练 Otter。Otter 在图像和视频方面均表现出更强的指令遵循和上下文学习能力。\n\n## 🗄 MIMIC-IT 数据集详情\n\n\u003Cp align=\"center\" width=\"100%\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEvolvingLMMs-Lab_Otter_readme_06ec0bf9c5c3.png\"  width=\"80%\" height=\"80%\">\n\u003C\u002Fp>\n\nMIMIC-IT 能够支持以第一视角为核心的视觉助手模型，该模型可以回答诸如“嘿，你觉得我把钥匙落在桌子上了吗？”之类的问题。借助 MIMIC-IT，您可以充分发挥 AI 驱动的视觉助手潜力，将交互式视觉-语言任务提升至全新高度。\n\n\u003Cp align=\"center\" width=\"100%\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEvolvingLMMs-Lab_Otter_readme_0d8177a74229.png\"  width=\"80%\" height=\"80%\">\n\u003C\u002Fp>\n\n我们还推出了 **Syphus**，这是一个用于生成多语言高质量指令-响应对的自动化流水线。基于 LLaVA 提出的框架，我们利用 ChatGPT 根据视觉内容生成指令-响应对。为确保生成的指令-响应对的质量，我们的流水线在提示中加入了系统消息、视觉标注以及上下文示例，以引导 ChatGPT 的生成。\n\n更多详情，请参阅 [MIMIC-IT 数据集](mimic-it\u002FREADME.md)。\n\n## 🤖 Otter 模型详情\n\n\u003Cdiv style=\"text-align:center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEvolvingLMMs-Lab_Otter_readme_8be3a4e5ce7a.png\"  width=\"100%\" height=\"100%\">\n\u003C\u002Fdiv>\n\nOtter 模型旨在支持基于 OpenFlamingo 模型的多模态上下文指令微调，即根据相应的媒体（如与字幕或指令-响应对对应的图像）来调整语言模型。\n\n我们使用包含约 280 万个上下文指令-响应对的 MIMIC-IT 数据集对 Otter 进行了训练，这些数据被组织成一个连贯的模板，以方便执行各种任务。Otter 支持视频输入（帧的排列方式与原始 Flamingo 实现一致）以及作为上下文示例的多张图片输入，这使其成为 **首个经过多模态指令微调的模型**。\n\n以下模板包含了图像、用户指令和模型生成的响应，并使用 `User` 和 `GPT` 角色标签来实现流畅的用户-助手交互：\n\n```python\nprompt = f\"\u003Cimage>User: {instruction} GPT:\u003Canswer> {response}\u003Cendofchunk>\"\n```\n\n通过在 MIMIC-IT 数据集上训练 Otter 模型，它能够获得不同的能力，这一点在 LA 和 SD 任务中得到了验证。在 LA 任务上训练后，该模型展现出卓越的场景理解能力、推理能力和多轮对话能力。\n\n```python\n# 多轮对话\nprompt = f\"\u003Cimage>User: {first_instruction} GPT:\u003Canswer> {first_response}\u003Cendofchunk>User: {second_instruction} GPT:\u003Canswer>\"\n```\n\n关于组织视觉-语言上下文示例的概念，我们在 LA-T2T 任务上训练 Otter 模型后，展示了其遵循跨上下文指令的能力。组织后的输入数据格式如下：\n\n```python\n# 包含相似指令的多个上下文示例\nprompt = f\"\u003Cimage>User:{ict_first_instruction} GPT: \u003Canswer>{ict_first_response}\u003C|endofchunk|>\u003Cimage>User:{ict_second_instruction} GPT: \u003Canswer>{ict_second_response}\u003C|endofchunk|>\u003Cimage>User:{query_instruction} GPT: \u003Canswer>\"\n```\n\n更多详情，请参阅我们的 [论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.05425) 附录中的其他任务部分。\n\n## 🗂️ 环境配置\n\n1. 比较 `nvidia-smi` 和 `nvcc --version` 返回的 CUDA 版本，两者必须匹配。或者至少，`nvcc --version` 返回的版本应小于或等于 `nvidia-smi` 返回的版本。\n2. 安装与您的 CUDA 版本匹配的 PyTorch。（例如，CUDA 11.7 对应 PyTorch 2.0.0）。我们已在 CUDA 11.1 + PyTorch 1.10.1 和 CUDA 11.7 + PyTorch 2.0.0 上成功运行此代码。您可以参考 PyTorch 的官方文档，[最新版](https:\u002F\u002Fpytorch.org\u002F) 或 [历史版本](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Fprevious-versions\u002F)。\n3. 您可以通过 `conda env create -f environment.yml` 来安装环境。尤其要确保安装 `transformers>=4.28.0` 和 `accelerate>=0.18.0`。\n\n环境配置完成后，您只需几行代码即可将 🦩 Flamingo 模型 \u002F 🦦 Otter 模型作为 🤗 Hugging Face 模型使用！一键操作即可自动下载模型配置和权重。详细信息请参阅 [Huggingface Otter\u002FFlamingo](.\u002Fdocs\u002Fhuggingface_compatible.md)。\n\n## ☄️ 训练\n\nOtter 是基于 OpenFlamingo 训练的。您可能需要使用位于 [luodian\u002FOTTER-9B-INIT](https:\u002F\u002Fhuggingface.co\u002Fluodian\u002FOTTER-9B-INIT) 或 [luodian\u002FOTTER-MPT7B-Init](https:\u002F\u002Fhuggingface.co\u002Fluodian\u002FOTTER-MPT7B-Init) 的转换权重。它们分别由 [OpenFlamingo-LLaMA7B-v1](https:\u002F\u002Fhuggingface.co\u002Fopenflamingo\u002FOpenFlamingo-9B) 和 [OpenFlamingo-MPT7B-v2](https:\u002F\u002Fhuggingface.co\u002Fopenflamingo\u002FOpenFlamingo-9B-vitl-mpt7b) 转换而来。为了便于 Otter 的下游指令微调，我们为其添加了一个 `\u003Canswer>` 标记。\n\n您也可以使用任何已训练好的 Otter 权重，在我们的基础上继续训练，相关权重可在 [Otter Weights](https:\u002F\u002Fhuggingface.co\u002Fluodian) 中找到。准备图像、指令和训练 JSON 文件时，可参考 [MIMIC-IT](https:\u002F\u002Fgithub.com\u002FLuodian\u002FOtter\u002Ftree\u002Fmain\u002Fmimic-it)。\n\n```bash\nexport PYTHONPATH=.\nRUN_NAME=\"Otter_MPT7B\"\nGPU=8\nWORKERS=$((${GPU}*2))\n\necho \"使用 ${GPU} 张 GPU 卡和 ${WORKERS} 个工作进程\"\necho \"正在运行 ${RUN_NAME}\"\n\naccelerate launch --config_file=.\u002Fpipeline\u002Faccelerate_configs\u002Faccelerate_config_zero3.yaml \\\n    --num_processes=${GPU} \\\n    pipeline\u002Ftrain\u002Finstruction_following.py \\\n    --pretrained_model_name_or_path=luodian\u002FOTTER-MPT7B-Init \\\n    --model_name=otter \\\n    --instruction_format=simple \\\n    --training_data_yaml=.\u002Fshared_scripts\u002FDemo_Data.yaml \\\n    --batch_size=8 \\\n    --num_epochs=3 \\\n    --report_to_wandb \\\n    --wandb_entity=ntu-slab \\\n    --external_save_dir=.\u002Fcheckpoints \\\n    --run_name=${RUN_NAME} \\\n    --wandb_project=Otter_MPTV \\\n    --workers=${WORKERS} \\\n    --lr_scheduler=cosine \\\n    --learning_rate=2e-5 \\\n    --warmup_steps_ratio=0.01 \\\n    --save_hf_model \\\n    --max_seq_len=1024 \\\n```\n\n## 📑 引用\n\n如果您觉得本仓库对您有所帮助，请考虑引用以下文献：\n\n```\n@article{li2023otter,\n  title={Otter: 一种具有上下文指令微调能力的多模态模型},\n  author={Li, Bo and Zhang, Yuanhan and Chen, Liangyu and Wang, Jinghao and Yang, Jingkang and Liu, Ziwei},\n  journal={arXiv 预印本 arXiv:2305.03726},\n  year={2023}\n}\n\n@article{li2023mimicit,\n    title={MIMIC-IT：多模态上下文指令微调},\n    author={Bo Li and Yuanhan Zhang and Liangyu Chen and Jinghao Wang and Fanyi Pu and Jingkang Yang and Chunyuan Li and Ziwei Liu},\n    year={2023},\n    eprint={2306.05425},\n    archivePrefix={arXiv},\n    primaryClass={cs.CV}\n}\n```\n\n### 👨‍🏫 致谢\n\n我们感谢 [Jack Hessel](https:\u002F\u002Fjmhessel.com\u002F) 的建议和支持，同时也感谢 [OpenFlamingo](https:\u002F\u002Fgithub.com\u002Fmlfoundations\u002Fopen_flamingo) 团队为开源社区所做的杰出贡献。\n\n向 [Flamingo](https:\u002F\u002Fwww.deepmind.com\u002Fblog\u002Ftackling-multiple-tasks-with-a-single-visual-language-model) 和 [OpenFlamingo](https:\u002F\u002Fgithub.com\u002Fmlfoundations\u002Fopen_flamingo) 团队致以崇高敬意，感谢他们在这项卓越架构上的辛勤工作。\n\n### 📝 相关项目\n\n- [LLaVA: 视觉指令微调](https:\u002F\u002Fgithub.com\u002Fhaotian-liu\u002FLLaVA)\n- [基于 GPT4 的指令微调](https:\u002F\u002Fgithub.com\u002FInstruction-Tuning-with-GPT-4\u002FGPT-4-LLM)","# Otter 快速上手指南\n\nOtter 是一个基于 OpenFlamingo 架构的多模态大语言模型，支持图像和视频的上下文指令微调（In-Context Instruction Tuning）。它基于 MIMIC-IT 数据集训练，具备强大的多轮对话、场景理解及细粒度视觉识别能力。\n\n## 1. 环境准备\n\n### 系统要求\n*   **操作系统**: Linux (推荐 Ubuntu 20.04+)\n*   **GPU**: 至少 16GB 显存（用于本地推理和微调）\n*   **Python**: 3.8 或更高版本\n*   **CUDA**: 建议 11.7+\n\n### 前置依赖\n确保已安装 `git` 和 `conda`（推荐）。Otter 依赖 `litellm` 支持多种后端模型，并使用了 `Flash-Attention-2` 加速训练。\n\n## 2. 安装步骤\n\n### 方案 A：使用国内镜像源（推荐中国大陆用户）\n对于中国大陆开发者，推荐使用 **OpenXLab** 平台直接加载模型，或使用 Git 镜像加速克隆代码库。\n\n1.  **克隆代码库**\n    ```bash\n    git clone https:\u002F\u002Fgitee.com\u002Fmirrors\u002Fotter.git # 如有国内镜像请使用，否则使用官方源\n    # 若无国内镜像，使用官方源：\n    # git clone https:\u002F\u002Fgithub.com\u002FLuodian\u002Fotter.git\n    cd otter\n    ```\n\n2.  **创建虚拟环境并安装依赖**\n    ```bash\n    conda create -n otter python=3.10 -y\n    conda activate otter\n    \n    # 安装 PyTorch (根据实际 CUDA 版本调整)\n    pip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n    \n    # 安装 Flash-Attention-2 (可选但强烈推荐，提升训练速度 4-5 倍)\n    pip install flash-attn --no-build-isolation\n    \n    # 安装项目依赖\n    pip install -r requirements.txt\n    ```\n\n### 方案 B：模型获取\n*   **Hugging Face**:\n    *   图像模型: `luodian\u002FOTTER-Image-MPT7B`\n    *   视频模型: `luodian\u002FOTTER-Video-LLaMA7B-DenseCaption`\n*   **OpenXLab (国内加速)**:\n    *   [OTTER-Image-MPT7B](https:\u002F\u002Fopenxlab.org.cn\u002Fmodels\u002Fdetail\u002FYuanhanZhang\u002FOTTER-Image-MPT7B)\n    *   [OTTER-Video-LLaMA7B-DenseCaption](https:\u002F\u002Fopenxlab.org.cn\u002Fmodels\u002Fdetail\u002FYuanhanZhang\u002FOTTER-Video-LLaMA7B-DenseCaption)\n\n## 3. 基本使用\n\n### 本地推理示例\nOtter 支持图像标注、视频描述生成及有害内容识别。运行前请确保正确设置路径以导入 `otter.modeling_otter`。\n\n**注意**：在运行脚本前，需调整 `sys.path`。\n\n```python\nimport sys\nimport torch\nfrom PIL import Image\n\n# 关键步骤：修正路径以访问 otter 模块\nsys.path.append(\"..\u002F..\") \n\nfrom otter.modeling_otter import OtterForConditionalGeneration\n\n# 加载模型 (示例：加载图像模型)\n# 请替换为本地下载后的模型路径或 HuggingFace 模型 ID\nmodel_path = \"luodian\u002FOTTER-Image-MPT7B\" \n\ndevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n\n# 初始化模型 (具体加载方式需参考 pipeline\u002Fdemo 中的实现)\n# 此处为概念性代码，具体实例化请参考官方 demo 脚本\n# model = OtterForConditionalGeneration.from_pretrained(model_path).to(device)\n\n# 准备输入\nimage = Image.open(\"path\u002Fto\u002Fyour\u002Fimage.jpg\").convert(\"RGB\")\ninstruction = \"Describe this image in detail.\"\n\n# 构建 Prompt 模板\n# 格式：\u003Cimage>User: {instruction} GPT:\u003Canswer>\nprompt = f\"\u003Cimage>User: {instruction} GPT:\u003Canswer>\"\n\n# 执行推理 (伪代码，具体 tensor 处理见 pipeline\u002Fdemo)\n# inputs = process_image_and_text(image, prompt)\n# outputs = model.generate(**inputs)\n# print(outputs)\n```\n\n> **提示**：完整的本地运行脚本请参考 `.\u002Fpipeline\u002Fdemo` 目录。该目录修复了视频推理中 `frame tensors` 的错误压缩问题。\n\n### 使用 YAML 配置评估基准\nOtter 支持一键运行多个基准测试（如 MMBench, MME, MagnifierBench 等）。创建一个 `eval_config.yaml` 文件：\n\n```yaml\ndatasets:\n  - name: magnifierbench\n    split: test\n    prompt: Answer with the option's letter from the given choices directly.\n    api_key: [Your API Key] # 用于 GPT4\u002FGPT3.5 评估答案\n    debug: true # 设为 true 将保存模型响应到日志\n  - name: mme\n    split: test\n    debug: true\n  - name: mmbench\n    split: test\n    debug: true\n\nmodels:\n  - name: gpt4v\n    api_key: [Your API Key] # 调用 GPT4V 模型的密钥\n```\n\n运行评估脚本（具体脚本名请参考 `docs\u002Fbenchmark_eval.md`）：\n```bash\npython pipeline\u002Feval\u002Frun_benchmark.py --config eval_config.yaml\n```\n\n### 进阶：多模态上下文学习\nOtter 的核心优势在于支持多轮对话和多图像上下文示例。\n\n**多轮对话示例：**\n```python\nprompt = f\"\u003Cimage>User: {first_instruction} GPT:\u003Canswer> {first_response}\u003Cendofchunk>User: {second_instruction} GPT:\u003Canswer>\"\n```\n\n**多图像上下文示例（In-Context Learning）：**\n```python\nprompt = f\"\u003Cimage>User:{ict_first_instruction} GPT: \u003Canswer>{ict_first_response}\u003C|endofchunk|>\u003Cimage>User:{ict_second_instruction} GPT: \u003Canswer>{ict_second_response}\u003C|endofchunk|>\u003Cimage>User: {target_instruction} GPT:\u003Canswer>\"\n```\n\n---\n*注：本指南基于 Otter v0.3 及以上版本。代码仍在持续迭代中，如遇问题欢迎提交 Issue 或 PR。*","某电商平台的视觉算法团队正致力于构建一个能自动分析商品短视频并生成详细营销文案的智能系统。\n\n### 没有 Otter 时\n- **细粒度识别困难**：传统多模态模型难以捕捉视频中占比极小（如仅占画面 1%）的商品细节或标签信息，导致生成的描述笼统模糊。\n- **指令遵循能力弱**：模型无法准确理解复杂的自然语言指令（如“请重点描述模特佩戴的配饰材质”），往往输出固定模板式的回答。\n- **高分辨率处理瓶颈**：处理高清商品视频时需要额外的视觉编码器进行预处理，流程繁琐且显存占用高，推理速度缓慢。\n- **上下文学习缺失**：面对新的商品类别或营销风格示例，模型无法通过少量样本快速调整输出风格，每次都需要重新微调训练。\n\n### 使用 Otter 后\n- **精准捕捉微小细节**：借助 OtterHD 的高分辨率处理能力，模型能直接识别视频中微小的商品纹理和空间关系，生成极具画面感的细节描述。\n- **完美执行复杂指令**：基于 MIMIC-IT 数据集训练的强指令遵循能力，让 Otter 能精准响应“突出展示特定卖点”等定制化需求，输出灵活多变。\n- **架构精简高效**：Otter 无需显式的独立视觉编码器，直接将图像块与文本令牌联合处理，大幅降低了高清视频分析的延迟和资源消耗。\n- **强大的少样本适应力**：利用其卓越的上下文学习能力，只需在提示词中提供几个新风格的文案示例，Otter 即可立即模仿并应用于新商品视频。\n\nOtter 通过突破性的多模态架构，将电商视频内容从“粗略识别”升级为“细粒度理解与定制化创作”，显著提升了自动化营销内容的质量与生产效率。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEvolvingLMMs-Lab_Otter_503397b7.png","EvolvingLMMs-Lab","LMMs-Lab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FEvolvingLMMs-Lab_cafaf396.png","Feeling and building multimodal intelligence.",null,"drluodian@gmail.com","lmmslab","https:\u002F\u002Fgithub.com\u002FEvolvingLMMs-Lab",[81,85,89],{"name":82,"color":83,"percentage":84},"Python","#3572A5",96.9,{"name":86,"color":87,"percentage":88},"Shell","#89e051",2.1,{"name":90,"color":91,"percentage":92},"Jupyter Notebook","#DA5B0B",1,3365,211,"2026-04-16T18:25:25","MIT",4,"未说明","需要 NVIDIA GPU，本地运行至少需要 16GB 显存 (提及 'at least 16G GPU mem')",{"notes":101,"python":98,"dependencies":102},"1. 本地运行模型进行图像\u002F视频标记、字幕生成等任务至少需要 16GB GPU 显存。2. 代码使用了 Flash-Attention-2 以提升训练吞吐量。3. 运行前需正确调整 sys.path 以访问 otter.modeling_otter 模块。4. 支持通过 LiteLLM 调用 Azure、Anthropic、Palm、Cohere 等 API 模型。5. 项目包含多个模型变体（如 OTTER-Image-MPT7B, OtterHD-8B），具体显存需求可能随模型大小变化。",[103,104,105,106,107],"torch","transformers","flash-attn (Flash-Attention-2)","litellm","openflamingo",[14,35],[110,111,112,113,114,115,116,117,118,119,120],"gpt-4","visual-language-learning","artificial-inteligence","deep-learning","foundation-models","multi-modality","machine-learning","chatgpt","instruction-tuning","large-scale-models","embodied-ai","2026-03-27T02:49:30.150509","2026-04-20T04:04:27.200423",[124,129,134,139,144,149],{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},43200,"为什么 openflamingo-9b-hf 模型的参数文件大小（30GB）远大于 LLaMA-7B（13GB）？","这是因为精度存储格式不同。LLaMA-7B 的 13GB 版本通常是以 `torch.float16` 格式存储的，而 OpenFlamingo 默认将参数存储为 `torch.float32`，导致文件大小约为两倍。维护者表示未来可能会考虑发布 `float16` 版本的模型以减小体积。","https:\u002F\u002Fgithub.com\u002FEvolvingLMMs-Lab\u002FOtter\u002Fissues\u002F92",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},43201,"运行 Demo 时遇到 'RuntimeError: GET was unable to find an engine to execute this computation' 错误怎么办？","该错误通常由 CUDA 版本不匹配引起。请检查并对比 `nvidia-smi` 显示的驱动支持版本与 `nvcc --version` 显示的编译器版本是否一致。确保 PyTorch、CUDA 驱动和 nvcc 版本相互兼容。","https:\u002F\u002Fgithub.com\u002FEvolvingLMMs-Lab\u002FOtter\u002Fissues\u002F147",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},43202,"如何下载 MIMICIT 数据集时解决 'AccessDenied' 或部分文件无法下载的问题？","部分数据集文件（如 SN 和 TVC）可能曾存在上传问题。维护者已重新上传了相关文件。如果遇到特定分片（parquet 文件）访问被拒绝，请稍后重试或检查 Hugging Face 仓库的最新状态，维护者通常会修复此类权限问题。","https:\u002F\u002Fgithub.com\u002FEvolvingLMMs-Lab\u002FOtter\u002Fissues\u002F338",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},43203,"下载的 MIMICIT 数据集指令数量（约 117 万）与论文中提到的（220 万）不符，缺少了哪些数据？","这通常是因为某些子数据集（如 VST）在特定时间点下载不完整。请检查 Hugging Face 仓库是否有更新，特别是针对 VST 等缺失样本的补充。维护者确认 LA 等其他数据集是正常的，建议重新拉取最新的数据集文件。","https:\u002F\u002Fgithub.com\u002FEvolvingLMMs-Lab\u002FOtter\u002Fissues\u002F318",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},43204,"如何将旧的 Open Flamingo 权重转换为新的 Hugging Face 格式并加载模型？","首先使用脚本转换权重：`python flamingo_hf\u002Fconverting_flamingo_to_pytorch.py --old old_ckpt.pt --new new_ckpt.pt`。然后使用以下代码加载：\n```python\nimport torch\nfrom configuration_flamingo import FlamingoConfig\nfrom modeling_flamingo import FlamingoModel\n\nconfig = FlamingoConfig.from_json_file(\"config.json\")\nmodel = FlamingoModel(config)\nmodel.load_state_dict(torch.load(\"new_ckpt.pt\", map_location=\"cpu\"), strict=False)\nmodel.save_pretrained(\"保存路径\")\n```\n注意文本生成时需设置 `padding_side = \"left\"` 并在输入中加入 `\u003Cimage>` 和 `\u003C|endofchunk|>` 特殊标记。","https:\u002F\u002Fgithub.com\u002FEvolvingLMMs-Lab\u002FOtter\u002Fissues\u002F24",{"id":150,"question_zh":151,"answer_zh":152,"source_url":153},43205,"运行 OtterHD 的 Demo 脚本时，因缺少 Azure 存储路径导致失败，如何解决？","目前大部分所需文件已可通过 Hugging Face 获取。用户需要从 HF 下载数据，并修改 `Demo_Data.yaml` 配置文件，将原本指向 `azure_storage` 的路径替换为本地下载后的实际路径。需注意匹配对应的 `instructions.json` 文件和 `parquet` 数据文件。","https:\u002F\u002Fgithub.com\u002FEvolvingLMMs-Lab\u002FOtter\u002Fissues\u002F311",[155,160,165],{"id":156,"version":157,"summary_zh":158,"released_at":159},342872,"v0.3.0","**[2023-11]：支持GPT4V在8个基准上的评估；宣布推出基于Fuyu-8B改进的OtterHD-8B。详情请参阅[OtterHD](.\u002Fdocs\u002FOtterHD.md)。**\n\n\u003Cdiv style=\"text-align:center\">\n\u003Cimg src=\"https:\u002F\u002Fi.postimg.cc\u002FdtxQQzt6\u002Fdemo0.png\"  width=\"100%\" height=\"100%\">\n\u003C\u002Fdiv>\n\n1. 🦦 新增了[OtterHD](.\u002Fdocs\u002FOtterHD.md)，它是基于[Fuyu-8B](https:\u002F\u002Fhuggingface.co\u002Fadept\u002Ffuyu-8b)进行多模态微调的模型，旨在无需显式视觉编码器模块的情况下，对高分辨率视觉输入进行细粒度的理解与解析。所有图像块均经过线性变换，并与文本标记一同处理。这是一次极具创新性和优雅性的探索。我们对此深感着迷，并在此基础上开源了Fuyu-8B的微调脚本，同时借助[Flash-Attention-2](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention)将训练吞吐量提升了4至5倍。欢迎尝试我们的微调脚本，详见[OtterHD](.\u002Fdocs\u002FOtterHD.md)。\n2. 🔍 新增了[MagnifierBench](.\u002Fdocs\u002FOtterHD.md)，这是一个专门用于评估模型能否识别图像中极小物体信息（占图像面积的1%）及其空间关系的评测基准。\n3. 针对当前领先的多模态大模型，优化了[预训练](pipeline\u002Ftrain\u002Fpretraining.py)、[监督微调](pipeline\u002Ftrain\u002Finstruction_following.py)以及[RLHF]()的流程。\n   1. **模型**：[Otter](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.03726) | [OpenFlamingo](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.01390) | [Idefics](https:\u002F\u002Fhuggingface.co\u002FHuggingFaceM4\u002Fidefics-80b-instruct) | [Fuyu](https:\u002F\u002Fhuggingface.co\u002Fadept\u002Ffuyu-8b)\n   2. **训练数据集接口：(预训练)** MMC4 | LAION2B | CC3M | CC12M，**(SFT)** MIMIC-IT | M3IT | LLAVAR | LRV | SVIT…\n        - *我们已使用上述数据集对OpenFlamingo和Otter进行了预训练和指令微调的测试，并针对Idefics和Fuyu也进行了指令微调的测试。我们将逐步开源相关训练脚本。*\n   3. 【**基准评测接口**】(https:\u002F\u002Fhuggingface.co\u002FOtter-AI)：MagnifierBench\u002FMMBench\u002FMM-VET\u002FMathVista\u002FPOPE\u002FMME\u002FSicenceQA\u002FSeedBench。这些评测可一键运行，详情请参阅[Benchmark](.\u002Fdocs\u002Fbenchmark_eval.md)。\n    ```yaml\n        datasets:\n        - name: magnifierbench\n            split: test\n            prompt: 请直接从给定选项中选择答案字母作答。\n            api_key: [您的API密钥] # 使用GPT4或GPT3.5评估答案与真实标签。\n            debug: true # 设置debug=true会将模型响应保存到日志文件中。\n        - name: mme\n            split: test\n            debug: true\n        - name: mmbench\n            split: test\n            debug: true\n\n        models:\n        - name: gpt4v\n            api_key: [您的API密钥] # 用于调用GPT4V模型。\n    ```  \n   4. 对代码进行了重构，以**通过集成式YAML文件组织多组数据集**，详细内容请参阅[管理MIMIC-IT格式的数据集](docs\u002Fmimicit_format.md)。例如：\n    ```yaml\n        IMAGE_TEXT: # 组名应为[IMAGE_TEXT, TEX","2023-11-18T04:44:36",{"id":161,"version":162,"summary_zh":163,"released_at":164},342873,"v0.2.0","- 🧨 [下载 MIMIC-IT 数据集](https:\u002F\u002Fentuedu-my.sharepoint.com\u002F:f:\u002Fg\u002Fpersonal\u002Flibo0013_e_ntu_edu_sg\u002FEo9bgNV5cjtEswfA-HfjNNABiKsjDzSWAl5QYAlRZPiuZA?e=M9isDT)。如需了解更多关于数据集的使用信息，请参阅 [MIMIC-IT 数据集 README](https:\u002F\u002Fgithub.com\u002FLuodian\u002FOtter\u002Fblob\u002Fmain\u002Fmimic-it\u002FREADME.md)。\n\n- 🏎️ [在本地运行 Otter](https:\u002F\u002Fgithub.com\u002FLuodian\u002FOtter\u002Fblob\u002Fmain\u002Fpipeline\u002Fdemo)。您可以在至少配备 16GB 显存的 GPU 上本地运行我们的模型，以执行图像\u002F视频标注、生成描述以及识别有害内容等任务。我们修复了一个与视频推理相关的 bug，该 bug 导致帧张量被错误地压缩到不正确的 vision_x 维度。现在您可以使用更新后的版本再次尝试运行。\n  > 请确保正确调整 `sys.path.append(\"..\u002F..\")`，以便能够访问 `otter.modeling_otter` 模块，从而成功启动模型。","2023-06-24T18:11:45",{"id":166,"version":167,"summary_zh":168,"released_at":169},342874,"v0.1.0","我们很高兴地宣布 🦦 Otter 的首次发布！","2023-04-30T10:38:11"]