[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-microsoft--markitdown":3,"tool-microsoft--markitdown":61},[4,18,26,36,44,52],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",142651,2,"2026-04-06T23:34:12",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":53,"name":54,"github_repo":55,"description_zh":56,"stars":57,"difficulty_score":10,"last_commit_at":58,"category_tags":59,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,60],"视频",{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":78,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":32,"env_os":95,"env_gpu":96,"env_ram":96,"env_deps":97,"category_tags":104,"github_topics":106,"view_count":114,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":115,"updated_at":116,"faqs":117,"releases":147},4721,"microsoft\u002Fmarkitdown","markitdown","Python tool for converting files and office documents to Markdown.","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器分析提供高质量的结构化输入，而非替代专业排版工具。通过简单的命令行操作或 Python 调用，用户即可轻松实现文件格式的统一化处理，大幅提升工作流效率。","# MarkItDown\n\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fmarkitdown.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmarkitdown\u002F)\n![PyPI - Downloads](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdd\u002Fmarkitdown)\n[![Built by AutoGen Team](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBuilt%20by-AutoGen%20Team-blue)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fautogen)\n\n> [!TIP]\n> MarkItDown now offers an MCP (Model Context Protocol) server for integration with LLM applications like Claude Desktop. See [markitdown-mcp](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Ftree\u002Fmain\u002Fpackages\u002Fmarkitdown-mcp) for more information.\n\n> [!IMPORTANT]\n> Breaking changes between 0.0.1 to 0.1.0:\n> * Dependencies are now organized into optional feature-groups (further details below). Use `pip install 'markitdown[all]'` to have backward-compatible behavior.\n> * convert\\_stream() now requires a binary file-like object (e.g., a file opened in binary mode, or an io.BytesIO object). This is a breaking change from the previous version, where it previously also accepted text file-like objects, like io.StringIO.\n> * The DocumentConverter class interface has changed to read from file-like streams rather than file paths. *No temporary files are created anymore*. If you are the maintainer of a plugin, or custom DocumentConverter, you likely need to update your code. Otherwise, if only using the MarkItDown class or CLI (as in these examples), you should not need to change anything.\n\nMarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. To this end, it is most comparable to [textract](https:\u002F\u002Fgithub.com\u002Fdeanmalmgren\u002Ftextract), but with a focus on preserving important document structure and content as Markdown (including: headings, lists, tables, links, etc.) While the output is often reasonably presentable and human-friendly, it is meant to be consumed by text analysis tools -- and may not be the best option for high-fidelity document conversions for human consumption.\n\nMarkItDown currently supports the conversion from:\n\n- PDF\n- PowerPoint\n- Word\n- Excel\n- Images (EXIF metadata and OCR)\n- Audio (EXIF metadata and speech transcription)\n- HTML\n- Text-based formats (CSV, JSON, XML)\n- ZIP files (iterates over contents)\n- Youtube URLs\n- EPubs\n- ... and more!\n\n## Why Markdown?\n\nMarkdown is extremely close to plain text, with minimal markup or formatting, but still\nprovides a way to represent important document structure. Mainstream LLMs, such as\nOpenAI's GPT-4o, natively \"_speak_\" Markdown, and often incorporate Markdown into their\nresponses unprompted. This suggests that they have been trained on vast amounts of\nMarkdown-formatted text, and understand it well. As a side benefit, Markdown conventions\nare also highly token-efficient.\n\n## Prerequisites\nMarkItDown requires Python 3.10 or higher. It is recommended to use a virtual environment to avoid dependency conflicts.\n\nWith the standard Python installation, you can create and activate a virtual environment using the following commands:\n\n```bash\npython -m venv .venv\nsource .venv\u002Fbin\u002Factivate\n```\n\nIf using `uv`, you can create a virtual environment with:\n\n```bash\nuv venv --python=3.12 .venv\nsource .venv\u002Fbin\u002Factivate\n# NOTE: Be sure to use 'uv pip install' rather than just 'pip install' to install packages in this virtual environment\n```\n\nIf you are using Anaconda, you can create a virtual environment with:\n\n```bash\nconda create -n markitdown python=3.12\nconda activate markitdown\n```\n\n## Installation\n\nTo install MarkItDown, use pip: `pip install 'markitdown[all]'`. Alternatively, you can install it from the source:\n\n```bash\ngit clone git@github.com:microsoft\u002Fmarkitdown.git\ncd markitdown\npip install -e 'packages\u002Fmarkitdown[all]'\n```\n\n## Usage\n\n### Command-Line\n\n```bash\nmarkitdown path-to-file.pdf > document.md\n```\n\nOr use `-o` to specify the output file:\n\n```bash\nmarkitdown path-to-file.pdf -o document.md\n```\n\nYou can also pipe content:\n\n```bash\ncat path-to-file.pdf | markitdown\n```\n\n### Optional Dependencies\nMarkItDown has optional dependencies for activating various file formats. Earlier in this document, we installed all optional dependencies with the `[all]` option. However, you can also install them individually for more control. For example:\n\n```bash\npip install 'markitdown[pdf, docx, pptx]'\n```\n\nwill install only the dependencies for PDF, DOCX, and PPTX files.\n\nAt the moment, the following optional dependencies are available:\n\n* `[all]` Installs all optional dependencies\n* `[pptx]` Installs dependencies for PowerPoint files\n* `[docx]` Installs dependencies for Word files\n* `[xlsx]` Installs dependencies for Excel files\n* `[xls]` Installs dependencies for older Excel files\n* `[pdf]` Installs dependencies for PDF files\n* `[outlook]` Installs dependencies for Outlook messages\n* `[az-doc-intel]` Installs dependencies for Azure Document Intelligence\n* `[audio-transcription]` Installs dependencies for audio transcription of wav and mp3 files\n* `[youtube-transcription]` Installs dependencies for fetching YouTube video transcription\n\n### Plugins\n\nMarkItDown also supports 3rd-party plugins. Plugins are disabled by default. To list installed plugins:\n\n```bash\nmarkitdown --list-plugins\n```\n\nTo enable plugins use:\n\n```bash\nmarkitdown --use-plugins path-to-file.pdf\n```\n\nTo find available plugins, search GitHub for the hashtag `#markitdown-plugin`. To develop a plugin, see `packages\u002Fmarkitdown-sample-plugin`.\n\n#### markitdown-ocr Plugin\n\nThe `markitdown-ocr` plugin adds OCR support to PDF, DOCX, PPTX, and XLSX converters, extracting text from embedded images using LLM Vision — the same `llm_client` \u002F `llm_model` pattern that MarkItDown already uses for image descriptions. No new ML libraries or binary dependencies required.\n\n**Installation:**\n\n```bash\npip install markitdown-ocr\npip install openai  # or any OpenAI-compatible client\n```\n\n**Usage:**\n\nPass the same `llm_client` and `llm_model` you would use for image descriptions:\n\n```python\nfrom markitdown import MarkItDown\nfrom openai import OpenAI\n\nmd = MarkItDown(\n    enable_plugins=True,\n    llm_client=OpenAI(),\n    llm_model=\"gpt-4o\",\n)\nresult = md.convert(\"document_with_images.pdf\")\nprint(result.text_content)\n```\n\nIf no `llm_client` is provided the plugin still loads, but OCR is silently skipped and the standard built-in converter is used instead.\n\nSee [`packages\u002Fmarkitdown-ocr\u002FREADME.md`](packages\u002Fmarkitdown-ocr\u002FREADME.md) for detailed documentation.\n\n### Azure Document Intelligence\n\nTo use Microsoft Document Intelligence for conversion:\n\n```bash\nmarkitdown path-to-file.pdf -o document.md -d -e \"\u003Cdocument_intelligence_endpoint>\"\n```\n\nMore information about how to set up an Azure Document Intelligence Resource can be found [here](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fai-services\u002Fdocument-intelligence\u002Fhow-to-guides\u002Fcreate-document-intelligence-resource?view=doc-intel-4.0.0)\n\n### Python API\n\nBasic usage in Python:\n\n```python\nfrom markitdown import MarkItDown\n\nmd = MarkItDown(enable_plugins=False) # Set to True to enable plugins\nresult = md.convert(\"test.xlsx\")\nprint(result.text_content)\n```\n\nDocument Intelligence conversion in Python:\n\n```python\nfrom markitdown import MarkItDown\n\nmd = MarkItDown(docintel_endpoint=\"\u003Cdocument_intelligence_endpoint>\")\nresult = md.convert(\"test.pdf\")\nprint(result.text_content)\n```\n\nTo use Large Language Models for image descriptions (currently only for pptx and image files), provide `llm_client` and `llm_model`:\n\n```python\nfrom markitdown import MarkItDown\nfrom openai import OpenAI\n\nclient = OpenAI()\nmd = MarkItDown(llm_client=client, llm_model=\"gpt-4o\", llm_prompt=\"optional custom prompt\")\nresult = md.convert(\"example.jpg\")\nprint(result.text_content)\n```\n\n### Docker\n\n```sh\ndocker build -t markitdown:latest .\ndocker run --rm -i markitdown:latest \u003C ~\u002Fyour-file.pdf > output.md\n```\n\n## Contributing\n\nThis project welcomes contributions and suggestions. Most contributions require you to agree to a\nContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us\nthe rights to use your contribution. For details, visit https:\u002F\u002Fcla.opensource.microsoft.com.\n\nWhen you submit a pull request, a CLA bot will automatically determine whether you need to provide\na CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions\nprovided by the bot. You will only need to do this once across all repos using our CLA.\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002F).\nFor more information see the [Code of Conduct FAQ](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002Ffaq\u002F) or\ncontact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n\n### How to Contribute\n\nYou can help by looking at issues or helping review PRs. Any issue or PR is welcome, but we have also marked some as 'open for contribution' and 'open for reviewing' to help facilitate community contributions. These are of course just suggestions and you are welcome to contribute in any way you like.\n\n\u003Cdiv align=\"center\">\n\n|            | All                                                          | Especially Needs Help from Community                                                                                                      |\n| ---------- | ------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------- |\n| **Issues** | [All Issues](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fissues) | [Issues open for contribution](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fissues?q=is%3Aissue+is%3Aopen+label%3A%22open+for+contribution%22) |\n| **PRs**    | [All PRs](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpulls)     | [PRs open for reviewing](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpulls?q=is%3Apr+is%3Aopen+label%3A%22open+for+reviewing%22)              |\n\n\u003C\u002Fdiv>\n\n### Running Tests and Checks\n\n- Navigate to the MarkItDown package:\n\n  ```sh\n  cd packages\u002Fmarkitdown\n  ```\n\n- Install `hatch` in your environment and run tests:\n\n  ```sh\n  pip install hatch  # Other ways of installing hatch: https:\u002F\u002Fhatch.pypa.io\u002Fdev\u002Finstall\u002F\n  hatch shell\n  hatch test\n  ```\n\n  (Alternative) Use the Devcontainer which has all the dependencies installed:\n\n  ```sh\n  # Reopen the project in Devcontainer and run:\n  hatch test\n  ```\n\n- Run pre-commit checks before submitting a PR: `pre-commit run --all-files`\n\n### Contributing 3rd-party Plugins\n\nYou can also contribute by creating and sharing 3rd party plugins. See `packages\u002Fmarkitdown-sample-plugin` for more details.\n\n## Trademarks\n\nThis project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft\ntrademarks or logos is subject to and must follow\n[Microsoft's Trademark & Brand Guidelines](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Flegal\u002Fintellectualproperty\u002Ftrademarks\u002Fusage\u002Fgeneral).\nUse of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.\nAny use of third-party trademarks or logos are subject to those third-party's policies.\n","# MarkItDown\n\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fmarkitdown.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmarkitdown\u002F)\n![PyPI - Downloads](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdd\u002Fmarkitdown)\n[![由 AutoGen 团队构建](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBuilt%20by-AutoGen%20Team-blue)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fautogen)\n\n> [!TIP]\n> MarkItDown 现在提供了一个 MCP（模型上下文协议）服务器，用于与 Claude Desktop 等 LLM 应用程序集成。更多信息请参阅 [markitdown-mcp](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Ftree\u002Fmain\u002Fpackages\u002Fmarkitdown-mcp)。\n\n> [!IMPORTANT]\n> 0.0.1 到 0.1.0 之间的重大变更：\n> * 依赖项现在被组织为可选的功能组（详情见下文）。若需保持向后兼容性，请使用 `pip install 'markitdown[all]'`。\n> * `convert_stream()` 现在需要一个二进制文件类对象（例如以二进制模式打开的文件或 `io.BytesIO` 对象）。这是对先前版本的重大更改，因为之前它也接受文本文件类对象，如 `io.StringIO`。\n> * `DocumentConverter` 类的接口已更改为从文件类流中读取，而非文件路径。*不再创建临时文件*。如果您是插件或自定义 `DocumentConverter` 的维护者，可能需要更新您的代码。否则，如果仅使用 `MarkItDown` 类或 CLI（如本示例所示），则无需进行任何更改。\n\nMarkItDown 是一款轻量级的 Python 工具，用于将各种文件转换为 Markdown 格式，以便在 LLM 和相关文本分析流水线中使用。在这方面，它与 [textract](https:\u002F\u002Fgithub.com\u002Fdeanmalmgren\u002Ftextract) 最为相似，但更注重以 Markdown 格式保留文档的重要结构和内容（包括：标题、列表、表格、链接等）。虽然输出通常具有较好的可读性和人性化呈现效果，但它主要是供文本分析工具消费的——对于需要高保真度的人类阅读文档转换来说，可能并不是最佳选择。\n\n目前，MarkItDown 支持以下格式的转换：\n\n- PDF\n- PowerPoint\n- Word\n- Excel\n- 图片（EXIF 元数据和 OCR）\n- 音频（EXIF 元数据和语音转录）\n- HTML\n- 文本格式（CSV、JSON、XML）\n- ZIP 文件（遍历其内容）\n- YouTube 视频链接\n- EPUB\n- ……以及更多！\n\n## 为什么选择 Markdown？\n\nMarkdown 非常接近纯文本，几乎没有标记或格式化，但仍能表示文档的重要结构。主流 LLM，例如 OpenAI 的 GPT-4o，原生支持 Markdown，并且经常会在未被提示的情况下在其响应中加入 Markdown 内容。这表明它们已经接受了大量 Markdown 格式的训练，并对其有很好的理解。此外，Markdown 的语法也非常节省 token。\n\n## 先决条件\nMarkItDown 需要 Python 3.10 或更高版本。建议使用虚拟环境以避免依赖冲突。\n\n使用标准 Python 安装时，可以通过以下命令创建并激活虚拟环境：\n\n```bash\npython -m venv .venv\nsource .venv\u002Fbin\u002Factivate\n```\n\n如果使用 `uv`，可以这样创建虚拟环境：\n\n```bash\nuv venv --python=3.12 .venv\nsource .venv\u002Fbin\u002Factivate\n# 注意：请务必使用 'uv pip install' 而不是 'pip install' 来安装此虚拟环境中的包\n```\n\n如果您使用 Anaconda，可以这样创建虚拟环境：\n\n```bash\nconda create -n markitdown python=3.12\nconda activate markitdown\n```\n\n## 安装\n要安装 MarkItDown，可以使用 pip：`pip install 'markitdown[all]'`。或者您也可以从源码安装：\n\n```bash\ngit clone git@github.com:microsoft\u002Fmarkitdown.git\ncd markitdown\npip install -e 'packages\u002Fmarkitdown[all]'\n```\n\n## 使用方法\n\n### 命令行\n```bash\nmarkitdown path-to-file.pdf > document.md\n```\n\n或者使用 `-o` 指定输出文件：\n\n```bash\nmarkitdown path-to-file.pdf -o document.md\n```\n\n您还可以通过管道输入内容：\n\n```bash\ncat path-to-file.pdf | markitdown\n```\n\n### 可选依赖\nMarkItDown 具有用于启用各种文件格式的可选依赖项。在本文前面，我们使用 `[all]` 选项安装了所有可选依赖项。不过，您也可以单独安装它们以获得更高的控制权。例如：\n\n```bash\npip install 'markitdown[pdf, docx, pptx]'\n```\n\n将只安装 PDF、DOCX 和 PPTX 文件所需的依赖项。\n\n目前，可用的可选依赖项如下：\n\n* `[all]` 安装所有可选依赖项\n* `[pptx]` 安装 PowerPoint 文件所需的依赖项\n* `[docx]` 安装 Word 文件所需的依赖项\n* `[xlsx]` 安装 Excel 文件所需的依赖项\n* `[xls]` 安装旧版 Excel 文件所需的依赖项\n* `[pdf]` 安装 PDF 文件所需的依赖项\n* `[outlook]` 安装 Outlook 邮件所需的依赖项\n* `[az-doc-intel]` 安装 Azure Document Intelligence 所需的依赖项\n* `[audio-transcription]` 安装用于 WAV 和 MP3 文件音频转录的依赖项\n* `[youtube-transcription]` 安装用于获取 YouTube 视频转录的依赖项\n\n### 插件\nMarkItDown 还支持第三方插件。默认情况下，插件是禁用的。要列出已安装的插件：\n\n```bash\nmarkitdown --list-plugins\n```\n\n要启用插件，请使用：\n\n```bash\nmarkitdown --use-plugins path-to-file.pdf\n```\n\n要查找可用的插件，可以在 GitHub 上搜索标签 `#markitdown-plugin`。要开发插件，请参阅 `packages\u002Fmarkitdown-sample-plugin`。\n\n#### markitdown-ocr 插件\n`markitdown-ocr` 插件为 PDF、DOCX、PPTX 和 XLSX 转换器增加了 OCR 支持，利用 LLM Vision 从嵌入式图像中提取文本——这与 MarkItDown 已经用于图像描述的 `llm_client` \u002F `llm_model` 模式相同。无需新的机器学习库或二进制依赖。\n\n**安装：**\n\n```bash\npip install markitdown-ocr\npip install openai  # 或任何与 OpenAI 兼容的客户端\n```\n\n**使用：**\n\n传递您用于图像描述的相同 `llm_client` 和 `llm_model`：\n\n```python\nfrom markitdown import MarkItDown\nfrom openai import OpenAI\n\nmd = MarkItDown(\n    enable_plugins=True,\n    llm_client=OpenAI(),\n    llm_model=\"gpt-4o\",\n)\nresult = md.convert(\"document_with_images.pdf\")\nprint(result.text_content)\n```\n\n如果没有提供 `llm_client`，插件仍然会加载，但 OCR 将被静默跳过，转而使用标准内置转换器。\n\n详细文档请参阅 [`packages\u002Fmarkitdown-ocr\u002FREADME.md`](packages\u002Fmarkitdown-ocr\u002FREADME.md)。\n\n### Azure Document Intelligence\n要使用 Microsoft Document Intelligence 进行转换：\n\n```bash\nmarkitdown path-to-file.pdf -o document.md -d -e \"\u003Cdocument_intelligence_endpoint>\"\n```\n\n有关如何设置 Azure Document Intelligence 资源的更多信息，请参阅 [此处](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fai-services\u002Fdocument-intelligence\u002Fhow-to-guides\u002Fcreate-document-intelligence-resource?view=doc-intel-4.0.0)。\n\n### Python API\n\nPython 中的基本用法：\n\n```python\nfrom markitdown import MarkItDown\n\nmd = MarkItDown(enable_plugins=False) # 设置为 True 以启用插件\nresult = md.convert(\"test.xlsx\")\nprint(result.text_content)\n```\n\nPython 中的文档智能转换：\n\n```python\nfrom markitdown import MarkItDown\n\nmd = MarkItDown(docintel_endpoint=\"\u003Cdocument_intelligence_endpoint>\")\nresult = md.convert(\"test.pdf\")\nprint(result.text_content)\n```\n\n要使用大型语言模型生成图像描述（目前仅支持 pptx 和图像文件），请提供 `llm_client` 和 `llm_model`：\n\n```python\nfrom markitdown import MarkItDown\nfrom openai import OpenAI\n\nclient = OpenAI()\nmd = MarkItDown(llm_client=client, llm_model=\"gpt-4o\", llm_prompt=\"可选自定义提示\")\nresult = md.convert(\"example.jpg\")\nprint(result.text_content)\n```\n\n### Docker\n\n```sh\ndocker build -t markitdown:latest .\ndocker run --rm -i markitdown:latest \u003C ~\u002Fyour-file.pdf > output.md\n```\n\n## 贡献说明\n\n本项目欢迎各种贡献和建议。大多数贡献都需要您签署贡献者许可协议（CLA），声明您有权且确实授予我们使用您贡献的权利。有关详细信息，请访问 https:\u002F\u002Fcla.opensource.microsoft.com。\n\n当您提交拉取请求时，CLA 机器人会自动判断您是否需要提供 CLA，并相应地标记您的 PR（例如状态检查、评论）。您只需按照机器人提供的指示操作即可。对于所有使用我们 CLA 的仓库，您只需完成一次此步骤。\n\n本项目已采用 [微软开源行为准则](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002F)。更多信息请参阅 [行为准则常见问题解答](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002Ffaq\u002F)，或如有任何其他疑问或意见，请联系 [opencode@microsoft.com](mailto:opencode@microsoft.com)。\n\n### 如何贡献\n\n您可以查看问题或帮助审查 PR 来提供帮助。任何问题或 PR 都受欢迎，但我们也将一些标记为“开放贡献”和“开放评审”，以促进社区参与。当然，这些只是建议，您也可以以任何您喜欢的方式进行贡献。\n\n\u003Cdiv align=\"center\">\n\n|            | 全部                                                          | 特别需要社区帮助                                                                                                      |\n| ---------- | ------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------- |\n| **问题** | [全部问题](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fissues) | [开放贡献的问题](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fissues?q=is%3Aissue+is%3Aopen+label%3A%22open+for+contribution%22) |\n| **PR**    | [全部 PR](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpulls)     | [开放评审的 PR](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpulls?q=is%3Apr+is%3Aopen+label%3A%22open+for+reviewing%22)              |\n\n\u003C\u002Fdiv>\n\n### 运行测试和检查\n\n- 导航到 MarkItDown 包：\n\n  ```sh\n  cd packages\u002Fmarkitdown\n  ```\n\n- 在您的环境中安装 `hatch` 并运行测试：\n\n  ```sh\n  pip install hatch  # 安装 hatch 的其他方法：https:\u002F\u002Fhatch.pypa.io\u002Fdev\u002Finstall\u002F\n  hatch shell\n  hatch test\n  ```\n\n  （替代方案）使用已安装所有依赖项的 Devcontainer：\n\n  ```sh\n  # 在 Devcontainer 中重新打开项目并运行：\n  hatch test\n  ```\n\n- 在提交 PR 之前运行预提交检查：`pre-commit run --all-files`\n\n### 贡献第三方插件\n\n您还可以通过创建和分享第三方插件来做出贡献。更多详情请参阅 `packages\u002Fmarkitdown-sample-plugin`。\n\n## 商标说明\n\n本项目可能包含项目、产品或服务的商标或标识。未经授权使用微软商标或标识须遵守并遵循 [微软商标与品牌指南](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Flegal\u002Fintellectualproperty\u002Ftrademarks\u002Fusage\u002Fgeneral)。在本项目的修改版本中使用微软商标或标识不得造成混淆或暗示微软的赞助。任何第三方商标或标识的使用均应遵守该第三方的相关政策。","# MarkItDown 快速上手指南\n\nMarkItDown 是一个轻量级 Python 工具，专为将各类文件（PDF、Office 文档、图片、音频等）转换为 Markdown 格式而设计，特别适用于大语言模型（LLM）的数据预处理和分析流程。\n\n## 环境准备\n\n- **系统要求**：Python 3.10 或更高版本\n- **推荐环境**：建议使用虚拟环境以避免依赖冲突\n\n创建并激活虚拟环境（任选一种方式）：\n\n```bash\n# 标准 Python 方式\npython -m venv .venv\nsource .venv\u002Fbin\u002Factivate  # Windows 使用：.venv\\Scripts\\activate\n\n# 或使用 uv（更快速）\nuv venv --python=3.12 .venv\nsource .venv\u002Fbin\u002Factivate\n\n# 或使用 Conda\nconda create -n markitdown python=3.12\nconda activate markitdown\n```\n\n> 💡 **国内加速建议**：如下载缓慢，可配置国内镜像源\n> ```bash\n> pip config set global.index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 安装步骤\n\n安装完整功能版本（包含所有可选依赖）：\n\n```bash\npip install 'markitdown[all]'\n```\n\n或从源码安装：\n\n```bash\ngit clone git@github.com:microsoft\u002Fmarkitdown.git\ncd markitdown\npip install -e 'packages\u002Fmarkitdown[all]'\n```\n\n如需精简安装，可按需选择特定格式支持：\n\n```bash\n# 仅安装 PDF、Word 和 PowerPoint 支持\npip install 'markitdown[pdf,docx,pptx]'\n```\n\n可用选项包括：`[pdf]`, `[docx]`, `[pptx]`, `[xlsx]`, `[xls]`, `[outlook]`, `[audio-transcription]`, `[youtube-transcription]` 等。\n\n## 基本使用\n\n### 命令行使用\n\n转换单个文件并输出到终端：\n\n```bash\nmarkitdown path-to-file.pdf\n```\n\n转换文件并保存为 `.md` 文件：\n\n```bash\nmarkitdown path-to-file.pdf -o document.md\n```\n\n通过管道处理内容：\n\n```bash\ncat path-to-file.pdf | markitdown > document.md\n```\n\n### Python API 使用\n\n基础转换示例：\n\n```python\nfrom markitdown import MarkItDown\n\nmd = MarkItDown()\nresult = md.convert(\"test.xlsx\")\nprint(result.text_content)\n```\n\n启用插件支持（如第三方扩展）：\n\n```python\nmd = MarkItDown(enable_plugins=True)\nresult = md.convert(\"document_with_images.pdf\")\nprint(result.text_content)\n```\n\n结合大模型进行图片描述（需安装 `openai` 或其他兼容客户端）：\n\n```python\nfrom markitdown import MarkItDown\nfrom openai import OpenAI\n\nclient = OpenAI()\nmd = MarkItDown(llm_client=client, llm_model=\"gpt-4o\")\nresult = md.convert(\"example.jpg\")\nprint(result.text_content)\n```\n\n> ✅ 提示：转换结果保留标题、列表、表格、链接等结构信息，适合直接输入 LLM 进行进一步分析。","某数据分析师需要构建一个企业知识库 RAG 系统，必须将散落在各部门的数百份 PDF 报告、PPT 演示文稿和 Excel 财务报表统一转化为大模型可理解的结构化文本。\n\n### 没有 markitdown 时\n- **格式解析困难**：面对混合了复杂表格和多层级标题的 PDF 与 PPT，传统提取脚本往往只能抓取纯文本，导致文档逻辑结构完全丢失。\n- **多格式适配繁琐**：处理图片需单独调用 OCR 接口，处理音频需额外部署语音转写服务，针对不同文件类型编写和维护多套转换代码，开发成本极高。\n- **令牌浪费严重**：转换后的文本缺乏 Markdown 标记，大模型难以识别重点，导致上下文窗口被大量无结构的冗余字符占用，推理成本上升。\n- **流程断裂**：无法直接流式处理云端文件或压缩包内容，往往需要先下载落地为临时文件再处理，增加了 I\u002FO 开销和数据泄露风险。\n\n### 使用 markitdown 后\n- **结构完美保留**：markitdown 自动将原文档的标题、列表、链接及复杂表格精准转换为标准 Markdown 语法，大模型能立即理解文档层级与数据关系。\n- **全能一键转换**：无论是含文字的截图、带语音的会议录音，还是嵌套的 ZIP 包，markitdown 均能内置 OCR 和转录功能一站式输出文本，无需拼凑外部工具。\n- **令牌高效利用**：生成的 Markdown 内容高度紧凑且语义清晰，显著减少无效 Token 消耗，让大模型更专注于核心信息分析，提升回答准确率。\n- **流式无缝集成**：支持直接从二进制流或 URL 读取内容，无需生成临时文件，轻松嵌入现有的 Python 数据处理管道或 AutoGen 智能体工作流中。\n\nmarkitdown 通过将所有异构办公文档标准化为大模型“母语”般的 Markdown，极大地降低了非结构化数据进入 AI  pipelines 的门槛与成本。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_markitdown_53a24659.png","microsoft","Microsoft","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fmicrosoft_4900709c.png","Open source projects and samples from Microsoft",null,"opensource@microsoft.com","OpenAtMicrosoft","https:\u002F\u002Fopensource.microsoft.com","https:\u002F\u002Fgithub.com\u002Fmicrosoft",[83,87],{"name":84,"color":85,"percentage":86},"Python","#3572A5",99.7,{"name":88,"color":89,"percentage":90},"Dockerfile","#384d54",0.3,93400,5645,"2026-04-06T19:52:38","MIT","Linux, macOS, Windows","未说明",{"notes":98,"python":99,"dependencies":100},"建议使用虚拟环境（venv、uv 或 conda）安装以避免依赖冲突。核心功能无需 GPU。若需启用 OCR 插件或使用大模型描述图片，需配置 OpenAI 兼容的 API Key 及客户端。支持通过可选依赖组按需安装特定文件格式的支持（如 pdf, docx, xlsx 等）。","3.10+",[101,102,103],"markitdown[all]","openai (可选，用于 OCR 和图像描述)","azure-ai-formrecognizer (可选，用于 Azure Document Intelligence)",[105,14],"插件",[107,108,109,110,111,112,113],"langchain","openai","autogen-extension","autogen","markdown","microsoft-office","pdf",11,"2026-03-27T02:49:30.150509","2026-04-07T07:51:22.246107",[118,123,128,133,138,143],{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},21470,"为什么在转换 YouTube 视频时出现 'no element found' 错误？","这通常是因为网络限制导致无法直接访问 YouTube（例如在中国大陆）。解决方案是使用能够直接访问 YouTube 的服务器环境（如 VPS）运行代码。此外，请确保安装了正确版本的依赖库，推荐使用 `youtube-transcript-api==1.1.0`。如果在本地 VS Code 中失败，尝试在远程服务器上测试。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fissues\u002F1383",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},21471,"如何在 Windows + 日语环境下解决导出 Markdown 时的 'UnicodeEncodeError' 编码错误？","在 Windows 命令行中设置环境变量 `PYTHONUTF8` 为 `1` 即可解决该问题。执行命令：`set PYTHONUTF8=1`，然后再运行 markitdown 命令。这将强制 Python 使用 UTF-8 编码处理输出，避免 cp932 编码无法识别特殊字符的问题。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fissues\u002F78",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},21472,"如何将文档（如 DOCX、PDF）中的图片转换为 Base64 格式并保留在 Markdown 中？","该功能已在后续版本（PR #1140）中修复和支持。请确保升级到最新版本的 markitdown。升级后，转换包含图片的 DOCX 或 PDF 文件时，图片会自动转换为 Base64 编码并嵌入到 Markdown 内容中（格式为 `![描述](data:image\u002F...;base64,...)`），无需额外配置。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fissues\u002F58",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},21473,"Markdown 输出中的 Base64 图片链接无法在本地查看怎么办？","这是为了节省大语言模型（LLM）的 Token 消耗而设计的默认行为。Base64 图片数据已包含在 Markdown 文本中，大多数现代 Markdown 编辑器或浏览器可以直接渲染显示。如果需要在本地保存为独立图片文件，目前工具主要侧重于文本提取，建议手动编写脚本解析 Base64 数据并保存图片，或者关注后续版本是否增加自动保存图片到磁盘的功能。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fissues\u002F162",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},21474,"运行代码时遇到 'UnboundLocalError: cannot access local variable res' 错误如何解决？","这是一个已知代码缺陷，通常发生在特定版本的 `_convert` 方法中。解决方法是更新 markitdown 到最新版本，该问题已在后续的代码提交中修复。如果无法立即升级，可以临时检查输入文件是否存在且可读，或者手动修改源码，在使用 `res` 变量前确保其已被正确赋值（例如初始化为 `None`）。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fissues\u002F40",{"id":144,"question_zh":145,"answer_zh":146,"source_url":122},21475,"如何获取当前环境的依赖包列表以排查 YouTube 转录问题？","可以通过运行 `pip freeze > requirements.txt` 命令将当前 Python 环境中的所有安装包及其版本导出到 `requirements.txt` 文件中。检查该文件中 `youtube-transcript-api` 的版本是否为 `1.1.0` 或其他兼容版本，有助于排查因依赖库版本不一致导致的转录失败问题。",[148,153,158,163,168,173,178,183,188,193,198,203,208,213,218,223,228,233],{"id":149,"version":150,"summary_zh":151,"released_at":152},127452,"v0.1.5","## 变更内容\n* 更新 PDF 表格提取功能，支持对齐的 Markdown 格式，由 @lesyk 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1499 中实现。\n* 修复：PDF 解析不支持部分编号列表的问题，由 @lesyk 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1525 中修复。\n* 扩展表格支持，以处理宽表格，由 @lesyk 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1552 中实现。\n* 在 Accept 头中添加 text\u002Fmarkdown，由 @afourney 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1554 中完成。\n* 移除 onnxruntime\u003C=1.20.1 的 Windows 版本锁定，由 @basnijholt 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1551 中完成。\n* 提升版本号以准备发布，由 @afourney 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1564 中完成。\n\n## 新贡献者\n* @lesyk 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1499 中完成了首次贡献。\n* @basnijholt 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1551 中完成了首次贡献。\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fcompare\u002Fv0.1.4...v0.1.5","2026-02-20T19:49:39",{"id":154,"version":155,"summary_zh":156,"released_at":157},127453,"v0.1.5b1","## 变更内容\n* 更新 PDF 表格提取功能，支持对齐的 Markdown 格式，由 @lesyk 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1499 中实现\n* 修复：PDF 解析不支持部分编号列表的问题，由 @lesyk 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1525 中修复\n\n## 新贡献者\n* @lesyk 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1499 中完成了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fcompare\u002Fv0.1.4...v0.1.5b1","2026-01-08T23:23:05",{"id":159,"version":160,"summary_zh":161,"released_at":162},127454,"v0.1.4","维护版本更新：\n\n将 mammoth 升级至 1.11.0，以修复 [CVE-2025-11849](https:\u002F\u002Favd.aquasec.com\u002Fnvd\u002F2025\u002Fcve-2025-11849\u002F) 漏洞。\n并将 pdfminer.six 升级至 20251107，以修复 [GHSA-wf5f-4jwr-ppcp](https:\u002F\u002Fgithub.com\u002Fpdfminer\u002Fpdfminer.six\u002Fsecurity\u002Fadvisories\u002FGHSA-wf5f-4jwr-ppcp) 漏洞。","2025-12-01T18:20:17",{"id":164,"version":165,"summary_zh":166,"released_at":167},127455,"v0.1.3","## 变更内容\n* 在 Windows 上固定 `onnxruntime` 版本，由 @t-kalinowski 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1274 中完成\n* 让 MarkItDown MCP 服务器从 ENV 中读取 `MARKITDOWN_ENABLE_PLUGINS` 环境变量，由 @afourney 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1273 中完成\n* 解决了 docx 文档中链接图片的问题 [mammoth]，由 @afourney 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1405 中完成\n* 确保安全使用 ExifTool：要求版本 >= 12.24，由 @t3tra-dev 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1399 中完成\n* 将 actions\u002Fcheckout 从 4 升级到 5，由 @dependabot[bot] 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1394 中完成\n* HTML| 更新文档智能文件类型处理，由 @safen0s 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1352 中完成\n* 修复：正确传递自定义 LLM 提示参数，由 @stefan-rink 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1319 中完成\n* 添加对 `data-src` 属性的支持，由 @Noah-Zhuhaotian 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1226 中完成\n* 修复 docx 解析错误（docx 测试用例：alt 中包含 `\\n`），由 @BetterAndBetterII 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1163 中完成\n* 处理 PPTX 中位置为 `None` 的形状，由 @richardye101 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1161 中完成\n* 功能：在 `_CustomMarkdownify` 中添加复选框支持，由 @Meirna-kamal 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1208 中完成\n\n## 新贡献者\n* @onefloid 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1278 中完成了首次贡献\n* @JonahDelman 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1393 中完成了首次贡献\n* @safen0s 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1352 中完成了首次贡献\n* @stefan-rink 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1319 中完成了首次贡献\n* @W-DOS0 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1335 中完成了首次贡献\n* @UK0070 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1350 中完成了首次贡献\n* @ebrahimHakimuddin 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1191 中完成了首次贡献\n* @Noah-Zhuhaotian 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1226 中完成了首次贡献\n* @mdqst 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1173 中完成了首次贡献\n* @Meirna-kamal 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1208 中完成了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fcompare\u002Fv0.1.2...v0.1.3","2025-08-26T22:40:17",{"id":169,"version":170,"summary_zh":171,"released_at":172},127456,"v0.1.2","## 变更内容\n- 功能：通过 @sathinduga 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1160 中实现了在 .docx 文档中渲染数学公式的功能。\n- 使 AzureKeyCredentials 更易于与 Azure 文档智能服务一起使用，由 @afourney 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1151 中完成。\n- 添加 CSV 到 Markdown 表格的转换功能——修复了 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fissues\u002F1144，由 @erinshek 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1176 中实现。\n- 修复 README.md 中的拼写错误，由 @lentil32 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1175 中完成。\n- 更新 README.md，由 @createcentury 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1187 中完成。\n- 从 stdlib 的 minidom 解析器切换到 defusedxml，由 @afourney 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1259 中完成。\n- 更新 omml.py 以使用 defusedxml [python.lang.security.use-defused-xml-parse.use-defused-xml-parse]，由 @kira-offgrid 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1251 中完成。\n- 杂项：让 linter 满意，由 @t3tra-dev 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1256 中完成。\n- 修复 YouTube 字幕文件中的错误，由 @JoshClark-git 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1241 中完成。\n- 功能：支持为文档智能服务选择 API 版本，由 @kirisame-wang 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1253 中实现。\n- 更新 Python 版本要求，并将 .cursorrules 添加到 .gitignore 文件中，由 @Wuhall 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1249 中完成。\n- 支持可流式传输的 HTTP 多部分内容，由 @Betula-L 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1245 中完成。\n- 文档：修复拼写错误，由 @rtpacks 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1201 中完成。\n\n## 新贡献者\n@sathinduga 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1160 中做出了首次贡献。\n@erinshek 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1176 中做出了首次贡献。\n@lentil32 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1175 中做出了首次贡献。\n@createcentury 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1187 中做出了首次贡献。\n@kira-offgrid 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1251 中做出了首次贡献。\n@t3tra-dev 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1256 中做出了首次贡献。\n@JoshClark-git 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1241 中做出了首次贡献。\n@kirisame-wang 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1253 中做出了首次贡献。\n@Wuhall 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1249 中做出了首次贡献。\n@Betula-L 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1245 中做出了首次贡献。\n@rtpacks 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1201 中做出了首次贡献。\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fcompare\u002Fv0.1.1...v0.1.2","2025-05-28T17:11:27",{"id":174,"version":175,"summary_zh":176,"released_at":177},127457,"v0.1.2a1","## 变更内容\n* 功能：通过 @sathinduga 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1160 中实现了在 .docx 文档中渲染数学公式的功能。\n* 使 AzureKeyCredentials 更易于与 Azure 文档智能一起使用，由 @afourney 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1151 中完成。\n* 添加 CSV 到 Markdown 表格的转换功能——修复了 #1144，由 @erinshek 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1176 中实现。\n* 杂项：修复 README.md 中的拼写错误，由 @lentil32 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1175 中完成。\n* 更新 README.md，由 @createcentury 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1187 中完成。\n* 从 stdlib 的 minidom 解析器切换到 defusedxml，由 @afourney 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1259 中完成。\n* 更新 packages\u002Fmarkitdown\u002Fsrc\u002Fmarkitdown\u002Fconverter_utils\u002Fdocx\u002Fmath\u002Fomml.py 文件，以修复安全漏洞 [python.lang.security.use-defused-xml-parse.use-defused-xml-parse]，由 @kira-offgrid 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1251 中完成。\n* 杂项：让 linter 满意，由 @t3tra-dev 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1256 中完成。\n* 修复 YouTube 字幕错误，由 @JoshClark-git 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1241 中完成。\n* 功能：支持文档智能的 API 版本选择，由 @kirisame-wang 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1253 中实现。\n* 更新 Python 版本要求，并将 .cursorrules 添加到 .gitignore 文件中，由 @Wuhall 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1249 中完成。\n* 支持可流式传输的 HTTP mcp，由 @Betula-L 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1245 中完成。\n* 文档：修复拼写错误，由 @rtpacks 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1201 中完成。\n* 准备 0.1.2 的预发布版本，由 @afourney 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1260 中完成。\n\n## 新贡献者\n* @sathinduga 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1160 中做出了首次贡献。\n* @erinshek 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1176 中做出了首次贡献。\n* @lentil32 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1175 中做出了首次贡献。\n* @createcentury 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1187 中做出了首次贡献。\n* @kira-offgrid 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1251 中做出了首次贡献。\n* @t3tra-dev 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1256 中做出了首次贡献。\n* @JoshClark-git 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1241 中做出了首次贡献。\n* @kirisame-wang 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1253 中做出了首次贡献。\n* @Wuhall 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1249 中做出了首次贡献。\n* @Betula-L 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1245 中做出了首次贡献。\n* @rtpacks 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1201 中做出了首次贡献。\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fcompare\u002Fv0.1.1...v0.1.2a1","2025-05-21T22:33:37",{"id":179,"version":180,"summary_zh":181,"released_at":182},127458,"v0.1.1","## 变更内容\n\n`convert_url` 已重命名为 `convert_uri`，现在由 @afourney 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1153 中实现对 data 和 file URI 的处理。\n\n**注意**：为了保持向后兼容性，`convert_url` 仍然是 `convert_uri` 的别名。\n\n两者现在都支持 file URI 和 data URI：\n\n例如：\n```python\nmarkitdown = MarkItDown()\nresult = markitdown.convert_uri(\"file:\u002F\u002F\u002Fpath\u002Fto\u002Ffile.txt\")\nprint(result.markdown)\n```\n\n以及：\n```python\nmarkitdown = MarkItDown()\nresult = markitdown.convert_uri(\"data:text\u002Fplain;base64,SGVsbG8sIFdvcmxkIQ==\")\nprint(result.markdown)\n```\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fcompare\u002Fv0.1.0...v0.1.1","2025-03-25T06:32:22",{"id":184,"version":185,"summary_zh":186,"released_at":187},127459,"v0.1.0","## 概述\n版本 0.1.0（此前为 0.1.0a6）是一个重大发布，相比之前的 0.0.2 版本带来了许多改进。\n\n高层次的变更包括：\n\n* 将依赖项组织成功能组——只需安装所需的转换器，或者使用 `pip install markitdown[all]` 一次性获取所有功能。\n* 引入了基于插件的新架构，允许第三方开发者为 MarkItDown 添加功能（参见 [示例插件](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Ftree\u002Fmain\u002Fpackages\u002Fmarkitdown-sample-plugin)）。\n* 所有转换操作均在内存中完成——不再生成临时文件。\n* 新增对 EPUB 等格式的支持。\n* 提供在转换后的 Markdown 中保留 Data URI 的选项。\n* 在命令行界面中支持覆盖 MIME 类型、文件扩展名和字符集（当从管道或标准输入读取内容时非常有用）。\n\n## 破坏性变更\n* 如上所述，依赖项现已组织为可选的功能组。若需保持向后兼容性，请使用 `pip install markitdown[all]`。\n* `convert_stream()` 现在要求传入二进制文件类对象（例如以二进制模式打开的文件，或 `io.BytesIO` 对象）。这与之前版本的行为不同，因为旧版本也接受文本文件类对象，如 `io.StringIO`。\n* `DocumentConverter` 类的接口已更改，改为从文件类流中读取数据，而非文件路径。不再创建临时文件。如果您是插件或自定义 `DocumentConverter` 的维护者，可能需要更新您的代码。否则，如果您仅使用 `MarkItDown` 类或命令行工具（如这些示例所示），则无需进行任何更改。\n \n## 贡献详细列表\n* 为支持插件而进行的清理和重构。由 @afourney 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F318 中完成。\n* 在 `pre` 块中跳过生成 Markdown 链接。由 @t-kalinowski 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F322 中完成。\n* 修复示例 RTF 插件中的拼写错误。由 @rickygao 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F320 中完成。\n* 为所有转换器构造函数添加优先级参数。由 @afourney 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F324 中完成。\n* 针对重构代码的文档智能修复。由 @KennyZhang1 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F325 中完成。\n* 添加了 CLI 测试。由 @afourney 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F327 中完成。\n* 修复 `MarkItDown._convert` 中的 `UnboundLocalError` 错误。由 @menezesandre 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1038 中完成。\n* 添加必要的导入。由 @tanreinama 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F861 中完成。\n* 修复：实现 YouTube 字幕抓取的重试逻辑，并修复 URL 解码问题。由 @iw4p 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1035 中完成。\n* 添加对 PPTX 形状组的支持（修复代码设计中的缺陷，避免遗漏幻灯片内容）。由 @C0dingMast3r 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F331 中完成。\n* 确保 `MarkItDown` 的转换方法中文件扩展名唯一。由 @afourney 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1076 中完成。\n* 禁止 `ZipConverter` 接受 OOXML 文件。由 @","2025-03-22T18:33:42",{"id":189,"version":190,"summary_zh":191,"released_at":192},127460,"v0.1.0a6","## 变更内容\n* 添加了对保留 Base64 编码图片的支持，由 @BetterAndBetterII 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1140 中实现。\n* 提升版本号并修复了一个控制台编码错误，由 @afourney 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1149 中完成。\n\n## 新贡献者\n* @BetterAndBetterII 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1140 中完成了他们的首次贡献。\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fcompare\u002Fv0.1.0a5...v0.1.0a6","2025-03-21T16:29:16",{"id":194,"version":195,"summary_zh":196,"released_at":197},127461,"v0.1.0a5","## 变更内容\n* 将任何带有字符集的文本都视为可转换为纯文本的内容。由 @afourney 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1142 中提出。\n* 调整警告过滤器并更新依赖项。由 @afourney 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1143 中完成。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fcompare\u002Fv0.1.0a4...v0.1.0a5","2025-03-20T05:13:39",{"id":199,"version":200,"summary_zh":201,"released_at":202},127462,"v0.1.0a4","## Features\r\n* Basic EPub support from @0xRaduan, in collaboration with @afourney \r\n* Switch from puremagic to magika. by @afourney in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1108\r\n* Added CLI options for extension, mime-types, and charset. by @afourney in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1115\r\n* Sort pptx shapes to be parsed in top-to-bottom, left-to-right order by @richardye101 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1104\r\n\r\n## Bug fixes and enhancements\r\n* fix(README): correct pip install command formatting by @Piero24 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1090\r\n* Fixed deepcopy failure when passing llm_client by @scalabreseGD in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1089\r\n* feat(docker): improve dockerfile build by @syaghoubi00 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F220\r\n* Fix exiftool in well-known paths. by @afourney in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1106\r\n* fix typo in well-known path list by @0xmohit in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1109\r\n* Minimize guesses when guesses are compatible. by @afourney in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1114\r\n* Fix string formatting in FileConversionException error message by @yushihang in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1121\r\n* Refactored tests. by @afourney in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1120\r\n* Handle not supported plot type in pptx by @EmanueleMeazzo in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1122\r\n* Fix remaining mypy errors. by @afourney in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1132\r\n* Investigate and silence warnings. by @afourney in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1133\r\n\r\n## New Contributors\r\n* @0xRaduan made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F123 \r\n* @Piero24 made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1090\r\n* @scalabreseGD made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1089\r\n* @richardye101 made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1104\r\n* @syaghoubi00 made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F220\r\n* @0xmohit made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1109\r\n* @yushihang made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1121\r\n* @EmanueleMeazzo made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1122\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fcompare\u002Fv0.1.0a1...v0.1.0a4","2025-03-17T14:59:44",{"id":204,"version":205,"summary_zh":206,"released_at":207},127463,"v0.0.2","## What's Changed\r\n* Avoids resetting warning filters (addresses #1068) by @afourney in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1101\r\n* Removes deprecated features from 0.0.1aX (pre-release alphas) by @afourney in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1105\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fcompare\u002Fv0.0.1...v0.0.2","2025-03-08T00:24:38",{"id":209,"version":210,"summary_zh":211,"released_at":212},127464,"v0.1.0a1","## What's Changed\r\nThis MarkItDown _alpha_ introduces numerous bug-fixes, and the following major changes:\r\n\r\n* Dependencies are now organized into optional feature-groups (further details below). Use pip install `markitdown[all]` to have backward-compatible behavior.\r\n* The DocumentConverter class interface has changed to read from file-like streams rather than file paths. *No temporary files are created anymore*. If you are the maintainer of a DocumentConverter, you likely need to update your code. Otherwise, if only using the MarkItDown class or CLI, you should not need to change anything.\r\n* MarkItDown now supports extension through 3rd-party plugins. See [markitdown-sample-plugin](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Ftree\u002Fmain\u002Fpackages\u002Fmarkitdown-sample-plugin) for more details!","2025-03-06T07:09:05",{"id":214,"version":215,"summary_zh":216,"released_at":217},127465,"v0.0.1","Promoting v0.0.1a5 to a full release. \r\n\r\nFor more details see the prior [Release Notes](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Freleases\u002Ftag\u002Fv0.0.1a5).\r\n","2025-03-06T05:42:49",{"id":219,"version":220,"summary_zh":221,"released_at":222},127466,"v0.0.1a5","## What's Changed\r\n* Fixed compatibility with [markdownify v1.0.0](https:\u002F\u002Fgithub.com\u002Fmatthewwithanm\u002Fpython-markdownify\u002Freleases\u002Ftag\u002F1.0.0)\r\n## New Contributors\r\n* @lh0x00 made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F1072\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fcompare\u002Fv0.0.1a4...v0.0.1a5","2025-02-28T15:38:21",{"id":224,"version":225,"summary_zh":226,"released_at":227},127467,"v0.0.1a4","## Some of What's Changed\r\n* feat: Add RSSConverter  by @Soulter in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F97\r\n* feat: Add IpynbConverter by @AumGupta in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F71\r\n* feat(devcontainer): Add DevContainer Configuration for Easier Contribution Setup by @l-lumin in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F64\r\n* feat: add support for conversion via Document Intelligence by @KennyZhang1 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F303\r\n* feat: add version option to markitdown CLI by @l-lumin in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F172\r\n* feat: enable Git support in devcontainer by @numekudi in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F136\r\n* feat: outlook \".msg\" file converter by @muratcankurtulus in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F196\r\n* feat: Add xls support by @yeungadrian in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F169\r\n* feat: support image description with LLM for pptx files by @masquare in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F306\r\n* fix: Safeguard against path traversal for ZipConverter by @finchy in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F129\r\n* fix: support -o param to avoid encoding issues by @Soulter in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F116\r\n* fix(transcription): TRANSCRIPTION_CAPABLE should be iniztialized by @absadiki in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F194\r\n* fix: added a test for leading spaces. by @afourney in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F258\r\n* fix: If puremagic has no guesses, try again after ltrim. by @afourney in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F260\r\n* fix: Recognize json as plain text (if no other handlers are present). by @afourney in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F261\r\n* fix: Set exiftool path explicitly. by @afourney in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F267\r\n* fix: remove leading and trailing \\n for HtmlConverter by @ZeyuTeng96 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F262\r\n* fix: argparse CLI option ordering, fixes #268 by @slhck in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F290\r\n* fix: for mimetype issue with csv files on windows. by @wunde005 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F273\r\n* docs: update README.md by @eltociear in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F182\r\n* docs: Add documentation for docintel  by @KennyZhang1 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F312\r\n\r\n## New Contributors\r\n* @AumGupta made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F71\r\n* @diya155 made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F80\r\n* @l-lumin made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F64\r\n* @waterimp made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F98\r\n* @finchy made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F129\r\n* @sugatoray made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F130\r\n* @PetrAPConsulting made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F91\r\n* @SigireddyBalasai made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F93\r\n* @dependabot made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F177\r\n* @numekudi made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F136\r\n* @eltociear made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F182\r\n* @absadiki made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F194\r\n* @muratcankurtulus made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F196\r\n* @yeungadrian made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F169\r\n* @KennyZhang1 made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F303\r\n* @ZeyuTeng96 made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F262\r\n* @jamesmh made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F270\r\n* @masquare made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F306\r\n* @slhck made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F290\r\n* @wunde005 made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F273\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fcompare\u002Fv0.0.1a3...v0.0.1a4","2025-02-11T00:12:57",{"id":229,"version":230,"summary_zh":231,"released_at":232},127468,"v0.0.1a3","## New Features and Formats\r\n\r\n* Add zip handling by @Josh-XT in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F22\r\n* Add PPTX chart support by @nyosegawa in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F33\r\n\r\n## Breaking Changes\r\nRenamed `mlm_client ` and `mlm_model` arguments to `llm_client` and `llm_model`, and added appropriate deprecation warnings.\r\n\r\nSee:\r\n* Fix LLM terminology in code by @CharlesCNorton in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F73\r\n* Fix LLM terms by @CharlesCNorton in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F72\r\n* Added deprecation warnings for mlm_* arguments. by @afourney in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F101\r\n\r\n## Bug fixes and enhancements\r\n* Remove invalid classifiers by @simonw in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F10\r\n* Add installation instructions from haesleinhuepf:patch-1 by @gagb in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F27\r\n* Update README.md by @gagb in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F28\r\n* Improve the readme with contributing guidelines by @gagb in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F7\r\n* Add installation instructions by @haesleinhuepf in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F24\r\n* Update README.md by @pawarbi in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F26\r\n* Update README.md by @gagb in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F29\r\n* CLI usage instructions by @simonw in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F11\r\n* Fix character decoding issues with text-like files by @brc-dd in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F19\r\n* Catching pydub's warning of ffmpeg or avconv missing by @SH4DOW4RE in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F39\r\n* Exclude test files from language statistics using linguist-vendored by @Y-Kim-64 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F44\r\n* Support specifying YouTube transcript language by @narumiruna in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F50\r\n* Add passing style_map kwarg to Mammoth when converting docx to allow keeping comments by @VillePuuska in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F38\r\n* Fix: pass the kwargs to _convert method when converting an url file by @Soulter in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F48\r\n* Added Dockerfile  by @madduci in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F60\r\n* fix issue #65 by @DIMAX99 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F67\r\n* Cybernobie\u002Fmain by @gagb in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F75\r\n* Ensure hatch is installed before running tests by @cybernobie in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F63\r\n* Kevinclb\u002Fmain by @gagb in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F77\r\n* feature: add argument parsing for cli tool capability by @kevinclb in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F46\r\n* Added llm tests to the local test set. by @afourney in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F100\r\n\r\n## New Contributors\r\n* @simonw made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F10\r\n* @gagb made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F27\r\n* @haesleinhuepf made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F24\r\n* @pawarbi made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F26\r\n* @brc-dd made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F19\r\n* @Josh-XT made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F22\r\n* @nyosegawa made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F33\r\n* @VillePuuska made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F38\r\n* @SH4DOW4RE made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F39\r\n* @Y-Kim-64 made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F44\r\n* @Soulter made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F48\r\n* @narumiruna made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F50\r\n* @madduci made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F60\r\n* @CharlesCNorton made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F73\r\n* @DIMAX99 made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F67\r\n* @cybernobie made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F63\r\n* @kevinclb made their first contribution in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fpull\u002F46\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmarkitdown\u002Fcompare\u002Fv0.0.1a2...v0.0.1a3","2024-12-17T22:31:03",{"id":234,"version":235,"summary_zh":236,"released_at":237},127469,"v0.0.1a2","## Initial Release of markitdown\r\n\r\nThe MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.)\r\n\r\nIt presently supports:\r\n\r\n* PDF (.pdf)\r\n* PowerPoint (.pptx)\r\n* Word (.docx)\r\n* Excel (.xlsx)\r\n* Images (EXIF metadata, and OCR)\r\n* Audio (EXIF metadata, and speech transcription)\r\n* HTML (special handling of Wikipedia, etc.)\r\n* Various other text-based formats (csv, json, xml, etc.)\r\n\r\nThe API is simple:\r\n\r\n```python\r\nfrom markitdown import MarkItDown\r\n\r\nmarkitdown = MarkItDown()\r\nresult = markitdown.convert(\"test.xlsx\")\r\nprint(result.text_content)\r\n```","2024-12-17T22:17:54"]