[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-D4Vinci--Scrapling":3,"tool-D4Vinci--Scrapling":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",155373,2,"2026-04-14T11:34:08",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":77,"owner_email":76,"owner_twitter":78,"owner_website":76,"owner_url":79,"languages":80,"stars":89,"forks":90,"last_commit_at":91,"license":92,"difficulty_score":32,"env_os":93,"env_gpu":94,"env_ram":94,"env_deps":95,"category_tags":104,"github_topics":105,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":125,"updated_at":126,"faqs":127,"releases":157},7417,"D4Vinci\u002FScrapling","Scrapling","🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!","Scrapling 是一款专为现代网页设计的自适应网络爬虫框架，旨在让数据抓取变得轻松高效。无论是发起单次请求还是执行大规模全站爬取，它都能游刃有余地应对。\n\n在如今的互联网环境中，网站结构复杂多变，反爬虫机制日益严格，传统工具往往难以稳定运行或需要大量手动调整。Scrapling 正是为了解决这些痛点而生，它能够智能适应网页变化，自动处理常见的反爬障碍，显著降低了维护成本和技术门槛。\n\n这款工具非常适合开发者、数据研究人员以及需要自动化采集公开数据的团队使用。如果你正在构建数据分析管道、训练 AI 模型或监控市场信息，Scrapling 能提供可靠的后端支持。\n\n其技术亮点在于高度灵活的架构：内置多种获取器（Fetchers）以适应不同场景，支持高效的代理轮换以规避封锁，并提供强大的元素选择方法。此外，Scrapling 还特别优化了对 AI 智能体的支持，拥有专门的技能目录，便于与大模型工作流集成。配合友好的命令行界面和详尽的文档，用户无需深陷底层细节，即可快速搭建健壮的爬虫系统，将精力更多集中在数据价值的挖掘上。","\u003C!-- mcp-name: io.github.D4Vinci\u002FScrapling -->\n\n\u003Ch1 align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fscrapling.readthedocs.io\">\n        \u003Cpicture>\n          \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fdocs\u002Fassets\u002Fcover_dark.svg?sanitize=true\">\n          \u003Cimg alt=\"Scrapling Poster\" src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fdocs\u002Fassets\u002Fcover_light.svg?sanitize=true\">\n        \u003C\u002Fpicture>\n    \u003C\u002Fa>\n    \u003Cbr>\n    \u003Csmall>Effortless Web Scraping for the Modern Web\u003C\u002Fsmall>\n\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F14244\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_4a68feb902da.png\" alt=\"D4Vinci%2FScrapling | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n    \u003Cbr\u002F>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_AR.md\">العربيه\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_ES.md\">Español\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_FR.md\">Français\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_DE.md\">Deutsch\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_CN.md\">简体中文\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_JP.md\">日本語\u003C\u002Fa> |  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_RU.md\">Русский\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_KR.md\">한국어\u003C\u002Fa>\n    \u003Cbr\u002F>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Factions\u002Fworkflows\u002Ftests.yml\" alt=\"Tests\">\n        \u003Cimg alt=\"Tests\" src=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Factions\u002Fworkflows\u002Ftests.yml\u002Fbadge.svg\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fbadge.fury.io\u002Fpy\u002FScrapling\" alt=\"PyPI version\">\n        \u003Cimg alt=\"PyPI version\" src=\"https:\u002F\u002Fbadge.fury.io\u002Fpy\u002FScrapling.svg\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fclickpy.clickhouse.com\u002Fdashboard\u002Fscrapling\" rel=\"nofollow\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Fscrapling\" alt=\"PyPI package downloads\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Ftree\u002Fmain\u002Fagent-skill\" alt=\"AI Agent Skill directory\">\n        \u003Cimg alt=\"Static Badge\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSkill-black?style=flat&label=Agent&link=https%3A%2F%2Fgithub.com%2FD4Vinci%2FScrapling%2Ftree%2Fmain%2Fagent-skill\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fclawhub.ai\u002FD4Vinci\u002Fscrapling-official\" alt=\"OpenClaw Skill\">\n        \u003Cimg alt=\"OpenClaw Skill\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FClawhub-darkred?style=flat&label=OpenClaw&link=https%3A%2F%2Fclawhub.ai%2FD4Vinci%2Fscrapling-official\">\u003C\u002Fa>\n    \u003Cbr\u002F>\n    \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FEMgGbDceNQ\" alt=\"Discord\" target=\"_blank\">\n      \u003Cimg alt=\"Discord\" src=\"https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1360786381042880532?style=social&logo=discord&link=https%3A%2F%2Fdiscord.gg%2FEMgGbDceNQ\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fx.com\u002FScrapling_dev\" alt=\"X (formerly Twitter)\">\n      \u003Cimg alt=\"X (formerly Twitter) Follow\" src=\"https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002FScrapling_dev?style=social&logo=x&link=https%3A%2F%2Fx.com%2FScrapling_dev\">\n    \u003C\u002Fa>\n    \u003Cbr\u002F>\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fscrapling\u002F\" alt=\"Supported Python versions\">\n        \u003Cimg alt=\"Supported Python versions\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fscrapling.svg\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Fparsing\u002Fselection.html\">\u003Cstrong>Selection methods\u003C\u002Fstrong>\u003C\u002Fa>\n    &middot;\n    \u003Ca href=\"https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Ffetching\u002Fchoosing.html\">\u003Cstrong>Fetchers\u003C\u002Fstrong>\u003C\u002Fa>\n    &middot;\n    \u003Ca href=\"https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Fspiders\u002Farchitecture.html\">\u003Cstrong>Spiders\u003C\u002Fstrong>\u003C\u002Fa>\n    &middot;\n    \u003Ca href=\"https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Fspiders\u002Fproxy-blocking.html\">\u003Cstrong>Proxy Rotation\u003C\u002Fstrong>\u003C\u002Fa>\n    &middot;\n    \u003Ca href=\"https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Fcli\u002Foverview.html\">\u003Cstrong>CLI\u003C\u002Fstrong>\u003C\u002Fa>\n    &middot;\n    \u003Ca href=\"https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Fai\u002Fmcp-server.html\">\u003Cstrong>MCP\u003C\u002Fstrong>\u003C\u002Fa>\n\u003C\u002Fp>\n\nScrapling is an adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl.\n\nIts parser learns from website changes and automatically relocates your elements when pages update. Its fetchers bypass anti-bot systems like Cloudflare Turnstile out of the box. And its spider framework lets you scale up to concurrent, multi-session crawls with pause\u002Fresume and automatic proxy rotation - all in a few lines of Python. One library, zero compromises.\n\nBlazing fast crawls with real-time stats and streaming. Built by Web Scrapers for Web Scrapers and regular users, there's something for everyone.\n\n```python\nfrom scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, DynamicFetcher\nStealthyFetcher.adaptive = True\np = StealthyFetcher.fetch('https:\u002F\u002Fexample.com', headless=True, network_idle=True)  # Fetch website under the radar!\nproducts = p.css('.product', auto_save=True)                                        # Scrape data that survives website design changes!\nproducts = p.css('.product', adaptive=True)                                         # Later, if the website structure changes, pass `adaptive=True` to find them!\n```\nOr scale up to full crawls\n```python\nfrom scrapling.spiders import Spider, Response\n\nclass MySpider(Spider):\n  name = \"demo\"\n  start_urls = [\"https:\u002F\u002Fexample.com\u002F\"]\n\n  async def parse(self, response: Response):\n      for item in response.css('.product'):\n          yield {\"title\": item.css('h2::text').get()}\n\nMySpider().start()\n```\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fdataimpulse.com\u002F?utm_source=scrapling&utm_medium=banner&utm_campaign=scrapling\" target=\"_blank\" style=\"display:flex; justify-content:center; padding:4px 0;\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_3c9d0e631045.png\" alt=\"At DataImpulse, we specialize in developing custom proxy services for your business. Make requests from anywhere, collect data, and enjoy fast connections with our premium proxies.\" style=\"max-height:60px;\">\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n# Platinum Sponsors\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Fhypersolutions.co\u002F?utm_source=github&utm_medium=readme&utm_campaign=scrapling\" target=\"_blank\" title=\"Bot Protection Bypass API for Akamai, DataDome, Incapsula & Kasada\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_37d452d3d29e.png\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd> Scrapling handles Cloudflare Turnstile. For enterprise-grade protection, \u003Ca href=\"https:\u002F\u002Fhypersolutions.co?utm_source=github&utm_medium=readme&utm_campaign=scrapling\">\n        \u003Cb>Hyper Solutions\u003C\u002Fb>\n      \u003C\u002Fa> provides API endpoints that generate valid antibot tokens for \u003Cb>Akamai\u003C\u002Fb>, \u003Cb>DataDome\u003C\u002Fb>, \u003Cb>Kasada\u003C\u002Fb>, and \u003Cb>Incapsula\u003C\u002Fb>. Simple API calls, no browser automation required. \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Fbirdproxies.com\u002Ft\u002Fscrapling\" target=\"_blank\" title=\"At Bird Proxies, we eliminate your pains such as banned IPs, geo restriction, and high costs so you can focus on your work.\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_7af244ed50bf.jpg\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>Hey, we built \u003Ca href=\"https:\u002F\u002Fbirdproxies.com\u002Ft\u002Fscrapling\">\n        \u003Cb>BirdProxies\u003C\u002Fb>\n      \u003C\u002Fa> because proxies shouldn't be complicated or overpriced. Fast residential and ISP proxies in 195+ locations, fair pricing, and real support. \u003Cbr \u002F>\n      \u003Cb>Try our FlappyBird game on the landing page for free data!\u003C\u002Fb>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\" target=\"_blank\" title=\"Evomi is your Swiss Quality Proxy Provider, starting at $0.49\u002FGB\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_7fa2b47caf30.png\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>\n      \u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\">\n        \u003Cb>Evomi\u003C\u002Fb>\n      \u003C\u002Fa>: residential proxies from $0.49\u002FGB. Scraping browser with fully spoofed Chromium, residential IPs, auto CAPTCHA solving, and anti-bot bypass. \u003C\u002Fbr>\n      \u003Cb>Scraper API for hassle-free results. MCP and N8N integrations are available.\u003C\u002Fb>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Ftikhub.io\u002F?utm_source=github.com\u002FD4Vinci\u002FScrapling&utm_medium=marketing_social&utm_campaign=retargeting&utm_content=carousel_ad\" target=\"_blank\" title=\"Unlock the Power of Social Media Data & AI\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_c3e69635b465.jpg\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>\n      \u003Ca href=\"https:\u002F\u002Ftikhub.io\u002F?utm_source=github.com\u002FD4Vinci\u002FScrapling&utm_medium=marketing_social&utm_campaign=retargeting&utm_content=carousel_ad\" target=\"_blank\">TikHub.io\u003C\u002Fa> provides 900+ stable APIs across 16+ platforms including TikTok, X, YouTube & Instagram, with 40M+ datasets. \u003Cbr \u002F> Also offers \u003Ca href=\"https:\u002F\u002Fai.tikhub.io\u002F?ref=KarimShoair\" target=\"_blank\">DISCOUNTED AI models\u003C\u002Fa> - Claude, GPT, GEMINI & more up to 71% off.\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Fwww.nsocks.com\u002F?keyword=2p67aivg\" target=\"_blank\" title=\"Scalable Web Data Access for AI Applications\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_c8e83028e0ff.png\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>\n    \u003Ca href=\"https:\u002F\u002Fwww.nsocks.com\u002F?keyword=2p67aivg\" target=\"_blank\">Nsocks\u003C\u002Fa> provides fast Residential and ISP proxies for developers and scrapers. Global IP coverage, high anonymity, smart rotation, and reliable performance for automation and data extraction. Use \u003Ca href=\"https:\u002F\u002Fwww.xcrawl.com\u002F?keyword=2p67aivg\" target=\"_blank\">Xcrawl\u003C\u002Fa> to simplify large-scale web crawling.\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Fpetrosky.io\u002Fd4vinci\" target=\"_blank\" title=\"PetroSky delivers cutting-edge VPS hosting.\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_6716c1e3393c.png\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>\n    Close your laptop. Your scrapers keep running. \u003Cbr \u002F>\n    \u003Ca href=\"https:\u002F\u002Fpetrosky.io\u002Fd4vinci\" target=\"_blank\">PetroSky VPS\u003C\u002Fa> - cloud servers built for nonstop automation. Windows and Linux machines with full control. From €6.99\u002Fmo.\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Fsubstack.thewebscraping.club\u002Fp\u002Fscrapling-hands-on-guide?utm_source=github&utm_medium=repo&utm_campaign=scrapling\" target=\"_blank\" title=\"The #1 newsletter dedicated to Web Scraping\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_2a0a55e90413.png\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>\n    Read a full review of \u003Ca href=\"https:\u002F\u002Fsubstack.thewebscraping.club\u002Fp\u002Fscrapling-hands-on-guide?utm_source=github&utm_medium=repo&utm_campaign=scrapling\" target=\"_blank\">Scrapling on The Web Scraping Club\u003C\u002Fa> (Nov 2025), the #1 newsletter dedicated to Web Scraping.\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Fproxy-seller.com\u002F?partner=CU9CAA5TBYFFT2\" target=\"_blank\" title=\"Proxy-Seller provides reliable proxy infrastructure for Web Scraping\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_ffdbda06a50d.png\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>\n    \u003Ca href=\"https:\u002F\u002Fproxy-seller.com\u002F?partner=CU9CAA5TBYFFT2\" target=\"_blank\">Proxy-Seller\u003C\u002Fa> provides reliable proxy infrastructure for web scraping, offering IPv4, IPv6, ISP, Residential, and Mobile proxies with stable performance, broad geo coverage, and flexible plans for business-scale data collection.\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"http:\u002F\u002Fmangoproxy.com\u002F?utm_source=D4Vinci&utm_medium=GitHub&utm_campaign=D4Vinci\" target=\"_blank\" title=\"Proxies You Can Rely On: Residential, Server, and Mobile\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_900a02654e71.png\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>\n    \u003Ca href=\"http:\u002F\u002Fmangoproxy.com\u002F?utm_source=D4Vinci&utm_medium=GitHub&utm_campaign=D4Vinci\" target=\"_blank\">Stable proxies\u003C\u002Fa> for scraping, automation, and multi-accounting. Clean IPs, fast response, and reliable performance under load. Built for scalable workflows.\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Fwww.swiftproxy.net\u002F?ref=D4Vinci\" target=\"_blank\" title=\"Scalable Solutions for Web Data Access\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_aea40132f2bb.png\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>\n    \u003Ca href=\"https:\u002F\u002Fwww.swiftproxy.net\u002F?ref=D4Vinci\" target=\"_blank\">Swiftproxy\u003C\u002Fa> provides scalable residential proxies with 80M+ IPs across 195+ countries, delivering fast, reliable connections, automatic rotation, and strong anti-block performance. Free trial available.\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\u003Ci>\u003Csub>Do you want to show your ad here? Click [here](https:\u002F\u002Fgithub.com\u002Fsponsors\u002FD4Vinci\u002Fsponsorships?tier_id=586646)\u003C\u002Fsub>\u003C\u002Fi>\n# Sponsors \n\n\u003C!-- sponsors -->\n\n\n\u003Ca href=\"https:\u002F\u002Fserpapi.com\u002F?utm_source=scrapling\" target=\"_blank\" title=\"Scrape Google and other search engines with SerpApi\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_5c64a8e7d8f5.png\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fvisit.decodo.com\u002FDy6W0b\" target=\"_blank\" title=\"Try the Most Efficient Residential Proxies for Free\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_2ed01da1cc6f.png\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fhasdata.com\u002F?utm_source=github&utm_medium=banner&utm_campaign=D4Vinci\" target=\"_blank\" title=\"The web scraping service that actually beats anti-bot systems!\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_ba50a5b0dbed.png\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fproxyempire.io\u002F?ref=scrapling&utm_source=scrapling\" target=\"_blank\" title=\"Collect The Data Your Project Needs with the Best Residential Proxies\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_e22be457f4e3.png\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fwww.webshare.io\u002F?referral_code=48r2m2cd5uz1\" target=\"_blank\" title=\"The Most Reliable Proxy with Unparalleled Performance\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_37bec0dca10b.png\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fwww.crawleo.dev\u002F?utm_source=github&utm_medium=sponsor&utm_campaign=scrapling\" target=\"_blank\" title=\"Supercharge your AI with Real-Time Web Intelligence\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_8b7d37b0a929.png\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fwww.rapidproxy.io\u002F?ref=d4v\" target=\"_blank\" title=\"Affordable Access to the Proxy World – bypass CAPTCHAs blocks, and avoid additional costs.\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_4fc9773f0901.jpg\">\u003C\u002Fa>\n\n\n\u003C!-- \u002Fsponsors -->\n\n\u003Ci>\u003Csub>Do you want to show your ad here? Click [here](https:\u002F\u002Fgithub.com\u002Fsponsors\u002FD4Vinci) and choose the tier that suites you!\u003C\u002Fsub>\u003C\u002Fi>\n\n---\n\n## Key Features\n\n### Spiders - A Full Crawling Framework\n- 🕷️ **Scrapy-like Spider API**: Define spiders with `start_urls`, async `parse` callbacks, and `Request`\u002F`Response` objects.\n- ⚡ **Concurrent Crawling**: Configurable concurrency limits, per-domain throttling, and download delays.\n- 🔄 **Multi-Session Support**: Unified interface for HTTP requests, and stealthy headless browsers in a single spider - route requests to different sessions by ID.\n- 💾 **Pause & Resume**: Checkpoint-based crawl persistence. Press Ctrl+C for a graceful shutdown; restart to resume from where you left off.\n- 📡 **Streaming Mode**: Stream scraped items as they arrive via `async for item in spider.stream()` with real-time stats - ideal for UI, pipelines, and long-running crawls.\n- 🛡️ **Blocked Request Detection**: Automatic detection and retry of blocked requests with customizable logic.\n- 🤖 **Robots.txt Compliance**: Optional `robots_txt_obey` flag that respects `Disallow`, `Crawl-delay`, and `Request-rate` directives with per-domain caching.\n- 🧪 **Development Mode**: Cache responses to disk on the first run and replay them on subsequent runs - iterate on your `parse()` logic without re-hitting the target servers.\n- 📦 **Built-in Export**: Export results through hooks and your own pipeline or the built-in JSON\u002FJSONL with `result.items.to_json()` \u002F `result.items.to_jsonl()` respectively.\n\n### Advanced Websites Fetching with Session Support\n- **HTTP Requests**: Fast and stealthy HTTP requests with the `Fetcher` class. Can impersonate browsers' TLS fingerprint, headers, and use HTTP\u002F3.\n- **Dynamic Loading**: Fetch dynamic websites with full browser automation through the `DynamicFetcher` class supporting Playwright's Chromium and Google's Chrome.\n- **Anti-bot Bypass**: Advanced stealth capabilities with `StealthyFetcher` and fingerprint spoofing. Can easily bypass all types of Cloudflare's Turnstile\u002FInterstitial with automation.\n- **Session Management**: Persistent session support with `FetcherSession`, `StealthySession`, and `DynamicSession` classes for cookie and state management across requests.\n- **Proxy Rotation**: Built-in `ProxyRotator` with cyclic or custom rotation strategies across all session types, plus per-request proxy overrides.\n- **Domain & Ad Blocking**: Block requests to specific domains (and their subdomains) or enable built-in ad blocking (~3,500 known ad\u002Ftracker domains) in browser-based fetchers.\n- **DNS Leak Prevention**: Optional DNS-over-HTTPS support to route DNS queries through Cloudflare's DoH, preventing DNS leaks when using proxies.\n- **Async Support**: Complete async support across all fetchers and dedicated async session classes.\n\n### Adaptive Scraping & AI Integration\n- 🔄 **Smart Element Tracking**: Relocate elements after website changes using intelligent similarity algorithms.\n- 🎯 **Smart Flexible Selection**: CSS selectors, XPath selectors, filter-based search, text search, regex search, and more.\n- 🔍 **Find Similar Elements**: Automatically locate elements similar to found elements.\n- 🤖 **MCP Server to be used with AI**: Built-in MCP server for AI-assisted Web Scraping and data extraction. The MCP server features powerful, custom capabilities that leverage Scrapling to extract targeted content before passing it to the AI (Claude\u002FCursor\u002Fetc), thereby speeding up operations and reducing costs by minimizing token usage. ([demo video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=qyFk3ZNwOxE))\n\n### High-Performance & battle-tested Architecture\n- 🚀 **Lightning Fast**: Optimized performance outperforming most Python scraping libraries.\n- 🔋 **Memory Efficient**: Optimized data structures and lazy loading for a minimal memory footprint.\n- ⚡ **Fast JSON Serialization**: 10x faster than the standard library.\n- 🏗️ **Battle tested**: Not only does Scrapling have 92% test coverage and full type hints coverage, but it has been used daily by hundreds of Web Scrapers over the past year.\n\n### Developer\u002FWeb Scraper Friendly Experience\n- 🎯 **Interactive Web Scraping Shell**: Optional built-in IPython shell with Scrapling integration, shortcuts, and new tools to speed up Web Scraping scripts development, like converting curl requests to Scrapling requests and viewing requests results in your browser.\n- 🚀 **Use it directly from the Terminal**: Optionally, you can use Scrapling to scrape a URL without writing a single line of code!\n- 🛠️ **Rich Navigation API**: Advanced DOM traversal with parent, sibling, and child navigation methods.\n- 🧬 **Enhanced Text Processing**: Built-in regex, cleaning methods, and optimized string operations.\n- 📝 **Auto Selector Generation**: Generate robust CSS\u002FXPath selectors for any element.\n- 🔌 **Familiar API**: Similar to Scrapy\u002FBeautifulSoup with the same pseudo-elements used in Scrapy\u002FParsel.\n- 📘 **Complete Type Coverage**: Full type hints for excellent IDE support and code completion. The entire codebase is automatically scanned with **PyRight** and **MyPy** with each change.\n- 🔋 **Ready Docker image**: With each release, a Docker image containing all browsers is automatically built and pushed.\n\n## Getting Started\n\nLet's give you a quick glimpse of what Scrapling can do without deep diving.\n\n### Basic Usage\nHTTP requests with session support\n```python\nfrom scrapling.fetchers import Fetcher, FetcherSession\n\nwith FetcherSession(impersonate='chrome') as session:  # Use latest version of Chrome's TLS fingerprint\n    page = session.get('https:\u002F\u002Fquotes.toscrape.com\u002F', stealthy_headers=True)\n    quotes = page.css('.quote .text::text').getall()\n\n# Or use one-off requests\npage = Fetcher.get('https:\u002F\u002Fquotes.toscrape.com\u002F')\nquotes = page.css('.quote .text::text').getall()\n```\nAdvanced stealth mode\n```python\nfrom scrapling.fetchers import StealthyFetcher, StealthySession\n\nwith StealthySession(headless=True, solve_cloudflare=True) as session:  # Keep the browser open until you finish\n    page = session.fetch('https:\u002F\u002Fnopecha.com\u002Fdemo\u002Fcloudflare', google_search=False)\n    data = page.css('#padded_content a').getall()\n\n# Or use one-off request style, it opens the browser for this request, then closes it after finishing\npage = StealthyFetcher.fetch('https:\u002F\u002Fnopecha.com\u002Fdemo\u002Fcloudflare')\ndata = page.css('#padded_content a').getall()\n```\nFull browser automation\n```python\nfrom scrapling.fetchers import DynamicFetcher, DynamicSession\n\nwith DynamicSession(headless=True, disable_resources=False, network_idle=True) as session:  # Keep the browser open until you finish\n    page = session.fetch('https:\u002F\u002Fquotes.toscrape.com\u002F', load_dom=False)\n    data = page.xpath('\u002F\u002Fspan[@class=\"text\"]\u002Ftext()').getall()  # XPath selector if you prefer it\n\n# Or use one-off request style, it opens the browser for this request, then closes it after finishing\npage = DynamicFetcher.fetch('https:\u002F\u002Fquotes.toscrape.com\u002F')\ndata = page.css('.quote .text::text').getall()\n```\n\n### Spiders\nBuild full crawlers with concurrent requests, multiple session types, and pause\u002Fresume:\n```python\nfrom scrapling.spiders import Spider, Request, Response\n\nclass QuotesSpider(Spider):\n    name = \"quotes\"\n    start_urls = [\"https:\u002F\u002Fquotes.toscrape.com\u002F\"]\n    concurrent_requests = 10\n    \n    async def parse(self, response: Response):\n        for quote in response.css('.quote'):\n            yield {\n                \"text\": quote.css('.text::text').get(),\n                \"author\": quote.css('.author::text').get(),\n            }\n            \n        next_page = response.css('.next a')\n        if next_page:\n            yield response.follow(next_page[0].attrib['href'])\n\nresult = QuotesSpider().start()\nprint(f\"Scraped {len(result.items)} quotes\")\nresult.items.to_json(\"quotes.json\")\n```\nUse multiple session types in a single spider:\n```python\nfrom scrapling.spiders import Spider, Request, Response\nfrom scrapling.fetchers import FetcherSession, AsyncStealthySession\n\nclass MultiSessionSpider(Spider):\n    name = \"multi\"\n    start_urls = [\"https:\u002F\u002Fexample.com\u002F\"]\n    \n    def configure_sessions(self, manager):\n        manager.add(\"fast\", FetcherSession(impersonate=\"chrome\"))\n        manager.add(\"stealth\", AsyncStealthySession(headless=True), lazy=True)\n    \n    async def parse(self, response: Response):\n        for link in response.css('a::attr(href)').getall():\n            # Route protected pages through the stealth session\n            if \"protected\" in link:\n                yield Request(link, sid=\"stealth\")\n            else:\n                yield Request(link, sid=\"fast\", callback=self.parse)  # explicit callback\n```\nPause and resume long crawls with checkpoints by running the spider like this:\n```python\nQuotesSpider(crawldir=\".\u002Fcrawl_data\").start()\n```\nPress Ctrl+C to pause gracefully - progress is saved automatically. Later, when you start the spider again, pass the same `crawldir`, and it will resume from where it stopped.\n\n### Advanced Parsing & Navigation\n```python\nfrom scrapling.fetchers import Fetcher\n\n# Rich element selection and navigation\npage = Fetcher.get('https:\u002F\u002Fquotes.toscrape.com\u002F')\n\n# Get quotes with multiple selection methods\nquotes = page.css('.quote')  # CSS selector\nquotes = page.xpath('\u002F\u002Fdiv[@class=\"quote\"]')  # XPath\nquotes = page.find_all('div', {'class': 'quote'})  # BeautifulSoup-style\n# Same as\nquotes = page.find_all('div', class_='quote')\nquotes = page.find_all(['div'], class_='quote')\nquotes = page.find_all(class_='quote')  # and so on...\n# Find element by text content\nquotes = page.find_by_text('quote', tag='div')\n\n# Advanced navigation\nquote_text = page.css('.quote')[0].css('.text::text').get()\nquote_text = page.css('.quote').css('.text::text').getall()  # Chained selectors\nfirst_quote = page.css('.quote')[0]\nauthor = first_quote.next_sibling.css('.author::text')\nparent_container = first_quote.parent\n\n# Element relationships and similarity\nsimilar_elements = first_quote.find_similar()\nbelow_elements = first_quote.below_elements()\n```\nYou can use the parser right away if you don't want to fetch websites like below:\n```python\nfrom scrapling.parser import Selector\n\npage = Selector(\"\u003Chtml>...\u003C\u002Fhtml>\")\n```\nAnd it works precisely the same way!\n\n### Async Session Management Examples\n```python\nimport asyncio\nfrom scrapling.fetchers import FetcherSession, AsyncStealthySession, AsyncDynamicSession\n\nasync with FetcherSession(http3=True) as session:  # `FetcherSession` is context-aware and can work in both sync\u002Fasync patterns\n    page1 = session.get('https:\u002F\u002Fquotes.toscrape.com\u002F')\n    page2 = session.get('https:\u002F\u002Fquotes.toscrape.com\u002F', impersonate='firefox135')\n\n# Async session usage\nasync with AsyncStealthySession(max_pages=2) as session:\n    tasks = []\n    urls = ['https:\u002F\u002Fexample.com\u002Fpage1', 'https:\u002F\u002Fexample.com\u002Fpage2']\n    \n    for url in urls:\n        task = session.fetch(url)\n        tasks.append(task)\n    \n    print(session.get_pool_stats())  # Optional - The status of the browser tabs pool (busy\u002Ffree\u002Ferror)\n    results = await asyncio.gather(*tasks)\n    print(session.get_pool_stats())\n```\n\n## CLI & Interactive Shell\n\nScrapling includes a powerful command-line interface:\n\n[![asciicast](https:\u002F\u002Fasciinema.org\u002Fa\u002F736339.svg)](https:\u002F\u002Fasciinema.org\u002Fa\u002F736339)\n\nLaunch the interactive Web Scraping shell\n```bash\nscrapling shell\n```\nExtract pages to a file directly without programming (Extracts the content inside the `body` tag by default). If the output file ends with `.txt`, then the text content of the target will be extracted. If it ends in `.md`, it will be a Markdown representation of the HTML content; if it ends in `.html`, it will be the HTML content itself.\n```bash\nscrapling extract get 'https:\u002F\u002Fexample.com' content.md\nscrapling extract get 'https:\u002F\u002Fexample.com' content.txt --css-selector '#fromSkipToProducts' --impersonate 'chrome'  # All elements matching the CSS selector '#fromSkipToProducts'\nscrapling extract fetch 'https:\u002F\u002Fexample.com' content.md --css-selector '#fromSkipToProducts' --no-headless\nscrapling extract stealthy-fetch 'https:\u002F\u002Fnopecha.com\u002Fdemo\u002Fcloudflare' captchas.html --css-selector '#padded_content a' --solve-cloudflare\n```\n\n> [!NOTE]\n> There are many additional features, but we want to keep this page concise, including the MCP server and the interactive Web Scraping Shell. Check out the full documentation [here](https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002F)\n\n## Performance Benchmarks\n\nScrapling isn't just powerful-it's also blazing fast. The following benchmarks compare Scrapling's parser with the latest versions of other popular libraries.\n\n### Text Extraction Speed Test (5000 nested elements)\n\n| # |      Library      | Time (ms) | vs Scrapling | \n|---|:-----------------:|:---------:|:------------:|\n| 1 |     Scrapling     |   2.02    |     1.0x     |\n| 2 |   Parsel\u002FScrapy   |   2.04    |     1.01     |\n| 3 |     Raw Lxml      |   2.54    |    1.257     |\n| 4 |      PyQuery      |   24.17   |     ~12x     |\n| 5 |    Selectolax     |   82.63   |     ~41x     |\n| 6 |  MechanicalSoup   |  1549.71  |   ~767.1x    |\n| 7 |   BS4 with Lxml   |  1584.31  |   ~784.3x    |\n| 8 | BS4 with html5lib |  3391.91  |   ~1679.1x   |\n\n\n### Element Similarity & Text Search Performance\n\nScrapling's adaptive element finding capabilities significantly outperform alternatives:\n\n| Library     | Time (ms) | vs Scrapling |\n|-------------|:---------:|:------------:|\n| Scrapling   |   2.39    |     1.0x     |\n| AutoScraper |   12.45   |    5.209x    |\n\n\n> All benchmarks represent averages of 100+ runs. See [benchmarks.py](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fbenchmarks.py) for methodology.\n\n## Installation\n\nScrapling requires Python 3.10 or higher:\n\n```bash\npip install scrapling\n```\n\nThis installation only includes the parser engine and its dependencies, without any fetchers or commandline dependencies.\n\n### Optional Dependencies\n\n1. If you are going to use any of the extra features below, the fetchers, or their classes, you will need to install fetchers' dependencies and their browser dependencies as follows:\n    ```bash\n    pip install \"scrapling[fetchers]\"\n    \n    scrapling install           # normal install\n    scrapling install  --force  # force reinstall\n    ```\n\n    This downloads all browsers, along with their system dependencies and fingerprint manipulation dependencies.\n\n    Or you can install them from the code instead of running a command like this:\n    ```python\n    from scrapling.cli import install\n    \n    install([], standalone_mode=False)          # normal install\n    install([\"--force\"], standalone_mode=False) # force reinstall\n    ```\n\n2. Extra features:\n   - Install the MCP server feature:\n       ```bash\n       pip install \"scrapling[ai]\"\n       ```\n   - Install shell features (Web Scraping shell and the `extract` command): \n       ```bash\n       pip install \"scrapling[shell]\"\n       ```\n   - Install everything: \n       ```bash\n       pip install \"scrapling[all]\"\n       ```\n   Remember that you need to install the browser dependencies with `scrapling install` after any of these extras (if you didn't already)\n\n### Docker\nYou can also install a Docker image with all extras and browsers with the following command from DockerHub:\n```bash\ndocker pull pyd4vinci\u002Fscrapling\n```\nOr download it from the GitHub registry:\n```bash\ndocker pull ghcr.io\u002Fd4vinci\u002Fscrapling:latest\n```\nThis image is automatically built and pushed using GitHub Actions and the repository's main branch.\n\n## Contributing\n\nWe welcome contributions! Please read our [contributing guidelines](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002FCONTRIBUTING.md) before getting started.\n\n## Disclaimer\n\n> [!CAUTION]\n> This library is provided for educational and research purposes only. By using this library, you agree to comply with local and international data scraping and privacy laws. The authors and contributors are not responsible for any misuse of this software. Always respect the terms of service of websites and robots.txt files.\n\n## 🎓 Citations\nIf you have used our library for research purposes please quote us with the following reference:\n```text\n  @misc{scrapling,\n    author = {Karim Shoair},\n    title = {Scrapling},\n    year = {2024},\n    url = {https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling},\n    note = {An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!}\n  }\n```\n\n## License\n\nThis work is licensed under the BSD-3-Clause License.\n\n## Acknowledgments\n\nThis project includes code adapted from:\n- Parsel (BSD License)-Used for [translator](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fscrapling\u002Fcore\u002Ftranslator.py) submodule\n\n---\n\u003Cdiv align=\"center\">\u003Csmall>Designed & crafted with ❤️ by Karim Shoair.\u003C\u002Fsmall>\u003C\u002Fdiv>\u003Cbr>\n","\u003C!-- mcp-name: io.github.D4Vinci\u002FScrapling -->\n\n\u003Ch1 align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fscrapling.readthedocs.io\">\n        \u003Cpicture>\n          \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fdocs\u002Fassets\u002Fcover_dark.svg?sanitize=true\">\n          \u003Cimg alt=\"Scrapling Poster\" src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fdocs\u002Fassets\u002Fcover_light.svg?sanitize=true\">\n        \u003C\u002Fpicture>\n    \u003C\u002Fa>\n    \u003Cbr>\n    \u003Csmall>面向现代Web的轻松网页抓取\u003C\u002Fsmall>\n\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F14244\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_4a68feb902da.png\" alt=\"D4Vinci%2FScrapling | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n    \u003Cbr\u002F>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_AR.md\">العربيه\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_ES.md\">Español\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_FR.md\">Français\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_DE.md\">Deutsch\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_CN.md\">简体中文\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_JP.md\">日本語\u003C\u002Fa> |  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_RU.md\">Русский\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fdocs\u002FREADME_KR.md\">한국어\u003C\u002Fa>\n    \u003Cbr\u002F>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Factions\u002Fworkflows\u002Ftests.yml\" alt=\"Tests\">\n        \u003Cimg alt=\"Tests\" src=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Factions\u002Fworkflows\u002Ftests.yml\u002Fbadge.svg\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fbadge.fury.io\u002Fpy\u002FScrapling\" alt=\"PyPI version\">\n        \u003Cimg alt=\"PyPI version\" src=\"https:\u002F\u002Fbadge.fury.io\u002Fpy\u002FScrapling.svg\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fclickpy.clickhouse.com\u002Fdashboard\u002Fscrapling\" rel=\"nofollow\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Fscrapling\" alt=\"PyPI package downloads\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Ftree\u002Fmain\u002Fagent-skill\" alt=\"AI Agent Skill directory\">\n        \u003Cimg alt=\"Static Badge\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSkill-black?style=flat&label=Agent&link=https%3A%2F%2Fgithub.com%2FD4Vinci%2FScrapling%2Ftree%2Fmain%2Fagent-skill\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fclawhub.ai\u002FD4Vinci\u002Fscrapling-official\" alt=\"OpenClaw Skill\">\n        \u003Cimg alt=\"OpenClaw Skill\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FClawhub-darkred?style=flat&label=OpenClaw&link=https%3A%2F%2Fclawhub.ai%2FD4Vinci%2Fscrapling-official\">\u003C\u002Fa>\n    \u003Cbr\u002F>\n    \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FEMgGbDceNQ\" alt=\"Discord\" target=\"_blank\">\n      \u003Cimg alt=\"Discord\" src=\"https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1360786381042880532?style=social&logo=discord&link=https%3A%2F%2Fdiscord.gg%2FEMgGbDceNQ\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fx.com\u002FScrapling_dev\" alt=\"X (formerly Twitter)\">\n      \u003Cimg alt=\"X (formerly Twitter) Follow\" src=\"https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002FScrapling_dev?style=social&logo=x&link=https%3A%2F%2Fx.com%2FScrapling_dev\">\n    \u003C\u002Fa>\n    \u003Cbr\u002F>\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fscrapling\u002F\" alt=\"Supported Python versions\">\n        \u003Cimg alt=\"Supported Python versions\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fscrapling.svg\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Fparsing\u002Fselection.html\">\u003Cstrong>选择方法\u003C\u002Fstrong>\u003C\u002Fa>\n    &middot;\n    \u003Ca href=\"https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Ffetching\u002Fchoosing.html\">\u003Cstrong>采集器\u003C\u002Fstrong>\u003C\u002Fa>\n    &middot;\n    \u003Ca href=\"https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Fspiders\u002Farchitecture.html\">\u003Cstrong>爬虫\u003C\u002Fstrong>\u003C\u002Fa>\n    &middot;\n    \u003Ca href=\"https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Fspiders\u002Fproxy-blocking.html\">\u003Cstrong>代理轮换\u003C\u002Fstrong>\u003C\u002Fa>\n    &middot;\n    \u003Ca href=\"https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Fcli\u002Foverview.html\">\u003Cstrong>命令行工具\u003C\u002Fstrong>\u003C\u002Fa>\n    &middot;\n    \u003Ca href=\"https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Fai\u002Fmcp-server.html\">\u003Cstrong>MCP\u003C\u002Fstrong>\u003C\u002Fa>\n\u003C\u002Fp>\n\nScrapling是一个自适应的网页抓取框架，能够处理从单个请求到大规模爬取的各种任务。\n\n它的解析器会根据网站的变化不断学习，并在页面更新时自动重新定位你的元素。其采集器开箱即用，即可绕过Cloudflare Turnstile等反机器人系统。而爬虫框架则允许你通过几行Python代码，轻松实现并发、多会话的爬取，支持暂停\u002F继续以及自动代理轮换——所有这些功能都集成在一个库中，无需任何妥协。\n\n极速爬取，实时统计与流式输出。由网页抓取者为网页抓取者及普通用户打造，适合各类人群使用。\n\n```python\nfrom scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, DynamicFetcher\nStealthyFetcher.adaptive = True\np = StealthyFetcher.fetch('https:\u002F\u002Fexample.com', headless=True, network_idle=True)  # 在不被察觉的情况下抓取网站！\nproducts = p.css('.product', auto_save=True)                                        # 抓取即使网站设计变更也能保持稳定的数据！\nproducts = p.css('.product', adaptive=True)                                         # 如果网站结构随后发生变化，只需传入`adaptive=True`即可重新找到它们！\n```\n或者扩展到完整的爬取：\n```python\nfrom scrapling.spiders import Spider, Response\n\nclass MySpider(Spider):\n  name = \"demo\"\n  start_urls = [\"https:\u002F\u002Fexample.com\u002F\"]\n\n  async def parse(self, response: Response):\n      for item in response.css('.product'):\n          yield {\"title\": item.css('h2::text').get()}\n\nMySpider().start()\n```\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fdataimpulse.com\u002F?utm_source=scrapling&utm_medium=banner&utm_campaign=scrapling\" target=\"_blank\" style=\"display:flex; justify-content:center; padding:4px 0;\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_3c9d0e631045.png\" alt=\"在DataImpulse，我们专注于为您的企业开发定制代理服务。无论身在何处，都能发起请求、收集数据，并享受我们优质代理带来的快速连接。\" style=\"max-height:60px;\">\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n# 白金赞助商\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Fhypersolutions.co\u002F?utm_source=github&utm_medium=readme&utm_campaign=scrapling\" target=\"_blank\" title=\"用于Akamai、DataDome、Incapsula和Kasada的机器人防护绕过API\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_37d452d3d29e.png\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd> Scrapling可以处理Cloudflare Turnstile。对于企业级保护，\u003Ca href=\"https:\u002F\u002Fhypersolutions.co?utm_source=github&utm_medium=readme&utm_campaign=scrapling\">\n        \u003Cb>Hyper Solutions\u003C\u002Fb>\n      \u003C\u002Fa> 提供API端点，可为\u003Cb>Akamai\u003C\u002Fb>、\u003Cb>DataDome\u003C\u002Fb>、\u003Cb>Kasada\u003C\u002Fb>和\u003Cb>Incapsula\u003C\u002Fb>生成有效的反机器人令牌。只需简单的API调用，无需浏览器自动化。\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Fbirdproxies.com\u002Ft\u002Fscrapling\" target=\"_blank\" title=\"在Bird Proxies，我们消除您遇到的IP被封禁、地理限制和高成本等问题，让您专注于工作。\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_7af244ed50bf.jpg\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>嘿，我们打造了\u003Ca href=\"https:\u002F\u002Fbirdproxies.com\u002Ft\u002Fscrapling\">\n        \u003Cb>BirdProxies\u003C\u002Fb>\n      \u003C\u002Fa>,因为代理不应该复杂或价格过高。195+个地区的快速住宅和ISP代理，定价公道，并提供真正的支持。\u003Cbr \u002F>\n      \u003Cb>在着陆页上玩我们的FlappyBird游戏，即可免费获取数据！\u003C\u002Fb>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\" target=\"_blank\" title=\"Evomi是您的瑞士品质代理提供商，起价仅为0.49美元\u002FGB\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_7fa2b47caf30.png\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>\n      \u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\">\n        \u003Cb>Evomi\u003C\u002Fb>\n      \u003C\u002Fa>：住宅代理低至0.49美元\u002FGB。配备完全伪造Chromium内核的爬虫浏览器，提供住宅IP、自动解决CAPTCHA以及绕过反机器人机制。 \u003C\u002Fbr>\n      \u003Cb>无 hassle的Scraper API，可轻松获得结果。支持MCP和N8N集成。\u003C\u002Fb>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Ftikhub.io\u002F?utm_source=github.com\u002FD4Vinci\u002FScrapling&utm_medium=marketing_social&utm_campaign=retargeting&utm_content=carousel_ad\" target=\"_blank\" title=\"释放社交媒体数据与AI的潜力\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_c3e69635b465.jpg\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>\n      \u003Ca href=\"https:\u002F\u002Ftikhub.io\u002F?utm_source=github.com\u002FD4Vinci\u002FScrapling&utm_medium=marketing_social&utm_campaign=retargeting&utm_content=carousel_ad\" target=\"_blank\">TikHub.io\u003C\u002Fa> 提供覆盖TikTok、X、YouTube和Instagram等16+平台的900+条稳定API，拥有超过4000万的数据集。\u003Cbr \u002F> 同时还提供\u003Ca href=\"https:\u002F\u002Fai.tikhub.io\u002F?ref=KarimShoair\" target=\"_blank\">折扣AI模型\u003C\u002Fa>——Claude、GPT、GEMINI等最高可享71%折扣。\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Fwww.nsocks.com\u002F?keyword=2p67aivg\" target=\"_blank\" title=\"面向AI应用的可扩展Web数据访问\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_c8e83028e0ff.png\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>\n    \u003Ca href=\"https:\u002F\u002Fwww.nsocks.com\u002F?keyword=2p67aivg\" target=\"_blank\">Nsocks\u003C\u002Fa> 为开发者和爬虫提供快速的住宅和ISP代理。全球IP覆盖、高度匿名性、智能轮换以及可靠的性能，适用于自动化和数据提取。使用\u003Ca href=\"https:\u002F\u002Fwww.xcrawl.com\u002F?keyword=2p67aivg\" target=\"_blank\">Xcrawl\u003C\u002Fa> 可简化大规模网页爬取。\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Fpetrosky.io\u002Fd4vinci\" target=\"_blank\" title=\"PetroSky提供尖端的VPS托管服务\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_6716c1e3393c.png\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>\n    关上你的笔记本电脑吧。你的爬虫会一直运行下去。\u003Cbr \u002F>\n    \u003Ca href=\"https:\u002F\u002Fpetrosky.io\u002Fd4vinci\" target=\"_blank\">PetroSky VPS\u003C\u002Fa> ——专为不间断自动化而打造的云服务器。Windows和Linux系统，完全可控。每月仅需6.99欧元起。\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Fsubstack.thewebscraping.club\u002Fp\u002Fscrapling-hands-on-guide?utm_source=github&utm_medium=repo&utm_campaign=scrapling\" target=\"_blank\" title=\"专注于网络爬虫的第一号通讯\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_2a0a55e90413.png\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>\n    阅读关于\u003Ca href=\"https:\u002F\u002Fsubstack.thewebscraping.club\u002Fp\u002Fscrapling-hands-on-guide?utm_source=github&utm_medium=repo&utm_campaign=scrapling\" target=\"_blank\">Scrapling在The Web Scraping Club\u003C\u002Fa>上的完整评测（2025年11月），这是专注于网络爬虫的第一号通讯。\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Fproxy-seller.com\u002F?partner=CU9CAA5TBYFFT2\" target=\"_blank\" title=\"Proxy-Seller为网络爬虫提供可靠的代理基础设施\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_ffdbda06a50d.png\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>\n    \u003Ca href=\"https:\u002F\u002Fproxy-seller.com\u002F?partner=CU9CAA5TBYFFT2\" target=\"_blank\">Proxy-Seller\u003C\u002Fa> 为网络爬虫提供可靠的代理基础设施，涵盖IPv4、IPv6、ISP、住宅和移动代理，性能稳定、地理覆盖广泛，并提供灵活的套餐以满足商业规模的数据采集需求。\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"http:\u002F\u002Fmangoproxy.com\u002F?utm_source=D4Vinci&utm_medium=GitHub&utm_campaign=D4Vinci\" target=\"_blank\" title=\"值得信赖的代理：住宅、服务器和移动\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_900a02654e71.png\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>\n    \u003Ca href=\"http:\u002F\u002Fmangoproxy.com\u002F?utm_source=D4Vinci&utm_medium=GitHub&utm_campaign=D4Vinci\" target=\"_blank\">稳定的代理\u003C\u002Fa>适用于爬虫、自动化和多账号操作。干净的IP、快速响应，且在负载下表现可靠。专为可扩展的工作流程而设计。\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd width=\"200\">\n      \u003Ca href=\"https:\u002F\u002Fwww.swiftproxy.net\u002F?ref=D4Vinci\" target=\"_blank\" title=\"面向Web数据访问的可扩展解决方案\">\n        \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_aea40132f2bb.png\">\n      \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd>\n    \u003Ca href=\"https:\u002F\u002Fwww.swiftproxy.net\u002F?ref=D4Vinci\" target=\"_blank\">Swiftproxy\u003C\u002Fa> 提供可扩展的住宅代理，拥有195+个国家的8000万+个IP地址，能够提供快速、可靠的连接、自动轮换以及强大的防封堵性能。现提供免费试用。\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\u003Ci>\u003Csub>您想在这里展示您的广告吗？点击[这里](https:\u002F\u002Fgithub.com\u002Fsponsors\u002FD4Vinci\u002Fsponsorships?tier_id=586646)\u003C\u002Fsub>\u003C\u002Fi>\n\n# 赞助商\n\n\u003C!-- sponsors -->\n\n\n\u003Ca href=\"https:\u002F\u002Fserpapi.com\u002F?utm_source=scrapling\" target=\"_blank\" title=\"使用 SerpApi 抓取 Google 及其他搜索引擎\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_5c64a8e7d8f5.png\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fvisit.decodo.com\u002FDy6W0b\" target=\"_blank\" title=\"免费试用最高效的住宅代理\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_2ed01da1cc6f.png\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fhasdata.com\u002F?utm_source=github&utm_medium=banner&utm_campaign=D4Vinci\" target=\"_blank\" title=\"真正能突破反爬虫系统的网页抓取服务！\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_ba50a5b0dbed.png\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fproxyempire.io\u002F?ref=scrapling&utm_source=scrapling\" target=\"_blank\" title=\"用最佳住宅代理收集项目所需数据\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_e22be457f4e3.png\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fwww.webshare.io\u002F?referral_code=48r2m2cd5uz1\" target=\"_blank\" title=\"性能无与伦比的最可靠代理\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_37bec0dca10b.png\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fwww.crawleo.dev\u002F?utm_source=github&utm_medium=sponsor&utm_campaign=scrapling\" target=\"_blank\" title=\"用实时网络情报为您的 AI 提速\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_8b7d37b0a929.png\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fwww.rapidproxy.io\u002F?ref=d4v\" target=\"_blank\" title=\"经济实惠地接入代理世界——绕过验证码拦截，避免额外费用。\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_readme_4fc9773f0901.jpg\">\u003C\u002Fa>\n\n\n\u003C!-- \u002Fsponsors -->\n\n\u003Ci>\u003Csub>您想在此处展示广告吗？请点击[这里](https:\u002F\u002Fgithub.com\u002Fsponsors\u002FD4Vinci)，选择适合您的赞助层级吧！\u003C\u002Fsub>\u003C\u002Fi>\n\n---\n\n## 核心功能\n\n### 爬虫——完整的爬取框架\n- 🕷️ **类 Scrapy 的爬虫 API**：通过 `start_urls`、异步 `parse` 回调以及 `Request`\u002F`Response` 对象定义爬虫。\n- ⚡ **并发爬取**：可配置的并发限制、按域名限速及下载延迟。\n- 🔄 **多会话支持**：统一的 HTTP 请求接口，同时在单个爬虫中使用隐身无头浏览器——可根据 ID 将请求路由到不同会话。\n- 💾 **暂停与恢复**：基于检查点的爬取持久化。按下 Ctrl+C 即可优雅关闭；重启后可从上次中断处继续。\n- 📡 **流式模式**：通过 `async for item in spider.stream()` 流式输出抓取到的数据，并提供实时统计信息——非常适合 UI、数据管道和长时间运行的爬取任务。\n- 🛡️ **被阻请求检测**：自动检测并重试被阻断的请求，逻辑可自定义。\n- 🤖 **Robots.txt 合规性**：可选的 `robots_txt_obey` 标志，尊重 `Disallow`、`Crawl-delay` 和 `Request-rate` 指令，并按域名缓存。\n- 🧪 **开发模式**：首次运行时将响应缓存到磁盘，后续运行时可回放——无需再次访问目标服务器即可迭代 `parse()` 逻辑。\n- 📦 **内置导出**：可通过钩子和自定义管道，或直接使用内置的 JSON\u002FJSONL 格式（分别调用 `result.items.to_json()` \u002F `result.items.to_jsonl()`）导出结果。\n\n### 支持会话的高级网站抓取\n- **HTTP 请求**：使用 `Fetcher` 类进行快速且隐蔽的 HTTP 请求。可模拟浏览器的 TLS 指纹、头部信息，并支持 HTTP\u002F3。\n- **动态加载**：通过支持 Playwright 的 Chromium 和 Google Chrome 的 `DynamicFetcher` 类，实现对动态网站的完整浏览器自动化抓取。\n- **反爬虫绕过**：借助 `StealthyFetcher` 和指纹伪造功能，具备高级隐身能力。可轻松绕过 Cloudflare 的 Turnstile\u002FInterstitial 等各类验证机制。\n- **会话管理**：提供 `FetcherSession`、`StealthySession` 和 `DynamicSession` 类，用于跨请求的 Cookie 和状态管理，支持持久化会话。\n- **代理轮换**：内置 `ProxyRotator`，可在所有会话类型中使用循环或自定义轮换策略，还可针对每次请求单独覆盖代理。\n- **域名与广告拦截**：可阻止对特定域名（及其子域名）的请求，或在基于浏览器的抓取器中启用内置广告拦截功能（约 3,500 个已知广告\u002F追踪域名）。\n- **DNS 泄漏防护**：可选的 DNS-over-HTTPS 支持，将 DNS 查询通过 Cloudflare 的 DoH 路由，防止使用代理时发生 DNS 泄漏。\n- **异步支持**：所有抓取器及专用异步会话类均完全支持异步操作。\n\n### 自适应抓取与 AI 集成\n- 🔄 **智能元素跟踪**：利用智能相似度算法，在网站内容更新后重新定位元素。\n- 🎯 **智能灵活选择**：支持 CSS 选择器、XPath 选择器、基于筛选器的搜索、文本搜索、正则表达式搜索等多种方式。\n- 🔍 **查找相似元素**：自动定位与已找到元素相似的其他元素。\n- 🤖 **MCP 服务器配合 AI 使用**：内置 MCP 服务器，用于 AI 辅助的网页抓取和数据提取。该服务器具有强大的自定义功能，能够利用 Scrapling 提取目标内容后再传递给 AI（Claude\u002FCursor 等），从而加快处理速度并减少 token 使用量，降低成本。（[演示视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=qyFk3ZNwOxE)）\n\n### 高性能且经过实战检验的架构\n- 🚀 **闪电般快速**：性能优化，超越大多数 Python 抓取库。\n- 🔋 **内存高效**：优化的数据结构和懒加载设计，占用内存极低。\n- ⚡ **快速 JSON 序列化**：速度是标准库的 10 倍。\n- 🏗️ **经受考验**：Scrapling 不仅测试覆盖率高达 92%，类型注解覆盖全面，而且在过去一年中已被数百名网页抓取者每日使用。\n\n### 面向开发者和网页爬虫的友好体验\n- 🎯 **交互式网页爬虫 Shell**：可选的内置 IPython shell，集成 Scrapling，提供快捷键和新工具，以加速网页爬虫脚本的开发，例如将 curl 请求转换为 Scrapling 请求，并在浏览器中查看请求结果。\n- 🚀 **直接从终端使用**：你也可以不写任何代码，直接用 Scrapling 抓取一个 URL！\n- 🛠️ **丰富的导航 API**：支持父节点、兄弟节点和子节点导航的高级 DOM 遍历方法。\n- 🧬 **增强的文本处理**：内置正则表达式、清理方法以及优化的字符串操作。\n- 📝 **自动选择器生成**：为任意元素生成健壮的 CSS\u002FXPath 选择器。\n- 🔌 **熟悉的 API**：与 Scrapy\u002FBeautifulSoup 类似，使用与 Scrapy\u002FParsel 相同的伪元素。\n- 📘 **完整的类型覆盖**：全面的类型提示，提供出色的 IDE 支持和代码补全。每次代码变更时，整个代码库都会自动使用 **PyRight** 和 **MyPy** 进行扫描。\n- 🔋 **现成的 Docker 镜像**：每次发布时，都会自动构建并推送包含所有浏览器的 Docker 镜像。\n\n## 快速入门\n\n让我们快速了解一下 Scrapling 的功能，而无需深入探讨。\n\n### 基本用法\n支持会话的 HTTP 请求\n```python\nfrom scrapling.fetchers import Fetcher, FetcherSession\n\nwith FetcherSession(impersonate='chrome') as session:  # 使用最新版本的 Chrome TLS 指纹\n    page = session.get('https:\u002F\u002Fquotes.toscrape.com\u002F', stealthy_headers=True)\n    quotes = page.css('.quote .text::text').getall()\n\n# 或者使用一次性请求\npage = Fetcher.get('https:\u002F\u002Fquotes.toscrape.com\u002F')\nquotes = page.css('.quote .text::text').getall()\n```\n高级隐身模式\n```python\nfrom scrapling.fetchers import StealthyFetcher, StealthySession\n\nwith StealthySession(headless=True, solve_cloudflare=True) as session:  # 浏览器会一直保持打开状态直到任务完成\n    page = session.fetch('https:\u002F\u002Fnopecha.com\u002Fdemo\u002Fcloudflare', google_search=False)\n    data = page.css('#padded_content a').getall()\n\n# 或者采用一次性请求模式，浏览器仅在本次请求期间打开，完成后即关闭\npage = StealthyFetcher.fetch('https:\u002F\u002Fnopecha.com\u002Fdemo\u002Fcloudflare')\ndata = page.css('#padded_content a').getall()\n```\n完整的浏览器自动化\n```python\nfrom scrapling.fetchers import DynamicFetcher, DynamicSession\n\nwith DynamicSession(headless=True, disable_resources=False, network_idle=True) as session:  # 浏览器会一直保持打开状态直到任务完成\n    page = session.fetch('https:\u002F\u002Fquotes.toscrape.com\u002F', load_dom=False)\n    data = page.xpath('\u002F\u002Fspan[@class=\"text\"]\u002Ftext()').getall()  # 如果你喜欢，也可以使用 XPath 选择器\n\n# 或者采用一次性请求模式，浏览器仅在本次请求期间打开，完成后即关闭\npage = DynamicFetcher.fetch('https:\u002F\u002Fquotes.toscrape.com\u002F')\ndata = page.css('.quote .text::text').getall()\n```\n\n### 爬虫\n构建支持并发请求、多种会话类型以及暂停\u002F恢复功能的完整爬虫：\n```python\nfrom scrapling.spiders import Spider, Request, Response\n\nclass QuotesSpider(Spider):\n    name = \"quotes\"\n    start_urls = [\"https:\u002F\u002Fquotes.toscrape.com\u002F\"]\n    concurrent_requests = 10\n    \n    async def parse(self, response: Response):\n        for quote in response.css('.quote'):\n            yield {\n                \"text\": quote.css('.text::text').get(),\n                \"author\": quote.css('.author::text').get(),\n            }\n            \n        next_page = response.css('.next a')\n        if next_page:\n            yield response.follow(next_page[0].attrib['href'])\n\nresult = QuotesSpider().start()\nprint(f\"已抓取 {len(result.items)} 条 quotes\")\nresult.items.to_json(\"quotes.json\")\n```\n在一个爬虫中使用多种会话类型：\n```python\nfrom scrapling.spiders import Spider, Request, Response\nfrom scrapling.fetchers import FetcherSession, AsyncStealthySession\n\nclass MultiSessionSpider(Spider):\n    name = \"multi\"\n    start_urls = [\"https:\u002F\u002Fexample.com\u002F\"]\n    \n    def configure_sessions(self, manager):\n        manager.add(\"fast\", FetcherSession(impersonate=\"chrome\"))\n        manager.add(\"stealth\", AsyncStealthySession(headless=True), lazy=True)\n    \n    async def parse(self, response: Response):\n        for link in response.css('a::attr(href)').getall():\n            # 将受保护的页面通过隐身会话处理\n            if \"protected\" in link:\n                yield Request(link, sid=\"stealth\")\n            else:\n                yield Request(link, sid=\"fast\", callback=self.parse)  # 显式回调\n```\n通过检查点实现长时间爬虫的暂停与恢复，只需这样运行爬虫即可：\n```python\nQuotesSpider(crawldir=\".\u002Fcrawl_data\").start()\n```\n按下 Ctrl+C 可以优雅地暂停，进度会自动保存。之后再次启动爬虫时，传入相同的 `crawldir`，它就会从中断的地方继续。\n\n### 高级解析与导航\n```python\nfrom scrapling.fetchers import Fetcher\n\n# 丰富的元素选择与导航\npage = Fetcher.get('https:\u002F\u002Fquotes.toscrape.com\u002F')\n\n# 使用多种方式获取 quotes\nquotes = page.css('.quote')  # CSS 选择器\nquotes = page.xpath('\u002F\u002Fdiv[@class=\"quote\"]')  # XPath\nquotes = page.find_all('div', {'class': 'quote'})  # 类似 BeautifulSoup 的方式\n# 同样也可以说\nquotes = page.find_all('div', class_='quote')\nquotes = page.find_all(['div'], class_='quote')\nquotes = page.find_all(class_='quote')  # 以此类推...\n# 根据文本内容查找元素\nquotes = page.find_by_text('quote', tag='div')\n\n# 高级导航\nquote_text = page.css('.quote')[0].css('.text::text').get()\nquote_text = page.css('.quote').css('.text::text').getall()  # 链式选择器\nfirst_quote = page.css('.quote')[0]\nauthor = first_quote.next_sibling.css('.author::text')\nparent_container = first_quote.parent\n\n# 元素关系与相似性\nsimilar_elements = first_quote.find_similar()\nbelow_elements = first_quote.below_elements()\n```\n如果你不想抓取网站，也可以直接使用解析器：\n```python\nfrom scrapling.parser import Selector\n\npage = Selector(\"\u003Chtml>...\u003C\u002Fhtml>\")\n```\n其使用方式完全相同！\n\n### 异步会话管理示例\n```python\nimport asyncio\nfrom scrapling.fetchers import FetcherSession, AsyncStealthySession, AsyncDynamicSession\n\nasync with FetcherSession(http3=True) as session:  # `FetcherSession` 具有上下文感知能力，可在同步和异步模式下工作\n    page1 = session.get('https:\u002F\u002Fquotes.toscrape.com\u002F')\n    page2 = session.get('https:\u002F\u002Fquotes.toscrape.com\u002F', impersonate='firefox135')\n\n# 异步会话使用\nasync with AsyncStealthySession(max_pages=2) as session:\n    tasks = []\n    urls = ['https:\u002F\u002Fexample.com\u002Fpage1', 'https:\u002F\u002Fexample.com\u002Fpage2']\n    \n    for url in urls:\n        task = session.fetch(url)\n        tasks.append(task)\n    \n    print(session.get_pool_stats())  # 可选 - 浏览器标签池的状态（忙碌\u002F空闲\u002F错误）\n    results = await asyncio.gather(*tasks)\n    print(session.get_pool_stats())\n```\n\n## 命令行界面与交互式 Shell\n\nScrapling 包含一个功能强大的命令行界面：\n\n[![asciicast](https:\u002F\u002Fasciinema.org\u002Fa\u002F736339.svg)](https:\u002F\u002Fasciinema.org\u002Fa\u002F736339)\n\n启动交互式网页抓取 Shell\n```bash\nscrapling shell\n```\n无需编程即可直接将页面内容提取到文件中（默认提取 `body` 标签内的内容）。如果输出文件以 `.txt` 结尾，则会提取目标的文本内容；如果以 `.md` 结尾，则会生成 HTML 内容的 Markdown 表示；如果以 `.html` 结尾，则会保存原始的 HTML 内容。\n```bash\nscrapling extract get 'https:\u002F\u002Fexample.com' content.md\nscrapling extract get 'https:\u002F\u002Fexample.com' content.txt --css-selector '#fromSkipToProducts' --impersonate 'chrome'  # 提取所有匹配 CSS 选择器 '#fromSkipToProducts' 的元素\nscrapling extract fetch 'https:\u002F\u002Fexample.com' content.md --css-selector '#fromSkipToProducts' --no-headless\nscrapling extract stealthy-fetch 'https:\u002F\u002Fnopecha.com\u002Fdemo\u002Fcloudflare' captchas.html --css-selector '#padded_content a' --solve-cloudflare\n```\n\n> [!NOTE]\n> Scrapling 还有许多其他功能，但我们希望保持本页面简洁，包括 MCP 服务器和交互式网页抓取 Shell。完整的文档请参阅 [这里](https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002F)\n\n## 性能基准测试\n\nScrapling 不仅功能强大，而且速度极快。以下基准测试将 Scrapling 的解析器与其他流行库的最新版本进行了比较。\n\n### 文本提取速度测试（5000 个嵌套元素）\n\n| 序号 |      库       | 时间 (ms) | 相对于 Scrapling |\n|---|:-------------:|:---------:|:----------------:|\n| 1 |     Scrapling     |   2.02    |     1.0x     |\n| 2 |   Parsel\u002FScrapy   |   2.04    |     1.01     |\n| 3 |     Raw Lxml      |   2.54    |    1.257     |\n| 4 |      PyQuery      |   24.17   |     ~12x     |\n| 5 |    Selectolax     |   82.63   |     ~41x     |\n| 6 |  MechanicalSoup   |  1549.71  |   ~767.1x    |\n| 7 |   BS4 with Lxml   |  1584.31  |   ~784.3x    |\n| 8 | BS4 with html5lib |  3391.91  |   ~1679.1x   |\n\n\n### 元素相似度与文本搜索性能\n\nScrapling 的自适应元素查找能力显著优于其他工具：\n\n| 库     | 时间 (ms) | 相对于 Scrapling |\n|-------------|:---------:|:------------:|\n| Scrapling   |   2.39    |     1.0x     |\n| AutoScraper |   12.45   |    5.209x    |\n\n\n> 所有基准测试均基于 100 次以上的平均运行结果。方法论请参阅 [benchmarks.py](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fbenchmarks.py)。\n\n## 安装\n\nScrapling 需要 Python 3.10 或更高版本：\n\n```bash\npip install scrapling\n```\n\n此安装仅包含解析引擎及其依赖项，不包括任何抓取器或命令行依赖。\n\n### 可选依赖\n\n1. 如果您打算使用以下任何附加功能、抓取器或其类，则需要按如下方式安装抓取器及其浏览器依赖：\n    ```bash\n    pip install \"scrapling[fetchers]\"\n    \n    scrapling install           # 正常安装\n    scrapling install  --force  # 强制重新安装\n    ```\n\n    这将下载所有浏览器及其系统依赖和指纹伪装依赖。\n\n    或者，您也可以通过代码而非命令来安装：\n    ```python\n    from scrapling.cli import install\n    \n    install([], standalone_mode=False)          # 正常安装\n    install([\"--force\"], standalone_mode=False) # 强制重新安装\n    ```\n\n2. 附加功能：\n   - 安装 MCP 服务器功能：\n       ```bash\n       pip install \"scrapling[ai]\"\n       ```\n   - 安装 Shell 功能（网页抓取 Shell 和 `extract` 命令）：\n       ```bash\n       pip install \"scrapling[shell]\"\n       ```\n   - 安装所有功能：\n       ```bash\n       pip install \"scrapling[all]\"\n       ```\n   请注意，在安装这些附加功能后，仍需使用 `scrapling install` 来安装浏览器依赖（如果您尚未安装的话）。\n\n### Docker\n您还可以从 Docker Hub 使用以下命令安装包含所有附加功能和浏览器的 Docker 镜像：\n```bash\ndocker pull pyd4vinci\u002Fscrapling\n```\n或者从 GitHub 注册表下载：\n```bash\ndocker pull ghcr.io\u002Fd4vinci\u002Fscrapling:latest\n```\n该镜像由 GitHub Actions 自动构建并推送到仓库的主分支。\n\n## 贡献\n我们欢迎您的贡献！在开始之前，请阅读我们的 [贡献指南](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002FCONTRIBUTING.md)。\n\n## 免责声明\n\n> [!CAUTION]\n> 本库仅供教育和研究目的使用。使用本库即表示您同意遵守当地及国际的数据抓取和隐私法律。作者和贡献者对本软件的任何滥用概不负责。请始终尊重网站的服务条款和 robots.txt 文件。\n\n## 🎓 引用\n如果您在研究中使用了我们的库，请使用以下引用：\n```text\n  @misc{scrapling,\n    author = {Karim Shoair},\n    title = {Scrapling},\n    year = {2024},\n    url = {https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling},\n    note = {一个自适应的网页抓取框架，可处理从单个请求到大规模爬取的所有任务！}\n  }\n```\n\n## 许可证\n本作品采用 BSD-3-Clause 许可证。\n\n## 致谢\n该项目包含改编自以下项目的代码：\n- Parsel（BSD 许可证）——用于 [translator](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fblob\u002Fmain\u002Fscrapling\u002Fcore\u002Ftranslator.py) 子模块\n\n---\n\u003Cdiv align=\"center\">\u003Csmall>由 Karim Shoair 用心设计并制作。\u003C\u002Fsmall>\u003C\u002Fdiv>\u003Cbr>","# Scrapling 快速上手指南\n\nScrapling 是一个自适应的 Web 爬虫框架，专为现代网页设计。它能自动适应网站结构变化，内置绕过 Cloudflare Turnstile 等反机器人系统的功能，并支持从单次请求到大规模并发爬取的全流程。\n\n## 环境准备\n\n- **操作系统**：Windows、macOS 或 Linux\n- **Python 版本**：3.8 及以上（推荐 3.9+）\n- **前置依赖**：\n  - `pip` 包管理工具\n  - 稳定的网络连接（访问目标网站及 PyPI）\n\n> 💡 **国内开发者提示**：若安装过程中下载缓慢，建议使用国内镜像源加速（见安装步骤）。\n\n## 安装步骤\n\n使用 pip 直接安装最新稳定版：\n\n```bash\npip install scrapling\n```\n\n**推荐使用国内镜像源加速安装：**\n\n```bash\npip install scrapling -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n或使用阿里云镜像：\n\n```bash\npip install scrapling -i https:\u002F\u002Fmirrors.aliyun.com\u002Fpypi\u002Fsimple\u002F\n```\n\n## 基本使用\n\n### 1. 简单页面抓取（自动适应结构变化）\n\n```python\nfrom scrapling.fetchers import StealthyFetcher\n\n# 启用自适应模式\nStealthyFetcher.adaptive = True\n\n#  stealthily 获取页面（无头模式，等待网络空闲）\np = StealthyFetcher.fetch('https:\u002F\u002Fexample.com', headless=True, network_idle=True)\n\n# 提取产品数据，auto_save=True 可保存选择器以便后续自适应匹配\nproducts = p.css('.product', auto_save=True)\n\n# 若网站结构已变，可显式启用 adaptive=True 进行智能重定位\nproducts = p.css('.product', adaptive=True)\n\nfor item in products:\n    print(item.css('h2::text').get())\n```\n\n### 2. 构建简易爬虫（支持异步与并发）\n\n```python\nfrom scrapling.spiders import Spider, Response\n\nclass MySpider(Spider):\n    name = \"demo\"\n    start_urls = [\"https:\u002F\u002Fexample.com\u002F\"]\n\n    async def parse(self, response: Response):\n        for item in response.css('.product'):\n            yield {\n                \"title\": item.css('h2::text').get()\n            }\n\n# 启动爬虫\nMySpider().start()\n```\n\n此示例将自动处理请求、解析数据，并支持后续扩展代理轮换、断点续爬等功能。\n\n---\n\n现在你已掌握 Scrapling 的核心用法。更多高级功能（如动态渲染、MCP 集成、CLI 工具等）请参考官方文档：https:\u002F\u002Fscrapling.readthedocs.io","某电商数据团队需要每日监控竞争对手在多个动态渲染网站上的价格变动，以调整自身定价策略。\n\n### 没有 Scrapling 时\n- 面对采用 JavaScript 动态加载内容的网站，传统请求库无法获取数据，被迫维护沉重的浏览器自动化脚本，运行缓慢且资源消耗巨大。\n- 目标站点频繁更新 HTML 结构或部署反爬机制，导致硬编码的选择器瞬间失效，开发人员需花费大量时间手动修复代码。\n- 缺乏内置的智能代理轮换和指纹伪装功能，爬虫 IP 极易被封禁，数据采集任务经常中断，难以保证数据的连续性。\n- 处理单页请求与全站抓取需要编写两套完全不同的逻辑架构，代码复用率低，项目扩展和维护成本极高。\n\n### 使用 Scrapling 后\n- 利用其自适应解析引擎，Scrapling 能自动处理动态渲染内容，无需启动浏览器即可提取数据，将采集速度提升数倍并大幅降低服务器负载。\n- 凭借智能选择器修复机制，当网页结构微调时，Scrapling 能自动适应变化，显著减少了因页面更新导致的维护工作和停机时间。\n- 内置的代理轮换与防阻塞策略自动管理请求指纹和 IP 池，有效绕过反爬检测，确保大规模抓取任务的稳定运行和高成功率。\n- 统一的框架设计让从单次 API 请求到复杂的全站爬虫只需一套代码逻辑，极大简化了开发流程，使团队能快速响应新的采集需求。\n\nScrapling 通过自适应能力和一体化架构，将脆弱的定制脚本转变为稳健、高效且易于维护的现代数据采集系统。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FD4Vinci_Scrapling_2545a2dc.png","D4Vinci","Karim shoair","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FD4Vinci_d01f4ed7.jpg","An extremely curious creature who loves to learn. A Computer Science and Information Security enthusiast. Web Scraping Expert.",null,"Egypt","D4Vinci1","https:\u002F\u002Fgithub.com\u002FD4Vinci",[81,85],{"name":82,"color":83,"percentage":84},"Python","#3572A5",99.9,{"name":86,"color":87,"percentage":88},"Dockerfile","#384d54",0.1,36740,3144,"2026-04-14T02:34:23","BSD-3-Clause","Linux, macOS, Windows","未说明",{"notes":96,"python":97,"dependencies":98},"Scrapling 是一个自适应网页爬虫框架，内置反反爬功能（如绕过 Cloudflare Turnstile）。若使用 StealthyFetcher 或 DynamicFetcher 进行动态渲染，需安装并配置 Playwright 浏览器驱动。支持异步抓取、代理轮换及断点续爬。无重型 AI 模型依赖，常规爬虫任务对硬件要求较低。","3.8+",[99,100,101,102,103],"scrapling","httpx","playwright","beautifulsoup4","cssselect",[52,16,15,13,14],[106,107,108,101,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124],"crawler","crawling","crawling-python","python","scraping","selectors","stealth","web-scraper","web-scraping","web-scraping-python","webscraping","xpath","automation","ai","ai-scraping","data","data-extraction","mcp","mcp-server","2026-03-27T02:49:30.150509","2026-04-14T20:49:51.677512",[128,133,138,143,148,153],{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},33281,"如何在 Scrapling 中为每个请求使用不同的代理（代理轮换）？","要启用代理轮换功能，需要确保浏览器已正确初始化。如果遇到 \"RuntimeError: Browser not initialized for proxy rotation mode\" 错误，通常是因为配置方式不正确。该问题已在后续版本中修复，请重新安装最新版本的补丁。具体使用方法可参考官方文档中的动态抓取部分，确保在初始化 Fetcher 时正确传递代理列表参数。","https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fissues\u002F215",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},33282,"遇到 \"'TextHandlers' object has no attribute 'partition'\" 错误怎么办？","这是一个已知 bug，发生在特定版本的 Scrapling 中。维护者已在 v0.2.96 版本中修复了此问题。解决方法是升级 Scrapling 到最新版本：\npip install --upgrade scrapling\n升级后该错误将不再出现。","https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fissues\u002F41",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},33283,"如何在 MCP 服务器或其他库中使用 Scrapling 而不输出 \"Downloading...\" 等日志信息？","Scrapling 的日志可以通过 Python 的 logging 模块进行控制。要禁止输出下载状态等信息，可以将日志级别设置为 ERROR 或更高。示例代码如下：\nimport logging\nlogging.getLogger(\"scrapling\").setLevel(logging.ERROR)\n\n如果仍需拦截 print 语句输出的内容，可以使用上下文重定向器（context redirector）来重定向 stdout。注意不要将日志级别设置为 DEBUG，否则所有信息都会被输出。","https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fissues\u002F50",{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},33284,"如何向 PlayWright 传递初始化脚本（init_script）？","从最新版本开始，Scrapling 支持通过 init_script 参数传递 JS 文件路径。使用方法如下：\nfrom scrapling.fetchers import StealthyFetcher\nStealthyFetcher.fetch('https:\u002F\u002Fexample.com', init_script=\"\u002Fabsolute\u002Fpath\u002Fto\u002Fjs\u002Fscript.js\")\n\n注意：如果启用了 stealth 模式（如使用 StealthyFetcher 或 DynamicFetcher），由于 JS 执行环境与主页面隔离，无法直接从外部调用或获取 init_script 中定义的函数返回值。","https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fissues\u002F56",{"id":149,"question_zh":150,"answer_zh":151,"source_url":152},33285,"使用 StealthyFetcher 抓取带有 Cloudflare 验证的页面时一直加载失败怎么办？","如果页面卡在 Cloudflare 验证阶段即使设置了很长的 timeout（如 120000ms），可能是当前环境的指纹被识别或网络问题导致。建议检查以下几点：\n1. 确保使用的是最新版本的 Scrapling 和浏览器驱动；\n2. 尝试更换 IP 或使用住宅代理；\n3. 检查是否触发了更高级的反爬机制，可能需要调整 fetcher 的配置或结合其他绕过方案。\n该问题有时也与目标站点策略变化有关，需持续观察更新。","https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fissues\u002F47",{"id":154,"question_zh":155,"answer_zh":156,"source_url":147},33286,"为什么在启用 stealth 模式时无法调用 init_script 中定义的 JavaScript 函数？","在 StealthyFetcher 或 DynamicFetcher 中启用 stealth 模式时，所有 JavaScript 执行都在隔离环境中进行，以增强反检测能力。这意味着 init_script 中定义的函数无法被主页面访问，也无法从 Python 端直接调用或获取其返回值。这是设计上的限制，旨在提高隐蔽性。如需交互，可考虑在不启用完整 stealth 的模式下运行，或通过页面注入其他方式实现通信。",[158,163,168,173,178,183,188,193,198,203,208,213,218,223,228,233,238,243,248,253],{"id":159,"version":160,"summary_zh":161,"released_at":162},255445,"v0.4.6","**关于浏览器隐身、隐私保护和开发者体验的聚焦更新 🔒**\n\n> [!NOTE]  \n> **[在 X 上关注我们，获取每日技巧与窍门](https:\u002F\u002Fx.com\u002FScrapling_dev)**\n\n## 🚀 新功能与体验优化\n\n- **为浏览器抓取器新增内置广告拦截功能**。通过传递 `block_ads=True` 参数，可在路由拦截层面阻止对约 3,500 个已知广告及追踪域名的请求——无需解析 DNS、无需建立 TCP 连接，立即中止请求。此功能可与 `blocked_domains` 配合使用，以实现自定义黑名单。MCP 服务器和 CLI 的 `--ai-targeted` 模式会自动启用该功能，从而节省 Token 并加快页面加载速度。\n    ```python\n    page = StealthyFetcher.fetch('https:\u002F\u002Fexample.com', block_ads=True)\n    ```\n- **新增 DNS-over-HTTPS 支持**，防止使用代理时发生 DNS 泄露。通过传递 `dns_over_https=True` 参数，可将 DNS 查询路由至 Cloudflare 的 DoH 服务，即使 HTTP 流量经过代理，您的真实位置也不会因 DNS 解析而暴露。\n    ```python\n    page = StealthyFetcher.fetch('https:\u002F\u002Fexample.com', proxy='http:\u002F\u002Fproxy:8080', dns_over_https=True)\n    ```\n- **为浏览器抓取器新增 `page_setup` 回调函数**。该函数会在 `page.goto()` 执行之前运行，允许您注册事件监听器、路由或脚本，以便在页面导航前完成相关设置。它与 `page_action`（在导航完成后执行）相辅相成。（解决了 [#237](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fissues\u002F237)）\n    ```python\n    def capture_websockets(page):\n        page.on(\"websocket\", lambda ws: print(f\"WS: {ws.url}\"))\n\n    page = DynamicFetcher.fetch('https:\u002F\u002Fexample.com', page_setup=capture_websockets)\n    ```\n- **为 `fetch` 和 `stealthy-fetch` 命令新增 `--block-ads` 和 `--dns-over-https` CLI 参数**。\n\n## 🐛 错误修复\n\n- **修复了 `Seconds` 类型别名拒绝浮点数值的问题**。此前，向浏览器抓取器传递 `wait=1.5` 或 `timeout=500.0` 时，会因类型错误而失败，原因是该类型别名错误地将 `float` 视作元数据而非有效类型。由 @kuishou68 在 [#240](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F240) 中修复。\n- **修复了全路径选择器生成时 ID 段重复的问题**。带有 `id` 属性的元素在生成完整 CSS\u002FXPath 路径时，其 ID 会被重复添加两次，导致生成类似 `body > #main > #main > #target > #target` 的选择器。此外，还修复了全路径 XPath 输出裸露的 `[@id='x']` 谓词（无效 XPath），而非正确的 `*[@id='x']` 格式。由 @sjhddh 在 [#241](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F241) 中修复。\n- **修复了交互式 Shell 缺少参数签名的问题**。交互式 Shell 的函数签名中缺少 `blocked_domains`、`block_ads`、`retries`、`retry_delay`、`capture_xhr`、`executable_path` 和 `dns_over_https` 等参数。\n\n_🙏 特别感谢社区持续的测试与反馈_\n\n---\n\n### 向我们的白金赞助商致以热烈祝贺\n\n\u003Ca href=\"https:\u002F\u002Fhypersolutions.co\u002F?utm_source=github&utm_medium=readme&utm_campaign=scrapling\" target=\"_blank\" title=\"适用于 Akamai、DataDome、Incapsula 和 Kasada 的机器人防护绕过 API\">","2026-04-13T13:36:47",{"id":164,"version":165,"summary_zh":166,"released_at":167},255446,"v0.4.5","**一次专注的更新，为爬虫开发者带来一项重磅的体验优化功能，以及几处重要的修复 🎉**\n\n> [!NOTE]\n> **[关注我们在 X 上获取每日技巧与窍门](https:\u002F\u002Fx.com\u002FScrapling_dev)**\n\n\n## 🚀 新功能与体验优化\n\n- **爬虫开发模式**：过去在迭代爬虫的 `parse()` 逻辑时，每次运行都需要重新请求目标服务器，这不仅速度慢、噪音大，还很容易在你还在调试选择器时就被限流。新的开发模式会在首次运行时将所有响应缓存到磁盘，并在后续每次运行时直接从磁盘重放这些响应，因此你可以随意调整回调函数并多次重新运行，而无需发起任何网络请求。只需通过一个类属性即可启用：\n\n    ```python\n    class MySpider(Spider):\n        name = \"my_spider\"\n        start_urls = [\"https:\u002F\u002Fexample.com\"]\n        development_mode = True\n\n        async def parse(self, response):\n            yield {\"title\": response.css(\"title::text\").get(\"\")}\n    ```\n\n    缓存默认存储在 `.scrapling_cache\u002F{spider.name}\u002F` 目录下，也可以通过 `development_cache_dir` 参数将其指向其他位置。新增的两个统计计数器 `cache_hits` 和 `cache_misses` 可以帮助你了解缓存的表现。缓存重放会绕过 `download_delay`、限流机制以及被拦截请求的重试流程，因此迭代速度仅受限于磁盘读写性能。请勿将 `development_mode = True` 的爬虫部署到生产环境——它只是一个开发工具，而非生产级缓存。更多详细信息请参阅[文档](https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Fspiders\u002Fadvanced.html#development-mode)。\n\n- **默认更安全的重定向**：现在所有 HTTP 请求器、MCP 服务器和 Shell 中的 `follow_redirects` 默认值已改为 `\"safe\"`。重定向仍然会被遵循，但针对内部\u002F私有 IP 地址（如回环地址、私有网络、链路本地地址）的重定向将被拒绝。这有助于在抓取用户提供的 URL 时防范 SSRF 攻击。若需恢复旧行为，可将 `follow_redirects` 设置为 `\"all\"`；若要完全禁用重定向，则可将其设为 `False`。\n\n## 🐛 Bug 修复\n\n- **强制停止不再丢失检查点**：过去，在启用了 `crawldir` 的爬虫上连续两次按下 Ctrl+C（即强制停止）时，会与检查点写入操作产生竞争——取消作用域会在序列化完成之前销毁任务，导致 `paused=False`，进而触发清理流程并删除之前的检查点。结果就是，强制停止长时间运行的爬虫可能会丢失你原本想要保存的所有进度。现在，引擎会在调用 `cancel_scope.cancel()` 之前先写入检查点，因此强制停止始终能够保留最新的待处理状态。由 @voidborne-d 在 [#230](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F230) 中实现。\n\n\n_🙏 特别感谢社区成员持续的测试与反馈_\n\n---\n\n\u003Cdiv style=\"text-align: center;\">\n  \u003Ca href=\"https:\u002F\u002Fhypersolutions.co\u002F?utm_source=github&utm_medium=readme&utm_campaign=scrapling\" target=\"_blank\" title=\"适用于 Akamai、DataDome 的机器人防护绕过 API，","2026-04-07T04:22:07",{"id":169,"version":170,"summary_zh":171,"released_at":172},255447,"v0.4.4","**一项包含重要爬虫改进和错误修复的新更新 🎉**\n\n## 🚀 新功能与体验优化\n- **在爬虫框架中新增了 robots.txt 合规性支持**，引入了 `robots_txt_obey` 选项。启用后，爬虫将在开始抓取前自动获取并遵守 robots.txt 文件中的规则，包括 `Disallow`、`Crawl-delay` 和 `Request-rate` 指令。robots.txt 文件会按域名并发获取并缓存，供整个爬取过程使用。由 @AbdullahY36 在 [#226](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F226) 中实现。\n- **增加了 robots.txt 缓存预热功能**，确保所有起始 URL 的域名在爬取循环开始前就已完成 robots.txt 的获取和解析，从而避免首次请求时的延迟。\n- **在 `CrawlStats` 中新增了 `robots_disallowed_count` 统计项**，用于记录爬取过程中因 robots.txt 规则而被阻止的请求数量。\n\n更多详情请访问官网：[此处](https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Fspiders\u002Fgetting-started.html#robotstxt-compliance)\n\n## 🐛 错误修复\n- **修复了 `ProxyRotator` 中的一个严重 MRO 问题**：`_build_context_with_proxy` 占位方法遮蔽了子类中的实际实现，导致代理轮换始终抛出 `NotImplementedError`（修复自 [#215](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fissues\u002F215)）。感谢 @yetval。\n- **修复了页面池泄漏问题**：在使用基于请求的代理轮换配合浏览器会话时，临时上下文中创建的页面未在清理时从页面池中移除，从而导致过期引用随时间累积。由 @yetval 在 [#223](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F223) 中修复。\n- **修复了静态爬取器中的类型断言缺失问题**：当使用 `curl_cffi` 时，`session.request()` 可能返回 `None`，进而引发下游错误。\n\n## 其他\n- 更新了依赖库，带来了最新的指纹信息及其他改进。\n- 在可选的 `fetchers` 组下新增了 `protego` 作为 robots.txt 解析的新依赖。\n\n_🙏 特别感谢社区持续的测试与反馈_\n\n---\n\n### 致敬我们的白金赞助商\n\n\u003Ca href=\"https:\u002F\u002Fhypersolutions.co\u002F?utm_source=github&utm_medium=readme&utm_campaign=scrapling\" target=\"_blank\" title=\"针对 Akamai、DataDome、Incapsula 和 Kasada 的机器人防护绕过 API\">\n\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002FHyperSolutions.png\" width=\"240\" height=\"100\">\n\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fbirdproxies.com\u002Ft\u002Fscrapling\" target=\"_blank\" title=\"在 Bird Proxies，我们帮您消除 IP 被封、地理限制和高昂成本等困扰，让您专注于核心业务。\">\n\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002FBirdProxies.jpg\" width=\"240\" height=\"100\">\n\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\" target=\"_blank\" title=\"Evomi 是您的瑞士品质代理提供商，流量资费低至每 GB 0.49 美元\">\n\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fevomi.png\" width","2026-04-05T03:37:22",{"id":174,"version":175,"summary_zh":176,"released_at":177},255448,"v0.4.3","**一项包含多项重要更新的新版本 🎉**\n\n## 🚀 新功能与体验优化\n- **新增了一个MCP工具**，用于打开一个持久化的普通\u002F隐身浏览器窗口，以便与其他工具协同使用；同时新增了另一个工具用于关闭该浏览器。([示例](https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Fai\u002Fmcp-server.html?h=Using+Persistent+Sessions#examples))\n- **新增了一个MCP工具**，用于列出所有现有的浏览器会话。该工具旨在与新工具配合使用。\n- **为浏览器会话添加了一个新选项**，可自动收集请求过程中发生的所有后台请求（解决了[#159](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F159)问题）[[示例](https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Ffetching\u002Fdynamic.html#capturing-xhrfetch-requests)]。\n- **新增了一种清理器**，通过移除隐藏或不可见的内容，保护MCP服务器免受常见的提示注入攻击。\n- **为网页抓取命令添加了一个新的命令行选项** `--ai-targeted`，使内容更符合AI需求，并能有效防范类似MCP服务器的常见提示注入攻击。\n- **为浏览器会话添加了一个新选项** `executable_path`，允许设置自定义的浏览器路径（解决了[#202](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F202)问题）。\n- **重构了** MCP服务器代码，使其更易于维护，并将所有工具统一为异步模式。\n- **重构了** CLI命令代码，使其更易于维护，并减少了210行代码。\n\n\n## 🐛 错误修复\n- @karesansui-u 在 [#201](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F201) 中修复了在爬虫会话中跨重试保持HTTP方法的问题。\n- @haosenwang1018 和 @D4Vinci 在 [#197](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F197) 中为获取页面内容增加了最大重试次数限制，以防止无限循环。\n- @haosenwang1018 在 [#196](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F196) 中将 `_restore_from_checkpoint` 函数中的裸 `raise` 替换为 `return False`。\n- 将 `Texthandler` 中的 `get_all` 替换为 `getall`，以与 `Selector` 类保持一致。\n\n## 覆盖率与测试改进\n- @Bortlesboat 在 [#192](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F192) 中增加了 `_normalize_credentials` 边界情况的覆盖率测试。\n- @haosenwang1018 在 [#193](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F193) 中增加了保存\u002F检索往返以及核心存储的覆盖率测试。\n- @haosenwang1018 在 [#194](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F194) 中为 `TextHandler` 的正则路径和 `TextHandlers.re()` 增加了覆盖率测试。\n- @awanawana 在 [#200](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F200) 中为 `filter`、`iterancestors` 和 `find_similar` 添加了边界情况测试。\n\n## 代理技能改进\n- @yetval 在 [#204](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F204) 中修复了技能引用中损坏的Markdown链接。\n- 改进了技能结构，使其更符合 [Clawhub](https:\u002F\u002Fclawhub.ai\u002Fd4vinci\u002Fscrapling-official) 的验证标准。\n- 强制技能在通过命令行命令进行抓取时使用 `--ai-targeted` 命令行选项。\n\n## 文档改进","2026-03-30T03:50:58",{"id":179,"version":180,"summary_zh":181,"released_at":182},255449,"v0.4.2","**一项包含重要变更的新维护更新**\n\n#### 错误修复\n- `get_all_text()` 函数现在会捕获尾部文本节点。这将使 MCP 服务器和相关命令能够识别之前遗漏的文本内容（[#168](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F168)）。感谢 @mhillebrand！\n- Referer 现在返回的是纯 Google URL，而非 Google 搜索链接。之前的逻辑存在错误，可能会泄露指纹信息（[#179](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F179)）。感谢 @Bortlesboat！\n- 修复了所有浏览器中多余标志位拼接的问题。感谢 @rostchri！\n- 修复了 Python 3.12 以下版本中因类型提示问题导致程序崩溃的情况。（解决了 [#163](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fissues\u002F163)）\n\n#### 其他\n- 新增了针对 Claude Code \u002F OpenClaw 及其他 AI 代理工具的 Agent Skill。\n- 将该 Agent Skill 添加至 Clawhub。\n- 将所有浏览器及 Playwright 版本更新至最新版本。\n- 在主 README 文件中添加了法语翻译。\n\n_🙏 特别感谢社区持续不断的测试与反馈_\n\n---\n\n### 向我们的白金赞助商致以热烈祝贺\n\u003Ca href=\"https:\u002F\u002Fhypersolutions.co\u002F?utm_source=github&utm_medium=readme&utm_campaign=scrapling\" target=\"_blank\" title=\"用于 Akamai、DataDome、Incapsula 和 Kasada 的机器人防护绕过 API\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002FHyperSolutions.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fbirdproxies.com\u002Ft\u002Fscrapling\" target=\"_blank\" title=\"在 Bird Proxies，我们帮您消除 IP 被封、地理限制和高昂成本等困扰，让您专注于业务本身。\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002FBirdProxies.jpg\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\" target=\"_blank\" title=\"Evomi 是您的瑞士品质代理提供商，价格低至每 GB 0.49 美元\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fevomi.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Ftikhub.io\u002F?ref=KarimShoair\" target=\"_blank\" title=\"释放社交媒体数据与 AI 的强大潜力\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002FTikHub.jpg\">\u003C\u002Fa>","2026-03-08T23:37:38",{"id":184,"version":185,"summary_zh":186,"released_at":187},255450,"v0.4.1","**一项包含多项重要更新的新版本**\n\n## 🚀 新功能与体验优化\n- **提升了正则表达式的精确度**，用于检测 Cloudflare 挑战页面（感谢 @Rinz27 [#133](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F133)）\n- **提高了 Cloudflare 解决器的速度和效率**，现在速度几乎提升了一倍。\n- **优化了 Cloudflare 解决器**，使其能够处理网站在重定向到主站之前有时会两次显示 Cloudflare 页面的情况。\n- 通过移除注入的 JS 文件，**提升了无痕浏览器的隐身模式和运行速度**。\n- **改进了 MCP 架构**，使其符合 OpenCode 的要求（感谢 @robin-ede [#137](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F137)）\n- **进一步增强了 MCP 架构的兼容性**，使其能被 VS Code Copilot 及其他严格工具接受。（解决了 #150 问题）\n- 在启用 `main_content_only` 选项时，通过去除无用的 HTML 标签，**大幅降低了 MCP 服务器令牌的消耗**。\n- 修复了 PyPI 页面，并添加了相关文件，以便将 MCP 服务器注册到 MCP 服务器注册表中。\n- 新增了一个代码片段，展示如何通过代码而非命令行来安装浏览器依赖项，从而更方便地实现自动化。\n- 通过使用最新版本的 GitHub Actions，**优化了所有工作流**（感谢 @salmanmkc [#143](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F143)\u002F[#144](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F144)）\n\n_🙏 特别感谢社区持续的测试与反馈_","2026-02-27T04:12:56",{"id":189,"version":190,"summary_zh":191,"released_at":192},255451,"v0.4","**Scrapling迄今为止最大的一次发布——引入蜘蛛框架、代理轮换以及重大的解析器改进**\n\n本次发布带来了全新的异步蜘蛛\u002F爬虫框架、智能代理管理，以及显著的API变更，使Scrapling功能更强大、一致性更高。请在升级前仔细阅读“破坏性变更”部分。\n\n🕷️ 蜘蛛框架\n============\n\n基于`anyio`构建的全新异步爬虫框架，专为结构化的大规模数据抓取而设计：\n\n```python\nfrom scrapling.spiders import Spider, Response\n\nclass MySpider(Spider):\n  name = \"demo\"\n  start_urls = [\"https:\u002F\u002Fexample.com\u002F\"]\n\n  async def parse(self, response: Response):\n      for item in response.css('.product'):\n          yield {\"title\": item.css('h2::text').get()}\n\nMySpider().start()\n```\n\n- **类似Scrapy的蜘蛛API**：通过`start_urls`、异步`parse`回调、`Request`\u002F`Response`对象以及优先级队列来定义蜘蛛。\n- **并发爬取**：可配置的并发限制、按域名限速及下载延迟。\n- **多会话支持**：统一的HTTP请求接口，并可在单个蜘蛛中无缝集成隐身无头浏览器——通过会话ID将请求路由到不同会话。支持懒加载式会话初始化。\n- **暂停与恢复**：基于检查点的爬取持久化。按下Ctrl+C即可优雅地关闭爬虫；随后重启即可从上次中断处继续。\n- **流式模式**：通过`async for item in spider.stream()`实时流式输出抓取到的数据项，并提供实时统计信息——非常适合用于UI界面、数据管道以及长时间运行的爬虫任务。\n- **被拦截请求检测**：自动检测并重试被拦截的请求，且逻辑可自定义。\n- **内置导出功能**：可通过钩子或自定义管道导出结果，亦可使用内置的JSON\u002FJSONL格式，分别调用`result.items.to_json()`和`result.items.to_jsonl()`进行导出。\n- **生命周期钩子**：包括`on_start()`、`on_close()`、`on_error()`、`on_scraped_item()`等，提供对爬虫生命周期的全面控制。\n- **详细的爬取统计**：跟踪请求数量、响应、字节数、状态码、使用的代理、按域名\u002F会话的细分数据、日志级别计数等丰富信息。\n- **uvloop支持**：在`spider.start()`时传入`use_uvloop=True`，即可在可用时实现更快的异步执行。\n\n网站已新增专门章节，详细介绍相关内容。请点击[这里](https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Fspiders\u002Farchitecture.html)。\n\n🔄 代理轮换\n============\n\n* 新增线程安全的`ProxyRotator`类，适用于所有抓取器和会话：\n  ```python\n  from scrapling import ProxyRotator\n  rotator = ProxyRotator([\"http:\u002F\u002Fproxy1:8080\", \"http:\u002F\u002Fproxy2:8080\"])\n  Fetcher.get(url, proxy_rotator=rotator)\n  ```\n* **自定义轮换策略**：允许用户自定义代理轮换逻辑。\n* **按请求覆盖代理**：在任何单独的`get()`、`post()`或`fetch()`调用中传入`proxy=`参数，即可覆盖该次请求的会话代理。\n\n🌐 浏览器抓取器改进\n============\n\n* **域名 ","2026-02-15T05:13:39",{"id":194,"version":195,"summary_zh":196,"released_at":197},255452,"v0.3.14","**一次小的维护更新，用于修复 v0.3.13 中部分设备出现的问题**\n\n- 禁用了 `StealthyFetcher` 及其会话类中的无痕模式，因为该模式会导致 Windows 设备上的 Cookie 在不同页面之间无法保持持久性。而在 macOS 和 Linux 上并未出现此问题。（修复了 [#123](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fissues\u002F123)，感谢 @frugality4121 提出该问题，以及 @gembleman 指出解决方案）\n- 锁定了 browserforge 的最新版本，以解决那些已使用较旧 browserforge 版本的用户在处理旧版头部模型时遇到的问题。\n\n\n_🙏 特别感谢我们的 [Discord 社区](https:\u002F\u002Fdiscord.gg\u002FEMgGbDceNQ)，感谢大家持续的测试与反馈_\n\n---\n\n### 向我们最大的赞助商致以热烈的感谢\n\u003Ca href=\"https:\u002F\u002Fwww.scrapeless.com\u002Fen?utm_source=official&utm_term=scrapling\" target=\"_blank\" title=\"为企业和开发者提供的轻松网页抓取工具包\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fscrapeless.jpg\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fwww.thordata.com\u002F?ls=github&lk=github\" target=\"_blank\" title=\"不可阻挡的代理和抓取基础设施，提供实时、可靠的网络数据，为 AI 模型和工作流提供支持。\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fthordata.jpg\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\" target=\"_blank\" title=\"Evomi 是您的瑞士品质代理提供商，起价仅为每 GB 0.49 美元\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fevomi.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fserpapi.com\u002F?utm_source=scrapling\" target=\"_blank\" title=\"使用 SerpApi 抓取 Google 及其他搜索引擎的数据\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002FSerpApi.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fvisit.decodo.com\u002FDy6W0b\" target=\"_blank\" title=\"免费试用最高效的住宅代理\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fdecodo.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fpetrosky.io\u002Fd4vinci\" target=\"_blank\" title=\"PetroSky 提供尖端的 VPS 托管服务。\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fpetrosky.png\">\u003C\u002Fa>","2026-01-03T20:29:30",{"id":199,"version":200,"summary_zh":201,"released_at":202},255453,"v0.3.13","**这是一次重大更新，涉及多处改进，但也包含许多出于合理原因的破坏性变更。请在更新前仔细阅读以下内容。**\n\n* 出于多种考虑，我们决定从现在起完全停止使用 Camoufox，并在未来其开发继续推进的情况下再考虑重新启用。如果您希望在此版本发布前继续沿用 Camoufox，可在本[章节](https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Ffetching\u002Fstealthy\u002F#using-camoufox-as-an-engine)中找到相关说明。\n\n* 此前，我们在 `DynamicFetcher` 及其会话类的隐身模式中使用 patchright。现在我们已将其隐身模式移除，转而在 `StealthyFetcher` 及其会话类中集成 patchright，并进行了大量优化——正如您将看到的那样，在 patchright 的基础上进一步提升了整体隐身效果。\n\n**这使得 `StealthyFetcher` 及其会话类的速度比之前提升了 101%，内存和存储占用更少，代码量减少了约 400 行；更重要的是，相比此前使用 Camoufox 时，稳定性显著提高。**\n\n此外，此次调整还将缩短 `scrapling install` 命令的安装时间，减小 Docker 镜像体积，提升 GitHub CI 中的测试流畅度，并让新用户更容易上手 Scrapling。\n\n## 破坏性变更\n1. `DynamicFetcher` 类及其会话类中已移除 `stealth` 参数，而 `hide_canvas` 参数则被移至 `StealthyFetcher` 及其会话类。\n2. `disable_webgl` 参数已从 `DynamicFetcher` 移至 `StealthyFetcher` 类，并更名为 `allow_webgl`，所有会话类亦同。\n3. `StealthyFetcher` 类现已成为 `DynamicFetcher` 的全新隐身版本，因此以下参数已被移除：`block_images`、`humanize`、`addons`、`os_randomize`、`disable_ads` 和 `geoip`。我曾尝试在 Chromium 中实现这些功能，但每项都存在各自的问题。不过，在 v0.4 发布之前，这些问题可能会随着后续版本得到解决。\n\n接下来是好消息：我们对许多方面进行了改进和修复 :)\n\n## 改进\n- 您已经了解到，`StealthyFetcher` 类及其会话类的速度比之前提升了 101%。与此同时，`DynamicFetcher` 类及其会话类的速度也提升了 20%。\n- Cloudflare 解决器算法现已优化，运行速度更快、适用场景更多。得益于新的重构，预计该解决器处理验证码的速度将提升一倍！\n- 所有抓取器的内存占用均有所降低。\n- MCP 服务器现在消耗的令牌更少，从而节省更多成本！\n- Docker 镜像体积缩小了 60%。\n- 整个文档网站已根据最新内容进行全面更新。同时，文档表述更加清晰，许多章节被精简，新增了更多示例，补充了遗漏的参数，并为 API 参考部分添加了图表等，还进行了其他多项改进。如今，网站加载速度提升了 130%，数据消耗更少，且对 SEO 更加友好。\n\n## 修复\n- 添加了 ","2026-01-01T20:07:13",{"id":204,"version":205,"summary_zh":206,"released_at":207},255454,"v0.3.12","## 变更内容\n- 为 `DynamicSession` 和 `AsyncDynamicSession` 类新增了一个名为 `timezone_id` 的参数，允许您设置浏览器的时区，使其与您使用的代理或 VPN 的时区一致。这样一来，网站就无法通过时区不匹配的方式来检测您是否在使用代理。\n- 改进了响应自动转换为 JSON 的功能。\n- 将 fetcher 会话类中的内部函数 `__create__` 重命名为 `start`，以便在 `with` 上下文之外更方便地使用这些类。\n- 更新了 `curl_cffi` 及其他依赖库至最新版本。\n\n_🙏 特别感谢我们的 [Discord 社区](https:\u002F\u002Fdiscord.gg\u002FEMgGbDceNQ)，感谢大家持续的测试与反馈_\n\n---\n\n### 向我们最大的赞助商致以热烈的祝贺\n\u003Ca href=\"https:\u002F\u002Fwww.scrapeless.com\u002Fen?utm_source=official&utm_term=scrapling\" target=\"_blank\" title=\"为企业和开发者提供的轻松网页抓取工具包\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fscrapeless.jpg\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fwww.thordata.com\u002F?ls=github&lk=github\" target=\"_blank\" title=\"不可拦截的代理和抓取基础设施，提供实时、可靠的网络数据，为 AI 模型和工作流提供支持。\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fthordata.jpg\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\" target=\"_blank\" title=\"Evomi 是您的瑞士品质代理提供商，流量资费低至每 GB 0.49 美元\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fevomi.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fserpapi.com\u002F?utm_source=scrapling\" target=\"_blank\" title=\"使用 SerpApi 抓取 Google 及其他搜索引擎的数据\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002FSerpApi.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fvisit.decodo.com\u002FDy6W0b\" target=\"_blank\" title=\"免费试用最高效的住宅代理\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fdecodo.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fpetrosky.io\u002Fd4vinci\" target=\"_blank\" title=\"PetroSky 提供尖端的 VPS 托管服务。\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fpetrosky.png\">\u003C\u002Fa>","2025-12-18T00:21:53",{"id":209,"version":210,"summary_zh":211,"released_at":212},255455,"v0.3.11","## What's Changed\r\n- Added a better logic for handling timeout errors when the `network_idle` argument is used on an unstable website (websites with media playing, etc.)\r\n- Fixed the autocompletion for the `stealthy_fetch` shortcut in the Web Scraping Shell\r\n\r\n_🙏 Special thanks to our [Discord community](https:\u002F\u002Fdiscord.gg\u002FEMgGbDceNQ) for all the continuous testing and feedback_\r\n\r\n---\r\n\r\n### Big shoutout to our biggest Sponsors\r\n\u003Ca href=\"https:\u002F\u002Fwww.scrapeless.com\u002Fen?utm_source=official&utm_term=scrapling\" target=\"_blank\" title=\"Effortless Web Scraping Toolkit for Business and Developers\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fscrapeless.jpg\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\" target=\"_blank\" title=\"Evomi is your Swiss Quality Proxy Provider, starting at $0.49\u002FGB\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fevomi.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fserpapi.com\u002F?utm_source=scrapling\" target=\"_blank\" title=\"Scrape Google and other search engines with SerpApi\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002FSerpApi.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fvisit.decodo.com\u002FDy6W0b\" target=\"_blank\" title=\"Try the Most Efficient Residential Proxies for Free\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fdecodo.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fpetrosky.io\u002Fd4vinci\" target=\"_blank\" title=\"PetroSky delivers cutting-edge VPS hosting.\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fpetrosky.png\">\u003C\u002Fa>","2025-12-03T01:53:02",{"id":214,"version":215,"summary_zh":216,"released_at":217},255456,"v0.3.10","**A maintenance update with many significant changes and possible breaking changes**\r\n\r\n- **Solved** all encoding issues by using a better approach which will handle web pages where encoding is not correctly declared (Thanks to @Kemsty2's efforts for pointing that out in [#110](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fissues\u002F110) [#111](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F111) )\r\n- **Solved** a logical issue with overriding session-level parameters with request-level parameters in all browser-based fetchers that was present since v0.3\r\n- **Fixed** the signatures of the shortcuts in the interactive web scraping shell, which made a perfect autocompletion experience for the shortcuts in the shell. This issue has been present since v0.3 as well.\r\n- **Pumped up** the version for the Maxmind database, which will improve the `geoip` argument for `StealthyFetcher` and its session classes.\r\n- **Updated** all used browser versions to the latest available ones.\r\n- **BREAKING** - all fetchers had gone through a big refactor, which resulted in some interesting things that might break your code:\r\n  1. Scrapling codebase is now smaller by ~750 lines and many changes which would make maintenance very much easier in the future and use a bit less resources.\r\n  2. The validation for all fetchers and their session classes became much faster, which will reflect on their overall speed.\r\n  3. To achieve this, now all fetchers can't accept standard arguments other than the `url` argument; the rest of the arguments must be keyword-arguments so your code must be like `Fetcher.get('https:\u002F\u002Fgoogle.com', stealthy_headers=True)` not `Fetcher.get('https:\u002F\u002Fgoogle.com', True)` if you were doing that for some reason!\r\n  4. An annoying difference between browser-based fetchers and their session classes since v0.3 was that the argument used to pass custom parser settings per request was called `custom_config`, while it was named `selector_config` in the session classes. This refactor allowed us to unify the naming to `selector_config` without breaking your code, so the main one is now `selector_config` with backward compatibility for the `custom_config` argument. The autocompletion support will be available only for the `selector_config` argument.\r\n  5. Also, to achieve all of this, we had to make the type hints of the fetchers' functions dynamically generated, so if you don't get a proper autocompletion in your IDE, make sure you are using a modern version of it. We have tested almost all known IDEs\u002Feditors.\r\n\r\n> We have also updated all benchmark tables with the current numbers against the latest versions of all alternative libraries.\r\n\r\n_🙏 Special thanks to our [Discord community](https:\u002F\u002Fdiscord.gg\u002FEMgGbDceNQ) for all the continuous testing and feedback_\r\n\r\n---\r\n\r\n### Big shoutout to our biggest Sponsors\r\n\u003Ca href=\"https:\u002F\u002Fwww.scrapeless.com\u002Fen?utm_source=official&utm_term=scrapling\" target=\"_blank\" title=\"Effortless Web Scraping Toolkit for Business and Developers\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fscrapeless.jpg\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\" target=\"_blank\" title=\"Evomi is your Swiss Quality Proxy Provider, starting at $0.49\u002FGB\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fevomi.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fserpapi.com\u002F?utm_source=scrapling\" target=\"_blank\" title=\"Scrape Google and other search engines with SerpApi\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002FSerpApi.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fvisit.decodo.com\u002FDy6W0b\" target=\"_blank\" title=\"Try the Most Efficient Residential Proxies for Free\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fdecodo.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fpetrosky.io\u002Fd4vinci\" target=\"_blank\" title=\"PetroSky delivers cutting-edge VPS hosting.\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fpetrosky.png\">\u003C\u002Fa>","2025-11-26T17:49:35",{"id":219,"version":220,"summary_zh":221,"released_at":222},255457,"v0.3.9","**A new update with many important changes**\r\n\r\n## 🚀 New Stuff and quality of life changes\r\n- Now the `impersonate` argument in `Fetcher` and `FetcherSession` can accept a list of browsers that the library will choose a random browser from them with each request.\r\n```python\r\nfrom scrapling.fetchers import FetcherSession\r\n\r\nwith FetcherSession(impersonate=['chrome', 'firefox', 'safari']) as s:\r\n  s.get('https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling')\r\n```\r\n- A new argument to the `clean` method in `TextHandler` to remove html entities from the current text easily.\r\n- Huge improvements to the documentation with more precise explanations of many parts and automatic translations of the main `README.md` file.\r\n\r\n## 🐛 Bug Fixes\r\n- Fixed a big issue with retrieving responses from browser-based fetchers. Now, there is intelligent content type detection that ensures `response.body` contains the rendered browser content only if the content is HTML; otherwise, it contains the raw content of the last request made. This allows you to download binary files and text-based files without having to find them wrapped in HTML tags, while being able to retrieve the rendered content you want from the website when fetching it.\r\n\r\n## 🔨 Misc\r\n- Updated the contributing guide to make it clearer and easier.\r\n- Add a new workflow to enforce code quality tools (Same ones used as pre-commit hooks).\r\n\r\n_🙏 Special thanks to our [Discord community](https:\u002F\u002Fdiscord.gg\u002FEMgGbDceNQ) for all the continuous testing and feedback_\r\n\r\n---\r\n\r\n### Big shoutout to our biggest Sponsors\r\n\u003Ca href=\"https:\u002F\u002Fwww.scrapeless.com\u002Fen?utm_source=official&utm_term=scrapling\" target=\"_blank\" title=\"Effortless Web Scraping Toolkit for Business and Developers\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fscrapeless.jpg\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\" target=\"_blank\" title=\"Evomi is your Swiss Quality Proxy Provider, starting at $0.49\u002FGB\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fevomi.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fvisit.decodo.com\u002FDy6W0b\" target=\"_blank\" title=\"Try the Most Efficient Residential Proxies for Free\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fdecodo.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fpetrosky.io\u002Fd4vinci\" target=\"_blank\" title=\"PetroSky delivers cutting-edge VPS hosting.\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fpetrosky.png\">\u003C\u002Fa>","2025-11-17T01:38:58",{"id":224,"version":225,"summary_zh":226,"released_at":227},255458,"v0.3.8","**A new update with many important changes**\r\n\r\n# 🚀 New Stuff and quality of life changes\r\n- For all browser-based fetchers: websites that never finish loading their requests won't crash the code now if you used `network_idle` with them.\r\n- The logic for collecting\u002Fchecking for page content in browser-based fetchers has been changed to make browsers more stable on Windows systems now, as Linux\u002FMacOS (All this difference in behaviour is because of Playwright's different implementation on Windows systems).\r\n- Refactored all the validation logic, which made all requests done from all browser-based fetchers faster by 8-15%\r\n- A New option called `extra_flags` has been added to `DynamicFetcher` and its session to allow users to add custom Chrome flags to the existing ones while launching the browser.\r\n- Reverted the route logic for catching responses (changed in the last version) to use the old routing version when `page_action` is used. This was added to collect the latest version of a page's content in case `page_action` changes it without making a request. (Thanks for @gembleman to pointing it in [#100](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fissues\u002F100) and [#102](https:\u002F\u002Fgithub.com\u002FD4Vinci\u002FScrapling\u002Fpull\u002F102) )\r\n\r\n# 🐛 Bug Fixes\r\n- Fixed a typo in `load_dom` in DynamicSession's async_fetch\r\n- Fixed an issue with Cloudflare solver that made the solver wait forever for embedded captchas that don't disappear after solving. Now it will wait for the captcha to disappear for 30 seconds, then assume it's the type that doesn't disappear (Fixes #100 )\r\n\r\n# 🔨 Misc\r\n- Now the Docker image is automatically pushed to Dockerhub and GitHub's container registry for user convenience.\r\n- Added a [new documentation page](https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Fexternal\u002F) to show how to use Scrapeless browser with Scrapling.\r\n\r\n_🙏 Special thanks to our [Discord community](https:\u002F\u002Fdiscord.gg\u002FEMgGbDceNQ) for all the continuous testing and feedback_\r\n\r\n---\r\n\r\n### Big shoutout to our biggest Sponsors\r\n\u003Ca href=\"https:\u002F\u002Fwww.scrapeless.com\u002Fen?utm_source=official&utm_term=scrapling\" target=\"_blank\" title=\"Effortless Web Scraping Toolkit for Business and Developers\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fscrapeless.jpg\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\" target=\"_blank\" title=\"Evomi is your Swiss Quality Proxy Provider, starting at $0.49\u002FGB\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fevomi.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fvisit.decodo.com\u002FDy6W0b\" target=\"_blank\" title=\"Try the Most Efficient Residential Proxies for Free\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fdecodo.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fpetrosky.io\u002Fd4vinci\" target=\"_blank\" title=\"PetroSky delivers cutting-edge VPS hosting.\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fpetrosky.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fapp.cyberyozh.com\u002F?utm_source=github&utm_medium=scrapling\" target=\"_blank\" title=\"We have gathered the best solutions for multi‑accounting and automation in one place.\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fcyberyozh.png\">\u003C\u002Fa>","2025-10-27T15:08:38",{"id":229,"version":230,"summary_zh":231,"released_at":232},255459,"v0.3.7","**A new update with many important changes**\r\n\r\n# 🚀 New Stuff and quality of life changes\r\n- Reworked `solve_cloudflare` argument in `StealthyFetcher` to make it able to solve all kinds of custom implementations of Turnstile.\r\n- Refactored the entire codebase to be acceptable by Pyright, so expect a flawless IDE experience now with all software and many bugs solved.\r\n- Refactored the requests logic to be cleaner and faster (Also solves #97 )\r\n- Added a new option `user_data_dir` to all browser-based session classes to allow the user to reuse the browser session data (cookies\u002Fstorage\u002Fetc...) from previous sessions. Leaving it will cause Playwright to use a random directory on each run, as was happening before.\r\n- Added a new customization option `additional_args` to `Dynamic fetcher` and its session class to enable the user to pass extra arguments to Playwright's context, as we had with `StealthyFetcher` before.\r\n- The route logic for collecting the last navigation response for all browsers has been improved, which allows the raw responses to be passed to the parser before being processed by the browsers as before. This will be very helpful with text\u002FJSON responses.\r\n\r\n# 🐛 Bug Fixes\r\n- The rework of the route logic solved an issue with retrieving the content of unstable websites on some Windows devices.\r\n- All the refactors that happened in this version solved a lot of bugs along the way that were hard to spot before, and weird autocompletion issues with some IDEs.\r\n- Many fixes to the documentation website\r\n\r\n_🙏 Special thanks to our [Discord community](https:\u002F\u002Fdiscord.gg\u002FEMgGbDceNQ) for all the continuous testing and feedback_\r\n\r\n---\r\n\r\n### Big shoutout to our biggest Sponsors\r\n\u003Ca href=\"https:\u002F\u002Fwww.thordata.com\u002F?ls=github&lk=D4Vinci\" target=\"_blank\" title=\"A global network of over 60M+ residential proxies with 99.7% availability, ensuring stable and reliable web data scraping to support AI, BI, and workflows.\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fthordata.jpg\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\" target=\"_blank\" title=\"Evomi is your Swiss Quality Proxy Provider, starting at $0.49\u002FGB\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fevomi.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fvisit.decodo.com\u002FDy6W0b\" target=\"_blank\" title=\"Try the Most Efficient Residential Proxies for Free\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fdecodo.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fpetrosky.io\u002Fd4vinci\" target=\"_blank\" title=\"PetroSky delivers cutting-edge VPS hosting.\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fpetrosky.png\">\u003C\u002Fa>","2025-10-12T04:35:58",{"id":234,"version":235,"summary_zh":236,"released_at":237},255460,"v0.3.6","# 🚀 New Stuff\r\n- Improved the `solve_cloudflare` argument in `StealthyFetcher` and its session classes to be able to solve all types of both Turnstile and interstitial Cloudflare challenges 🎉 \r\n- Now the MCP server has the option to use `Streamable HTTP`, so you can easily expose the server.\r\n- Added Docker support, so now an image is built and pushed to Docker Hub automatically with each release (contains all browsers)\r\n\r\n# 🐛 Bug Fixes\r\n- Fixed an encoding issue with the parser that happened in some cases (the famous `invalid start byte` error)\r\n- Restructured multiple parts of the library to fix some memory leaks, so now enjoy noticably lower memory usage based on your config (Also solves #92 )\r\n- Improved type annotation in many parts of the code so you can have a better IDE experience (Also solves #93 )\r\n\r\n_🙏 Special thanks to our [Discord community](https:\u002F\u002Fdiscord.gg\u002FEMgGbDceNQ) for all the continuous testing and feedback_\r\n\r\n---\r\n\r\n### Big shoutout to our biggest Sponsors\r\n\u003Ca href=\"https:\u002F\u002Fwww.thordata.com\u002F?ls=github&lk=D4Vinci\" target=\"_blank\" title=\"A global network of over 60M+ residential proxies with 99.7% availability, ensuring stable and reliable web data scraping to support AI, BI, and workflows.\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fthordata.jpg\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\" target=\"_blank\" title=\"Evomi is your Swiss Quality Proxy Provider, starting at $0.49\u002FGB\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fevomi.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fvisit.decodo.com\u002FDy6W0b\" target=\"_blank\" title=\"Try the Most Efficient Residential Proxies for Free\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fdecodo.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fpetrosky.io\u002Fd4vinci\" target=\"_blank\" title=\"PetroSky delivers cutting-edge VPS hosting.\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fpetrosky.png\">\u003C\u002Fa>","2025-10-01T03:40:14",{"id":239,"version":240,"summary_zh":241,"released_at":242},255461,"v0.3.5","**Necessary release that fixes multiple issues**\r\n\r\n# 🚀 New Stuff\r\n- All browser-based fetchers (`DynamicFetcher`\u002F`StealthyFetcher`\u002F...) and their session classes are now fetching websites 15-20%:\r\n  1. Page management is now much faster due to the logic improvement by @AbdullahY36 in #87\r\n  2. Optimized the validation logic overall and improved page creation for sync fetches, which together introduced a lot of speed improvements\r\n\r\n- Big improvements to the stealth mode in `DynamicFetcher` and its session classes by replacing `rebrowser-playwright` with `PatchRight`:\r\n  1. Before this update, `rebrowser-playwright` was turned off when you enabled `stealth` and `real_chrome` because they weren't doing well together, but now we don't have this issue with `PatchRight`\r\n  2. You can now interact with Closed-Shadow Roots since `PatchRight` can handle them automatically.\r\n\r\n# 🐛 Bug Fixes\r\n- Fixed a bug that happens while using the `re` method from the `Selectors` class.\r\n- Fixed a bug with `uncurl` and `curl2fetcher` commands in the Web Scraping Shell that made curl's `--data-raw` flag parse incorrectly.\r\n- Fixed a bug with the `view` command in the Web Scraping Shell that depended on the website's encoding to happen.\r\n- Fixed a bug with content converting that affected the `mcp` mode and `extract` commands.\r\n\r\n# New Contributors\r\n- @AbdullahY36 made their first contribution in #87\r\n\r\n\r\n_🙏 Special thanks to our [Discord community](https:\u002F\u002Fdiscord.gg\u002FEMgGbDceNQ) for all the continuous testing and feedback_\r\n\r\n---\r\n\r\n### Big shoutout to our biggest Sponsors\r\n\u003Ca href=\"https:\u002F\u002Fwww.thordata.com\u002F?ls=github&lk=D4Vinci\" target=\"_blank\" title=\"A global network of over 60M+ residential proxies with 99.7% availability, ensuring stable and reliable web data scraping to support AI, BI, and workflows.\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fthordata.jpg\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\" target=\"_blank\" title=\"Evomi is your Swiss Quality Proxy Provider, starting at $0.49\u002FGB\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fevomi.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fvisit.decodo.com\u002FDy6W0b\" target=\"_blank\" title=\"Try the Most Efficient Residential Proxies for Free\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fdecodo.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fpetrosky.io\u002Fd4vinci\" target=\"_blank\" title=\"PetroSky delivers cutting-edge VPS hosting.\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fpetrosky.png\">\u003C\u002Fa>","2025-09-20T12:57:19",{"id":244,"version":245,"summary_zh":246,"released_at":247},255462,"v0.3.4","**Necessary release that fixes multiple issues**\r\n\r\n# 🚀 New Stuff\r\n- Added all the fetchers session classes to the interactive shell to be available right away without import.\r\n\r\n# 🐛 Bug Fixes\r\n- Added a workaround for a bug with the Playwright API on Windows that happened while retrieving content while solving Cloudflare.\r\n- Fixed an encoding issue with the `view` command in the interactive shell\r\n- Fixed a bug with the `max_pages` argument in `AsyncStealthySession` that was crashing the code.\r\n- Fixed an issue that happened with the last updates that made the `html_content` and `prettify` properties in the `Selector` class return bytes, depending on the encoding. Both are returning strings as they were.\r\n\r\n_🙏 Special thanks to our [Discord community](https:\u002F\u002Fdiscord.gg\u002FEMgGbDceNQ) for all the continuous testing and feedback_\r\n\r\n---\r\n\r\n### Big shoutout to our biggest Sponsors\r\n\u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\" target=\"_blank\" title=\"Evomi is your Swiss Quality Proxy Provider, starting at $0.49\u002FGB\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fevomi.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fpetrosky.io\u002Fd4vinci\" target=\"_blank\" title=\"PetroSky delivers cutting-edge VPS hosting.\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fpetrosky.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fvisit.decodo.com\u002FDy6W0b\" target=\"_blank\" title=\"Try the Most Efficient Residential Proxies for Free\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fdecodo.png\">\u003C\u002Fa>","2025-09-16T12:50:34",{"id":249,"version":250,"summary_zh":251,"released_at":252},255463,"v0.3.3","- Removed the logic that is removing the default browser tab on browser-based fetchers since it caused a crashing error (Not happening on Mac, only managed to produce on Windows and Linux)\r\n\r\n---\r\n\r\n### Big shoutout to our biggest Sponsors\r\n\u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\" target=\"_blank\" title=\"Evomi is your Swiss Quality Proxy Provider, starting at $0.49\u002FGB\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fevomi.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fpetrosky.io\u002Fd4vinci\" target=\"_blank\" title=\"PetroSky delivers cutting-edge VPS hosting.\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fpetrosky.png\">\u003C\u002Fa>","2025-09-15T11:06:36",{"id":254,"version":255,"summary_zh":256,"released_at":257},255464,"v0.3.2","Release Notes for v0.3.2\r\n\r\n## 🚀 New Stuff\r\n\r\n- **Optional fetcher dependencies**: All fetchers are now part of optional dependency groups, reducing core package size. So the base `scrapling` module is now the parser only, and to use the fetchers or the commandline options, you have to do: `pip install \"scrapling[fetchers]\"`. Check out the detailed installation instructions from [here](https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002F#installation)\r\n- **Per-page configuration in sessions**: Session classes for browser fetchers now support individual configuration per page in sessions. All fetch-level parameters are now validated like session-level ones. More details on the documentation website [here](https:\u002F\u002Fscrapling.readthedocs.io\u002Fen\u002Flatest\u002Ffetching\u002Fdynamic\u002F#full-list-of-arguments)\r\n    \u003Cbr>Example:\r\n    ```python\r\n    with StealthySession(headless=True, solve_cloudflare=True) as session:\r\n        page = session.fetch('https:\u002F\u002Fnopecha.com\u002Fdemo\u002Fcloudflare', google_search=False)\r\n    ```\r\n- **Improved browser-based fetchers**\r\n  - A new option to control whether to wait for JavaScript execution to finish in pages or not (it's enabled by default now, as it was before)\r\n    ```python\r\n    with DynamicSession(headless=True, disable_resources=False, network_idle=True) as session:\r\n       page = session.fetch('https:\u002F\u002Fquotes.toscrape.com\u002F', load_dom=False)\r\n    ```\r\n  - The Stealth mode is now more reliable in `DynamicFetcher` and its session classes.\r\n  - Both `DynamicFetcher` and `StealthyFetcher` are now using fewer resources (Automatically finding and closing the default tab opened by Persistent contexts in Playwright API)\r\n  - Fixed a vital logic bug in browser-based fetchers' pages rotation - previous pages are now replaced with fresh ones. (Tabs that get reused in rotation are possibly contaminated from previous settings used on them)\r\n  - `StealthyFetcher` and its session classes are now slightly faster (5%)\r\n\r\n- **Enhanced `.body` property**: Now returns the passed content as-is without processing, enabling file downloads and handling non-HTML requests. Below is an example of downloading a photo:\r\n    ```python\r\n    from scrapling.fetchers import Fetcher\r\n    \r\n   page = Fetcher.get('https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fposter.png')\r\n   with open(file='poster.png', mode='wb') as f:\r\n       f.write(page.body)\r\n    ```\r\n\r\n## 🐛 Bug Fixes\r\n\r\n- **Encoding issues resolved**: Fixed multiple encoding problems that happened with some websites in parser, mcp mode, and extract commands (Also solves #80 and #81)\r\n- **Faster parsing**: Due to many changes here and there, the library is now faster, and it's reflected in the updated benchmarks\r\n\r\n## 🔨 Misc\r\n\r\n- **Updated benchmarks**: Refreshed performance benchmarks to compare the current speed improvements to the latest versions of similar libraries\r\n- **Refactored a lot of the code and replaced dead code with better implementations**: Fewer code, cleaner code, easier maintenance\r\n- **Added YouTube video**: Included video content for MCP documentation.\r\n- **A new issues template**: Easy new template for users who can't use the current templates.\r\n- **CI workflow optimization**: Tests workflow now skips runs when only documentation or non-code files are changed.\r\n- **Updated dependencies**: Bumped up various dependencies to the latest versions.\r\n- **Code style improvements**: Applied new ruff rules across all files.\r\n- **Pre-commit hooks**: Updated pre-commit configuration.\r\n\r\n## 🎯 Breaking Changes\r\n\r\n- Removed `max_pages` parameter from sync `StealthySession` to match `DynamicSession` (it's meaningless to have in the sync version)\r\n\r\n\r\n_🙏 Special thanks to our [Discord community](https:\u002F\u002Fdiscord.gg\u002FEMgGbDceNQ) for all the continuous testing and feedback_\r\n\r\n---\r\n\r\n### Big shoutout to our biggest Sponsors\r\n\u003Ca href=\"https:\u002F\u002Fevomi.com?utm_source=github&utm_medium=banner&utm_campaign=d4vinci-scrapling\" target=\"_blank\" title=\"Evomi is your Swiss Quality Proxy Provider, starting at $0.49\u002FGB\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fevomi.png\">\u003C\u002Fa>\u003Ca href=\"https:\u002F\u002Fpetrosky.io\u002Fd4vinci\" target=\"_blank\" title=\"PetroSky delivers cutting-edge VPS hosting.\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FD4Vinci\u002FScrapling\u002Fmain\u002Fimages\u002Fpetrosky.png\">\u003C\u002Fa>","2025-09-15T01:29:04"]