[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-totalgood--nlpia":3,"tool-totalgood--nlpia":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",150037,2,"2026-04-10T23:33:47",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":76,"owner_website":78,"owner_url":79,"languages":80,"stars":104,"forks":105,"last_commit_at":106,"license":107,"difficulty_score":10,"env_os":108,"env_gpu":109,"env_ram":109,"env_deps":110,"category_tags":121,"github_topics":123,"view_count":10,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":133,"updated_at":134,"faqs":135,"releases":166},3766,"totalgood\u002Fnlpia","nlpia","Examples and libraries for \"Natural Language Processing in Action\" book","nlpia 是开源书籍《Natural Language Processing in Action》的配套代码库，旨在为开发者提供构建自然语言处理（NLP）管道的实用示例与工具集。它主要解决了学习者在将 NLP 理论转化为实际代码时面临的“落地难”问题，通过提供经过验证的代码片段和完整项目案例，帮助用户快速搭建能够解决实际问题的 NLP 系统。\n\n该工具特别适合有一定编程基础的数据科学家、AI 工程师以及希望深入掌握 NLP 实战技术的科研人员。与普通教程不同，nlpia 不仅关注算法本身，更强调构建“具有社会责任感”的 NLP 管道，鼓励开发者思考技术如何回馈社区。其技术亮点在于提供了从环境配置（支持 Anaconda、Git Bash 等多平台）到模型部署的全流程指导，并集成了多种主流 NLP 库的协同用法。无论是想入门 NLP 的新手，还是寻求生产级代码参考的资深开发者，都能从中获得直观的实战经验，轻松上手复杂的语言处理任务。","[![Build Status](https:\u002F\u002Fapi.travis-ci.com\u002Ftotalgood\u002Fnlpia.svg?branch=master)](https:\u002F\u002Ftravis-ci.com\u002Ftotalgood\u002Fnlpia)\n[![Coverage](https:\u002F\u002Fcodecov.io\u002Fgh\u002Ftotalgood\u002Fnlpia\u002Fbranch\u002Fmaster\u002Fgraph\u002Fbadge.svg)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Ftotalgood\u002Fnlpia)\n[![GitHub release](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease\u002Ftotalgood\u002Fnlpia.svg)](https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Freleases\u002Flatest)\n[![PyPI version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fnlpia.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fnlpia\u002F)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fl\u002Fnlpia.svg)](https:\u002F\u002Fpypi.python.org\u002Fpypi\u002Fnlpia\u002F)\n\n# NLPIA\n\nCommunity-driven code for the book [**N**atural **L**anguage **P**rocessing **i**n **A**ction](http:\u002F\u002Fbit.ly\u002Fgh-readme-nlpia-book).\n\n## Description\n\nA community-developed book about building socially responsible NLP pipelines that give back to the communities they interact with.\n\n## Getting Started\n\nYou'll need a bash shell on your machine.\n[Git](https:\u002F\u002Fgit-scm.com\u002Fdownloads) has installers that include bash shell for all three major OSes.\n\nOnce you have Git installed, launch a bash terminal.\nIt will usually be found among your other applications with the name `git-bash`.\n\n\n### Step 1. Install [Anaconda3](https:\u002F\u002Fdocs.anaconda.com\u002Fanaconda\u002Finstall\u002F)\n\n* [Linux](https:\u002F\u002Frepo.anaconda.com\u002Farchive\u002FAnaconda3-5.2.0-Linux-x86_64.sh)\n* [MacOSX](https:\u002F\u002Frepo.anaconda.com\u002Farchive\u002FAnaconda3-5.2.0-MacOSX-x86_64.pkg)\n* [Windows](https:\u002F\u002Frepo.anaconda.com\u002Farchive\u002FAnaconda3-5.2.0-Windows-x86_64.exe)\n\nIf you're installing Anaconda3 using a GUI, be sure to check the box that updates your PATH variable.\nAlso, at the end, the Anaconda3 installer will ask if you want to install VSCode.\nMicrosoft's VSCode is a decent Python editor\u002Flinter if you're willing to send your data to Microsoft to enable all the linting features.\n\nSome of us prefer Sublime Text 3 to the open source IDEs like Atom and VSCode. In Sublime you can get complete linting and spellchecking and auto-delinters for free, even in offline mode (no intrusive data slurping or EULA).\n\n### Step 2. Install an Editor\n\nYou can skip this step if you are happy using `jupyter notebook` or `VSCode` or `Spyder` (built into Anaconda).\n\nI like [Sublime Text](https:\u002F\u002Fwww.sublimetext.com\u002F3).\nIt's a lot cleaner and more mature than the alternatives.\nPlus it has more plugins written by individual developers like you.\n\n### Step 3. Install Git and Bash\n\n* Linux -- already installed\n* MacOSX -- already installed\n* [Windows](https:\u002F\u002Fgit-scm.com\u002Fdownloads)\n\nIf you're on Linux or Mac OS, you're good to go. Just figure out how to launch a terminal and make sure you can run `ipython` or `jupyter notebook` in it. This is where you'll play around with your own NLP pipeline.\n\n#### Windows\n\nOn Windows you have a bit more work to do. Supposedly Windows 10 will let you install Ubuntu with a terminal and bash. But the terminal and shell that comes with [`git`](https:\u002F\u002Fgit-scm.com\u002Fdownloads) is probably a safer bet. It's maintained by a broader open source community.\n\nYou need to make sure your `PATH` variable includes a path to `conda`, `python` and other command line apps installed by Anaconda. This can sometimes be set with something like this:\n\n```\necho \"PATH=$HOME\u002FAnaconda3\u002Fbin:$PATH\" >> ~\u002F.bashrc\n```\n\nor\n\n```bash\necho \"PATH=\u002Fc\u002FUsers\u002F$USER\u002FAppData\u002FLocal\u002FContinuum\u002FAnaconda3\u002F:$PATH\" >> ~\u002F.bashrc\n```\n\nYou'll need to make sure you new MINGW64 terminal is launched with `winpty` to trick windows into treating the `MINGW64` terminal (git-bash) like a standards-compliant TTY terminal application. So add these aliases to your `~\u002F.bashrc` from within your git-bash terminal:\n\n```bash\necho \"alias python='winpty python'\" >> ~\u002F.bashrc\necho \"alias jupyter='winpty jupyter'\" >> ~\u002F.bashrc\necho \"alias ipython='winpty ipython'\" >> ~\u002F.bashrc\n```\n\n\n### Step 4. Clone this repository\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia.git\n```\n\n### Step 5. Install `nlpia`\n\nYou have two alternative package managers you can use to install `nlpia`:\n\n5.1. [`conda`](https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fblob\u002Fmaster\u002FREADME.md#alternative-51-conda)\n5.2. [`pip`](https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fblob\u002Fmaster\u002FREADME.md#alternative-52-pip)\n\nA helpful [NLPIA](http:\u002F\u002Fbit.ly\u002Fgh-readme-nlpia-book) reader, [Hoang Chung Hien](https:\u002F\u002Fgithub.com\u002Fhoangchunghien), created a Dockerfile you can use for a third way to manage your environment:\n\n5.3. [`docker`](https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fblob\u002Fmaster\u002FREADME.md#alternative-53-docker)\n\nIn most cases, `conda` will be able to install python packages faster and more reliably than `pip`. Without `conda` Some packages, such as `python-levenshtein`, require you to compile a C library during installation. Windows doesn't have a a compiler and python package installer that will \"just work.\"\n\n#### Alternative 5.1. `conda`\n\nUse `conda` (from the Anaconda3 package that you installed in Step 1 above) to create an environment called `nlpiaenv`:\n\n```bash\ncd nlpia  # make sure you're in the nlpia directory that contains `setup.py`\nconda env create -n nlpiaenv -f conda\u002Fenvironment.yml\nconda install -y pip  # to get the latest version of pip\nconda activate nlpiaenv\npip install -e .\n```\n\nWhenever you want to be able to import or run any `nlpia` modules, you'll need to activate this conda environment first:\n\n```bash\n$ conda activate nlpiaenv\n```\n\nOn **Windows** CMD prompt (Anaconda Prompt in Applications) there is no source command so:\n\n```dos\nC:\\ activate nlpiaenv\n```\n\nNow make sure you can import nlpia with:\n\n```bash\npython -c \"print(import nlpia)\"\n```\n\nSkip to Step 6 (\"Have fun!\") if you have successfully created and activated an environment containing the `nlpia` package and its dependencies.\n\n#### Alternative 5.2. `pip`\n\nYou can try this first, if you're feeling lucky:\n\n```bash\ncd nlpia\npip install --upgrade pip\npip install -e .\n```\n\nOr if you don't think you'll be editing any of the source code for nlpia and you don't want to contribute to the community here you can just:\n\n```bash\npip install nlpia\n```\n\nLinux-based OSes like Ubuntu and OSX come with C++ compilers built-in, so you may be able to install the dependencies using pip instead of `conda`.\nBut if you're on Windows and you want to install packages, like `python-levenshtein` that need compiled C++ libraries, you'll need a compiler.\nFortunately Microsoft still lets you [download a compiler for free](https:\u002F\u002Fwiki.python.org\u002Fmoin\u002FWindowsCompilers#Microsoft_Visual_C.2B-.2B-_14.0_standalone:_Visual_C.2B-.2B-_Build_Tools_2015_.28x86.2C_x64.2C_ARM.29), just make sure you follow the links to the Visual Studio \"Build Tools\" and not the entire Visual Studio package.\n\nOnce you have a C\u002FC++ compiler and the python source code files, you can install `nlpia` using pip:\n\n```bash\ncd nlpia  # make sure you're in the nlpia directory that contains `setup.py`\npip install --upgrade pip\nmkvirtualenv nlpiaenv\nsource nlpiaenv\u002Fbin\u002Factivate\npip install -r requirements-test.txt\npip install -e .\npip install -r requirements-deep.txt\n```\n\nThe chatbots(including TTS and STT audio drivers) that come with `nlpia` may not be compatible with Windows due to problems installing `pycrypto`.\nIf you are on a Linux or Darwin(Mac OSX) system or want to try to help us debug the pycrypto problem feel free to install the chatbot requirements:\n\n```bash\n# pip install -r requirements-chat.txt\n# pip install -r requirements-voice.txt\n```\n\n## Alternative 5.3. `docker`\n\n### 5.3.1. Build your image\n\nThis might take a few minutes to download the jupyter docker image:\n\n```bash\ndocker build -t nlpia .\n```\n\n### 5.3.2. Run your image\n\n- `docker run -p 8888:8888 nlpia`\n- Copy the `token` obtained from the run log\n- Open Browser and use the link `http:\u002F\u002Flocalhost:8888\u002F?token=...`\n\n### 5.3.3. Play around\n\nIf you want to keep your notebook file or share a folder with the running container then use:\n\n```bash\ndocker run -p 8888:8888 -v ~:\u002Fhome\u002Fjovyan\u002Fwork nlpia\n```\n\nThen open a new notebook and test your code. Make sure save it inside `work` directory so it's accessible outside the container.\n\n### 6. Have Fun!\n\nCheck out the code examples from the book in `nlpia\u002Fnlpia\u002Fbook\u002Fexamples` to get ideas:\n\n```bash\ncd nlpia\u002Fbook\u002Fexamples\nls\n```\n\nHelp other NLP practitioners by contributing your code and knowledge.\n\nBelow are some nlpia feature ideas others might find handy. Contribute your own ideas to https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fissues .\n\n#### 6.1. Feature 1: Glossary Compiler\n\nSkeleton code and APIs that could be added to the https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fblob\u002Fmaster\u002Fsrc\u002Fnlpia\u002Ftranscoders.py:`transcoders.py` module.\n\n\n```python\ndef find_acronym(text):\n    \"\"\"Find parenthetical noun phrases in a sentence and return the acronym\u002Fabbreviation\u002Fterm as a pair of strings.\n\n    >>> find_acronym('Support Vector Machine (SVM) are a great tool.')\n    ('SVM', 'Support Vector Machine')\n    \"\"\"\n    return (abbreviation, noun_phrase)\n```\n\n```python\ndef glossary_from_dict(dict, format='asciidoc'):\n    \"\"\" Given a dict of word\u002Facronym: definition compose a Glossary string in ASCIIDOC format \"\"\"\n    return text\n```\n\n```python\ndef glossary_from_file(path, format='asciidoc'):\n    \"\"\" Given an asciidoc file path compose a Glossary string in ASCIIDOC format \"\"\"\n    return text\n\n\ndef glossary_from_dir(path, format='asciidoc'):\n    \"\"\" Given an path to a directory of asciidoc files compose a Glossary string in ASCIIDOC format \"\"\"\n    return text\n```\n\n#### 6.2. Feature 2: Semantic Search\n\nUse a parser to extract only natural language sentences and headings\u002Ftitles from a list of lines\u002Fsentences from an asciidoc book like \"Natural Language Processing in Action\".\nUse a sentence segmenter in https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fblob\u002Fmaster\u002Fsrc\u002Fnlpia\u002Ftranscoders.py:[nlpia.transcoders] to split a book, like _NLPIA_, into a seequence of sentences.\n\n#### 6.3. Feature 3: Semantic Spectrograms\n\nA sequence of word vectors or topic vectors forms a 2D array or matrix which can be displayed as an image. I used `word2vec` (`nlpia.loaders.get_data('word2vec')`) to embed the words in the last four paragraphs of Chapter 1 in NLPIA and it produced a spectrogram that was a lot noisier than I expected. Nonetheless stripes and blotches of meaning are clearly visible.\n\nFirst, the imports:\n\n```python\n>>> from nlpia.loaders import get_data\n>>> from nltk.tokenize import casual_tokenize\n>>> from matplotlib import pyplot as plt\n>>> import seaborn\n```\n\nFirst get the raw text and tokenize it:\n\n```python\n>>> lines = get_data('ch1_conclusion')\n>>> txt = \"\\n\".join(lines)\n>>> tokens = casual_tokenize(txt)\n>>> tokens[-10:]\n['you',\n 'accomplish',\n 'your',\n 'goals',\n 'in',\n 'business',\n 'and',\n 'in',\n 'life',\n '.']\n```\n\nThen you'll have to download a word vector model like word2vec:\n\n```python\n>>> wv = get_data('w2v')  # this could take several minutes\n>>> wordvectors = np.array([wv[tok] for tok in tokens if tok in wv])\n>>> wordvectors.shape\n(307, 300)\n```\n\nNow you can display your 307x300 spectrogram or \"wordogram\":\n\n```python\n>>> plt.imshow(wordvectors)\n>>> plt.show()\n```\n\n[![307x300 spectrogram or \"wordogram\"](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftotalgood_nlpia_readme_aa368edd7f1f.png)](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftotalgood_nlpia_readme_aa368edd7f1f.png)\n\nCan you think of some image processing or deep learning algorithms you could run on images of natural language text?\n\nOnce you've mastered word vectors you can play around with Google's Universal Sentence Encoder and create spectrograms of entire books.\n\n#### 6.4. Feature 4: Build your own Sequence-to-Sequence translator\n\nIf you have pairs of statements or words in two languages, you can build a sequence-to-sequence translator.  You could even design your own language like you did in gradeschool with piglatin or build yourself a L337 translator.\n\nOr you could create a universal sentence embedding using `dfs = [get_data(lang) for lang in nlpia.loaders.ANKI_LANGUAGES]` and then replacing the movie character chatbot dataset in Chapter 10 with these translation pairs, one at a time. Start with a fresh clean decoder for each new language. That way you'll have a separate decoder that you can use to translate into any language. But you want to reuse the encoder so that you end up with a Universal thought vector for encoding English sentences. This will be similar to Google's Unversal Sentence Encoding, but yours will be character-based so it can handle mispelled English words.\n\n#### Other Ideas\n\nThere are a lot more project ideas mentioned in the \"Resources\" section at the end of the NLPIA Book. Here's an early draft of [that resource list](https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fblob\u002Fmaster\u002Fsrc\u002Fnlpia\u002Fdata\u002Fbook\u002FAppendix%20E%20--%20Resources.asc.md).\n\n\n\n","[![构建状态](https:\u002F\u002Fapi.travis-ci.com\u002Ftotalgood\u002Fnlpia.svg?branch=master)](https:\u002F\u002Ftravis-ci.com\u002Ftotalgood\u002Fnlpia)\n[![覆盖率](https:\u002F\u002Fcodecov.io\u002Fgh\u002Ftotalgood\u002Fnlpia\u002Fbranch\u002Fmaster\u002Fgraph\u002Fbadge.svg)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Ftotalgood\u002Fnlpia)\n[![GitHub发布](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease\u002Ftotalgood\u002Fnlpia.svg)](https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Freleases\u002Flatest)\n[![PyPI版本](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fnlpia.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fnlpia\u002F)\n[![许可证](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fl\u002Fnlpia.svg)](https:\u002F\u002Fpypi.python.org\u002Fpypi\u002Fnlpia\u002F)\n\n# NLPIA\n\n由社区驱动的代码，用于书籍《**N**atural **L**anguage **P**rocessing **i**n **A**ction》（见：http:\u002F\u002Fbit.ly\u002Fgh-readme-nlpia-book）。\n\n## 描述\n\n这是一本由社区共同编写的书籍，内容关于构建具有社会责任感的自然语言处理流水线，使其能够回馈所服务的社区。\n\n## 入门指南\n\n你需要在你的机器上有一个 bash shell。\n[Git](https:\u002F\u002Fgit-scm.com\u002Fdownloads) 提供了包含 bash shell 的安装程序，适用于所有主流操作系统。\n\n安装好 Git 后，打开一个 bash 终端。\n通常它会以 `git-bash` 的名称出现在你的应用程序列表中。\n\n\n### 第一步：安装 [Anaconda3](https:\u002F\u002Fdocs.anaconda.com\u002Fanaconda\u002Finstall\u002F)\n\n* [Linux](https:\u002F\u002Frepo.anaconda.com\u002Farchive\u002FAnaconda3-5.2.0-Linux-x86_64.sh)\n* [MacOSX](https:\u002F\u002Frepo.anaconda.com\u002Farchive\u002FAnaconda3-5.2.0-MacOSX-x86_64.pkg)\n* [Windows](https:\u002F\u002Frepo.anaconda.com\u002Farchive\u002FAnaconda3-5.2.0-Windows-x86_64.exe)\n\n如果你使用图形界面安装 Anaconda3，请确保勾选更新 PATH 环境变量的选项。\n另外，在安装结束时，Anaconda3 安装程序会询问你是否要安装 VSCode。\nMicrosoft 的 VSCode 是一款不错的 Python 编辑器和 linter，不过为了启用所有 lint 功能，你需要将数据发送给 Microsoft。\n\n有些人更喜欢 Sublime Text 3，而不是 Atom 和 VSCode 等开源 IDE。在 Sublime 中，即使离线模式下，你也可以免费获得完整的 lint、拼写检查和自动缩进功能，且不会发生数据窃取或强制接受 EULA 的情况。\n\n### 第二步：安装编辑器\n\n如果你对使用 `jupyter notebook`、`VSCode` 或者 Anaconda 自带的 `Spyder` 感到满意，可以跳过此步骤。\n\n我喜欢 [Sublime Text](https:\u002F\u002Fwww.sublimetext.com\u002F3)。\n相比其他选择，它更加简洁成熟。\n而且拥有更多由像你一样的个人开发者编写的插件。\n\n### 第三步：安装 Git 和 Bash\n\n* Linux — 已经安装\n* MacOSX — 已经安装\n* [Windows](https:\u002F\u002Fgit-scm.com\u002Fdownloads)\n\n如果你使用的是 Linux 或 macOS，则可以直接进入下一步。只需学会如何启动终端，并确保可以在其中运行 `ipython` 或 `jupyter notebook`。这里是你用来实践自己的 NLP 流水线的地方。\n\n#### Windows\n\n在 Windows 上，你需要做更多的准备工作。据说 Windows 10 可以让你安装带有终端和 bash 的 Ubuntu。但随 [`git`](https:\u002F\u002Fgit-scm.com\u002Fdownloads) 一起提供的终端和 shell 可能更为安全可靠，因为它由更广泛的开源社区维护。\n\n你需要确保你的 `PATH` 环境变量中包含了指向 `conda`、`python` 以及其他由 Anaconda 安装的命令行工具的路径。可以通过以下方式设置：\n\n```\necho \"PATH=$HOME\u002FAnaconda3\u002Fbin:$PATH\" >> ~\u002F.bashrc\n```\n\n或者\n\n```bash\necho \"PATH=\u002Fc\u002FUsers\u002F$USER\u002FAppData\u002FLocal\u002FContinuum\u002FAnaconda3\u002F:$PATH\" >> ~\u002F.bashrc\n```\n\n此外，你需要确保新的 MINGW64 终端通过 `winpty` 启动，以便欺骗 Windows 将 `MINGW64` 终端（即 git-bash）识别为符合标准的 TTY 终端应用。因此，在 git-bash 终端中将以下别名添加到你的 `~\u002F.bashrc` 文件中：\n\n```bash\necho \"alias python='winpty python'\" >> ~\u002F.bashrc\necho \"alias jupyter='winpty jupyter'\" >> ~\u002F.bashrc\necho \"alias ipython='winpty ipython'\" >> ~\u002F.bashrc\n```\n\n\n### 第四步：克隆本仓库\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia.git\n```\n\n### 第5步：安装 `nlpia`\n\n您有两种可选的包管理器来安装 `nlpia`：\n\n5.1. [`conda`](https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fblob\u002Fmaster\u002FREADME.md#alternative-51-conda)\n5.2. [`pip`](https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fblob\u002Fmaster\u002FREADME.md#alternative-52-pip)\n\n一位热心的 [NLPIA](http:\u002F\u002Fbit.ly\u002Fgh-readme-nlpia-book) 读者 [Hoang Chung Hien](https:\u002F\u002Fgithub.com\u002Fhoangchunghien) 创建了一个 Dockerfile，您可以使用它作为管理环境的第三种方式：\n\n5.3. [`docker`](https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fblob\u002Fmaster\u002FREADME.md#alternative-53-docker)\n\n在大多数情况下，`conda` 比 `pip` 能更快、更可靠地安装 Python 包。如果没有 `conda`，某些包（例如 `python-levenshtein`）在安装时需要编译 C 库。而 Windows 系统本身并不自带编译器和能够“开箱即用”的 Python 包安装工具。\n\n#### 替代方案 5.1：`conda`\n\n使用您在上述第 1 步中安装的 Anaconda3 中的 `conda`，创建一个名为 `nlpiaenv` 的环境：\n\n```bash\ncd nlpia  # 确保您位于包含 `setup.py` 的 nlpia 目录下\nconda env create -n nlpiaenv -f conda\u002Fenvironment.yml\nconda install -y pip  # 获取最新版本的 pip\nconda activate nlpiaenv\npip install -e .\n```\n\n每当您需要导入或运行任何 `nlpia` 模块时，都需要先激活这个 conda 环境：\n\n```bash\n$ conda activate nlpiaenv\n```\n\n在 **Windows** 的 CMD 提示符（应用程序中的 Anaconda Prompt）中没有 `source` 命令，因此：\n\n```dos\nC:\\ activate nlpiaenv\n```\n\n现在请确保可以成功导入 `nlpia`：\n\n```bash\npython -c \"import nlpia; print(nlpia)\"\n```\n\n如果您已成功创建并激活包含 `nlpia` 包及其依赖项的环境，请直接跳至第 6 步（“尽情享受吧！”）。\n\n#### 替代方案 5.2：`pip`\n\n如果您运气不错，可以先尝试以下方法：\n\n```bash\ncd nlpia\npip install --upgrade pip\npip install -e .\n```\n\n或者，如果您不打算编辑 `nlpia` 的源代码，也不希望为社区做出贡献，可以直接运行：\n\n```bash\npip install nlpia\n```\n\n基于 Linux 的操作系统（如 Ubuntu）和 macOS 自带 C++ 编译器，因此您可能可以使用 `pip` 而不是 `conda` 来安装依赖项。\n但如果您使用的是 Windows，并且想要安装像 `python-levenshtein` 这样需要编译 C++ 库的包，则必须配备编译器。\n幸运的是，微软仍然允许您 [免费下载编译器](https:\u002F\u002Fwiki.python.org\u002Fmoin\u002FWindowsCompilers#Microsoft_Visual_C.2B-.2B-_14.0_standalone:_Visual_C.2B-.2B-_Build_Tools_2015_.28x86.2C_x64.2C_ARM.29)，只需确保您下载的是 Visual Studio “构建工具”，而不是完整的 Visual Studio 安装包。\n\n一旦您拥有了 C\u002FC++ 编译器和 Python 源代码文件，就可以使用 `pip` 安装 `nlpia`：\n\n```bash\ncd nlpia  # 确保您位于包含 `setup.py` 的 nlpia 目录下\npip install --upgrade pip\nmkvirtualenv nlpiaenv\nsource nlpiaenv\u002Fbin\u002Factivate\npip install -r requirements-test.txt\npip install -e .\npip install -r requirements-deep.txt\n```\n\n由于 `pycrypto` 的安装问题，`nlpia` 自带的聊天机器人（包括 TTS 和 STT 音频驱动程序）可能无法在 Windows 上正常运行。\n如果您使用的是 Linux 或 Darwin（macOS）系统，或者愿意帮助我们调试 `pycrypto` 问题，可以尝试安装聊天机器人的相关依赖：\n\n```bash\n# pip install -r requirements-chat.txt\n# pip install -r requirements-voice.txt\n```\n\n## 替代方案 5.3：`docker`\n\n### 5.3.1：构建您的镜像\n\n下载 Jupyter Docker 镜像可能需要几分钟时间：\n\n```bash\ndocker build -t nlpia .\n```\n\n### 5.3.2：运行您的镜像\n\n- `docker run -p 8888:8888 nlpia`\n- 复制运行日志中获取的 `token`\n- 打开浏览器，访问链接 `http:\u002F\u002Flocalhost:8888\u002F?token=...`\n\n### 5.3.3：开始使用\n\n如果您希望保留笔记本文件或将文件夹共享给正在运行的容器，可以使用以下命令：\n\n```bash\ndocker run -p 8888:8888 -v ~:\u002Fhome\u002Fjovyan\u002Fwork nlpia\n```\n\n然后打开一个新的笔记本并测试您的代码。请务必将文件保存到 `work` 目录下，以便在容器外部也能访问。\n\n### 6. 享受乐趣！\n\n查看本书在 `nlpia\u002Fnlpia\u002Fbook\u002Fexamples` 中的代码示例，获取灵感：\n\n```bash\ncd nlpia\u002Fbook\u002Fexamples\nls\n```\n\n通过贡献你的代码和知识，帮助其他自然语言处理从业者。\n\n以下是一些其他可能觉得有用的 nlpia 功能想法。请将你的想法提交到 https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fissues 。\n\n#### 6.1. 功能 1：术语表编译器\n\n可以添加到 https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fblob\u002Fmaster\u002Fsrc\u002Fnlpia\u002Ftranscoders.py:`transcoders.py` 模块中的框架代码和 API。\n\n\n```python\ndef find_acronym(text):\n    \"\"\"在句子中查找括号内的名词短语，并以字符串对的形式返回首字母缩略词\u002F缩写\u002F术语。\n\n    >>> find_acronym('支持向量机（SVM）是一个很好的工具。')\n    ('SVM', '支持向量机')\n    \"\"\"\n    return (abbreviation, noun_phrase)\n```\n\n```python\ndef glossary_from_dict(dict, format='asciidoc'):\n    \"\"\"给定一个单词\u002F首字母缩略词: 定义的字典，以 ASCIIDOC 格式生成术语表字符串\"\"\"\n    return text\n```\n\n```python\ndef glossary_from_file(path, format='asciidoc'):\n    \"\"\"给定一个 ASCIIDOC 文件路径，以 ASCIIDOC 格式生成术语表字符串\"\"\"\n    return text\n```\n\n```python\ndef glossary_from_dir(path, format='asciidoc'):\n    \"\"\"给定一个包含 ASCIIDOC 文件的目录路径，以 ASCIIDOC 格式生成术语表字符串\"\"\"\n    return text\n```\n\n#### 6.2. 功能 2：语义搜索\n\n使用解析器从类似《自然语言处理实战》的 ASCIIDOC 书籍的行或句子列表中提取纯自然语言句子以及标题\u002F小标题。利用 https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fblob\u002Fmaster\u002Fsrc\u002Fnlpia\u002Ftranscoders.py:[nlpia.transcoders] 中的句子分割器，将像 _NLPIA_ 这样的书籍拆分成一系列句子。\n\n#### 6.3. 功能 3：语义频谱图\n\n一串词向量或主题向量可以组成一个二维数组或矩阵，进而以图像形式展示出来。我使用了 `word2vec` (`nlpia.loaders.get_data('word2vec')`) 对 NLPIA 第一章最后四段的文字进行嵌入，结果生成的频谱图比预期要嘈杂得多。尽管如此，其中仍能清晰地看到一些意义的条纹和斑块。\n\n首先，导入必要的库：\n\n```python\n>>> from nlpia.loaders import get_data\n>>> from nltk.tokenize import casual_tokenize\n>>> from matplotlib import pyplot as plt\n>>> import seaborn\n```\n\n先获取原始文本并进行分词：\n\n```python\n>>> lines = get_data('ch1_conclusion')\n>>> txt = \"\\n\".join(lines)\n>>> tokens = casual_tokenize(txt)\n>>> tokens[-10:]\n['你',\n '实现',\n '你的',\n '目标',\n '在',\n '商业',\n '和',\n '在',\n '生活',\n '中',\n '.']\n```\n\n然后需要下载一个词向量模型，比如 word2vec：\n\n```python\n>>> wv = get_data('w2v')  # 这可能需要几分钟\n>>> wordvectors = np.array([wv[tok] for tok in tokens if tok in wv])\n>>> wordvectors.shape\n(307, 300)\n```\n\n现在你可以显示你的 307×300 的频谱图或“词图”：\n\n```python\n>>> plt.imshow(wordvectors)\n>>> plt.show()\n```\n\n[![307×300 频谱图或“词图”](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftotalgood_nlpia_readme_aa368edd7f1f.png)](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftotalgood_nlpia_readme_aa368edd7f1f.png)\n\n你能想到哪些图像处理或深度学习算法可以应用于自然语言文本的图像吗？\n\n一旦掌握了词向量技术，你就可以尝试使用 Google 的通用句子编码器，为整本书创建频谱图。\n\n#### 6.4. 功能 4：构建你自己的序列到序列翻译器\n\n如果你有双语的句子或词汇对，就可以构建一个序列到序列的翻译器。你甚至可以像小学时学猪语那样设计一门自己的语言，或者打造一个 L337 翻译器。\n\n又或者，你可以使用 `dfs = [get_data(lang) for lang in nlpia.loaders.ANKI_LANGUAGES]` 创建一个通用句子嵌入，然后逐个用这些翻译对替换第 10 章中的电影角色聊天机器人数据集。每次针对新语言都从全新的解码器开始。这样你就能拥有一个独立的解码器，用于翻译成任何语言。同时尽量复用编码器，最终得到一个用于编码英语句子的通用思想向量。这将类似于 Google 的通用句子编码，但你的版本是基于字符的，因此能够处理拼写错误的英语单词。\n\n#### 其他想法\n\nNLPIA 书末的“资源”部分还提到了许多其他的项目创意。这里是该资源列表的早期草稿 [链接](https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fblob\u002Fmaster\u002Fsrc\u002Fnlpia\u002Fdata\u002Fbook\u002FAppendix%20E%20--%20Resources.asc.md)。","# NLPIA 快速上手指南\n\nNLPIA 是开源书籍《Natural Language Processing in Action》的配套代码库，提供构建负责任的自然语言处理（NLP）流水线所需的社区驱动工具。\n\n## 环境准备\n\n在开始之前，请确保你的系统满足以下要求：\n\n*   **操作系统**：Linux、macOS 或 Windows。\n*   **Shell 环境**：需要 Bash shell。\n    *   Linux\u002FmacOS：系统自带终端即可。\n    *   Windows：推荐安装 [Git for Windows](https:\u002F\u002Fgit-scm.com\u002Fdownloads)，它包含 `git-bash`。\n*   **Python 发行版**：推荐安装 [Anaconda3](https:\u002F\u002Fdocs.anaconda.com\u002Fanaconda\u002Finstall\u002F)，因为它预装了大多数科学计算依赖。\n    *   **注意**：安装时请勾选\"Add Anaconda to my PATH environment variable\"（将 Anaconda 添加到环境变量）。\n*   **编辑器（可选）**：可使用 Jupyter Notebook、VSCode、Spyder 或 Sublime Text。\n\n## 安装步骤\n\n推荐使用 **Conda** 进行安装，因为它能更好地处理如 `python-levenshtein` 等需要编译 C 库的依赖，尤其是在 Windows 上。\n\n### 第一步：克隆仓库\n\n打开终端（Windows 用户使用 `git-bash`），运行以下命令获取源码：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia.git\ncd nlpia\n```\n\n### 第二步：创建并激活环境\n\n使用 Conda 创建名为 `nlpiaenv` 的虚拟环境并安装依赖：\n\n```bash\n# 创建环境\nconda env create -n nlpiaenv -f conda\u002Fenvironment.yml\n\n# 激活环境\n# Linux\u002FmacOS:\nconda activate nlpiaenv\n# Windows (CMD 或 Anaconda Prompt):\nactivate nlpiaenv\n```\n\n### 第三步：安装 NLPIA 包\n\n在激活的环境中，以可编辑模式安装本地包：\n\n```bash\n# 确保 pip 为最新版本\nconda install -y pip\n\n# 安装 nlpia\npip install -e .\n```\n\n> **替代方案 (Pip)**：如果你不使用 Conda 且系统已配备 C++ 编译器（Linux\u002FmacOS 通常自带，Windows 需单独安装 Build Tools），可直接运行 `pip install nlpia`，但可能会遇到编译错误。\n\n### 验证安装\n\n运行以下命令检查是否安装成功：\n\n```bash\npython -c \"import nlpia; print('NLPIA installed successfully')\"\n```\n\n## 基本使用\n\n安装完成后，你可以加载书中示例数据或尝试基础的 NLP 功能。\n\n### 示例：生成语义频谱图 (Semantic Spectrogram)\n\n以下代码演示了如何加载文本、分词、获取词向量并可视化：\n\n```python\nfrom nlpia.loaders import get_data\nfrom nltk.tokenize import casual_tokenize\nfrom matplotlib import pyplot as plt\nimport seaborn\nimport numpy as np\n\n# 1. 获取示例文本并分词\nlines = get_data('ch1_conclusion')\ntxt = \"\\n\".join(lines)\ntokens = casual_tokenize(txt)\n\n# 2. 下载并加载词向量模型 (首次运行需几分钟)\nwv = get_data('w2v')\n\n# 3. 将 tokens 转换为向量矩阵\nwordvectors = np.array([wv[tok] for tok in tokens if tok in wv])\n\n# 4. 绘制频谱图\nplt.imshow(wordvectors)\nplt.show()\n```\n\n### 探索更多示例\n\n你可以查看仓库中提供的书籍代码示例以获得更多灵感：\n\n```bash\ncd nlpia\u002Fbook\u002Fexamples\nls\n```\n\n现在你可以开始构建自己的 NLP 流水线了！","某初创公司的数据分析师需要快速构建一个能识别用户评论情感倾向并提取关键话题的原型系统，以应对即将到来的产品发布会。\n\n### 没有 nlpia 时\n- 开发者需手动从零搭建完整的 NLP 流水线，花费数天时间配置分词、去停用词和词向量转换等基础组件。\n- 缺乏统一的代码库参考，不同章节的算法示例分散且依赖环境复杂，导致复现书中“负责任的 NLP\"理念极其困难。\n- 在 Windows 环境下配置 Python 与 Bash 交互时常遇兼容性问题，调试环境消耗了大量本应用于模型优化的时间。\n- 难以直接获取经过清洗的社会责任相关数据集，必须自行爬取并处理原始文本，严重拖慢原型迭代速度。\n\n### 使用 nlpia 后\n- 直接调用 nlpia 预置的标准化流水线模块，几分钟内即可完成从文本输入到情感评分的全流程开发。\n- 依托书中配套的社区驱动代码库，轻松复用已验证的伦理 NLP 模式，确保算法在提取观点时兼顾社会影响。\n- 遵循详细的安装指南快速解决 Windows 下的终端配置难题，让团队能立即在统一环境中运行 Jupyter 笔记进行实验。\n- 内置丰富的示例数据集和实用工具函数，让分析师能专注于调整业务逻辑而非陷入底层数据清洗的泥潭。\n\nnlpia 将原本需要数周的基础设施搭建工作压缩至几小时，让团队能迅速将《自然语言处理实战》中的理论转化为具有社会价值的落地应用。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftotalgood_nlpia_f380ba12.png","totalgood","TotalGood","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ftotalgood_b1e1c081.jpg","Machine Intelligence for the greater good",null,"github@totalgood.com","http:\u002F\u002Ftotalgood.com","https:\u002F\u002Fgithub.com\u002Ftotalgood",[81,85,89,93,97,101],{"name":82,"color":83,"percentage":84},"HTML","#e34c26",87.3,{"name":86,"color":87,"percentage":88},"Python","#3572A5",6.4,{"name":90,"color":91,"percentage":92},"Jupyter Notebook","#DA5B0B",6,{"name":94,"color":95,"percentage":96},"Shell","#89e051",0.3,{"name":98,"color":99,"percentage":100},"Dockerfile","#384d54",0,{"name":102,"color":103,"percentage":100},"Batchfile","#C1F12E",635,263,"2026-03-09T03:47:25","MIT","Linux, macOS, Windows","未说明",{"notes":111,"python":112,"dependencies":113},"强烈建议使用 Anaconda3 (conda) 进行环境管理，因为在 Windows 上 pip 安装部分依赖（如 python-levenshtein）需要额外的 C++ 编译器。Windows 用户在使用 git-bash 运行 python\u002Fjupyter 时需配置 winpty 别名以兼容终端。聊天机器人功能（含语音驱动）在 Windows 上可能因 pycrypto 兼容性问题无法使用。提供了 Docker 镜像作为替代部署方案。","未说明 (需安装 Anaconda3)",[114,115,116,117,118,119,120],"nltk","matplotlib","seaborn","python-levenshtein","pycrypto","jupyter","word2vec",[35,13,122,15,14],"其他",[124,125,126,127,128,129,130,131,132],"nlp","bot","chatbot","book","natural-language-processing","deep-learning","ai","virtual-assistant","machine-learning","2026-03-27T02:49:30.150509","2026-04-11T18:32:47.609613",[136,141,146,151,156,161],{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},17255,"在 macOS 上使用 Conda 安装 nlpia 时遇到 TensorFlow 依赖错误怎么办？","这是因为 nlpia 的某个依赖项（AIML-Bot）需要 Python 3.6，而用户可能使用了 Python 2.7。解决方法是确保在安装前激活正确的 Conda 环境。请在运行 `pip install -e .` 之前，先执行命令 `source activate nlpiaenv`（或新版 Conda 的 `conda activate nlpiaenv`）来进入专门创建的环境。","https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fissues\u002F16",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},17256,"导入 `get_data()` 时出现 `ImportError: cannot import name 'Mapping' from 'collections'` 错误如何解决？","这是由于 Python 3.10+ 版本中 `Mapping` 类已从 `collections` 移至 `collections.abc`。您需要修改源代码中的导入语句，将 `from collections import OrderedDict, Mapping, Counter` 更改为 `from collections.abc import Mapping`（同时保留其他仍在 collections 中的类，或将 Mapping 单独从 abc 导入）。具体操作是找到报错文件（如 `pugnlp\u002Futil.py`），定位到第 47 行左右，修改导入路径即可。","https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fissues\u002F49",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},17257,"如何在 Windows 上修复 `constants.FLOAT128` 类型导致的错误？","Windows 系统通常不支持 `numpy.float128` 类型，导致跨平台兼容性错误。该问题已在 `pugnlp` 库的 0.1.11 版本中修复。请升级您的依赖包：运行 `pip install --upgrade pugnlp` 以确保安装版本不低于 0.1.11，或者升级 nlpia 到最新版本以自动获取修复后的依赖。","https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fissues\u002F18",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},17258,"是否有 Docker 镜像可以快速开始使用 nlpia？","项目社区提供了 Dockerfile 供用户自行构建镜像。您可以在项目根目录创建名为 `Dockerfile` 的文件，内容如下：\n```\nFROM jupyter\u002Fdatascience-notebook\nUSER root\nWORKDIR \u002Fhome\u002Fjovyan\u002Fnlpia\nCOPY . .\nRUN pip install --upgrade pip\nRUN pip install -r requirements.txt\nRUN pip install -e .\nWORKDIR \u002Fhome\u002Fjovyan\nRUN chown -R jovyan nlpia\nEXPOSE 8888\nVOLUME [ \"~\u002FDocuments\u002FGitHub\u002Fnlpia\u002Fnotebooks:\u002Fhome\u002Fjovyan\u002Fnotebooks\" ]\n```\n然后运行 `docker build -t nlpia` 构建镜像，最后通过 `docker run -p 8888:8888 nlpia` 启动并访问 Jupyter Notebook。","https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fissues\u002F19",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},17259,"为什么每次调用 `get_data()` 都会重新下载已存在的数据文件？","这是一个已知的大小写敏感性问题，导致程序无法识别已下载的文件。该问题已在 nlpia 0.2.* 版本中修复。如果您遇到此问题，请升级 nlpia 包：`pip install --upgrade nlpia`。升级后，程序将能正确检测本地缓存，避免重复下载。","https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fissues\u002F14",{"id":162,"question_zh":163,"answer_zh":164,"source_url":165},17260,"在 Windows 上运行示例代码时遇到 `ValueError: Unable to configure handler 'logging.handlers.NTEventLogHandler'` 错误怎么办？","这是 Windows 日志处理器配置参数缺失导致的 Bug。该问题已在 nlpia 0.2.8 版本中修复。请运行 `pip install --upgrade nlpia` 将库升级到 0.2.8 或更高版本，即可解决此导入错误。","https:\u002F\u002Fgithub.com\u002Ftotalgood\u002Fnlpia\u002Fissues\u002F23",[167,172,177,182,187,192,197,202,207,212,217,222,227,232,237],{"id":168,"version":169,"summary_zh":170,"released_at":171},99468,"0.1.43","The following examples from the book should now work:\r\n\r\n```python\r\n>>> nlpia.loaders.get_data('deu')  # German-English sentence pairs for `nlpia.translate` LSTM models \r\n>>> nlpia.loaders.get_data('imdb')  # DataFrame of IMDB movie reviews with ratings\r\n```\r\n\r\nAlso, a multi-language international translation character-based LSTM model can be built using the new nlpia.translate module:\r\n\r\n```python\r\n>>> from nlpia.translate import *\r\n>>> model = main('spa', n=10000, epochs=100, batch_size=64, num_neurons=128)\r\nTrain on 9000 samples, validate on 1000 samples\r\nEpoch 1\u002F100\r\n```\r\n\r\nAlso, the base-requirements.txt now includes `Keras`, `tensorflow-gpu`, `SpaCy`, and `regex`. \r\n\r\n\r\n","2018-09-05T23:23:21",{"id":173,"version":174,"summary_zh":175,"released_at":176},99454,"0.2.14","- 更新了 load_anki_phrases 函数，以支持用于存储许可证信息的额外列\n- 更新了正则表达式的文档测试，以适应正则匹配结果表示形式的变化","2019-12-01T05:55:57",{"id":178,"version":179,"summary_zh":180,"released_at":181},99455,"0.2.9","- 升级 regex 包及相关的 doctest 测试。\n- 升级 nltk 以修复安全漏洞。\n- 删除 `annoy`（近似最近邻库）依赖，以便 Bob Liu 能在 Windows 系统上使用 `nlpia`（无论是否使用 Docker 容器）。\n- 测试在 Travis CI 的 Python 3.6 Ubuntu 环境中通过。\n- 测试在 macOS 环境的 Python 3.6 和 3.7 版本中通过。\n","2019-09-13T05:22:04",{"id":183,"version":184,"summary_zh":185,"released_at":186},99456,"0.2.8","- 增加更多文档测试\n- 修复因 Anki 语料库更新而失效的文档测试\n- 修复 Abhijit Mustafi 和 Matt 在 Manning 的 LiveBook 评论中指出的 bug\n- 为 Windows 操作系统中运行于 Ubuntu 子系统（会导致 NT 系统日志功能失效）的情况提供回退方案\n\n","2019-07-31T18:08:37",{"id":188,"version":189,"summary_zh":190,"released_at":191},99457,"0.2.2","- 使 conda\u002Fenvironment.yml 和 requirements.txt 更加健壮\n- 根据读者反馈，更新 ch06_nessvectors 示例中的 `get_data('word2vec')` API 调用\n- 在 docs\u002Fproduct_description.HTML 中添加亚马逊商品页面链接\n- 对 ch10 book\u002Fexamples 中的内容进行改进，并新增来自 Chollet（keras-examples）的法语原文翻译示例\n\n","2019-06-11T22:31:15",{"id":193,"version":194,"summary_zh":195,"released_at":196},99458,"0.1.89","修复了 `src\u002Fnlpia\u002Fbook\u002Fexamples\u002Fch10_*.py` 中用于对话聊天和机器翻译的序列到序列模型示例。这些示例面向《自然语言处理实战》（Manning 出版社）的读者。","2019-05-03T06:16:54",{"id":198,"version":199,"summary_zh":200,"released_at":201},99459,"0.1.82","@KyleBanks 修复了那个聊天机器人的 AIML 语法错误。这个聊天机器人有点像约翰·欧文小说《欧文·米尼的祈祷》中的欧文·米尼——书中描述了一种亲社会策略，用来应对霸凌者口出恶言、贬低他人时的做法。\n","2019-01-23T23:46:35",{"id":203,"version":204,"summary_zh":205,"released_at":206},99460,"0.1.81","- 将电影对话添加到 `data\u002Fmoviedialog.csv`\n- 修复 `src\u002Fnlpia\u002Fbook\u002Fexamples\u002Fch10*` 中的示例聊天机器人\n","2019-01-23T23:39:37",{"id":208,"version":209,"summary_zh":210,"released_at":211},99461,"0.1.80","- 从仓库（src\u002Fnlpia\u002Fbook\u002Fexamples）中删除较大的（5MB）Keras H5 模型文件\n- 在 pytest.ini 和 coverage.rc 中忽略 talk.py，因为它需要安装 pyaudio 和 pocketsphinx，而这些依赖项安装起来比较困难\n- 为 loaders.get_data 添加更多 BIG_URLS 别名，例如将 'wv' 替换为 'w2v'，以修复 LiveBook 贡献者 Jettro Coenradie 发现的 bug\n- 修复 get_data 函数，使其不再将 w2v.bin 文件当作文本文件加载\n- book\u002Fexamples\u002Fch05_...py","2019-01-04T06:18:33",{"id":213,"version":214,"summary_zh":215,"released_at":216},99462,"0.1.63","- Hoang Chung Hien @hoangchunghien 贡献了一个 Dockerfile 和 README.md 文档。","2018-11-13T17:43:56",{"id":218,"version":219,"summary_zh":220,"released_at":221},99463,"0.1.62","- Windows 安装包发布\r\n- 发布 docs\u002Fnotes\u002F 下的 *.md 文件，其中包含书中引用的对话资源。","2018-11-13T12:51:41",{"id":223,"version":224,"summary_zh":225,"released_at":226},99464,"0.1.58","- Installation has been tested on Windows 7 and Windows 10\r\n- Fixed (pinned) the versions of some crucial dependencies (tensorflow, keras, numpy)\r\n- 41% doctest coverage","2018-11-04T04:32:57",{"id":228,"version":229,"summary_zh":230,"released_at":231},99465,"0.1.52","- diabetes.py and .csv for simplified appendix on machine learning\r\n- more regexes for url recognition\u002Fextraction","2018-09-29T01:08:11",{"id":233,"version":234,"summary_zh":235,"released_at":236},99466,"0.1.50","- loaders for l33t and netspeak dictionary dataset\r\n- python 2.7 tests on travis\r\n- 41% coverage\r\n- miniconda install on travis instead of anaconda\r\n- conda build recipe doesn't work because not all requirements.txt have conda builds available on conda-forge","2018-09-20T00:39:14",{"id":238,"version":239,"summary_zh":240,"released_at":241},99467,"0.1.44","\r\nAdd simple URL-extracting regex and import all the more complicated ones from pugnlp:\r\n\r\n```python\r\n>>> from nlpia.regexes import RE_URL_SIMPLE\r\n>>> re.findall(RE_URL_SIMPLE, '* Sublime Text 3 (https:\u002F\u002Fwww.sublimetext.com\u002F3) is great!')[0][0]\r\n'https:\u002F\u002Fwww.sublimetext.com\u002F3'\r\n>>> re.findall(RE_URL_SIMPLE, 'Google github totalgood [github.com\u002Ftotalgood]!')[0][0]\r\n'github.com\u002Ftotalgood'\r\n```\r\n","2018-09-05T23:58:35"]