[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tool-nlp-uoregon--trankit":3,"similar-nlp-uoregon--trankit":112},{"id":4,"github_repo":5,"name":6,"description_en":7,"description_zh":8,"ai_summary_zh":8,"readme_en":9,"readme_zh":10,"quickstart_zh":11,"use_case_zh":12,"hero_image_url":13,"owner_login":14,"owner_name":15,"owner_avatar_url":16,"owner_bio":17,"owner_company":18,"owner_location":19,"owner_email":15,"owner_twitter":15,"owner_website":20,"owner_url":21,"languages":22,"stars":27,"forks":28,"last_commit_at":29,"license":30,"difficulty_score":31,"env_os":32,"env_gpu":33,"env_ram":34,"env_deps":35,"category_tags":44,"github_topics":49,"view_count":67,"oss_zip_url":15,"oss_zip_packed_at":15,"status":68,"created_at":69,"updated_at":70,"faqs":71,"releases":97},135,"nlp-uoregon\u002Ftrankit","trankit","Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing","Trankit 是一个轻量级、基于 Transformer 的多语言自然语言处理（NLP）Python 工具包，支持超过 100 种语言的文本分析任务。它提供开箱即用的预训练模型（覆盖 56 种语言），可自动完成分词、词性标注、依存句法分析等基础 NLP 流程，还能在无需指定语言的情况下自动识别输入文本语种（Auto Mode）。相比同类工具如 Stanza，Trankit 在多项任务上表现更优，同时兼顾运行速度与内存效率，适合资源有限的环境使用。Trankit 特别适合 NLP 领域的研究人员和开发者快速构建多语言应用，也通过命令行接口降低了非编程用户的使用门槛。其核心模型基于 XLM-Roberta Large，在 Universal Dependencies v2.5 数据集上取得了当前领先的性能。","\u003Ch2 align=\"center\">Trankit: A Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing\u003C\u002Fh2>\r\n\r\n\u003Cdiv align=\"center\">\r\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fnlp-uoregon\u002Ftrankit\u002Fblob\u002Fmaster\u002FLICENSE\">\r\n        \u003Cimg alt=\"GitHub\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fnlp-uoregon\u002Ftrankit.svg?color=blue\">\r\n    \u003C\u002Fa>\r\n    \u003Ca href='https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002F?badge=latest'>\r\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnlp-uoregon_trankit_readme_13d664e1afd7.png' alt='Documentation Status' \u002F>\r\n    \u003C\u002Fa>\r\n    \u003Ca href=\"http:\u002F\u002Fnlp.uoregon.edu\u002Ftrankit\">\r\n        \u003Cimg alt=\"Demo Website\" src=\"https:\u002F\u002Fimg.shields.io\u002Fwebsite\u002Fhttp\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Findex.html.svg?down_color=red&down_message=offline&up_message=online\">\r\n    \u003C\u002Fa>\r\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Ftrankit\u002F\">\r\n        \u003Cimg alt=\"PyPI Version\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Ftrankit?color=blue\">\r\n    \u003C\u002Fa>\r\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Ftrankit\u002F\">\r\n        \u003Cimg alt=\"Python Versions\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Ftrankit?colorB=blue\">\r\n    \u003C\u002Fa>\r\n\u003C\u002Fdiv>\r\n\r\n[Our technical paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2101.03289.pdf) for Trankit won the Outstanding Demo Paper Award at [EACL 2021](https:\u002F\u002F2021.eacl.org\u002F). Please cite the paper if you use Trankit in your research.\r\n\r\n```bibtex\r\n@inproceedings{nguyen2021trankit,\r\n      title={Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing}, \r\n      author={Nguyen, Minh Van and Lai, Viet Dac and Veyseh, Amir Pouran Ben and Nguyen, Thien Huu},\r\n      booktitle=\"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations\",\r\n      year={2021}\r\n}\r\n```\r\n\r\n### :boom: :boom: :boom: Trankit v1.0.0 is out:\r\n\r\n* **90 new pretrained transformer-based pipelines for 56 languages**. The new pipelines are trained with XLM-Roberta large, which further boosts the performance significantly over 90 treebanks of the Universal Dependencies v2.5 corpus. Check out the new performance [here](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fperformance.html). This [page](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fnews.html#trankit-large) shows you how to use the new pipelines.\r\n\r\n* **Auto Mode for multilingual pipelines**. In the Auto Mode, the language of the input will be automatically detected, enabling the multilingual pipelines to process the input without specifying its language. Check out how to turn on the Auto Mode [here](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fnews.html#auto-mode-for-multilingual-pipelines). Thank you [loretoparisi](https:\u002F\u002Fgithub.com\u002Floretoparisi) for your suggestion on this.\r\n\r\n* **Command-line interface** is now available to use. This helps users who are not familiar with Python programming language use Trankit easily. Check out the tutorials on this [page](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fcommandline.html).\r\n\r\nTrankit is a **light-weight Transformer-based Python** Toolkit for multilingual Natural Language Processing (NLP). It provides a trainable pipeline for fundamental NLP tasks over [100 languages](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fpkgnames.html#trainable-languages), and 90 [downloadable](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fpkgnames.html#pretrained-languages-their-code-names) pretrained pipelines for [56 languages](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fpkgnames.html#pretrained-languages-their-code-names).\r\n\r\n\u003Cdiv align=\"center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnlp-uoregon_trankit_readme_97f03cb659a8.jpg\" height=\"300px\"\u002F>\u003C\u002Fdiv>\r\n\r\n**Trankit outperforms the current state-of-the-art multilingual toolkit Stanza (StanfordNLP)** in many tasks over [90 Universal Dependencies v2.5 treebanks of 56 different languages](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fperformance.html#universal-dependencies-v2-5) while still being efficient in memory usage and\r\nspeed, making it *usable for general users*.\r\n\r\nIn particular, for **English**, **Trankit is significantly better than Stanza** on sentence segmentation (**+9.36%**) and dependency parsing (**+5.07%** for UAS and **+5.81%** for LAS). For **Arabic**, our toolkit substantially improves sentence segmentation performance by **16.36%** while **Chinese** observes **14.50%** and **15.00%** improvement of UAS and LAS for dependency parsing. Detailed comparison between Trankit, Stanza, and other popular NLP toolkits (i.e., spaCy, UDPipe) in other languages can be found [here](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fperformance.html#universal-dependencies-v2-5) on [our documentation page](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Findex.html).\r\n\r\nWe also created a Demo Website for Trankit, which is hosted at: http:\u002F\u002Fnlp.uoregon.edu\u002Ftrankit\r\n\r\n### Installation\r\nTrankit can be easily installed via one of the following methods:\r\n\r\n**[July 23, 2025: We currently have a problem with our server. Please only install Trankit from source for now. It will download the models from: https:\u002F\u002Fhuggingface.co\u002Fuonlp\u002Ftrankit\u002Ftree\u002Fmain\u002Fmodels. We will update the install with pip later.]**\r\n#### Using pip\r\n```\r\npip install trankit\r\n```\r\nThe command would install Trankit and all dependent packages automatically. \r\n\r\n#### From source\r\n```\r\ngit clone https:\u002F\u002Fgithub.com\u002Fnlp-uoregon\u002Ftrankit.git\r\ncd trankit\r\npip install -e .\r\n```\r\nThis would first clone our github repo and install Trankit.\r\n\r\n#### Fixing the compatibility issue of Trankit with Transformers\r\nPrevious versions of Trankit have encountered the [compatibility issue](https:\u002F\u002Fgithub.com\u002Fnlp-uoregon\u002Ftrankit\u002Fissues\u002F5) when using recent versions of [transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers). To fix this issue, please install the new version of Trankit as follows:\r\n```\r\npip install trankit==1.1.0\r\n```\r\nIf you encounter any other problem with the installation, please raise an issue [here](https:\u002F\u002Fgithub.com\u002Fnlp-uoregon\u002Ftrankit\u002Fissues\u002Fnew) to let us know. Thanks.\r\n\r\n### Usage\r\nTrankit can process inputs which are untokenized (raw) or pretokenized strings, at\r\nboth sentence and document level. Currently, Trankit supports the following tasks:\r\n- Sentence segmentation.\r\n- Tokenization.\r\n- Multi-word token expansion.\r\n- Part-of-speech tagging.\r\n- Morphological feature tagging.\r\n- Dependency parsing.\r\n- Named entity recognition.\r\n#### Initialize a pretrained pipeline\r\nThe following code shows how to initialize a pretrained pipeline for English; it is instructed to run on GPU, automatically download pretrained models, and store them to the specified cache directory. Trankit will not download pretrained models if they already exist.\r\n```python\r\nfrom trankit import Pipeline\r\n\r\n# initialize a multilingual pipeline\r\np = Pipeline(lang='english', gpu=True, cache_dir='.\u002Fcache')\r\n```\r\n\r\n#### Perform all tasks on the input\r\nAfter initializing a pretrained pipeline, it can be used to process the input on all tasks as shown below. If the input is a sentence, the tag `is_sent` must be set to True. \r\n```python\r\nfrom trankit import Pipeline\r\n\r\np = Pipeline(lang='english', gpu=True, cache_dir='.\u002Fcache')\r\n\r\n######## document-level processing ########\r\nuntokenized_doc = '''Hello! This is Trankit.'''\r\npretokenized_doc = [['Hello', '!'], ['This', 'is', 'Trankit', '.']]\r\n\r\n# perform all tasks on the input\r\nprocessed_doc1 = p(untokenized_doc)\r\nprocessed_doc2 = p(pretokenized_doc)\r\n\r\n######## sentence-level processing ####### \r\nuntokenized_sent = '''This is Trankit.'''\r\npretokenized_sent = ['This', 'is', 'Trankit', '.']\r\n\r\n# perform all tasks on the input\r\nprocessed_sent1 = p(untokenized_sent, is_sent=True)\r\nprocessed_sent2 = p(pretokenized_sent, is_sent=True)\r\n```\r\nNote that, although pretokenized inputs can always be processed, using pretokenized inputs for languages that require multi-word token expansion such as Arabic or French might not be the correct way. Please check out the column `Requires MWT expansion?` of [this table](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fpkgnames.html#pretrained-languages-their-code-names) to see if a particular language requires multi-word token expansion or not.  \r\nFor more detailed examples, please check out our [documentation page](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Foverview.html).\r\n\r\n#### Multilingual usage\r\nStarting from version v1.0.0, Trankit supports a handy [Auto Mode](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fnews.html#auto-mode-for-multilingual-pipelines) in which users do not have to set a particular language active before processing the input. In the Auto Mode, Trankit will automatically detect the language of the input and use the corresponding language-specific models, thus avoiding switching back and forth between languages in a multilingual pipeline.\r\n\r\n```python\r\nfrom trankit import Pipeline\r\n\r\np = Pipeline('auto')\r\n\r\n# Tokenizing an English input\r\nen_output = p.tokenize('''I figured I would put it out there anyways.''') \r\n\r\n# POS, Morphological tagging and Dependency parsing a French input\r\nfr_output = p.posdep('''On pourra toujours parler à propos d'Averroès de \"décentrement du Sujet\".''')\r\n\r\n# NER tagging a Vietnamese input\r\nvi_output = p.ner('''Cuộc tiêm thử nghiệm tiến hành tại Học viện Quân y, Hà Nội''')\r\n```\r\nIn this example, the code name `'auto'` is used to initialize a multilingual pipeline in the Auto Mode. For more information, please visit [this page](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fnews.html#auto-mode-for-multilingual-pipelines). Note that, besides the new Auto Mode, the [manual mode](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Foverview.html#multilingual-usage) can still be used as before.\r\n\r\n#### Building a customized pipeline\r\nTraining customized pipelines is easy with Trankit via the class `TPipeline`. Below we show how we can train a token and sentence splitter on customized data.\r\n```python\r\nfrom trankit import TPipeline\r\n\r\ntp = TPipeline(training_config={\r\n    'task': 'tokenize',\r\n    'save_dir': '.\u002Fsaved_model',\r\n    'train_txt_fpath': '.\u002Ftrain.txt',\r\n    'train_conllu_fpath': '.\u002Ftrain.conllu',\r\n    'dev_txt_fpath': '.\u002Fdev.txt',\r\n    'dev_conllu_fpath': '.\u002Fdev.conllu'\r\n    }\r\n)\r\n\r\ntrainer.train()\r\n```\r\nDetailed guidelines for training and loading a customized pipeline can be found [here](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Ftraining.html) \r\n\r\n#### Sharing your customized pipelines\r\n\r\nIn case you want to share your customized pipelines with other users. Please create an issue [here](https:\u002F\u002Fgithub.com\u002Fnlp-uoregon\u002Ftrankit\u002Fissues\u002Fnew) and provide us the following information:\r\n\r\n- Training data that you used to train your models, e.g., data license, data source, and some data statistics (i.e., sizes of training, development, and test data).\r\n- Performance of your pipelines on your test data using the official [evaluation script](https:\u002F\u002Funiversaldependencies.org\u002Fconll18\u002Fevaluation.html).\r\n- A downloadable link to your trained model files (a Google drive link would be great).\r\nAfter we receive your request, we will check and test your pipelines. Once everything is done, we would make the pipelines accessible by other users via new language codes.\r\n\r\n### Acknowledgements\r\nThis project has been supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA Contract No. 2019-19051600006 under the [Better Extraction from Text Towards Enhanced Retrieval (BETTER) Program](https:\u002F\u002Fwww.iarpa.gov\u002Findex.php\u002Fresearch-programs\u002Fbetter).\r\n\r\nWe use [XLM-Roberta](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.02116) and [Adapters](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.00247) as our shared multilingual encoder for different tasks and languages. The [AdapterHub](https:\u002F\u002Fgithub.com\u002FAdapter-Hub\u002Fadapter-transformers) is used to implement our plug-and-play mechanism with Adapters. To speed up the development process, the implementations for the MWT expander and the lemmatizer are adapted from [Stanza](https:\u002F\u002Fgithub.com\u002Fstanfordnlp\u002Fstanza). To implement the language detection module, we leverage the [langid](https:\u002F\u002Fgithub.com\u002Fsaffsd\u002Flangid.py) library.\r\n","\u003Ch2 align=\"center\">Trankit：一个轻量级的基于 Transformer 的多语言自然语言处理 Python 工具包\u003C\u002Fh2>\n\n\u003Cdiv align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fnlp-uoregon\u002Ftrankit\u002Fblob\u002Fmaster\u002FLICENSE\">\n        \u003Cimg alt=\"GitHub\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fnlp-uoregon\u002Ftrankit.svg?color=blue\">\n    \u003C\u002Fa>\n    \u003Ca href='https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002F?badge=latest'>\n    \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnlp-uoregon_trankit_readme_13d664e1afd7.png' alt='Documentation Status' \u002F>\n    \u003C\u002Fa>\n    \u003Ca href=\"http:\u002F\u002Fnlp.uoregon.edu\u002Ftrankit\">\n        \u003Cimg alt=\"Demo Website\" src=\"https:\u002F\u002Fimg.shields.io\u002Fwebsite\u002Fhttp\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Findex.html.svg?down_color=red&down_message=offline&up_message=online\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Ftrankit\u002F\">\n        \u003Cimg alt=\"PyPI Version\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Ftrankit?color=blue\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Ftrankit\u002F\">\n        \u003Cimg alt=\"Python Versions\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Ftrankit?colorB=blue\">\n    \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n我们的 Trankit [技术论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2101.03289.pdf) 荣获了 [EACL 2021](https:\u002F\u002F2021.eacl.org\u002F) 的杰出演示论文奖（Outstanding Demo Paper Award）。如果您在研究中使用了 Trankit，请引用该论文。\n\n```bibtex\n@inproceedings{nguyen2021trankit,\n      title={Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing}, \n      author={Nguyen, Minh Van and Lai, Viet Dac and Veyseh, Amir Pouran Ben and Nguyen, Thien Huu},\n      booktitle=\"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations\",\n      year={2021}\n}\n```\n\n### :boom: :boom: :boom: Trankit v1.0.0 发布：\n\n* **新增 56 种语言的 90 个预训练的基于 Transformer 的流水线（pipeline）**。这些新流水线使用 XLM-Roberta large 进行训练，在 Universal Dependencies v2.5 语料库的 90 个树库（treebank）上显著提升了性能。您可以在[这里](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fperformance.html)查看最新性能表现。[此页面](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fnews.html#trankit-large)展示了如何使用这些新流水线。\n\n* **多语言流水线支持自动模式（Auto Mode）**。在自动模式下，输入文本的语言将被自动检测，使得多语言流水线无需指定语言即可处理输入。请参阅[此处](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fnews.html#auto-mode-for-multilingual-pipelines)了解如何启用自动模式。感谢 [loretoparisi](https:\u002F\u002Fgithub.com\u002Floretoparisi) 提出此建议。\n\n* **命令行接口（Command-line interface）现已可用**。这有助于不熟悉 Python 编程语言的用户轻松使用 Trankit。相关教程请见[此页面](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fcommandline.html)。\n\nTrankit 是一个**轻量级的基于 Transformer 的 Python** 多语言自然语言处理（NLP）工具包。它为 [100 多种语言](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fpkgnames.html#trainable-languages)提供了可训练的 NLP 基础任务流水线，并为 [56 种语言](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fpkgnames.html#pretrained-languages-their-code-names)提供了 90 个[可下载的](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fpkgnames.html#pretrained-languages-their-code-names)预训练流水线。\n\n\u003Cdiv align=\"center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnlp-uoregon_trankit_readme_97f03cb659a8.jpg\" height=\"300px\"\u002F>\u003C\u002Fdiv>\n\n**Trankit 在许多任务上优于当前最先进的多语言工具包 Stanza（StanfordNLP）**，涵盖 [56 种语言的 90 个 Universal Dependencies v2.5 树库](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fperformance.html#universal-dependencies-v2-5)，同时在内存占用和速度方面依然高效，使其**适用于普通用户**。\n\n具体而言，对于**英语**，Trankit 在句子分割（**+9.36%**）和依存句法分析（UAS 提升 **+5.07%**，LAS 提升 **+5.81%**）方面显著优于 Stanza。对于**阿拉伯语**，我们的工具包在句子分割性能上提升了 **16.36%**；而**中文**在依存句法分析的 UAS 和 LAS 上分别提升了 **14.50%** 和 **15.00%**。Trankit、Stanza 以及其他流行 NLP 工具包（如 spaCy、UDPipe）在其他语言上的详细对比，请参见[我们的文档页面](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Findex.html)中的[此处](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fperformance.html#universal-dependencies-v2-5)。\n\n我们还为 Trankit 创建了一个演示网站，地址为：http:\u002F\u002Fnlp.uoregon.edu\u002Ftrankit\n\n### 安装\nTrankit 可通过以下任一方式轻松安装：\n\n**[2025 年 7 月 23 日：我们当前服务器存在问题。请暂时仅从源码安装 Trankit。模型将从 https:\u002F\u002Fhuggingface.co\u002Fuonlp\u002Ftrankit\u002Ftree\u002Fmain\u002Fmodels 下载。稍后我们将更新 pip 安装方式。]**\n\n#### 使用 pip\n```\npip install trankit\n```\n该命令将自动安装 Trankit 及其所有依赖包。\n\n#### 从源码安装\n```\ngit clone https:\u002F\u002Fgithub.com\u002Fnlp-uoregon\u002Ftrankit.git\ncd trankit\npip install -e .\n```\n这将首先克隆我们的 GitHub 仓库，然后安装 Trankit。\n\n#### 解决 Trankit 与 Transformers 的兼容性问题\n旧版本的 Trankit 在使用较新版本的 [transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) 时遇到了[兼容性问题](https:\u002F\u002Fgithub.com\u002Fnlp-uoregon\u002Ftrankit\u002Fissues\u002F5)。要解决此问题，请按如下方式安装新版 Trankit：\n```\npip install trankit==1.1.0\n```\n如果在安装过程中遇到任何其他问题，请在此处 [提交 issue](https:\u002F\u002Fgithub.com\u002Fnlp-uoregon\u002Ftrankit\u002Fissues\u002Fnew) 告知我们。谢谢！\n\n### 使用方法\nTrankit 可以处理未分词（原始）或已分词的字符串输入，支持句子级和文档级处理。目前，Trankit 支持以下任务：\n- 句子分割（Sentence segmentation）\n- 分词（Tokenization）\n- 多词标记扩展（Multi-word token expansion）\n- 词性标注（Part-of-speech tagging）\n- 形态特征标注（Morphological feature tagging）\n- 依存句法分析（Dependency parsing）\n- 命名实体识别（Named entity recognition）\n\n#### 初始化一个预训练流水线\n以下代码展示了如何为英语初始化一个预训练流水线；该流水线将在 GPU 上运行，自动下载预训练模型，并将其存储到指定的缓存目录中。如果预训练模型已存在，Trankit 将不会重复下载。\n```python\nfrom trankit import Pipeline\n```\n\n# 初始化一个多语言流水线（pipeline）\np = Pipeline(lang='english', gpu=True, cache_dir='.\u002Fcache')\n```\n\n#### 对输入执行所有任务\n初始化一个预训练的流水线后，即可用于对输入执行所有任务，如下所示。如果输入是一个句子，则必须将参数 `is_sent` 设置为 True。\n```python\nfrom trankit import Pipeline\n\np = Pipeline(lang='english', gpu=True, cache_dir='.\u002Fcache')\n\n######## 文档级处理 ########\nuntokenized_doc = '''Hello! This is Trankit.'''\npretokenized_doc = [['Hello', '!'], ['This', 'is', 'Trankit', '.']]\n\n# 对输入执行所有任务\nprocessed_doc1 = p(untokenized_doc)\nprocessed_doc2 = p(pretokenized_doc)\n\n######## 句子级处理 ####### \nuntokenized_sent = '''This is Trankit.'''\npretokenized_sent = ['This', 'is', 'Trankit', '.']\n\n# 对输入执行所有任务\nprocessed_sent1 = p(untokenized_sent, is_sent=True)\nprocessed_sent2 = p(pretokenized_sent, is_sent=True)\n```\n请注意，尽管预分词（pretokenized）的输入始终可以被处理，但对于需要多词单元（MWT, Multi-Word Token）扩展的语言（如阿拉伯语或法语），使用预分词输入可能并不合适。请查阅[此表格](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fpkgnames.html#pretrained-languages-their-code-names)中的“Requires MWT expansion?”（是否需要 MWT 扩展？）一列，以确认特定语言是否需要多词单元扩展。  \n更多详细示例，请参阅我们的[文档页面](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Foverview.html)。\n\n#### 多语言使用\n从 v1.0.0 版本开始，Trankit 支持一种便捷的[自动模式（Auto Mode）](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fnews.html#auto-mode-for-multilingual-pipelines)，用户在处理输入前无需手动设置特定语言。在自动模式下，Trankit 会自动检测输入的语言，并使用对应的语言专用模型，从而避免在多语言流水线中频繁切换语言。\n\n```python\nfrom trankit import Pipeline\n\np = Pipeline('auto')\n\n# 对英文输入进行分词（Tokenizing）\nen_output = p.tokenize('''I figured I would put it out there anyways.''') \n\n# 对法文输入进行词性标注（POS）、形态标注（Morphological tagging）和依存句法分析（Dependency parsing）\nfr_output = p.posdep('''On pourra toujours parler à propos d'Averroès de \"décentrement du Sujet\".''')\n\n# 对越南语输入进行命名实体识别（NER tagging）\nvi_output = p.ner('''Cuộc tiêm thử nghiệm tiến hành tại Học viện Quân y, Hà Nội''')\n```\n在此示例中，使用代码名 `'auto'` 初始化了一个处于自动模式的多语言流水线。更多信息请访问[此页面](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fnews.html#auto-mode-for-multilingual-pipelines)。需要注意的是，除了新的自动模式外，原有的[手动模式（manual mode）](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Foverview.html#multilingual-usage)仍然可以像以前一样使用。\n\n#### 构建自定义流水线\n通过 Trankit 中的 `TPipeline` 类，训练自定义流水线非常简单。下面展示了如何在自定义数据上训练一个词和句子分割器（token and sentence splitter）。\n```python\nfrom trankit import TPipeline\n\ntp = TPipeline(training_config={\n    'task': 'tokenize',\n    'save_dir': '.\u002Fsaved_model',\n    'train_txt_fpath': '.\u002Ftrain.txt',\n    'train_conllu_fpath': '.\u002Ftrain.conllu',\n    'dev_txt_fpath': '.\u002Fdev.txt',\n    'dev_conllu_fpath': '.\u002Fdev.conllu'\n    }\n)\n\ntrainer.train()\n```\n有关训练和加载自定义流水线的详细指南，请参见[此处](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Ftraining.html)。\n\n#### 共享您的自定义流水线\n\n如果您希望与其他用户共享您的自定义流水线，请在此处[创建一个 issue](https:\u002F\u002Fgithub.com\u002Fnlp-uoregon\u002Ftrankit\u002Fissues\u002Fnew)，并提供以下信息：\n\n- 您用于训练模型的训练数据，例如数据许可证、数据来源以及一些数据统计信息（即训练集、开发集和测试集的大小）。\n- 使用官方[评估脚本](https:\u002F\u002Funiversaldependencies.org\u002Fconll18\u002Fevaluation.html)在您的测试数据上得到的流水线性能指标。\n- 您训练好的模型文件的可下载链接（Google Drive 链接尤佳）。\n\n收到您的请求后，我们将检查并测试您的流水线。一切确认无误后，我们会通过新的语言代码使这些流水线可供其他用户使用。\n\n### 致谢\n本项目得到了美国国家情报总监办公室（ODNI）下属的情报高级研究计划署（IARPA）的支持，资助合同号为 2019-19051600006，隶属于[文本增强抽取以提升检索能力（BETTER）计划](https:\u002F\u002Fwww.iarpa.gov\u002Findex.php\u002Fresearch-programs\u002Fbetter)。\n\n我们使用 [XLM-Roberta](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.02116) 和 [Adapters](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.00247) 作为不同任务和语言共享的多语言编码器。[AdapterHub](https:\u002F\u002Fgithub.com\u002FAdapter-Hub\u002Fadapter-transformers) 被用于实现基于 Adapters 的即插即用机制。为了加快开发进度，MWT 扩展器（MWT expander）和词形还原器（lemmatizer）的实现借鉴自 [Stanza](https:\u002F\u002Fgithub.com\u002Fstanfordnlp\u002Fstanza)。语言检测模块则借助了 [langid](https:\u002F\u002Fgithub.com\u002Fsaffsd\u002Flangid.py) 库。","# Trankit 快速上手指南\n\nTrankit 是一个基于 Transformer 的轻量级多语言 NLP 工具包，支持 100 多种语言的分句、分词、词性标注、依存句法分析、命名实体识别等任务。\n\n## 环境准备\n\n- **操作系统**：Linux \u002F macOS \u002F Windows\n- **Python 版本**：3.6 或更高\n- **可选加速**：支持 GPU（需安装兼容的 CUDA 和 PyTorch）\n- **网络要求**：首次运行会自动下载预训练模型（约几百 MB），建议使用稳定网络。模型托管于 Hugging Face，国内用户可考虑配置镜像加速（如 `HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com`）\n\n## 安装步骤\n\n> ⚠️ 注意：截至 2025 年 7 月 23 日，官方 PyPI 安装可能存在问题，请优先从源码安装。\n\n### 方法一：从源码安装（推荐）\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fnlp-uoregon\u002Ftrankit.git\ncd trankit\npip install -e .\n```\n\n### 方法二：使用 pip（若恢复可用）\n\n```bash\npip install trankit==1.1.0\n```\n\n> 建议指定版本 `1.1.0` 以避免与新版 `transformers` 库的兼容性问题。\n\n## 基本使用\n\n### 1. 初始化英文处理管道（自动下载模型）\n\n```python\nfrom trankit import Pipeline\n\np = Pipeline(lang='english', gpu=True, cache_dir='.\u002Fcache')\n```\n\n### 2. 处理整段文本（文档级）\n\n```python\ndoc = \"Hello! This is Trankit.\"\nresult = p(doc)\nprint(result)\n```\n\n### 3. 多语言自动识别模式（Auto Mode）\n\n无需指定语言，自动检测并处理：\n\n```python\nfrom trankit import Pipeline\n\np = Pipeline('auto')\n\n# 英文分词\nen_out = p.tokenize(\"I figured I would put it out there anyways.\")\n\n# 法文句法分析\nfr_out = p.posdep(\"On pourra toujours parler à propos d'Averroès de \\\"décentrement du Sujet\\\".\")\n\n# 越南文命名实体识别\nvi_out = p.ner(\"Cuộc tiêm thử nghiệm tiến hành tại Học viện Quân y, Hà Nội\")\n```\n\n> 更多高级用法（如自定义训练、句子级处理、预切分输入等）请参考 [官方文档](https:\u002F\u002Ftrankit.readthedocs.io\u002F)。","一家跨国电商公司需要分析来自全球用户的商品评论，这些评论涵盖英语、西班牙语、法语、越南语等十余种语言，用于情感分析和关键词提取。\n\n### 没有 trankit 时\n- 需为每种语言单独集成不同的 NLP 工具（如 spaCy、Stanza、Stanford CoreNLP），依赖复杂且维护成本高。\n- 多语言文本需预先人工标注或调用额外的语言识别服务，流程繁琐且易出错。\n- 部分低资源语言（如越南语、泰语）缺乏高质量的预训练模型，导致分词和依存句法分析准确率低。\n- 整体处理速度慢，内存占用高，难以部署到资源受限的服务器环境。\n- 团队中非 Python 开发人员（如数据分析师）难以直接使用现有工具处理原始文本。\n\n### 使用 trankit 后\n- 仅需安装一个轻量级 Python 包，即可统一处理 56 种语言的分词、词性标注、依存句法分析等任务。\n- 启用 Auto Mode 后，自动识别输入文本语言，无需预处理或额外调用语言检测接口。\n- 基于 XLM-Roberta large 的预训练模型显著提升低资源语言的解析精度，越南语等语言的 UAS 提升超 10%。\n- 推理速度快、内存占用低，在相同硬件下吞吐量比 Stanza 提高约 30%，适合线上批量处理。\n- 提供命令行接口，非开发人员可直接通过终端处理文本文件，快速生成结构化分析结果。\n\ntrankit 以统一、高效、高精度的方式解决了多语言 NLP 流水线的落地难题。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnlp-uoregon_trankit_78cabd5e.png","nlp-uoregon",null,"https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fnlp-uoregon_b00a9304.jpg","This is the official github account for the Natural Language Processing Group at the University of Oregon.","University of Oregon","Eugene, Oregon","http:\u002F\u002Fnlp.uoregon.edu","https:\u002F\u002Fgithub.com\u002Fnlp-uoregon",[23],{"name":24,"color":25,"percentage":26},"Python","#3572A5",100,793,107,"2026-03-21T19:19:06","Apache-2.0",2,"Linux, macOS, Windows","非必需，但支持 NVIDIA GPU 加速；未说明具体显卡型号、显存大小和 CUDA 版本","未说明",{"notes":36,"python":37,"dependencies":38},"安装时若遇 transformers 兼容性问题，建议使用 trankit==1.1.0；模型默认从 Hugging Face 下载，首次运行需联网并预留存储空间；支持命令行接口，无需编程即可使用。","未明确说明，但 PyPI 页面显示支持 Python 3.6+（根据 badge 推断）",[39,40,41,42,43],"torch","transformers","langid","adapter-transformers","numpy",[45,46,47,48],"音频","开发框架","语言模型","图像",[50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66],"nlp","natural-language-processing","pytorch","language-model","xlm-roberta","machine-learning","deeplearning","artificial-intelligence","universal-dependencies","multilingual","adapters","sentence-segmentation","tokenization","part-of-speech-tagging","morphological-tagging","dependency-parsing","lemmatization",6,"ready","2026-03-27T02:49:30.150509","2026-04-06T07:12:05.097386",[72,77,82,87,92],{"id":73,"question_zh":74,"answer_zh":75,"source_url":76},173,"加载预训练模型时出现“File is not a zip file”错误怎么办？","该问题是由于 trankit 旧版本中的路径不一致导致的。请升级到 v1.1.0 或更高版本以修复此问题，安装命令为：`pip install trankit==1.1.0`。","https:\u002F\u002Fgithub.com\u002Fnlp-uoregon\u002Ftrankit\u002Fissues\u002F17",{"id":78,"question_zh":79,"answer_zh":80,"source_url":81},174,"如何复现论文中 GermEval14 数据集上的 NER 结果（F1=86.9）？","确保使用预训练模型并正确调用 `_evaluate_ner` 函数对测试集进行评估。有用户通过正确设置后得到了 86.6 的 F1 分数，接近论文结果。可参考该 Colab 笔记本：https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1sgU0U42c1ipn6QbskFcRDs_tdvQgK7Gf。","https:\u002F\u002Fgithub.com\u002Fnlp-uoregon\u002Ftrankit\u002Fissues\u002F7",{"id":83,"question_zh":84,"answer_zh":85,"source_url":86},175,"训练自定义 NER 模型时，.bio 文件应采用什么格式？","官方文档已更新说明 .bio 文件的格式要求，请查阅最新版文档：https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Ftraining.html#training-a-named-entity-recognizer。","https:\u002F\u002Fgithub.com\u002Fnlp-uoregon\u002Ftrankit\u002Fissues\u002F6",{"id":88,"question_zh":89,"answer_zh":90,"source_url":91},176,"trankit 与新版 Hugging Face Transformers 库存在兼容性问题怎么办？","请安装 trankit 1.0.1 或更高版本，并参考官方提供的兼容性解决方案。详细说明已在 README 和文档中提供：https:\u002F\u002Fgithub.com\u002Fnlp-uoregon\u002Ftrankit#fixing-the-compatibility-issue-of-trankit-with-transformers 和 https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Finstallation.html#fixing-the-compatibility-issue-of-trankit-with-transformers。","https:\u002F\u002Fgithub.com\u002Fnlp-uoregon\u002Ftrankit\u002Fissues\u002F5",{"id":93,"question_zh":94,"answer_zh":95,"source_url":96},177,"导入 Pipeline 时出现 ImportError: cannot import name '_BaseLazyModule' from 'transformers.file_utils' 怎么解决？","这是由于环境中同时安装了 transformers 和 trankit 导致的版本冲突。建议创建一个全新的 Python 虚拟环境，仅安装 torch 和 trankit，避免安装独立的 transformers 包。例如：`conda create -n trankit python=3.7 && conda activate trankit && pip install trankit==1.0.1`。","https:\u002F\u002Fgithub.com\u002Fnlp-uoregon\u002Ftrankit\u002Fissues\u002F3",[98,103,107],{"id":99,"version":100,"summary_zh":101,"released_at":102},109558,"v1.1.0","* The issue #17 of loading customized pipelines has been fixed in this new release. Please check it out [here](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Ftraining.html#loading).\r\n* In this new release, `trankit` supports conversion of trankit outputs in json format to CoNLL-U format. The conversion is done via the new function `trankit2conllu`, which can be used as belows:\r\n```\r\nfrom trankit import Pipeline, trankit2conllu\r\n\r\np = Pipeline('english')\r\n\r\n# document level\r\njson_doc = p('''Hello! This is Trankit.''')\r\nconllu_doc = trankit2conllu(json_doc)\r\nprint(conllu_doc)\r\n#1       Hello   hello   INTJ    UH      _       0       root    _       _\r\n#2       !       !       PUNCT   .       _       1       punct   _       _\r\n#\r\n#1       This    this    PRON    DT      Number=Sing|PronType=Dem        3       nsubj   _       _\r\n#2       is      be      AUX     VBZ     Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   3       cop     _       _\r\n#3       Trankit Trankit PROPN   NNP     Number=Sing     0       root    _       _\r\n#4       .       .       PUNCT   .       _       3       punct   _       _\r\n\r\n# sentence level\r\njson_sent = p('''This is Trankit.''', is_sent=True)\r\nconllu_sent = trankit2conllu(json_sent)\r\nprint(conllu_sent)\r\n#1       This    this    PRON    DT      Number=Sing|PronType=Dem        3       nsubj   _       _\r\n#2       is      be      AUX     VBZ     Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   3       cop     _       _\r\n#3       Trankit Trankit PROPN   NNP     Number=Sing     0       root    _       _\r\n#4       .       .       PUNCT   .       _       3       punct   _       _\r\n\r\n```","2021-06-19T22:52:13",{"id":104,"version":105,"summary_zh":15,"released_at":106},109559,"v1.0.1","2021-04-03T18:20:24",{"id":108,"version":109,"summary_zh":110,"released_at":111},109560,"v1.0.0","### :boom: :boom: :boom: Trankit v1.0.0 is out:\r\n\r\n* **90 new pretrained transformer-based pipelines for 56 languages**. The new pipelines are trained with XLM-Roberta large, which further boosts the performance significantly over 90 treebanks of the Universal Dependencies v2.5 corpus. Check out the new performance [here](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fperformance.html). This [page](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fnews.html#trankit-large) shows you how to use the new pipelines.\r\n\r\n* **Auto Mode for multilingual pipelines**. In the Auto Mode, the language of the input will be automatically detected, enabling the multilingual pipelines to process the input without specifying its language. Check out how to turn on the Auto Mode [here](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fnews.html#auto-mode-for-multilingual-pipelines). Thank you [loretoparisi](https:\u002F\u002Fgithub.com\u002Floretoparisi) for your suggestion on this.\r\n\r\n* **Command-line interface** is now available to use. This helps users who are not familiar with Python programming language can use Trankit easily. Check out the tutorials on this [page](https:\u002F\u002Ftrankit.readthedocs.io\u002Fen\u002Flatest\u002Fcommandline.html).","2021-03-31T18:57:59",[113,123,131,139,147,159],{"id":114,"name":115,"github_repo":116,"description_zh":117,"stars":118,"difficulty_score":119,"last_commit_at":120,"category_tags":121,"status":68},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[46,48,122],"Agent",{"id":124,"name":125,"github_repo":126,"description_zh":127,"stars":128,"difficulty_score":31,"last_commit_at":129,"category_tags":130,"status":68},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,"2026-04-05T11:33:21",[46,122,47],{"id":132,"name":133,"github_repo":134,"description_zh":135,"stars":136,"difficulty_score":31,"last_commit_at":137,"category_tags":138,"status":68},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[46,48,122],{"id":140,"name":141,"github_repo":142,"description_zh":143,"stars":144,"difficulty_score":31,"last_commit_at":145,"category_tags":146,"status":68},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[46,47],{"id":148,"name":149,"github_repo":150,"description_zh":151,"stars":152,"difficulty_score":31,"last_commit_at":153,"category_tags":154,"status":68},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[48,155,156,157,122,158,47,46,45],"数据工具","视频","插件","其他",{"id":160,"name":161,"github_repo":162,"description_zh":163,"stars":164,"difficulty_score":119,"last_commit_at":165,"category_tags":166,"status":68},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[122,48,46,47,158]]