[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-mozilla--bugbug":3,"tool-mozilla--bugbug":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":114,"forks":115,"last_commit_at":116,"license":117,"difficulty_score":10,"env_os":118,"env_gpu":119,"env_ram":120,"env_deps":121,"category_tags":128,"github_topics":129,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":137,"updated_at":138,"faqs":139,"releases":175},2371,"mozilla\u002Fbugbug","bugbug","Platform for Machine Learning projects on Software Engineering","bugbug 是一个由 Mozilla 开发的开源机器学习平台，专为软件工程领域设计。它致力于利用人工智能技术优化软件缺陷管理和质量保障流程，帮助团队更高效地处理海量问题报告。\n\n在实际开发中，面对成千上万的 Bug 提交，人工分类、指派和筛选往往耗时费力且容易出错。bugbug 通过训练多种分类模型，自动解决这些痛点：它能智能识别真正的程序缺陷而非功能请求，自动推荐合适的修复负责人，预测代码补丁是否会导致测试失败或需要回退，甚至能精准筛选出最需要运行的测试用例，从而大幅提升研发效率。\n\n这款工具特别适合软件开发者、测试工程师（QA）以及从事软件工程研究的科研人员使用。对于维护大型开源项目或复杂商业系统的团队而言，bugbug 能有效减轻重复性劳动，让专业人员将精力集中在核心问题上。\n\n其独特亮点在于提供了丰富的预训练模型场景，涵盖从“缺陷类型识别”、“回归检测”到“垃圾信息过滤”等十多种具体任务。更值得一提的是，bugbug 生成的训练数据可以独立于平台使用，为研究人员提供了宝贵的数据集资源，促进了机器学习在软件工程领域的进一步探索与应用。","# bugbug\n\n[![Task Status](https:\u002F\u002Fcommunity-tc.services.mozilla.com\u002Fapi\u002Fgithub\u002Fv1\u002Frepository\u002Fmozilla\u002Fbugbug\u002Fmaster\u002Fbadge.svg)](https:\u002F\u002Fcommunity-tc.services.mozilla.com\u002Fapi\u002Fgithub\u002Fv1\u002Frepository\u002Fmozilla\u002Fbugbug\u002Fmaster\u002Flatest)\n[![codecov](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fmozilla\u002Fbugbug\u002Fbranch\u002Fmaster\u002Fgraph\u002Fbadge.svg)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fmozilla\u002Fbugbug)\n\u003Ca href=\"https:\u002F\u002Fchat.mozilla.org\u002F#\u002Froom\u002F#bugbug:mozilla.org\" target=\"_blank\">\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fchat%20on%20[m]-%23bugbug%3Amozilla.org-blue\">\n\u003C\u002Fa>\n\nBugbug aims at leveraging machine learning techniques to help with bug and quality management, and other software engineering tasks (such as test selection and defect prediction).\n\nChat with us in the [bugbug](https:\u002F\u002Fchat.mozilla.org\u002F#\u002Froom\u002F#bugbug:mozilla.org) Matrix room.\n\nMore information on the Mozilla Hacks blog:\n\n- https:\u002F\u002Fhacks.mozilla.org\u002F2020\u002F07\u002Ftesting-firefox-more-efficiently-with-machine-learning\u002F\n- https:\u002F\u002Fhacks.mozilla.org\u002F2019\u002F04\u002Fteaching-machines-to-triage-firefox-bugs\u002F\n\nData generated by BugBug to train the models can be used independently from BugBug. See the [docs](docs\u002Fdata.md) for details.\n\n## Classifiers\n\n- **assignee** - The aim of this classifier is to suggest an appropriate assignee for a bug.\n\n- **backout** - The aim of this classifier is to detect patches that might be more likely to be backed-out (because of build or test failures). It could be used for test prioritization\u002Fscheduling purposes.\n\n- **bugtype** - The aim of this classifier is to classify bugs according to their type. The labels are gathered automatically from bugs: right now they are \"crash\u002Fmemory\u002Fperformance\u002Fsecurity\". The plan is to add more types after manual labeling.\n\n- **component** - The aim of this classifier is to assign product\u002Fcomponent to (untriaged) bugs.\n\n- **defect vs enhancement vs task** - Extension of the **defect** classifier to detect differences also between feature requests and development tasks.\n\n- **defect** - Bugs on Bugzilla aren't always bugs. Sometimes they are feature requests, refactorings, and so on. The aim of this classifier is to distinguish between bugs that are actually bugs and bugs that aren't. The dataset currently contains 2110 bugs, the accuracy of the current classifier is ~93% (precision ~95%, recall ~94%).\n\n- **devdocneeded** - The aim of this classifier is to detect bugs that should be documented for developers.\n\n- [**needsdiagnosis**](https:\u002F\u002Fgithub.com\u002Fwebcompat\u002Fwebcompat.com\u002Fblob\u002Fmain\u002Fdocs\u002Fml-process.md) - The aim of this classifier is to detect issues that are likely invalid and don't need to be diagnosed for webcompat use case.\n\n- **qaneeded** - The aim of this classifier is to detect bugs that would need QA verification.\n\n- **regression vs non-regression** - Bugzilla has a `regression` keyword to identify bugs that are regressions. Unfortunately it isn't used consistently. The aim of this classifier is to detect bugs that are regressions.\n\n- **regressionrange** - The aim of this classifier is to detect regression bugs that have a regression range vs those that don't.\n\n- [**regressor**](docs\u002Fmodels\u002Fregressor.md) - The aim of this classifier is to detect patches which are more likely to cause regressions. It could be used to make riskier patches undergo more scrutiny.\n\n- **spam** - The aim of this classifier is to detect bugs which are spam.\n\n- **stepstoreproduce** - The aim of this classifier is to detect bugs that have steps to reproduce vs those that don't.\n\n- **testfailure** - The aim of this classifier is to detect patches that might be more likely to cause test failures.\n\n- **testselect** - The aim of this classifier is to select relevant tests to run for a given patch.\n\n- **tracking** - The aim of this classifier is to detect bugs to track.\n\n- **uplift** - The aim of this classifier is to detect bugs for which uplift should be approved and bugs for which uplift should not be approved.\n\n## Setup and Prerequisites\n\nInstall the Python dependencies:\n\n```\npip3 install -r requirements.txt\n```\n\nYou may also need `pip install -r test-requirements.txt`. Depending on the parts of bugbug you want to run, you might need to install dependencies from other requirement files (find them with `find . -name \"*requirements*\"`).\n\nCurrently, Python 3.10+ is required. You can double check the version we use by looking at setup.py.\n\nAlso, libgit2 (needs [v1.0.0](https:\u002F\u002Fgithub.com\u002Flibgit2\u002Flibgit2\u002Freleases\u002Ftag\u002Fv1.0.0), only in [experimental on Debian](https:\u002F\u002Fwiki.debian.org\u002FDebianExperimental)), **might** be required (if you can't install it, skip this step).\n\n```\nsudo apt-get -t experimental install libgit2-dev\n```\n\n### Auto-formatting\n\nThis project is using [pre-commit](https:\u002F\u002Fpre-commit.com\u002F). Please run `pre-commit install` to install the git pre-commit hooks on your clone.\n\nEvery time you will try to commit, pre-commit will run checks on your files to make sure they follow our style standards and they aren't affected by some simple issues. If the checks fail, pre-commit won't let you commit.\n\n## Usage\n\n### Training\n\nRun the `trainer.py` script with the command `python -m scripts.trainer` (with `--help` to see the required and optional arguments of the command) to perform training (warning this takes 30min+).\n\n### Testing\n\nTo use a model to classify a given bug, you can run `python -m scripts.bug_classifier MODEL_NAME --bug-id ID_OF_A_BUG_FROM_BUGZILLA`. N.B.: If you run the classifier script without training a model first, it will automatically download an already trained model.\n\n### Example for the \"defect\" model\n\n**training** To train the model for mode `defect`:\n\n    python3 -m scripts.trainer defect\n\n**testing** To use the model to classify a given bug, you can run `python -m scripts.bug_classifier defect --bug-id ID_OF_A_BUG_FROM_BUGZILLA`.\n\n### Training on Taskcluster (Mozilla's CI platform)\n\nYou could run the model training task on the CI. To do this, simply include `Train on Taskcluster: \u003Cmodel name>` in the pull request description.\n\n#### Example\n\nTo train the `spambug` model on Taskcluster, you need to add the following line in the pull request description, ideally at the bottom:\n\n```\nTrain on Taskcluster: spambug\n```\n\nThere are a few things to consider when training a model on Taskcluster:\n\n- This is currently only supported in GitHub pull requests.\n- The training task will be re-run every time you push to the branch linked to the pull request. Limiting the number of times you push is wise to avoid unnecessary training and resource wastage. Alternatively, you could temporarily remove the \"Train on Taskcluster\" keyword from the pull request description.\n- Currently, the training task extracts only the model's name and does not consider arguments.\n\n### Running the repository mining script\n\nNote: This section is only necessary if you want to perform changes to the repository mining script. Otherwise, you can simply use the commits data we generate automatically.\n\n1. Clone https:\u002F\u002Fhg.mozilla.org\u002Fmozilla-central\u002F.\n2. Run `.\u002Fmach vcs-setup` in the directory where you have cloned mozilla-central.\n3. Enable the extensions mentioned in [infra\u002Fhgrc](https:\u002F\u002Fgithub.com\u002Fmozilla\u002Fbugbug\u002Fblob\u002Fmaster\u002Finfra\u002Fhgrc). For example, if you are on Linux, you can add `firefoxtree` to the extensions section of the `~\u002F.hgrc` file as:\n   ```\n   firefoxtree = ~\u002F.mozbuild\u002Fversion-control-tools\u002Fhgext\u002Ffirefoxtree\n   ```\n4. Run the `repository.py` script, with the only argument being the path to the mozilla-central repository.\n\nNote: If you run into problems, it's possible the version of Mercurial you are using is not supported. Check the Docker definition at infra\u002Fdockerfile.commit_retrieval to see what we are using in production.\n\nNote: the script will take a long time to run (on my laptop more than 7 hours). If you want to test a simple change and you don't intend to actually mine the data, you can modify the repository.py script to limit the number of analyzed commits. Simply add `limit=1024` to the call to the `log` command.\n\n## Structure of the project\n\n- `bugbug\u002Flabels` contains manually collected labels;\n- `bugbug\u002Fdb.py` is an implementation of a really simple JSON database;\n- `bugbug\u002Fbugzilla.py` contains the functions to retrieve bugs from the Bugzilla tracking system;\n- `bugbug\u002Frepository.py` contains the functions to mine data from the mozilla-central (Firefox) repository;\n- `bugbug\u002Fbug_features.py` contains functions to extract features from bug\u002Fcommit data;\n- `bugbug\u002Fmodel.py` contains the base class that all models derive from;\n- `bugbug\u002Fmodels` contains implementations of specific models;\n- `bugbug\u002Fnn.py` contains utility functions to include Keras models into a scikit-learn pipeline;\n- `bugbug\u002Futils.py` contains misc utility functions;\n- `bugbug\u002Fnlp` contains utility functions for NLP;\n- `bugbug\u002Flabels.py` contains utility functions for handling labels;\n- `bugbug\u002Fbug_snapshot.py` contains a module to play back the history of a bug;\n- `bugbug\u002Fgithub.py` contains functions to retrieve issues from GitHub for a specified owner\u002Frepository.\n\n## Using bugbug for non-Mozilla projects\n\nBugbug is focussing on Mozilla use-cases for Firefox, Bugzilla and GitHub.\nHowever, we will be happy to accept pull requests adding support for other projects or bug trackers.\n","# bugbug\n\n[![任务状态](https:\u002F\u002Fcommunity-tc.services.mozilla.com\u002Fapi\u002Fgithub\u002Fv1\u002Frepository\u002Fmozilla\u002Fbugbug\u002Fmaster\u002Fbadge.svg)](https:\u002F\u002Fcommunity-tc.services.mozilla.com\u002Fapi\u002Fgithub\u002Fv1\u002Frepository\u002Fmozilla\u002Fbugbug\u002Fmaster\u002Flatest)\n[![codecov](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fmozilla\u002Fbugbug\u002Fbranch\u002Fmaster\u002Fgraph\u002Fbadge.svg)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fmozilla\u002Fbugbug)\n\u003Ca href=\"https:\u002F\u002Fchat.mozilla.org\u002F#\u002Froom\u002F#bugbug:mozilla.org\" target=\"_blank\">\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fchat%20on%20[m]-%23bugbug%3Amozilla.org-blue\">\n\u003C\u002Fa>\n\nBugbug 的目标是利用机器学习技术来辅助 Bug 和质量管理工作，以及其他软件工程任务（例如测试选择和缺陷预测）。\n\n欢迎在 [bugbug](https:\u002F\u002Fchat.mozilla.org\u002F#\u002Froom\u002F#bugbug:mozilla.org) Matrix 频道与我们交流。\n\n更多相关信息请参阅 Mozilla Hacks 博客：\n\n- https:\u002F\u002Fhacks.mozilla.org\u002F2020\u002F07\u002Ftesting-firefox-more-efficiently-with-machine-learning\u002F\n- https:\u002F\u002Fhacks.mozilla.org\u002F2019\u002F04\u002Fteaching-machines-to-triage-firefox-bugs\u002F\n\n由 BugBug 生成用于训练模型的数据可以独立于 BugBug 使用。详情请参阅 [文档](docs\u002Fdata.md)。\n\n## 分类器\n\n- **assignee** - 该分类器旨在为 Bug 建议合适的分配对象。\n\n- **backout** - 该分类器旨在检测可能更容易被回退的补丁（由于构建或测试失败）。可用于测试优先级排序或调度。\n\n- **bugtype** - 该分类器旨在根据 Bug 类型对其进行分类。标签是从 Bug 中自动收集的：目前包括“崩溃\u002F内存\u002F性能\u002F安全”。计划在人工标注后添加更多类型。\n\n- **component** - 该分类器旨在为未分类的 Bug 分配产品\u002F组件。\n\n- **defect vs enhancement vs task** - 这是 **defect** 分类器的扩展，用于区分功能请求和开发任务。\n\n- **defect** - Bugzilla 上的 Bug 并不总是真正的 Bug。有时它们是功能请求、重构等。该分类器旨在区分真正的 Bug 和非 Bug。当前数据集包含 2110 个 Bug，现有分类器的准确率约为 93%（精确率约 95%，召回率约 94%）。\n\n- **devdocneeded** - 该分类器旨在检测需要为开发者记录文档的 Bug。\n\n- [**needsdiagnosis**](https:\u002F\u002Fgithub.com\u002Fwebcompat\u002Fwebcompat.com\u002Fblob\u002Fmain\u002Fdocs\u002Fml-process.md) - 该分类器旨在检测很可能无效且无需诊断的问题，适用于 webcompat 场景。\n\n- **qaneeded** - 该分类器旨在检测需要 QA 验证的 Bug。\n\n- **regression vs non-regression** - Bugzilla 使用 `regression` 关键字来标识回归性 Bug。然而，这一标记并不一致。该分类器旨在检测回归性 Bug。\n\n- **regressionrange** - 该分类器旨在区分存在回归范围的回归性 Bug 和不存在回归范围的回归性 Bug。\n\n- [**regressor**](docs\u002Fmodels\u002Fregressor.md) - 该分类器旨在检测更有可能导致回归的补丁。可用于对风险较高的补丁进行更严格的审查。\n\n- **spam** - 该分类器旨在检测垃圾 Bug。\n\n- **stepstoreproduce** - 该分类器旨在检测具有重现步骤的 Bug 和不具有重现步骤的 Bug。\n\n- **testfailure** - 该分类器旨在检测可能更容易导致测试失败的补丁。\n\n- **testselect** - 该分类器旨在为给定补丁选择相关测试用例。\n\n- **tracking** - 该分类器旨在检测需要跟踪的 Bug。\n\n- **uplift** - 该分类器旨在检测应批准 uplift 的 Bug 和不应批准 uplift 的 Bug。\n\n## 设置与先决条件\n\n安装 Python 依赖项：\n\n```\npip3 install -r requirements.txt\n```\n\n您可能还需要运行 `pip install -r test-requirements.txt`。根据您希望运行的 BugBug 模块，可能还需要安装其他需求文件中的依赖项（可通过 `find . -name \"*requirements*\"` 查找）。\n\n目前需要 Python 3.10 或更高版本。您可以通过查看 setup.py 来确认我们使用的版本。\n\n此外，libgit2（需要 [v1.0.0](https:\u002F\u002Fgithub.com\u002Flibgit2\u002Flibgit2\u002Freleases\u002Ftag\u002Fv1.0.0)，仅在 Debian 的实验版中可用）**可能** 是必需的（如果无法安装，请跳过此步骤）。\n\n```\nsudo apt-get -t experimental install libgit2-dev\n```\n\n### 自动格式化\n\n该项目使用 [pre-commit](https:\u002F\u002Fpre-commit.com\u002F)。请运行 `pre-commit install` 将 Git 预提交钩子安装到您的克隆中。\n\n每次尝试提交时，pre-commit 都会检查您的文件，以确保其符合我们的代码风格标准，并且没有受到一些简单问题的影响。如果检查失败，pre-commit 将阻止您提交。\n\n## 使用方法\n\n### 训练\n\n运行 `trainer.py` 脚本，命令为 `python -m scripts.trainer`（使用 `--help` 可查看命令的必选和可选参数），以执行训练（注意，这需要 30 分钟以上）。\n\n### 测试\n\n要使用模型对特定 Bug 进行分类，可以运行 `python -m scripts.bug_classifier MODEL_NAME --bug-id ID_OF_A_BUG_FROM_BUGZILLA`。请注意：如果您在未先训练模型的情况下运行分类器脚本，它将自动下载一个已训练好的模型。\n\n### “defect” 模型示例\n\n**训练** 要训练 `defect` 模型：\n\n    python3 -m scripts.trainer defect\n\n**测试** 要使用该模型对特定 Bug 进行分类，可以运行 `python -m scripts.bug_classifier defect --bug-id ID_OF_A_BUG_FROM_BUGZILLA`。\n\n### 在 Taskcluster（Mozilla 的 CI 平台）上训练\n\n您可以在 CI 上运行模型训练任务。为此，只需在拉取请求描述中加入“在 Taskcluster 上训练：模型名称”。\n\n#### 示例\n\n要在 Taskcluster 上训练 `spambug` 模型，您需要在拉取请求描述中添加以下内容，最好放在底部：\n\n```\n在 Taskcluster 上训练：spambug\n```\n\n在 Taskcluster 上训练模型时需要注意以下几点：\n\n- 目前仅支持 GitHub 拉取请求。\n- 每次向与拉取请求关联的分支推送时，训练任务都会重新运行。为避免不必要的训练和资源浪费，建议限制推送次数。或者，您可以暂时从拉取请求描述中移除“在 Taskcluster 上训练”关键词。\n- 目前，训练任务仅提取模型名称，不会考虑其他参数。\n\n### 运行仓库挖掘脚本\n\n注意：本节仅在您希望对仓库挖掘脚本进行修改时才需要。否则，您可以直接使用我们自动生成的提交数据。\n\n1. 克隆 https:\u002F\u002Fhg.mozilla.org\u002Fmozilla-central\u002F。\n2. 在克隆 mozilla-central 的目录中运行 `.\u002Fmach vcs-setup`。\n3. 启用 [infra\u002Fhgrc](https:\u002F\u002Fgithub.com\u002Fmozilla\u002Fbugbug\u002Fblob\u002Fmaster\u002Finfra\u002Fhgrc) 中提到的扩展。例如，在 Linux 系统上，您可以将 `firefoxtree` 添加到 `~\u002F.hgrc` 文件的 `extensions` 部分，如下所示：\n   ```\n   firefoxtree = ~\u002F.mozbuild\u002Fversion-control-tools\u002Fhgext\u002Ffirefoxtree\n   ```\n4. 运行 `repository.py` 脚本，其唯一参数为 mozilla-central 仓库的路径。\n\n注意：如果遇到问题，可能是您使用的 Mercurial 版本不受支持。请查看 infra\u002Fdockerfile.commit_retrieval 中的 Docker 定义，以了解我们在生产环境中使用的版本。\n\n注意：该脚本运行时间较长（在我的笔记本电脑上超过 7 小时）。如果您只想测试简单的更改而无需实际挖掘数据，可以修改 repository.py 脚本以限制分析的提交数量。只需在调用 `log` 命令时添加 `limit=1024` 即可。\n\n## 项目结构\n\n- `bugbug\u002Flabels` 包含手动收集的标签；\n- `bugbug\u002Fdb.py` 是一个非常简单的 JSON 数据库实现；\n- `bugbug\u002Fbugzilla.py` 包含从 Bugzilla 问题跟踪系统获取问题的函数；\n- `bugbug\u002Frepository.py` 包含从 mozilla-central（Firefox）仓库中挖掘数据的函数；\n- `bugbug\u002Fbug_features.py` 包含从问题和提交数据中提取特征的函数；\n- `bugbug\u002Fmodel.py` 包含所有模型所继承的基类；\n- `bugbug\u002Fmodels` 包含具体模型的实现；\n- `bugbug\u002Fnn.py` 包含将 Keras 模型集成到 scikit-learn 流程中的实用函数；\n- `bugbug\u002Futils.py` 包含各种实用函数；\n- `bugbug\u002Fnlp` 包含自然语言处理相关的实用函数；\n- `bugbug\u002Flabels.py` 包含标签处理的实用函数；\n- `bugbug\u002Fbug_snapshot.py` 包含用于回放问题历史记录的模块；\n- `bugbug\u002Fgithub.py` 包含从 GitHub 获取指定所有者\u002F仓库下问题的函数。\n\n## 将 bugbug 用于非 Mozilla 项目\n\nBugbug 目前专注于 Firefox、Bugzilla 和 GitHub 的 Mozilla 使用场景。不过，我们非常欢迎针对其他项目或问题跟踪系统的支持请求。","# Bugbug 快速上手指南\n\nBugbug 是一个利用机器学习技术辅助缺陷管理、质量保障及软件工程任务（如测试选择、缺陷预测）的开源工具，由 Mozilla 开发。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux\u002FmacOS\u002FWindows (推荐 Linux 环境)\n*   **Python 版本**：Python 3.10 或更高版本\n*   **可选依赖**：`libgit2` (版本需 >= 1.0.0)。\n    *   如果您在 Debian\u002FUbuntu 系统上且需要完整功能，可尝试安装：\n        ```bash\n        sudo apt-get -t experimental install libgit2-dev\n        ```\n    *   *注：若无法安装此库，可跳过该步骤，部分功能可能受限但不影响核心运行。*\n\n## 安装步骤\n\n1.  **克隆项目代码**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fmozilla\u002Fbugbug.git\n    cd bugbug\n    ```\n\n2.  **安装 Python 依赖**\n    建议先创建虚拟环境，然后安装核心依赖：\n    ```bash\n    pip3 install -r requirements.txt\n    ```\n    如果需要运行测试或特定模块，可能还需要安装：\n    ```bash\n    pip3 install -r test-requirements.txt\n    ```\n    *(国内用户如遇下载缓慢，可添加 `-i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple` 参数使用清华镜像源)*\n\n3.  **配置代码规范钩子（可选但推荐）**\n    项目使用 `pre-commit` 进行自动格式化检查：\n    ```bash\n    pre-commit install\n    ```\n\n## 基本使用\n\nBugbug 的核心工作流分为**模型训练**和**模型推理**。为了方便新手，运行推理脚本时若本地无模型，工具会自动下载预训练好的模型。\n\n### 1. 训练模型 (Training)\n\n以 `defect`（缺陷分类）模型为例，运行以下命令开始训练。\n*注意：首次训练可能需要 30 分钟以上，具体取决于数据量和硬件性能。*\n\n```bash\npython3 -m scripts.trainer defect\n```\n\n### 2. 使用模型进行分类 (Testing)\n\n训练完成后（或直接运行以触发自动下载），您可以对 Bugzilla 上的具体缺陷 ID 进行分类预测。\n\n**命令格式：**\n```bash\npython -m scripts.bug_classifier \u003C模型名称> --bug-id \u003CBugzilla 缺陷 ID>\n```\n\n**示例：**\n使用 `defect` 模型判断 ID 为 `123456` 的缺陷是否为真实 Bug：\n```bash\npython -m scripts.bug_classifier defect --bug-id 123456\n```\n\n### 3. 其他可用模型\n\n除了 `defect`，您还可以替换 `\u003C模型名称>` 来使用其他分类器，例如：\n*   `assignee`: 推荐缺陷指派负责人\n*   `component`: 自动分配产品\u002F组件\n*   `spam`: 识别垃圾缺陷报告\n*   `regression`: 识别回归缺陷\n*   `testselect`: 为补丁选择相关的测试用例\n\n只需将上述命令中的 `defect` 替换为对应的模型名称即可。","某大型开源浏览器项目的测试团队每天需处理数百个新提交的代码补丁和缺陷报告，面临巨大的人工筛选压力。\n\n### 没有 bugbug 时\n- **缺陷分类低效**：测试经理需人工阅读每条报告，耗时区分真正的“代码缺陷”与“功能请求”或“垃圾信息”，导致严重阻塞。\n- **回归风险难测**：难以快速识别哪些补丁可能引发回归测试失败，往往只能全量运行测试套件，浪费大量计算资源和时间。\n- **责任分配模糊**：新报出的缺陷无法自动匹配到正确的负责模块或开发人员，需要在多个团队间反复流转确认。\n- **验证优先级混乱**：缺乏数据支撑来判断哪些缺陷急需 QA 验证，重要的高危漏洞常被淹没在普通工单中。\n\n### 使用 bugbug 后\n- **智能自动分类**：利用 `defect` 和 `spam` 分类器，系统自动过滤掉非缺陷报告和垃圾信息，准确率达 93% 以上，让团队专注真实问题。\n- **精准风险预警**：通过 `regressor` 和 `testfailure` 模型预测高风险补丁，仅针对这些变更运行深度测试，大幅缩短反馈周期。\n- **自动路由分派**：`assignee` 和 `component` 分类器根据历史数据将新缺陷直接指派给最合适的开发者和模块，消除流转延迟。\n- **动态优先级排序**：借助 `qaneeded` 和 `bugtype` 模型，自动标记出需要紧急验证的崩溃或安全类缺陷，确保资源投向最关键处。\n\nbugbug 将机器学习深度融入软件工程流程，把原本依赖经验的人工决策转化为高效、精准的自动化闭环。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmozilla_bugbug_e7579a5e.png","mozilla","Mozilla","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fmozilla_4f54b559.png","This technology could fall into the right hands.",null,"https:\u002F\u002Fwiki.mozilla.org\u002FGithub","https:\u002F\u002Fgithub.com\u002Fmozilla",[83,87,91,95,99,103,107,110],{"name":84,"color":85,"percentage":86},"Python","#3572A5",91.2,{"name":88,"color":89,"percentage":90},"JavaScript","#f1e05a",5.5,{"name":92,"color":93,"percentage":94},"HTML","#e34c26",1.3,{"name":96,"color":97,"percentage":98},"Jupyter Notebook","#DA5B0B",1,{"name":100,"color":101,"percentage":102},"CSS","#663399",0.5,{"name":104,"color":105,"percentage":106},"Dockerfile","#384d54",0.2,{"name":108,"color":109,"percentage":106},"Shell","#89e051",{"name":111,"color":112,"percentage":113},"Mako","#7e858d",0,559,329,"2026-04-01T23:06:52","MPL-2.0","Linux","未说明","未说明（仓库挖掘脚本运行耗时极长，建议配备充足内存）",{"notes":122,"python":123,"dependencies":124},"1. 主要面向 Mozilla\u002FFirefox 项目，需连接 Bugzilla 和 mozilla-central 仓库。2. 可选依赖 libgit2 (v1.0.0) 仅在部分功能需要，若无法安装可跳过。3. 训练模型耗时较长（30 分钟以上），挖掘仓库数据可能超过 7 小时。4. 支持通过 Taskcluster (Mozilla CI) 进行分布式训练。5. 若需自行挖掘仓库数据，需配置 Mercurial 及特定扩展（如 firefoxtree）。","3.10+",[125,126,127],"requirements.txt 中定义的依赖","test-requirements.txt (可选)","libgit2-dev (v1.0.0+, 可选)",[26,15,14,13],[130,131,132,133,134,135,136],"machine-learning","software-engineering","ai","developer-tools","llm","ml","python","2026-03-27T02:49:30.150509","2026-04-06T05:19:26.170729",[140,145,150,155,160,165,170],{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},10899,"如何判断 Mercurial 提交中哪个文件是副本，哪个是原始文件？","可以通过运行 `hg export -r COMMIT_HASH` 命令来检查。在输出结果中，被重命名的文件即为原始文件，而新出现的文件名则是副本。例如：`hg export -r 61b168663a54d4b5da46de34d318da43cc10a71c`。","https:\u002F\u002Fgithub.com\u002Fmozilla\u002Fbugbug\u002Fissues\u002F113",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},10900,"如何验证某个用户是否是开发者（即是否有提交记录）？","可以使用 `hg log --user '用户名'` 命令查询该用户的提交记录。如果需要提取具体的作者名和邮箱以进行匹配，可以使用模板命令：`hg log --user '用户名' -T '{author|user}\\n{author|email}'`。如果查询到有返回结果，则说明该用户是开发者。","https:\u002F\u002Fgithub.com\u002Fmozilla\u002Fbugbug\u002Fissues\u002F83",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},10901,"在计算“作者经验”（author experience）特征时，时间范围是从何时开始计算的？","时间范围是从提交发生的那一天（the day the commit was made）开始向前推算，而不是从当前时间（datetime.now()）计算。例如，如果要计算过去 90 天的经验，是指从提交日往前推 90 天内的提交数量。","https:\u002F\u002Fgithub.com\u002Fmozilla\u002Fbugbug\u002Fissues\u002F105",{"id":156,"question_zh":157,"answer_zh":158,"source_url":159},10902,"在进行文本清理时，是否应该移除堆栈跟踪（stack traces）或引用文本？","不建议专门针对堆栈跟踪进行清理规则设定。因为 Bugzilla 评论中的文本是自由的，堆栈跟踪并不总是以 `>` 开头，有些人会用，有些人不会。强行基于 `>` 移除可能会误删代码片段或其他重要信息。因此，目前策略是忽略对堆栈跟踪的特殊清理，仅处理明确的 URL、文件引用和崩溃统计链接等。","https:\u002F\u002Fgithub.com\u002Fmozilla\u002Fbugbug\u002Fissues\u002F9",{"id":161,"question_zh":162,"answer_zh":163,"source_url":164},10903,"如何优化 HTTP 服务中分类提交的执行速度？","可以通过周期性执行某些耗时步骤来优化，因为数据在短时间内变化不大。具体建议包括：\n1. `Downloading the component mapping`（下载组件映射）和 `Computing the experiences`（计算经验，加载 pickle 文件）可以每天只执行一次。\n2. `Getting the log of commits`（获取提交日志）可以通过避免使用多进程来加速，因为在分析少量补丁时，多进程的启动开销过大。\n3. 避免在不需要时重复 dump 数据（save=False 已处理）。","https:\u002F\u002Fgithub.com\u002Fmozilla\u002Fbugbug\u002Fissues\u002F632",{"id":166,"question_zh":167,"answer_zh":168,"source_url":169},10904,"在编写组件模型（component model）的测试时，遇到断言错误提示没有匹配的 Bug 怎么办？","这通常是因为测试使用的固定数据库（fixture DB）太小，无法覆盖模型当前的检查逻辑。解决方法有两种：一是大幅增加 fixture DB 的数据量以包含所需的 Bug 案例；二是暂时减少组件模型中执行的检查项，以便在较小的数据集上通过测试。","https:\u002F\u002Fgithub.com\u002Fmozilla\u002Fbugbug\u002Fissues\u002F433",{"id":171,"question_zh":172,"answer_zh":173,"source_url":174},10905,"将 Docker 基础镜像切换为 Python Alpine 版本有什么影响？","切换到 Python Alpine 镜像后，构建产生的镜像大小差异并不大。在这种情况下，优先考虑的因素应该是构建速度。如果 Alpine 能显著加快 Docker 构建任务的时间，那么即使文件大小缩减不明显，也是值得推荐的方案。","https:\u002F\u002Fgithub.com\u002Fmozilla\u002Fbugbug\u002Fissues\u002F292",[]]