[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-spotify--basic-pitch":3,"tool-spotify--basic-pitch":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":79,"owner_website":81,"owner_url":82,"languages":83,"stars":92,"forks":93,"last_commit_at":94,"license":95,"difficulty_score":96,"env_os":97,"env_gpu":98,"env_ram":98,"env_deps":99,"category_tags":108,"github_topics":109,"view_count":120,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":121,"updated_at":122,"faqs":123,"releases":153},680,"spotify\u002Fbasic-pitch","basic-pitch","A lightweight yet powerful audio-to-MIDI converter with pitch bend detection","basic-pitch 是一款由 Spotify 音频智能实验室推出的开源音频转 MIDI 转换库。它能够将音频文件自动转录为 MIDI 格式，并完整保留弯音等演奏细节。对于想要快速将录音转化为乐谱数据，或者需要将音频素材导入数字音频工作站的用户来说，这是一个高效的解决方案。\n\n不同于以往需要庞大计算资源的音乐转录系统，basic-pitch 基于轻量级神经网络构建，在保持高精度的同时显著降低了资源消耗。它支持多音高检测，能够识别多种乐器的复调音乐，且不受特定乐器类型的限制。在技术实现上，basic-pitch 非常灵活，默认会根据运行环境（macOS、Windows 或 Linux）自动选择最优的推理后端，如 CoreML、TensorFlowLite 或 ONNX，无需用户手动配置复杂的依赖。\n\nbasic-pitch 非常适合开发者将其集成到 Python 项目中，也方便研究人员进行学术验证。普通音乐爱好者则可以通过其官方演示网站直接体验效果。通过简单的 pip 命令即可完成安装，跨平台兼容性良好，是音乐科技领域一款实用且强大的开源方案。","![Basic Pitch Logo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspotify_basic-pitch_readme_07a6382036cb.png)\n\n\n\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-blue.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0)\n![PyPI - Python Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fbasic-pitch)\n![Supported Platforms](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fplatforms-macOS%20%7C%20Windows%20%7C%20Linux-green)\n\n\nBasic Pitch is a Python library for Automatic Music Transcription (AMT), using lightweight neural network developed by [Spotify's Audio Intelligence Lab](https:\u002F\u002Fresearch.atspotify.com\u002Faudio-intelligence\u002F). It's small, easy-to-use, `pip install`-able and `npm install`-able via its [sibling repo](https:\u002F\u002Fgithub.com\u002Fspotify\u002Fbasic-pitch-ts).\n\nBasic Pitch may be simple, but it's is far from \"basic\"! `basic-pitch` is efficient and easy to use, and its multipitch support, its ability to generalize across instruments, and its note accuracy competes with much larger and more resource-hungry AMT systems.\n\nProvide a compatible audio file and basic-pitch will generate a MIDI file, complete with pitch bends. Basic pitch is instrument-agnostic and supports polyphonic instruments, so you can freely enjoy transcription of all your favorite music, no matter what instrument is used.  Basic pitch works best on one instrument at a time.\n\n### Research Paper\nThis library was released in conjunction with Spotify's publication at [ICASSP 2022](https:\u002F\u002F2022.ieeeicassp.org\u002F). You can read more about this research in the paper, [A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.09893).\n\nIf you use this library in academic research, consider citing it:\n```bibtex\n@inproceedings{2022_BittnerBRME_LightweightNoteTranscription_ICASSP,\n  author= {Bittner, Rachel M. and Bosch, Juan Jos\\'e and Rubinstein, David and Meseguer-Brocal, Gabriel and Ewert, Sebastian},\n  title= {A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation},\n  booktitle= {Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},\n  address= {Singapore},\n  year= 2022,\n}\n```\n\n**Note that we have improved Basic Pitch beyond what was presented in this paper. Therefore, if you use the output of Basic Pitch in academic research,\nwe recommend that you cite the version of the code that was used.**\n\n### Demo\nIf, for whatever reason, you're not yet completely inspired, or you're just like so totally over the general vibe and stuff, checkout our snappy demo website, [basicpitch.io](https:\u002F\u002Fbasicpitch.io), to experiment with our model on whatever music audio you provide!\n\n\n## Installation\n\n`basic-pitch` is available via PyPI. To install the current release:\n\n    pip install basic-pitch\n\nTo update Basic Pitch to the latest version, add `--upgrade` to the above command.\n\n#### Compatible Environments:\n- MacOS, Windows and Ubuntu operating systems\n- Python versions 3.7, 3.8, 3.9, 3.10, 3.11\n- **For Mac M1 hardware, we currently only support python version 3.10. Otherwise, we suggest using a virtual machine.**\n\n\n### Model Runtime\n\nBasic Pitch comes with the original TensorFlow model and the TensorFlow model converted to [CoreML](https:\u002F\u002Fdeveloper.apple.com\u002Fdocumentation\u002Fcoreml), [TensorFlowLite](https:\u002F\u002Fwww.tensorflow.org\u002Flite), and [ONNX](https:\u002F\u002Fonnx.ai\u002F). By default, Basic Pitch will _not_ install TensorFlow as a dependency *unless you are using Python>=3.11*. Instead, by default, CoreML will be installed on MacOS, TensorFlowLite will be installed on Linux and ONNX will be installed on Windows. If you want to install TensorFlow along with the default model inference runtime, you can install TensorFlow via `pip install basic-pitch[tf]`.\n\n## Usage\n\n### Model Prediction\n\n### Model Runtime\n\nBy default, Basic Pitch will attempt to load a model in the following order:\n\n1. TensorFlow\n2. CoreML\n3. TensorFlowLite\n4. ONNX\n\nAdditionally, the module variable ICASSP_2022_MODEL_PATH will default to the first available version in the list.\n\nWe will explain how to override this priority list below. Because all other model serializations were converted from TensorFlow, we recommend using TensorFlow when possible. N.B. Basic Pitch does not install TensorFlow by default to save the user time when installing and running Basic Pitch.\n\n#### Command Line Tool\n\nThis library offers a command line tool interface. A basic prediction command will generate and save a MIDI file transcription of audio at the `\u003Cinput-audio-path>` to the `\u003Coutput-directory>`:\n\n```bash\nbasic-pitch \u003Coutput-directory> \u003Cinput-audio-path>\n```\n\nFor example: \n```\nbasic-pitch \u002Foutput\u002Fdirectory\u002Fpath \u002Finput\u002Faudio\u002Fpath\n```\n\nTo process more than one audio file at a time:\n\n```bash\nbasic-pitch \u003Coutput-directory> \u003Cinput-audio-path-1> \u003Cinput-audio-path-2> \u003Cinput-audio-path-3>\n```\n\nOptionally, you may append any of the following flags to your prediction command to save additional formats of the prediction output to the `\u003Coutput-directory>`:\n\n- `--sonify-midi` to additionally save a `.wav` audio rendering of the MIDI file.\n- `--save-model-outputs` to additionally save raw model outputs as an NPZ file.\n- `--save-note-events` to additionally save the predicted note events as a CSV file.\n\nIf you want to use a non-default model type (e.g., use CoreML instead of TF), use the `--model-serialization` argument. The CLI will change the loaded model to the type you prefer.\n\nTo discover more parameter control, run:\n```bash\nbasic-pitch --help\n```\n\n#### Programmatic\n\n**predict()**\n\nImport `basic-pitch` into your own Python code and run the [`predict`](basic_pitch\u002Finference.py) functions directly, providing an `\u003Cinput-audio-path>` and returning the model's prediction results:\n\n```python\nfrom basic_pitch.inference import predict\nfrom basic_pitch import ICASSP_2022_MODEL_PATH\n\nmodel_output, midi_data, note_events = predict(\u003Cinput-audio-path>)\n```\n\n- `\u003Cminimum-frequency>` & `\u003Cmaximum-frequency>` (*float*s) set the maximum and minimum allowed note frequency, in Hz, returned by the model. Pitch events with frequencies outside of this range will be excluded from the prediction results.\n- `model_output` is the raw model inference output\n- `midi_data` is the transcribed MIDI data derived from the `model_output`\n- `note_events` is a list of note events derived from the `model_output`\n\nNote: As mentioned previously, ICASSP_2022_MODEL_PATH will default to the runtime first supported in the list TensorFlow, CoreML, TensorFlowLite, ONNX.\n\n**predict() in a loop**\n\nTo run prediction within a loop, you'll want to load the model yourself and provide `predict()` with the loaded model object itself to be used for repeated prediction calls, in order to avoid redundant and sluggish model loading.\n\n```python\nimport tensorflow as tf\n\nfrom basic_pitch.inference import predict, Model\nfrom basic_pitch import ICASSP_2022_MODEL_PATH\n\nbasic_pitch_model = Model(ICASSP_2022_MODEL_PATH))\n\nfor x in range():\n    ...\n    model_output, midi_data, note_events = predict(\n        \u003Cloop-x-input-audio-path>,\n        basic_pitch_model,\n    )\n    ...\n```\n\n**predict_and_save()**\n\nIf you would like `basic-pitch` orchestrate the generation and saving of our various supported output file types, you may use [`predict_and_save`](basic_pitch\u002Finference.py) instead of using [`predict`](basic_pitch\u002Finference.py) directly:\n\n```python\nfrom basic_pitch.inference import predict_and_save\n\npredict_and_save(\n    \u003Cinput-audio-path-list>,\n    \u003Coutput-directory>,\n    \u003Csave-midi>,\n    \u003Csonify-midi>,\n    \u003Csave-model-outputs>,\n    \u003Csave-notes>,\n    \u003Cmodel-path>\n)\n```\n\nwhere:\n   - `\u003Cinput-audio-path-list>` & `\u003Coutput-directory>`\n        - directory paths for `basic-pitch` to read from\u002Fwrite to.\n   - `\u003Csave-midi>`\n        - *bool* to control generating and saving a MIDI file to the `\u003Coutput-directory>`\n   - `\u003Csonify-midi>`\n        - *bool* to control saving a WAV audio rendering of the MIDI file to the `\u003Coutput-directory>`\n   - `\u003Csave-model-outputs>`\n        - *bool* to control saving the raw model output as a NPZ file to the `\u003Coutput-directory>`\n   - `\u003Csave-notes>`\n        - *bool* to control saving predicted note events as a CSV file `\u003Coutput-directory>`\n   - `\u003Cmodel-path>`\n        - *str* or *pathlib.Path* local path from where to load the model, can eg: use the path obtained with `from basic_pitch import ICASSP_2022_MODEL_PATH`\n\n\n\n\n### Model Input\n\n**Supported Audio Codecs**\n\n   `basic-pitch` accepts all sound files that are compatible with its version of [`librosa`](https:\u002F\u002Flibrosa.org\u002Fdoc\u002Flatest\u002Findex.html), including:\n\n- `.mp3`\n- `.ogg`\n- `.wav`\n- `.flac`\n- `.m4a`\n\n**Mono Channel Audio Only**\n\nWhile you may use stereo audio as an input to our model, at prediction time, the channels of the input will be down-mixed to mono, and then analyzed and transcribed.\n\n**File Size\u002FAudio Length**\n\nThis model can process any size or length of audio, but processing of larger\u002Flonger audio files could be limited by your machine's available disk space. To process these files, we recommend streaming the audio of the file, processing windows of audio at a time.\n\n**Sample Rate**\n\nInput audio maybe be of any sample rate, however, all audio will be resampled to 22050 Hz before processing.\n\n### VST\n\nThanks to DamRsn for developing this working VST version of basic-pitch! - https:\u002F\u002Fgithub.com\u002FDamRsn\u002FNeuralNote\n\n\n## Contributing\n\nContributions to `basic-pitch` are welcomed! See [CONTRIBUTING.md](CONTRIBUTING.md) for details.\n\n## Copyright and License\n`basic-pitch` is Copyright 2022 Spotify AB.\n\nThis software is licensed under the Apache License, Version 2.0 (the \"Apache License\"). You may choose either license to govern your use of this software only upon the condition that you accept all of the terms of either the Apache License.\n\nYou may obtain a copy of the Apache License at:\n\nhttp:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0\n\n\nUnless required by applicable law or agreed to in writing, software distributed under the Apache License or the GPL License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the Apache License for the specific language governing permissions and limitations under the Apache License.\n\n","![Basic Pitch Logo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspotify_basic-pitch_readme_07a6382036cb.png)\n\n\n\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-blue.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0)\n![PyPI - Python Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fbasic-pitch)\n![Supported Platforms](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fplatforms-macOS%20%7C%20Windows%20%7C%20Linux-green)\n\n\nBasic Pitch 是一个用于自动音乐转录（Automatic Music Transcription，简称 AMT）的 Python 库，采用由 [Spotify 音频智能实验室](https:\u002F\u002Fresearch.atspotify.com\u002Faudio-intelligence\u002F) 开发的轻量级神经网络。它小巧、易用，可以通过其 [兄弟仓库](https:\u002F\u002Fgithub.com\u002Fspotify\u002Fbasic-pitch-ts) 使用 `pip install` 或 `npm install` 进行安装。\n\nBasic Pitch 或许看似简单，但它绝非“基础”！`basic-pitch` 高效且易于使用，其多音高（multipitch）支持、跨乐器的泛化能力以及音符准确性可与更大、更消耗资源的 AMT 系统相媲美。\n\n提供兼容的音频文件，basic-pitch 将生成一个包含弯音（pitch bends）的 MIDI 文件。Basic Pitch 是乐器无关的（instrument-agnostic），支持复调（polyphonic）乐器，因此无论使用何种乐器，您都可以自由享受所有喜爱音乐的转录。Basic Pitch 在每次处理一种乐器时效果最佳。\n\n### Research Paper\n本库随 Spotify 在 [ICASSP 2022](https:\u002F\u002F2022.ieeeicassp.org\u002F) 上的发布一同推出。您可以在论文中阅读更多关于此研究的信息：[A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.09893)。\n\n如果您在学术研究中使用此库，请考虑引用它：\n```bibtex\n@inproceedings{2022_BittnerBRME_LightweightNoteTranscription_ICASSP,\n  author= {Bittner, Rachel M. and Bosch, Juan Jos\\'e and Rubinstein, David and Meseguer-Brocal, Gabriel and Ewert, Sebastian},\n  title= {A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation},\n  booktitle= {Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},\n  address= {Singapore},\n  year= 2022,\n}\n```\n\n**请注意，我们已对 Basic Pitch 进行了改进，超出了本文档中的展示范围。因此，如果您在学术研究中使用 Basic Pitch 的输出结果，我们建议您引用所使用的代码版本。**\n\n### Demo\n如果您出于任何原因尚未完全获得灵感，或者只是想体验一下整体效果，不妨查看我们轻快的演示网站 [basicpitch.io](https:\u002F\u002Fbasicpitch.io)，在您提供的任何音乐音频上体验我们的模型！\n\n\n## Installation\n\n`basic-pitch` 可通过 PyPI 获取。要安装当前版本：\n\n    pip install basic-pitch\n\n要将 Basic Pitch 更新到最新版本，请在上述命令中添加 `--upgrade`。\n\n#### Compatible Environments:\n- MacOS、Windows 和 Ubuntu 操作系统\n- Python 版本 3.7, 3.8, 3.9, 3.10, 3.11\n- **对于 Mac M1 硬件，我们目前仅支持 Python 版本 3.10。否则，建议使用虚拟机。**\n\n\n### Model Runtime\n\nBasic Pitch 附带原始 TensorFlow 模型，以及转换为 [CoreML](https:\u002F\u002Fdeveloper.apple.com\u002Fdocumentation\u002Fcoreml)、[TensorFlowLite](https:\u002F\u002Fwww.tensorflow.org\u002Flite) 和 [ONNX](https:\u002F\u002Fonnx.ai\u002F) 的 TensorFlow 模型。默认情况下，除非您使用 Python>=3.11，否则 Basic Pitch 不会将 TensorFlow 作为依赖项安装。相反，默认情况下，MacOS 上将安装 CoreML，Linux 上将安装 TensorFlowLite，Windows 上将安装 ONNX。如果您希望与默认模型推理运行时一起安装 TensorFlow，可以通过 `pip install basic-pitch[tf]` 安装 TensorFlow。\n\n## Usage\n\n### Model Prediction\n\n### 模型运行时\n\n默认情况下，Basic Pitch 将按以下顺序尝试加载模型：\n\n1. TensorFlow（一种深度学习框架）\n2. CoreML（苹果机器学习框架）\n3. TensorFlowLite（轻量级移动和嵌入式机器学习框架）\n4. ONNX（开放神经网络交换格式）\n\n此外，模块变量 `ICASSP_2022_MODEL_PATH` 将默认为列表中第一个可用的版本。\n\n我们将在此处解释如何覆盖此优先级列表。由于所有其他模型序列化格式均是从 TensorFlow（一种深度学习框架）转换而来的，我们建议在可能的情况下使用 TensorFlow。注意：Basic Pitch 默认不安装 TensorFlow，以节省用户安装和运行 Basic Pitch 的时间。\n\n#### 命令行工具\n\n本库提供了一个命令行工具接口。一条基本的预测命令将在 `\u003Coutput-directory>` 生成并保存位于 `\u003Cinput-audio-path>` 的音频的 MIDI（乐器数字接口）文件转录结果：\n\n```bash\nbasic-pitch \u003Coutput-directory> \u003Cinput-audio-path>\n```\n\n例如： \n```\nbasic-pitch \u002Foutput\u002Fdirectory\u002Fpath \u002Finput\u002Faudio\u002Fpath\n```\n\n要同时处理多个音频文件：\n\n```bash\nbasic-pitch \u003Coutput-directory> \u003Cinput-audio-path-1> \u003Cinput-audio-path-2> \u003Cinput-audio-path-3>\n```\n\n可选地，您可以在预测命令后附加以下任意标志，以便将预测输出的其他格式保存到 `\u003Coutput-directory>`：\n\n- `--sonify-midi` 另外保存 MIDI 文件的 `.wav`（波形音频文件格式）音频渲染。\n- `--save-model-outputs` 另外保存原始模型输出为 NPZ（NumPy 压缩文件格式）文件。\n- `--save-note-events` 另外保存预测的音符事件为 CSV（逗号分隔值文件）文件。\n\n如果您想使用非默认模型类型（例如，使用 CoreML 而不是 TF），请使用 `--model-serialization` 参数。CLI（命令行界面）将更改加载的模型为您首选的类型。\n\n若要发现更多参数控制，请运行：\n```bash\nbasic-pitch --help\n```\n\n#### 编程方式\n\n**predict()**\n\n将 `basic-pitch` 导入您自己的 Python（编程语言）代码中，并直接运行 [`predict`](basic_pitch\u002Finference.py) 函数，提供 `\u003Cinput-audio-path>` 并返回模型的预测结果：\n\n```python\nfrom basic_pitch.inference import predict\nfrom basic_pitch import ICASSP_2022_MODEL_PATH\n\nmodel_output, midi_data, note_events = predict(\u003Cinput-audio-path>)\n```\n\n- `\u003Cminimum-frequency>` & `\u003Cmaximum-frequency>` (*float*s) 设置模型返回的最大和最小允许音符频率，单位为 Hz。超出此范围的音高事件将从预测结果中排除。\n- `model_output` 是原始模型推理输出\n- `midi_data` 是由 `model_output` 派生的转录 MIDI 数据\n- `note_events` 是由 `model_output` 派生的音符事件列表\n\n注意：如前所述，ICASSP_2022_MODEL_PATH 将默认为列表中首先支持运行的 TensorFlow、CoreML、TensorFlowLite、ONNX。\n\n**predict() 在循环中**\n\n要在循环中运行预测，您需要自己加载模型，并向 `predict()` 提供加载的模型对象本身以供重复预测调用使用，以避免冗余且缓慢的模型加载。\n\n```python\nimport tensorflow as tf\n\nfrom basic_pitch.inference import predict, Model\nfrom basic_pitch import ICASSP_2022_MODEL_PATH\n\nbasic_pitch_model = Model(ICASSP_2022_MODEL_PATH))\n\nfor x in range():\n    ...\n    model_output, midi_data, note_events = predict(\n        \u003Cloop-x-input-audio-path>,\n        basic_pitch_model,\n    )\n    ...\n```\n\n**predict_and_save()**\n\n如果您希望 `basic-pitch` 编排生成和保存我们各种支持的输出文件类型，您可以使用 [`predict_and_save`](basic_pitch\u002Finference.py) 代替直接使用 [`predict`](basic_pitch\u002Finference.py)：\n\n```python\nfrom basic_pitch.inference import predict_and_save\n\npredict_and_save(\n    \u003Cinput-audio-path-list>,\n    \u003Coutput-directory>,\n    \u003Csave-midi>,\n    \u003Csonify-midi>,\n    \u003Csave-model-outputs>,\n    \u003Csave-notes>,\n    \u003Cmodel-path>\n)\n```\n\n其中：\n   - `\u003Cinput-audio-path-list>` & `\u003Coutput-directory>`\n        - `basic-pitch` 用于读取\u002F写入的目录路径。\n   - `\u003Csave-midi>`\n        - *bool* 控制生成并将 MIDI 文件保存到 `\u003Coutput-directory>`\n   - `\u003Csonify-midi>`\n        - *bool* 控制将 MIDI 文件的 WAV（波形音频文件格式）音频渲染保存到 `\u003Coutput-directory>`\n   - `\u003Csave-model-outputs>`\n        - *bool* 控制将原始模型输出保存为 NPZ（NumPy 压缩文件格式）文件到 `\u003Coutput-directory>`\n   - `\u003Csave-notes>`\n        - *bool* 控制将预测的音符事件保存为 CSV（逗号分隔值文件）到 `\u003Coutput-directory>`\n   - `\u003Cmodel-path>`\n        - *str* 或 *pathlib.Path*（Python 路径对象类）本地路径，从中加载模型，例如：可以使用 `from basic_pitch import ICASSP_2022_MODEL_PATH` 获得的路径\n\n\n\n\n### 模型输入\n\n**支持的音频编解码器**\n\n   `basic-pitch` 接受与其版本的 [`librosa`](https:\u002F\u002Flibrosa.org\u002Fdoc\u002Flatest\u002Findex.html)（Python 音频处理库）兼容的所有声音文件，包括：\n\n- `.mp3`\n- `.ogg`\n- `.wav`\n- `.flac`\n- `.m4a`\n\n**仅单声道音频**\n\n虽然您可以使用立体声音频作为模型的输入，但在预测时，输入的通道将被混音为单声道，然后进行分析并转录。\n\n**文件大小\u002F音频长度**\n\n该模型可以处理任何大小或长度的音频，但较大\u002F较长音频文件的处理可能受限于您机器上可用的磁盘空间。要处理这些文件，我们建议流式传输文件的音频，一次处理音频窗口。\n\n**采样率**\n\n输入音频可以是任何采样率，但是，所有音频在处理之前都将重采样至 22050 Hz。\n\n### VST\n\n感谢 DamRsn 开发了此基本 pitch 的可用 VST（虚拟乐器插件标准）版本！- https:\u002F\u002Fgithub.com\u002FDamRsn\u002FNeuralNote\n\n\n## 贡献\n\n欢迎对 `basic-pitch` 做出贡献！有关详细信息，请参阅 [CONTRIBUTING.md](CONTRIBUTING.md)。\n\n## 版权与许可\n`basic-pitch` 版权所有 2022 Spotify AB。\n\n本软件根据 Apache License, Version 2.0（“Apache License\"）（Apache 许可证）进行许可。您可以选择任一许可证来规范您对软件的使用，前提是您接受 Apache License 的所有条款。\n\n您可以在以下位置获取 Apache License 副本：\n\nhttp:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0\n\n\n除非适用法律要求或书面同意，否则根据 Apache License 或 GPL License 分发的软件是按“原样”分发的，没有任何形式的担保或条件，无论是明示的还是暗示的。请参见 Apache License 以了解特定语言下的权限和限制。","# basic-pitch 快速上手指南\n\n**basic-pitch** 是由 Spotify 音频智能实验室开发的轻量级自动音乐转录（AMT）Python 库。它能够将音频文件高效地转换为包含音高变化的 MIDI 文件，支持多种乐器和复调音乐，无需依赖庞大的资源即可实现高精度转录。\n\n## 1. 环境准备\n\n- **操作系统**: macOS, Windows, Linux (Ubuntu)\n- **Python 版本**: 3.7, 3.8, 3.9, 3.10, 3.11\n  - **注意**: Mac M1 硬件目前仅支持 Python 3.10 版本，其他版本建议使用虚拟机。\n- **输入格式**: 支持 `.mp3`, `.ogg`, `.wav`, `.flac`, `.m4a` 等常见音频格式。\n- **音频要求**: 支持立体声（会自动混音为单声道），采样率不限（处理前会重采样至 22050 Hz）。\n\n## 2. 安装步骤\n\n通过 PyPI 直接安装当前发布版本：\n\n```bash\npip install basic-pitch\n```\n\n如需升级到最新版本，添加 `--upgrade` 参数：\n\n```bash\npip install --upgrade basic-pitch\n```\n\n**运行时说明**：\n默认情况下，basic-pitch 会根据操作系统自动选择模型推理后端（Mac 使用 CoreML，Linux 使用 TensorFlowLite，Windows 使用 ONNX），不会强制安装 TensorFlow。如果您需要显式安装 TensorFlow 作为依赖，可运行：\n\n```bash\npip install basic-pitch[tf]\n```\n\n## 3. 基本使用\n\n### 命令行工具\n\n最快捷的方式是通过命令行将音频转录为 MIDI 文件。\n\n**单个文件转录：**\n```bash\nbasic-pitch \u003Coutput-directory> \u003Cinput-audio-path>\n```\n\n**示例：**\n```bash\nbasic-pitch \u002Foutput\u002Fdirectory\u002Fpath \u002Finput\u002Faudio\u002Fpath\n```\n\n**批量处理：**\n```bash\nbasic-pitch \u003Coutput-directory> \u003Cinput-audio-path-1> \u003Cinput-audio-path-2> \u003Cinput-audio-path-3>\n```\n\n**可选输出参数：**\n- `--sonify-midi`: 额外保存 MIDI 的 `.wav` 渲染音频。\n- `--save-model-outputs`: 额外保存原始模型输出为 NPZ 文件。\n- `--save-note-events`: 额外保存预测音符事件为 CSV 文件。\n- `--model-serialization`: 指定非默认的模型类型（如 CoreML）。\n\n查看完整帮助：\n```bash\nbasic-pitch --help\n```\n\n### Python 代码调用\n\n在您的 Python 脚本中直接导入并使用 `predict` 函数进行转录。\n\n**基础预测：**\n```python\nfrom basic_pitch.inference import predict\nfrom basic_pitch import ICASSP_2022_MODEL_PATH\n\nmodel_output, midi_data, note_events = predict(\u003Cinput-audio-path>)\n```\n\n- `model_output`: 原始模型推理输出\n- `midi_data`: 转录生成的 MIDI 数据\n- `note_events`: 预测的音符事件列表\n\n**循环预测优化：**\n为避免重复加载模型导致速度变慢，建议在循环外加载模型对象：\n\n```python\nimport tensorflow as tf\n\nfrom basic_pitch.inference import predict, Model\nfrom basic_pitch import ICASSP_2022_MODEL_PATH\n\nbasic_pitch_model = Model(ICASSP_2022_MODEL_PATH))\n\nfor x in range():\n    ...\n    model_output, midi_data, note_events = predict(\n        \u003Cloop-x-input-audio-path>,\n        basic_pitch_model,\n    )\n    ...\n```\n\n**一键保存所有格式：**\n使用 `predict_and_save` 函数可自动管理输出文件的生成与保存：\n\n```python\nfrom basic_pitch.inference import predict_and_save\n\npredict_and_save(\n    \u003Cinput-audio-path-list>,\n    \u003Coutput-directory>,\n    \u003Csave-midi>,\n    \u003Csonify-midi>,\n    \u003Csave-model-outputs>,\n    \u003Csave-notes>,\n    \u003Cmodel-path>\n)\n```","独立音乐制作人小李在咖啡馆用手机录制了一段钢琴即兴演奏，急需将其转换为可编辑的 MIDI 文件，以便在数字音频工作站中重新编排配器和调整速度。\n\n### 没有 basic-pitch 时\n- 手动逐帧听写音符极其耗时，往往花费数小时才能完成一首短曲的转录，严重影响创作进度。\n- 依赖在线转换平台需付费购买会员，且上传音频存在隐私泄露风险，不适合商业项目使用。\n- 现有重型 AMT 模型体积庞大，普通电脑运行时会频繁卡顿甚至崩溃，硬件门槛过高。\n- 传统算法难以捕捉滑音、颤音等细微的演奏技巧，导致生成的 MIDI 缺乏情感表现力。\n\n### 使用 basic-pitch 后\n- basic-pitch 通过 pip 安装即可本地运行，几分钟内就能输出高精度的 MIDI 文件，大幅缩短制作周期。\n- 支持多平台推理后端，Mac 或 Windows 上均能秒级处理，不占用过多系统资源，老旧设备也能流畅运行。\n- 内置弯音检测功能，完美还原录音中的揉弦与滑音细节，保持演奏原貌，提升音乐还原度。\n- 完全离线工作，无需联网上传音频，有效保护了原创作品的版权安全，适合保密性高的项目。\n\nbasic-pitch 凭借轻量化架构与高精度弯音检测能力，彻底解决了音频素材数字化处理的效率与质量瓶颈。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspotify_basic-pitch_07a63820.png","spotify","Spotify","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fspotify_6ec80015.png","",null,"opensource@spotify.com","https:\u002F\u002Fspotify.github.io\u002F","https:\u002F\u002Fgithub.com\u002Fspotify",[84,88],{"name":85,"color":86,"percentage":87},"Python","#3572A5",99.8,{"name":89,"color":90,"percentage":91},"Dockerfile","#384d54",0.2,4841,433,"2026-04-05T09:10:19","Apache-2.0",1,"Linux, macOS, Windows","未说明",{"notes":100,"python":101,"dependencies":102},"Mac M1 硬件仅支持 Python 3.10；默认根据操作系统自动选择推理后端（Mac 为 CoreML，Linux 为 TensorFlowLite，Windows 为 ONNX），无需强制安装 TensorFlow；输入音频会被重采样至 22050Hz 并混音为单声道；处理大文件时需注意磁盘空间限制。","3.7, 3.8, 3.9, 3.10, 3.11",[103,104,105,106,107],"librosa","tensorflow (可选)","onnxruntime","coremltools","tensorflow-lite",[13,55],[110,111,112,113,114,115,116,117,118,119],"lightweight","machine-learning","midi","music","pitch-detection","polyphonic","transcription","audio","python","typescript",9,"2026-03-27T02:49:30.150509","2026-04-06T05:16:24.312785",[124,129,134,138,143,148],{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},2831,"在 Mac M1 芯片上运行 Basic Pitch 时出现 \"illegal hardware instruction\" 错误如何处理？","这可能与 Docker 镜像架构（x86 vs ARM）或 TensorFlow 兼容性有关。建议先在原生 Python shell 中尝试导入 TensorFlow 以排查问题。可以使用 PYTHONFAULTHANDLER=1 环境变量进行调试，或检查是否使用了错误的架构镜像。维护者在 M1 设备上测试正常，建议确认本地环境配置。","https:\u002F\u002Fgithub.com\u002Fspotify\u002Fbasic-pitch\u002Fissues\u002F52",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},2832,"使用 pip 安装 Basic Pitch 时提示依赖冲突（ResolutionImpossible）怎么办？","这是因为不同版本的 Basic Pitch 对 TensorFlow 版本要求存在冲突。建议清理现有环境后，直接从 GitHub 仓库安装而非 PyPI。如果是 Python 3.11 环境，虽然已有修复补丁，但为了稳定性，建议暂时使用 Python 3.10 版本。","https:\u002F\u002Fgithub.com\u002Fspotify\u002Fbasic-pitch\u002Fissues\u002F63",{"id":135,"question_zh":136,"answer_zh":137,"source_url":133},2833,"在 Mac Silicon (Apple Silicon) 上推荐如何配置 Python 环境以安装 Basic Pitch？","推荐使用 Homebrew 安装 Python 3.10 以避免兼容性问题。具体步骤如下：\nbrew install python@3.10\nalias python=\u002Fopt\u002Fhomebrew\u002FCellar\u002Fpython@3.10\u002F3.10.11\u002Fbin\u002Fpython3.10\nalias pip=\u002Fopt\u002Fhomebrew\u002FCellar\u002Fpython@3.10\u002F3.10.11\u002Fbin\u002Fpip3.10\npip install basic-pitch\nIntel Mac 用户的路径可能位于 \u002Fusr\u002Flocal 下。",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},2834,"生成的 MIDI 文件在 MuseScore 中显示多个乐器轨道（如 62 个钢琴）而非单轨，如何解决？","此问题在版本 0.1.0 及之后已修复。解决方法包括：1. 删除 conda 环境并清理缓存，直接从 GitHub 使用 pip 安装最新版；2. 在代码调用 predict_and_save 时设置参数 multiple_pitch_bends = False。","https:\u002F\u002Fgithub.com\u002Fspotify\u002Fbasic-pitch\u002Fissues\u002F14",{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},2835,"在 Windows 上运行 basic-pitch 命令行时报错 \"ModuleNotFoundError: No module named 'basic_pitch.predict'\" 怎么办？","尝试使用 Python 模块运行模式代替直接执行 exe 文件。请使用以下命令替代原命令：\npython.exe -m basic_pitch .\\input_file.wav\n或者显式指定 predict 模块：\npython.exe -m basic_pitch.predict","https:\u002F\u002Fgithub.com\u002Fspotify\u002Fbasic-pitch\u002Fissues\u002F11",{"id":149,"question_zh":150,"answer_zh":151,"source_url":152},2836,"运行时出现 \"_UserObject' object has no attribute 'add_slot'\" 错误如何解决？","这是由于 TensorFlow 版本过高（如 2.16）导致的模型加载不兼容。解决方案是将 TensorFlow 版本降级，建议锁定为 \u003C2.15.1，或者从 2.16 回滚到 2.15 版本即可解决该问题。","https:\u002F\u002Fgithub.com\u002Fspotify\u002Fbasic-pitch\u002Fissues\u002F117",[154,159,164,169,174,179,184,188],{"id":155,"version":156,"summary_zh":157,"released_at":158},102295,"v0.4.0","## What's Changed\r\n\r\nTraining code has been added to Basic Pitch. This includes data preprocessing and the training loop. Thanks @bgenchel !\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fspotify\u002Fbasic-pitch\u002Fcompare\u002Fv0.3.3...v0.4.0","2024-08-16T17:16:26",{"id":160,"version":161,"summary_zh":162,"released_at":163},102296,"v0.3.3","## What's Changed\r\n* tensorflow-macos will not be installed by default on mac devices for `Python\u003C=3.11`\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fspotify\u002Fbasic-pitch\u002Fcompare\u002Fv0.3.2...v0.3.3","2024-04-23T22:24:30",{"id":165,"version":166,"summary_zh":167,"released_at":168},102297,"v0.3.2","## What's Changed\r\n* predict() uses the default model path from __init__.py. by @drubinstein in https:\u002F\u002Fgithub.com\u002Fspotify\u002Fbasic-pitch\u002Fpull\u002F123\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fspotify\u002Fbasic-pitch\u002Fcompare\u002Fv0.3.1...v0.3.2","2024-04-22T12:52:49",{"id":170,"version":171,"summary_zh":172,"released_at":173},102298,"v0.3.1","## What's Changed\r\n* Fixed issue #120 involving the removal of a scipy function in newer versions of scipy\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fspotify\u002Fbasic-pitch\u002Fcompare\u002Fv0.3.0...v0.3.1","2024-04-19T23:42:09",{"id":175,"version":176,"summary_zh":177,"released_at":178},102299,"v0.3.0","## What's Changed\r\n* Add upper bound of 2.15 (inclusive) for Tensorflow.\r\n* Python 3.7 support has been dropped.\r\n* New serializations added for CoreML, TensorFlowLite and ONNX\r\n* Basic Pitch will _not_ by default install TensorFlow. It will install a smaller inference runtime based on the machine you are installing Basic Pitch on, e.g. CoreML for Mac devices\r\n* Various small changes to the defaults so this repo is aligned with it's [Typescript cousin](https:\u002F\u002Fgithub.com\u002Fspotify\u002Fbasic-pitch-ts).\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fspotify\u002Fbasic-pitch\u002Fcompare\u002Fv0.2.6...v0.3.0","2024-03-25T14:57:55",{"id":180,"version":181,"summary_zh":182,"released_at":183},102300,"v0.2.6","## What's Changed\r\n* More TensorFlow versions are now supported.\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fspotify\u002Fbasic-pitch\u002Fcompare\u002Fv0.2.5...v0.2.6","2023-06-27T17:51:35",{"id":185,"version":186,"summary_zh":79,"released_at":187},102301,"v0.2.5","2023-05-19T16:51:58",{"id":189,"version":190,"summary_zh":79,"released_at":191},102302,"v0.2.4","2023-04-10T16:03:42"]