[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-breizhn--DTLN":3,"tool-breizhn--DTLN":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":81,"owner_email":80,"owner_twitter":80,"owner_website":80,"owner_url":82,"languages":83,"stars":88,"forks":89,"last_commit_at":90,"license":91,"difficulty_score":10,"env_os":92,"env_gpu":93,"env_ram":94,"env_deps":95,"category_tags":104,"github_topics":105,"view_count":23,"oss_zip_url":80,"oss_zip_packed_at":80,"status":16,"created_at":122,"updated_at":123,"faqs":124,"releases":153},3288,"breizhn\u002FDTLN","DTLN","Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.","DTLN 是一个基于 TensorFlow 2.x 实现的实时语音降噪开源模型，全称为“双信号变换 LSTM 网络”。它主要解决在嘈杂环境中提取清晰人声的难题，能够有效抑制背景噪声，同时保留语音的自然度与相位信息。\n\n该项目不仅提供了完整的模型训练、推理及服务代码，还预置了多种格式（包括 SavedModel、TF-lite 和 ONNX）的模型文件，方便用户直接调用或作为基线进行二次开发。其独特的技术亮点在于结合了短时傅里叶变换（STFT）与可学习的分析合成基，通过堆叠网络架构，在参数量少于 100 万的情况下实现了卓越的降噪性能。在国际知名的深度降噪挑战赛（DNS-Challenge）中，DTLN 的表现超越了官方基线模型，并在实时赛道中取得了优异成绩。\n\n得益于高效的算法设计，DTLN 支持真正的实时音频处理（逐帧输入输出），甚至能在树莓派等资源受限的设备上流畅运行。这使得它非常适合开发者、研究人员以及需要嵌入式语音处理方案的工程师使用。无论是构建会议软件、助听设备原型，还是研究语音增强算法，DTLN 都是一个轻量且强大的选择。项目遵循 MIT 协议开源，鼓励社区探索更多创新应用","DTLN 是一个基于 TensorFlow 2.x 实现的实时语音降噪开源模型，全称为“双信号变换 LSTM 网络”。它主要解决在嘈杂环境中提取清晰人声的难题，能够有效抑制背景噪声，同时保留语音的自然度与相位信息。\n\n该项目不仅提供了完整的模型训练、推理及服务代码，还预置了多种格式（包括 SavedModel、TF-lite 和 ONNX）的模型文件，方便用户直接调用或作为基线进行二次开发。其独特的技术亮点在于结合了短时傅里叶变换（STFT）与可学习的分析合成基，通过堆叠网络架构，在参数量少于 100 万的情况下实现了卓越的降噪性能。在国际知名的深度降噪挑战赛（DNS-Challenge）中，DTLN 的表现超越了官方基线模型，并在实时赛道中取得了优异成绩。\n\n得益于高效的算法设计，DTLN 支持真正的实时音频处理（逐帧输入输出），甚至能在树莓派等资源受限的设备上流畅运行。这使得它非常适合开发者、研究人员以及需要嵌入式语音处理方案的工程师使用。无论是构建会议软件、助听设备原型，还是研究语音增强算法，DTLN 都是一个轻量且强大的选择。项目遵循 MIT 协议开源，鼓励社区探索更多创新应用。","# Dual-signal Transformation LSTM Network\n\n+ Tensorflow 2.x implementation of the stacked dual-signal transformation LSTM network (DTLN) for real-time noise suppression.\n+ This repository provides the code for training, infering and serving the DTLN model in python. It also provides pretrained models in SavedModel, TF-lite and ONNX format, which can be used as baseline for your own projects. The model is able to run with real time audio on a RaspberryPi.\n+ If you are doing cool things with this repo, tell me about it. I am always curious about what you are doing with this code or this models.\n\n---\n\nThe DTLN model was handed in to the deep noise suppression challenge ([DNS-Challenge](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDNS-Challenge)) and the paper was presented at Interspeech 2020. \n\n\nThis approach combines a short-time Fourier transform (STFT) and a learned analysis and synthesis basis in a stacked-network approach with less than one million parameters. The model was trained on 500h of noisy speech provided by the challenge organizers. The network is capable of real-time processing (one frame in, one frame out) and reaches competitive results.\nCombining these two types of signal transformations enables the DTLN to robustly extract information from magnitude spectra and incorporate phase information from the learned feature basis. The method shows state-of-the-art performance and outperforms the DNS-Challenge baseline by 0.24 points absolute in terms of the mean opinion score (MOS).\n\nFor more information see the [paper](https:\u002F\u002Fwww.isca-speech.org\u002Farchive\u002Finterspeech_2020\u002Fwesthausen20_interspeech.html). The results of the DNS-Challenge are published [here](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Facademic-program\u002Fdeep-noise-suppression-challenge-interspeech-2020\u002F#!results). We reached a competitive 8th place out of 17 teams in the real time track.\n\n---\n\nFor baseline usage and to reproduce the processing used for the paper run:\n```bash\n$ python run_evaluation.py -i in\u002Ffolder\u002Fwith\u002Fwav -o target\u002Ffolder\u002Fprocessed\u002Ffiles -m .\u002Fpretrained_model\u002Fmodel.h5\n```\n\n---\n\nThe pretrained DTLN-aec (the DTLN applied to acoustic echo cancellation) can be found in the [DTLN-aec repository](https:\u002F\u002Fgithub.com\u002Fbreizhn\u002FDTLN-aec).\n\n---\n\nAuthor: Nils L. Westhausen ([Communication Acoustics](https:\u002F\u002Fuol.de\u002Fen\u002Fkommunikationsakustik) , Carl von Ossietzky University, Oldenburg, Germany)\n\nThis code is licensed under the terms of the MIT license.\n\n\n---\n### Citing:\n\nIf you are using the DTLN model, please cite:\n\n```BibTex\n@inproceedings{Westhausen2020,\n  author={Nils L. Westhausen and Bernd T. Meyer},\n  title={{Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression}},\n  year=2020,\n  booktitle={Proc. Interspeech 2020},\n  pages={2477--2481},\n  doi={10.21437\u002FInterspeech.2020-2631},\n  url={http:\u002F\u002Fdx.doi.org\u002F10.21437\u002FInterspeech.2020-2631}\n}\n```\n\n\n---\n### Contents of the README:\n\n* [Results](#results)\n* [Execution Times](#execution-times)\n* [Audio Samples](#audio-samples)\n* [Contents of the repository](#contents-of-the-repository)\n* [Python dependencies](#python-dependencies)\n* [Training data preparation](#training-data-preparation)\n* [Run a training of the DTLN model](#run-a-training-of-the-dtln-model)\n* [Measuring the execution time of the DTLN model with the SavedModel format](#measuring-the-execution-time-of-the-dtln-model-with-the-savedmodel-format)\n* [Real time processing with the SavedModel format](#real-time-processing-with-the-savedmodel-format)\n* [Real time processing with tf-lite](#real-time-processing-with-tf-lite)\n* [Real time audio with sounddevice and tf-lite](#real-time-audio-with-sounddevice-and-tf-lite)\n* [Model conversion and real time processing with ONNX](#model-conversion-and-real-time-processing-with-onnx)\n\n\n\n---\n### Results:\n\nResults on the DNS-Challenge non reverberant test set:\nModel | PESQ [mos] | STOI [%] | SI-SDR [dB] | TF version\n--- | --- | --- | --- | ---\nunprocessed | 2.45 | 91.52 | 9.07 |\nNsNet (Baseline) | 2.70 | 90.56 | 12.57 |\n |  |  |  | \nDTLN (500h) | 3.04 | 94.76 | 16.34 | 2.1\nDTLN (500h)| 2.98 | 94.75 | 16.20 | TF-light\nDTLN (500h) | 2.95 | 94.47 | 15.71 | TF-light quantized\n |  |  |  | \nDTLN norm (500h) | 3.04 | 94.47 | 16.10 | 2.2\n |  |  |  | \nDTLN norm (40h) | 3.05 | 94.57 | 16.88 | 2.2\nDTLN norm (40h) | 2.98 | 94.56 | 16.58 | TF-light\nDTLN norm (40h) | 2.98 | 94.51 | 16.22 | TF-light quantized\n\n* The conversion to TF-light slightly reduces the performance. \n* The dynamic range quantization of TF-light also reduces the performance a bit and introduces some quantization noise. But the audio-quality is still on a high level and the model is real-time capable on the Raspberry Pi 3 B+.\n* The normalization of the log magnitude of the STFT does not decrease the model performance and makes it more robust against level variations.\n* With data augmentation during training it is possible to train the DTLN model on just 40h of noise and speech data. If you have any question regarding this, just contact me.\n\n[To contents](#contents-of-the-readme)\n\n---\n\n### Execution Times:\n\nExecution times for SavedModel are measured with TF 2.2 and for TF-lite with the TF-lite runtime:\nSystem | Processor | #Cores | SavedModel | TF-lite | TF-lite quantized\n--- | --- | --- | --- | --- | ---\nUbuntu 18.04         | Intel I5 6600k @ 3.5 GHz | 4 | 0.65 ms | 0.36 ms | 0.27 ms\nMacbook Air mid 2012 | Intel I7 3667U @ 2.0 GHz | 2 | 1.4 ms | 0.6 ms | 0.4 ms\nRaspberry Pi 3 B+    | ARM Cortex A53 @ 1.4 GHz | 4 | 15.54 ms | 9.6 ms | 2.2 ms\n\nFor real-time capability the execution time must be below 8 ms.\n\n[To contents](#contents-of-the-readme)\n\n---\n\n### Audio Samples:\n\nHere some audio samples created with the tf-lite model. Sadly audio can not be integrated directly into markdown.\n\nNoisy | Enhanced | Noise type\n--- | --- | --- \n[Sample 1](https:\u002F\u002Fcloudsync.uol.de\u002Fs\u002FGFHzmWWJAwgQPLf) | [Sample 1](https:\u002F\u002Fcloudsync.uol.de\u002Fs\u002Fp3M48y7cjkJ2ZZg) | Air conditioning\n[Sample 2](https:\u002F\u002Fcloudsync.uol.de\u002Fs\u002F4Y2PoSpJf7nXx9T) | [Sample 2](https:\u002F\u002Fcloudsync.uol.de\u002Fs\u002FQeK4aH5KCELPnko) | Music\n[Sample 3](https:\u002F\u002Fcloudsync.uol.de\u002Fs\u002FAwc6oBtnTpb5pY7) | [Sample 3](https:\u002F\u002Fcloudsync.uol.de\u002Fs\u002FyNsmDgxH3MPWMTi) | Bus \n\n\n[To contents](#contents-of-the-readme)\n\n---\n### Contents of the repository:\n\n*  **DTLN_model.py** \\\n  This file is containing the model, data generator and the training routine.\n*  **run_training.py** \\\n  Script to run the training. Before you can start the training with `$ python run_training.py`you have to set the paths to you training and validation data inside the script. The training script uses a default setup.\n* **run_evaluation.py** \\\n  Script to process a folder with optional subfolders containing .wav files with a trained DTLN model. With the pretrained model delivered with this repository a folder can be processed as following: \\\n  `$ python run_evaluation.py -i \u002Fpath\u002Fto\u002Finput -o \u002Fpath\u002Ffor\u002Fprocessed -m .\u002Fpretrained_model\u002Fmodel.h5` \\\n  The evaluation script will create the new folder with the same structure as the input folder and the files will have the same name as the input files.\n* **measure_execution_time.py** \\\n  Script for measuring the execution time with the saved DTLN model in `.\u002Fpretrained_model\u002Fdtln_saved_model\u002F`. For further information see this [section](#measuring-the-execution-time-of-the-dtln-model-with-the-savedmodel-format).\n* **real_time_processing.py** \\\n  Script, which explains how real time processing with the SavedModel works. For more information see this [section](#real-time-processing-with-the-savedmodel-format).\n+  **.\u002Fpretrained_model\u002F** \\\n   * `model.h5`: Model weights as used in the DNS-Challenge DTLN model.\n   * `DTLN_norm_500h.h5`: Model weights trained on 500h with normalization of stft log magnitudes.\n   * `DTLN_norm_40h.h5`: Model weights trained on 40h with normalization of stft log magnitudes.\n   * `.\u002Fdtln_saved_model`: same as `model.h5` but as a stateful model in SavedModel format.\n   * `.\u002FDTLN_norm_500h_saved_model`: same as `DTLN_norm_500h.h5` but as a stateful model in SavedModel format.\n   * `.\u002FDTLN_norm_40h_saved_model`: same as `DTLN_norm_40h.h5` but as a stateful model in SavedModel format.\n   * `model_1.tflite` together with `model_2.tflite`: same as `model.h5` but as TF-lite model with external state handling.\n   * `model_quant_1.tflite` together with `model_quant_2.tflite`: same as `model.h5` but as TF-lite model with external state handling and dynamic range quantization.\n   * `model_1.onnx` together with `model_2.onnx`: same as `model.h5` but as ONNX model with external state handling.\n   \n[To contents](#contents-of-the-readme)\n   \n---\n### Python dependencies:\n\nThe following packages will be required for this repository:\n* TensorFlow (2.x)\n* librosa\n* wavinfo \n\n\nAll additional packages (numpy, soundfile, etc.) should be installed on the fly when using conda or pip. I recommend using conda environments or [pyenv](https:\u002F\u002Fgithub.com\u002Fpyenv\u002Fpyenv) [virtualenv](https:\u002F\u002Fgithub.com\u002Fpyenv\u002Fpyenv-virtualenv) for the python environment. For training a GPU with at least 5 GB of memory is required. I recommend at least Tensorflow 2.1 with Nvidia driver 418 and Cuda 10.1. If you use conda Cuda will be installed on the fly and you just need the driver. For evaluation-only the CPU version of Tensorflow is enough. Everything was tested on Ubuntu 18.04.\n\nConda environments for training (with cuda) and for evaluation (CPU only) can be created as following:\n\nFor the training environment:\n```shell\n$ conda env create -f train_env.yml\n```\nFor the evaluation environment:\n```\n$ conda env create -f eval_env.yml\n```\nFor the tf-lite environment:\n```\n$ conda env create -f tflite_env.yml\n```\nThe tf-lite runtime must be downloaded from [here](https:\u002F\u002Fwww.tensorflow.org\u002Flite\u002Fguide\u002Fpython).\n\n[To contents](#contents-of-the-readme)\n\n---\n### Training data preparation:\n\n1. Clone the forked DNS-Challenge [repository](https:\u002F\u002Fgithub.com\u002Fbreizhn\u002FDNS-Challenge). Before cloning the repository make sure `git-lfs` is installed. Also make sure your disk has enough space. I recommend downloading the data to an SSD for faster dataset creation.\n\n2. Run `noisyspeech_synthesizer_multiprocessing.py` to create the dataset. `noisyspeech_synthesizer.cfg`was changed according to my training setup used for the DNS-Challenge. \n\n3. Run `split_dns_corpus.py`to divide the dataset in training and validation data. The classic 80:20 split is applied. This file was added to the forked repository by me.\n\n[To contents](#contents-of-the-readme)\n\n---\n### Run a training of the DTLN model:\n\n1. Make sure all dependencies are installed in your python environment.\n\n2. Change the paths to your training and validation dataset in `run_training.py`.\n\n3. Run `$ python run_training.py`. \n\nOne epoch takes around 21 minutes on a Nvidia RTX 2080 Ti when loading the training data from an SSD. \n\n[To contents](#contents-of-the-readme)\n\n---\n### Measuring the execution time of the DTLN model with the SavedModel format:\n\nIn total there are three ways to measure the execution time for one block of the model: Running a sequence in Keras and dividing by the number of blocks in the sequence, building a stateful model in Keras and running block by block, and saving the stateful model in Tensorflow's SavedModel format and calling that one block by block. In the following I will explain how running the model in the SavedModel format, because it is the most portable version and can also be called from Tensorflow Serving.\n\nA Keras model can be saved to the saved model format:\n```python\nimport tensorflow as tf\n'''\nBuilding some model here\n'''\ntf.saved_model.save(your_keras_model, 'name_save_path')\n```\nImportant here for real time block by block processing is, to make the LSTM layer stateful, so they can remember the states from the previous block.\n\nThe model can be imported with \n```python\nmodel = tf.saved_model.load('name_save_path')\n```\n\nFor inference we now first call this for mapping signature names to functions\n```python\ninfer = model.signatures['serving_default']\n```\n\nand now for inferring the block `x` call\n```python\ny = infer(tf.constant(x))['conv1d_1']\n```\nThis command gives you the result on the node `'conv1d_1'`which is our output node for real time processing. For more information on using the SavedModel format and obtaining the output node see this [Guide](https:\u002F\u002Fwww.tensorflow.org\u002Fguide\u002Fsaved_model).\n\nFor making everything easier this repository provides a stateful DTLN SavedModel. \nFor measuring the execution time call:\n```\n$ python measure_execution_time.py\n```\n\n[To contents](#contents-of-the-readme)\n\n---\n\n### Real time processing with the SavedModel format:\n\nFor explanation look at `real_time_processing.py`. \n\nHere some consideration for integrating this model in your project:\n* The sampling rate of this model is fixed at 16 kHz. It will not work smoothly with other sampling rates.\n* The block length of 32 ms and the block shift of 8 ms are also fixed. For changing these values, the model must be retrained.\n* The delay created by the model is the block length, so the input-output delay is 32 ms.\n* For real time capability on your system, the execution time must be below the length of the block shift, so below 8 ms. \n* If can not give you support on the hardware side, regarding soundcards, drivers and so on. Be aware, a lot of artifacts can come from this side.\n\n[To contents](#contents-of-the-readme)\n\n---\n### Real time processing with tf-lite:\n\nWith TF 2.3 it is finally possible to convert LSTMs to tf-lite. It is still not perfect because the states must be handled seperatly for a stateful model and tf-light does not support complex numbers. That means that the model is splitted in two submodels when converting it to tf-lite and the calculation of the FFT and iFFT is performed outside the model. I provided an example script for explaining, how real time processing with the tf light model works (```real_time_processing_tf_lite.py```). In this script the tf-lite runtime is used. The runtime can be downloaded [here](https:\u002F\u002Fwww.tensorflow.org\u002Flite\u002Fguide\u002Fpython). Quantization works now.\n\nUsing the tf-lite DTLN model and the tf-lite runtime the execution time on an old Macbook Air mid 2012 can be decreased to **0.6 ms**.\n\n[To contents](#contents-of-the-readme)\n\n---\n### Real time audio with sounddevice and tf-lite:\n\nThe file ```real_time_dtln_audio.py```is an example how real time audio with the tf-lite model and the [sounddevice](https:\u002F\u002Fgithub.com\u002Fspatialaudio\u002Fpython-sounddevice) toolbox can be implemented. The script is based on the ```wire.py``` example. It works fine on an old Macbook Air mid 2012 and so it will probably run on most newer devices. In the quantized version it was sucessfully tested on an Raspberry Pi 3B +.\n\nFirst check for your audio devices:\n```\n$ python real_time_dtln_audio.py --list-devices\n```\nChoose the index of an input and an output device and call:\n```\n$ python real_time_dtln_audio.py -i in_device_idx -o out_device_idx\n```\nIf the script is showing too much ```input underflow``` restart the sript. If that does not help, increase the latency with the ```--latency``` option. The default value is 0.2 .\n\n[To contents](#contents-of-the-readme)\n\n---\n### Model conversion and real time processing with ONNX:\n\nFinally I got the ONNX model working. \nFor converting the model TF 2.1 and keras2onnx is required. keras2onnx can be downloaded [here](https:\u002F\u002Fgithub.com\u002Fonnx\u002Fkeras-onnx) and must be installed from source as described in the README. When all dependencies are installed, call:\n```\n$ python convert_weights_to_onnx.py -m \u002Fname\u002Fof\u002Fthe\u002Fmodel.h5 -t onnx_model_name\n```\nto convert the model to the ONNX format. The model is split in two parts as for the TF-lite model. The conversion does not work on MacOS.\nThe real time processing works similar to the TF-lite model and can be looked up in following file: ```real_time_processing_onnx.py ```\nThe ONNX runtime required for this script can be installed with:\n```\n$ pip install onnxruntime\n```\nThe execution time on the Macbook Air mid 2012 is around 1.13 ms for one block.\n","# 双信号变换LSTM网络\n\n+ 基于TensorFlow 2.x实现的堆叠式双信号变换LSTM网络（DTLN），用于实时噪声抑制。\n+ 本仓库提供了使用Python训练、推理和部署DTLN模型的代码。同时，还提供了SavedModel、TF-lite和ONNX格式的预训练模型，可作为您自己项目的基线。该模型能够在树莓派上对实时音频进行处理。\n+ 如果您使用本仓库开发了有趣的应用，请告诉我！我非常好奇您是如何利用这些代码或模型的。\n\n---\n\nDTLN模型曾提交至深度噪声抑制挑战赛（DNS-Challenge，[GitHub链接](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDNS-Challenge)），相关论文已在2020年Interspeech会议上发表。\n\n\n该方法结合了短时傅里叶变换（STFT）与学习得到的分析和综合基，在一个参数量少于一百万的堆叠网络中实现了这一创新。模型基于挑战赛主办方提供的500小时带噪语音数据进行训练。该网络能够进行实时处理（一帧输入，一帧输出），并取得了具有竞争力的结果。通过结合这两种信号变换方式，DTLN能够稳健地从幅度谱中提取信息，并将相位信息融入到学习得到的特征基中。该方法表现出当前最先进的性能，在平均意见得分（MOS）方面比DNS-Challenge的基线高出0.24分。\n\n更多信息请参阅[论文](https:\u002F\u002Fwww.isca-speech.org\u002Farchive\u002Finterspeech_2020\u002Fwesthausen20_interspeech.html)。DNS-Challenge的比赛结果已发布在[这里](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Facademic-program\u002Fdeep-noise-suppression-challenge-interspeech-2020\u002F#!results)。我们在实时赛道中从17支队伍中脱颖而出，获得了第8名的优异成绩。\n\n---\n\n如需使用基线或复现论文中的处理流程，请运行以下命令：\n```bash\n$ python run_evaluation.py -i in\u002Ffolder\u002Fwith\u002Fwav -o target\u002Ffolder\u002Fprocessed\u002Ffiles -m .\u002Fpretrained_model\u002Fmodel.h5\n```\n\n---\n\n预训练的DTLN-aec模型（即应用于声学回声消除的DTLN）可在[DTLN-aec仓库](https:\u002F\u002Fgithub.com\u002Fbreizhn\u002FDTLN-aec)中找到。\n\n---\n\n作者：尼尔斯·L·韦斯特豪森（德国奥尔登堡卡尔·冯·奥西茨基大学“通信声学”研究组）\n\n本代码采用MIT许可证授权。\n\n\n---\n### 引用：\n\n如果您使用了DTLN模型，请引用以下文献：\n\n```BibTex\n@inproceedings{Westhausen2020,\n  author={Nils L. Westhausen and Bernd T. Meyer},\n  title={{Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression}},\n  year=2020,\n  booktitle={Proc. Interspeech 2020},\n  pages={2477--2481},\n  doi={10.21437\u002FInterspeech.2020-2631},\n  url={http:\u002F\u002Fdx.doi.org\u002F10.21437\u002FInterspeech.2020-2631}\n}\n```\n\n\n---\n### README内容：\n\n* [结果](#results)\n* [执行时间](#execution-times)\n* [音频样本](#audio-samples)\n* [仓库内容](#contents-of-the-repository)\n* [Python依赖](#python-dependencies)\n* [训练数据准备](#training-data-preparation)\n* [运行DTLN模型训练](#run-a-training-of-the-dtln-model)\n* [使用SavedModel格式测量DTLN模型的执行时间](#measuring-the-execution-time-of-the-dtln-model-with-the-savedmodel-format)\n* [使用SavedModel格式进行实时处理](#real-time-processing-with-the-savedmodel-format)\n* [使用TF-lite进行实时处理](#real-time-processing-with-tf-lite)\n* [使用sounddevice和TF-lite进行实时音频处理](#real-time-audio-with-sounddevice-and-tf-lite)\n* [模型转换及使用ONNX进行实时处理](#model-conversion-and-real-time-processing-with-onnx)\n\n\n\n---\n### 结果：\n\nDNS-Challenge无混响测试集上的结果：\n模型 | PESQ [mos] | STOI [%] | SI-SDR [dB] | TF版本\n--- | --- | --- | --- | ---\n未处理 | 2.45 | 91.52 | 9.07 |\nNsNet（基线） | 2.70 | 90.56 | 12.57 |\n |  |  |  | \nDTLN（500h） | 3.04 | 94.76 | 16.34 | 2.1\nDTLN（500h）| 2.98 | 94.75 | 16.20 | TF-light\nDTLN（500h） | 2.95 | 94.47 | 15.71 | TF-light量化版\n |  |  |  | \nDTLN归一化（500h） | 3.04 | 94.47 | 16.10 | 2.2\n |  |  |  | \nDTLN归一化（40h） | 3.05 | 94.57 | 16.88 | 2.2\nDTLN归一化（40h） | 2.98 | 94.56 | 16.58 | TF-light\nDTLN归一化（40h） | 2.98 | 94.51 | 16.22 | TF-light量化版\n\n* 转换为TF-light会略微降低性能。\n* TF-light的动态范围量化也会小幅降低性能，并引入一些量化噪声。但音频质量仍然较高，且该模型可以在树莓派3 B+上实现实时处理。\n* 对STFT对数幅度进行归一化不会降低模型性能，反而使其对音量变化更具鲁棒性。\n* 通过训练时的数据增强，仅需40小时的噪声和语音数据即可训练出DTLN模型。如有任何疑问，请随时联系我。\n\n[返回目录](#contents-of-the-readme)\n\n---\n\n### 执行时间：\n\nSavedModel的执行时间使用TF 2.2测量，TF-lite的执行时间则使用TF-lite运行时环境测量：\n系统 | 处理器 | 核心数 | SavedModel | TF-lite | TF-lite量化版\n--- | --- | --- | --- | --- | ---\nUbuntu 18.04         | Intel I5 6600k @ 3.5 GHz | 4 | 0.65 ms | 0.36 ms | 0.27 ms\nMacbook Air mid 2012 | Intel I7 3667U @ 2.0 GHz | 2 | 1.4 ms | 0.6 ms | 0.4 ms\nRaspberry Pi 3 B+    | ARM Cortex A53 @ 1.4 GHz | 4 | 15.54 ms | 9.6 ms | 2.2 ms\n\n要实现实时处理，执行时间必须低于8毫秒。\n\n[返回目录](#contents-of-the-readme)\n\n---\n\n### 音频样本：\n\n以下是一些使用TF-lite模型生成的音频样本。遗憾的是，Markdown无法直接嵌入音频文件。\n\n带噪音频 | 增强后音频 | 噪声类型\n--- | --- | --- \n[样本1](https:\u002F\u002Fcloudsync.uol.de\u002Fs\u002FGFHzmWWJAwgQPLf) | [样本1](https:\u002F\u002Fcloudsync.uol.de\u002Fs\u002Fp3M48y7cjkJ2ZZg) | 空调噪声\n[样本2](https:\u002F\u002Fcloudsync.uol.de\u002Fs\u002F4Y2PoSpJf7nXx9T) | [样本2](https:\u002F\u002Fcloudsync.uol.de\u002Fs\u002FQeK4aH5KCELPnko) | 音乐\n[样本3](https:\u002F\u002Fcloudsync.uol.de\u002Fs\u002FAwc6oBtnTpb5pY7) | [样本3](https:\u002F\u002Fcloudsync.uol.de\u002Fs\u002FyNsmDgxH3MPWMTi) | 公交车噪音\n\n\n[返回目录](#contents-of-the-readme)\n\n---\n\n### 仓库内容：\n\n*  **DTLN_model.py** \\\n  该文件包含模型、数据生成器以及训练流程。\n*  **run_training.py** \\\n  用于运行训练的脚本。在使用 `$ python run_training.py` 开始训练之前，您需要在脚本中设置训练数据和验证数据的路径。训练脚本使用默认配置。\n* **run_evaluation.py** \\\n  用于处理包含 `.wav` 文件的文件夹（可选子文件夹）的脚本，需配合已训练好的 DTLN 模型使用。使用本仓库提供的预训练模型，可以按如下方式处理文件夹：\\\n  `$ python run_evaluation.py -i \u002Fpath\u002Fto\u002Finput -o \u002Fpath\u002Ffor\u002Fprocessed -m .\u002Fpretrained_model\u002Fmodel.h5` \\\n  评估脚本会创建与输入文件夹结构相同的输出文件夹，并且输出文件的名称与输入文件相同。\n* **measure_execution_time.py** \\\n  用于测量保存为 `.\u002Fpretrained_model\u002Fdtln_saved_model\u002F` 格式的 DTLN 模型执行时间的脚本。更多信息请参见[此处](#measuring-the-execution-time-of-the-dtln-model-with-the-savedmodel-format)。\n* **real_time_processing.py** \\\n  该脚本说明如何使用 SavedModel 格式进行实时处理。更多信息请参见[此处](#real-time-processing-with-the-savedmodel-format)。\n+  **.\u002Fpretrained_model\u002F** \\\n   * `model.h5`: DNS-Challenge DTLN 模型中使用的模型权重。\n   * `DTLN_norm_500h.h5`: 在 500 小时数据上训练、并对 STFT 对数幅度进行归一化的模型权重。\n   * `DTLN_norm_40h.h5`: 在 40 小时数据上训练、并对 STFT 对数幅度进行归一化的模型权重。\n   * `.\u002Fdtln_saved_model`: 与 `model.h5` 相同，但以 SavedModel 格式保存的状态化模型。\n   * `.\u002FDTLN_norm_500h_saved_model`: 与 `DTLN_norm_500h.h5` 相同，但以 SavedModel 格式保存的状态化模型。\n   * `.\u002FDTLN_norm_40h_saved_model`: 与 `DTLN_norm_40h.h5` 相同，但以 SavedModel 格式保存的状态化模型。\n   * `model_1.tflite` 和 `model_2.tflite`: 与 `model.h5` 相同，但采用 TF-Lite 模型格式，并使用外部状态管理。\n   * `model_quant_1.tflite` 和 `model_quant_2.tflite`: 与 `model.h5` 相同，但采用 TF-Lite 模型格式，结合外部状态管理和动态范围量化。\n   * `model_1.onnx` 和 `model_2.onnx`: 与 `model.h5` 相同，但采用 ONNX 模型格式，并使用外部状态管理。\n\n[返回目录](#contents-of-the-readme)\n   \n---\n### Python 依赖项：\n\n本仓库需要以下软件包：\n* TensorFlow (2.x)\n* librosa\n* wavinfo \n\n其他所有依赖包（如 numpy、soundfile 等）在使用 conda 或 pip 时应自动安装。建议使用 conda 环境或 [pyenv](https:\u002F\u002Fgithub.com\u002Fpyenv\u002Fpyenv) [virtualenv](https:\u002F\u002Fgithub.com\u002Fpyenv\u002Fpyenv-virtualenv) 来管理 Python 环境。训练时需要至少配备 5 GB 显存的 GPU。推荐使用 Tensorflow 2.1 及以上版本，搭配 Nvidia 驱动程序 418 和 CUDA 10.1。如果使用 conda，CUDA 会自动安装，您只需安装驱动程序即可。仅进行评估时，使用 CPU 版本的 Tensorflow 即可。所有测试均在 Ubuntu 18.04 上完成。\n\n可用于训练（带 CUDA）和评估（仅 CPU）的 conda 环境可按如下方式创建：\n\n训练环境：\n```shell\n$ conda env create -f train_env.yml\n```\n评估环境：\n```\n$ conda env create -f eval_env.yml\n```\nTF-Lite 环境：\n```\n$ conda env create -f tflite_env.yml\n```\nTF-Lite 运行时库需从[此处](https:\u002F\u002Fwww.tensorflow.org\u002Flite\u002Fguide\u002Fpython)下载。\n\n[返回目录](#contents-of-the-readme)\n\n---\n### 训练数据准备：\n\n1. 克隆分叉的 DNS-Challenge [仓库](https:\u002F\u002Fgithub.com\u002Fbreizhn\u002FDNS-Challenge)。克隆前请确保已安装 `git-lfs`，并确认磁盘空间充足。建议将数据下载到 SSD 上，以加快数据集的构建速度。\n\n2. 运行 `noisyspeech_synthesizer_multiprocessing.py` 以创建数据集。`noisyspeech_synthesizer.cfg` 已根据我在 DNS-Challenge 中使用的训练配置进行了修改。\n\n3. 运行 `split_dns_corpus.py` 将数据集划分为训练集和验证集。采用经典的 80:20 划分比例。该文件由我添加到分叉的仓库中。\n\n[返回目录](#contents-of-the-readme)\n\n---\n### 运行 DTLN 模型训练：\n\n1. 确保您的 Python 环境中已安装所有依赖项。\n\n2. 修改 `run_training.py` 中的训练集和验证集路径。\n\n3. 运行 `$ python run_training.py`。\n\n在使用 SSD 加载训练数据的情况下，单个 epoch 大约需要 21 分钟，硬件为 Nvidia RTX 2080 Ti。\n\n[返回目录](#contents-of-the-readme)\n\n---\n### 使用 SavedModel 格式测量 DTLN 模型的执行时间：\n\n总共有三种方法可以测量模型单个块的执行时间：在 Keras 中运行整个序列并除以序列中的块数；在 Keras 中构建一个状态化模型并逐块运行；或者将状态化模型保存为 TensorFlow 的 SavedModel 格式，然后逐块调用。接下来我将介绍如何使用 SavedModel 格式运行模型，因为它是最便携的版本，也可以通过 TensorFlow Serving 调用。\n\nKeras 模型可以保存为 SavedModel 格式：\n```python\nimport tensorflow as tf\n'''\n在此构建模型\n'''\ntf.saved_model.save(your_keras_model, 'name_save_path')\n```\n对于实时逐块处理而言，重要的是将 LSTM 层设置为状态化，以便它们能够记住前一个块的状态。\n\n模型可以通过以下方式导入：\n```python\nmodel = tf.saved_model.load('name_save_path')\n```\n\n进行推理时，首先需要调用以下代码来映射签名名称到函数：\n```python\ninfer = model.signatures['serving_default']\n```\n\n然后，对第 `x` 个块进行推理：\n```python\ny = infer(tf.constant(x))['conv1d_1']\n```\n\n此命令会返回节点 `'conv1d_1'` 的结果，该节点是我们用于实时处理的输出节点。有关如何使用 SavedModel 格式及获取输出节点的更多信息，请参阅[指南](https:\u002F\u002Fwww.tensorflow.org\u002Fguide\u002Fsaved_model)。\n\n为了简化操作，本仓库提供了一个状态化的 DTLN SavedModel。要测量执行时间，请运行：\n```\n$ python measure_execution_time.py\n```\n\n[返回目录](#contents-of-the-readme)\n\n---\n\n### 使用 SavedModel 格式的实时处理：\n\n有关说明，请参阅 `real_time_processing.py`。\n\n在将此模型集成到您的项目中时，需要注意以下几点：\n* 该模型的采样率固定为 16 kHz。如果使用其他采样率，可能无法顺利运行。\n* 块长度为 32 毫秒，块移位为 8 毫秒，这两者也是固定的。若要更改这些值，必须重新训练模型。\n* 模型引入的延迟等于块长度，因此输入输出延迟为 32 毫秒。\n* 为了在您的系统上实现实时处理，执行时间必须低于块移位的长度，即低于 8 毫秒。\n* 我们无法在硬件方面提供支持，例如声卡、驱动程序等。请注意，许多音频质量问题可能源于硬件端。\n\n[返回目录](#readme-contents)\n\n---\n### 使用 TF Lite 的实时处理：\n\n借助 TensorFlow 2.3，现在终于可以将 LSTM 转换为 TF Lite 格式。不过目前仍不完美：对于有状态模型，其状态需要单独管理；此外，TF Lite 不支持复数运算。这意味着在转换为 TF Lite 时，模型会被拆分为两个子模型，而 FFT 和 iFFT 的计算将在模型外部进行。我提供了一个示例脚本 (`real_time_processing_tf_lite.py`) 来说明如何使用 TF Lite 模型进行实时处理。该脚本使用了 TF Lite 运行时，可从 [这里](https:\u002F\u002Fwww.tensorflow.org\u002Flite\u002Fguide\u002Fpython) 下载。量化功能现已可用。\n\n使用 TF Lite 版 DTLN 模型和 TF Lite 运行时，在一台 2012 年中期的旧款 MacBook Air 上，执行时间可降低至 **0.6 毫秒**。\n\n[返回目录](#readme-contents)\n\n---\n### 使用 sounddevice 和 TF Lite 的实时音频：\n\n文件 `real_time_dtln_audio.py` 是一个示例，展示了如何结合 TF Lite 模型和 [sounddevice](https:\u002F\u002Fgithub.com\u002Fspatialaudio\u002Fpython-sounddevice) 工具箱来实现实时音频处理。该脚本基于 `wire.py` 示例编写。它在一台 2012 年中期的旧款 MacBook Air 上运行良好，因此很可能在大多数较新的设备上也能正常运行。在量化版本中，它已在 Raspberry Pi 3B+ 上成功测试过。\n\n首先检查您的音频设备：\n```\n$ python real_time_dtln_audio.py --list-devices\n```\n选择输入和输出设备的索引，然后运行：\n```\n$ python real_time_dtln_audio.py -i in_device_idx -o out_device_idx\n```\n如果脚本显示过多的 `input underflow` 错误，请重新启动脚本。如果问题仍未解决，可使用 `--latency` 选项增加延迟。默认值为 0.2。\n\n[返回目录](#readme-contents)\n\n---\n### 使用 ONNX 进行模型转换与实时处理：\n\n最终，ONNX 模型也成功运行起来了。\n要完成模型转换，需要 TensorFlow 2.1 和 keras2onnx。keras2onnx 可从 [这里](https:\u002F\u002Fgithub.com\u002Fonnx\u002Fkeras-onnx) 下载，并需按照 README 中的说明从源代码安装。所有依赖项安装完成后，运行以下命令即可将模型转换为 ONNX 格式：\n```\n$ python convert_weights_to_onnx.py -m \u002Fname\u002Fof\u002Fthe\u002Fmodel.h5 -t onnx_model_name\n```\n与 TF Lite 模型类似，该模型也会被拆分为两部分。不过，此转换在 macOS 上无法完成。\n实时处理的方式与 TF Lite 类似，可在 `real_time_processing_onnx.py` 文件中查看。\n该脚本所需的 ONNX 运行时可通过以下命令安装：\n```\n$ pip install onnxruntime\n```\n在 2012 年中期的 MacBook Air 上，处理一个数据块的执行时间约为 1.13 毫秒。","# DTLN 快速上手指南\n\nDTLN (Dual-signal Transformation LSTM Network) 是一个用于实时噪声抑制的深度学习模型。它结合了短时傅里叶变换 (STFT) 和学习到的分析\u002F综合基，参数量小于 100 万，能够在树莓派等边缘设备上实现实时音频处理。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: Linux (推荐 Ubuntu 18.04), macOS, Windows\n- **硬件**:\n  - **训练**: 需要 NVIDIA GPU (显存至少 5GB)，推荐 RTX 2080 Ti 或更高。\n  - **推理\u002F评估**: CPU 即可运行。若在树莓派 3 B+ 上运行实时处理，建议使用 TF-Lite 量化版本。\n- **软件依赖**:\n  - Python 3.x\n  - TensorFlow 2.1+ (训练推荐 TF 2.1 + CUDA 10.1 + Nvidia Driver 418+)\n  - `librosa`, `wavinfo`\n\n### 前置依赖安装\n推荐使用 `conda` 管理环境。项目提供了预定义的环境配置文件。\n\n**1. 克隆仓库**\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fbreizhn\u002FDTLN.git\ncd DTLN\n# 确保已安装 git-lfs 以获取大文件（如预训练模型）\ngit lfs install\ngit lfs pull\n```\n\n**2. 创建运行环境**\n根据需求选择以下任一命令创建环境：\n\n*   **仅评估\u002F推理 (CPU 版)**:\n    ```bash\n    conda env create -f eval_env.yml\n    conda activate eval_env\n    ```\n*   **训练 (GPU 版)**:\n    ```bash\n    conda env create -f train_env.yml\n    conda activate train\n    ```\n*   **TF-Lite 推理**:\n    ```bash\n    conda env create -f tflite_env.yml\n    conda activate tflite\n    # 注意：需额外下载 TF-Lite runtime，参考官方文档或仓库说明\n    ```\n\n> **提示**: 国内用户若下载 conda 包较慢，可配置清华或中科大镜像源：\n> ```bash\n> conda config --add channels https:\u002F\u002Fmirrors.tuna.tsinghua.edu.cn\u002Fanaconda\u002Fpkgs\u002Fmain\u002F\n> conda config --add channels https:\u002F\u002Fmirrors.tuna.tsinghua.edu.cn\u002Fanaconda\u002Fpkgs\u002Ffree\u002F\n> ```\n\n## 安装步骤\n\n如果使用 `pip` 手动安装而非 conda 环境文件，请执行以下命令：\n\n```bash\npip install tensorflow>=2.1 librosa wavinfo numpy soundfile\n```\n\n*注：若需 GPU 支持，请安装 `tensorflow-gpu` 并确保系统已正确安装 CUDA 和 cuDNN。*\n\n## 基本使用\n\n本仓库提供了预训练模型，可直接对噪声音频进行降噪处理。\n\n### 1. 批量处理音频文件\n使用提供的 `run_evaluation.py` 脚本，可以将文件夹内的所有 `.wav` 文件进行降噪处理。\n\n**命令格式**:\n```bash\npython run_evaluation.py -i \u003C输入文件夹路径> -o \u003C输出文件夹路径> -m \u003C模型文件路径>\n```\n\n**示例**:\n假设你有一个包含噪声音频的文件夹 `noisy_audio`，想要将处理后的文件保存到 `clean_audio`，并使用仓库自带的预训练模型：\n\n```bash\npython run_evaluation.py -i .\u002Fnoisy_audio -o .\u002Fclean_audio -m .\u002Fpretrained_model\u002Fmodel.h5\n```\n\n*   脚本会自动保留输入文件夹的子目录结构。\n*   输出文件名与输入文件保持一致。\n\n### 2. 可用的预训练模型\n在 `.\u002Fpretrained_model\u002F` 目录下提供了几种不同配置的模型：\n\n| 模型文件 | 描述 | 适用场景 |\n| :--- | :--- | :--- |\n| `model.h5` | DNS-Challenge 原始比赛模型 | 通用基准 |\n| `DTLN_norm_500h.h5` | 500 小时数据训练 + 幅度归一化 | 鲁棒性更强，推荐首选 |\n| `DTLN_norm_40h.h5` | 40 小时数据训练 + 幅度归一化 | 数据较少时的替代方案 |\n| `*.tflite` | TF-Lite 格式 | 移动端\u002F树莓派实时推理 |\n| `*.onnx` | ONNX 格式 | 跨平台部署 |\n\n### 3. 实时处理 (简要)\n若需在 Python 中进行流式实时处理，可参考 `real_time_processing.py`。核心逻辑是加载 SavedModel 并逐帧调用：\n\n```python\nimport tensorflow as tf\n\n# 加载状态模型 (Stateful Model)\nmodel = tf.saved_model.load('.\u002Fpretrained_model\u002Fdtln_saved_model')\ninfer = model.signatures['serving_default']\n\n# 逐帧推理 (frame 为单帧音频数据)\n# output = infer(input=tf.constant(frame, dtype=tf.float32))\n# 具体实现请参考 real_time_processing.py\n```\n\n对于树莓派等嵌入式设备，建议使用 `model_quant_1.tflite` 和 `model_quant_2.tflite` 配合外部状态管理以实现低于 8ms 的延迟。","一位远程医疗平台的开发者正在为医生端应用集成实时语音通话功能，但面临嘈杂家庭环境下的沟通障碍。\n\n### 没有 DTLN 时\n- 背景噪音（如键盘声、宠物叫声）严重干扰医患对话，导致关键病情信息听不清。\n- 传统降噪算法延迟高，在树莓派等边缘设备上无法实现流畅的“一进一出”实时处理。\n- 模型参数量过大，难以部署到移动端或低算力 IoT 设备，限制了服务的覆盖范围。\n- 语音相位信息丢失严重，处理后的人声听起来机械失真，影响医生的判断体验。\n\n### 使用 DTLN 后\n- 利用堆叠双信号变换 LSTM 网络，精准分离人声与背景噪点，即使在吵闹环境中也能清晰传达医嘱。\n- 凭借少于 100 万参数的轻量级架构，在树莓派上即可实现低延迟实时音频流处理，确保对话零卡顿。\n- 支持 TF-lite 和 ONNX 格式导出，轻松将高性能降噪模型嵌入各类移动端及边缘计算硬件中。\n- 结合短时傅里叶变换与学习到的分析综合基，有效保留语音相位细节，输出自然保真的清晰人声。\n\nDTLN 以极低的算力成本实现了业界领先的实时降噪效果，让边缘设备上的高质量语音交互成为可能。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fbreizhn_DTLN_2c476ad0.png","breizhn","Nils L. Westhausen","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fbreizhn_5608acae.jpg","Researcher @ Bose.\r\nFormer PhD candidate in the Communication Acoustics group at the University of Oldenburg. Working on speech enhancement and separation.",null,"Oldenburg ","https:\u002F\u002Fgithub.com\u002Fbreizhn",[84],{"name":85,"color":86,"percentage":87},"Python","#3572A5",100,703,172,"2026-04-01T04:44:07","MIT","Linux, macOS","训练必需：NVIDIA GPU，显存至少 5GB，推荐 CUDA 10.1 及 NVIDIA 驱动 418+；仅评估推理可使用 CPU 版本","未说明",{"notes":96,"python":97,"dependencies":98},"建议使用 conda 或 pyenv\u002Fvirtualenv 管理环境。训练数据建议存放在 SSD 以加快读取速度。在树莓派 3 B+ 上运行量化后的 TF-lite 模型可实现实时处理。TF-lite 运行时需单独下载。","未说明 (需支持 TensorFlow 2.x)",[99,100,101,102,103],"tensorflow>=2.1","librosa","wavinfo","numpy","soundfile",[55,13],[106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121],"noise-reduction","deep-learning","audio","real-time-audio","audio-processing","noise-suppression","tensorflow","dns-challenge","dtln-model","speech-denoising","speech-processing","speech-enhancement","keras","tf-lite","raspberry-pi","onnx","2026-03-27T02:49:30.150509","2026-04-06T07:12:50.791284",[125,130,135,140,145,149],{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},15108,"如何将 DTLN 模型转换为 TensorFlow.js 格式？直接使用 SavedModel 转换报错怎么办？","不要直接使用 SavedModel 进行转换，这会导致类型不兼容错误。正确的做法是：\n1. 从 `DTLN_model.py` 中实例化 `build_DTLN_model_stateful` 构建模型。\n2. 加载 `.\u002Fpretrained_model` 文件夹中的权重文件（.h5）。\n3. 使用 `tfjs.converters.save_keras_model` 保存为 TFJS 格式。\n\n参考代码：\n```python\nimport tensorflowjs as tfjs\nfrom DTLN_model import DTLN_model\n\nmodel_class = DTLN_model()\nmodel_class.build_DTLN_model_stateful()\nmodel_class.model.load_weights('.\u002Fpretrained_model\u002Fmodel.h5')\ntfjs.converters.save_keras_model(model_class.model, 'DTLN_js')\n```\n注意：加载模型后预测时若报 `expected ndim=3, found ndim=2` 错误，需确保输入数据维度与模型定义一致（通常需要扩展批次维度或时间步维度）。","https:\u002F\u002Fgithub.com\u002Fbreizhn\u002FDTLN\u002Fissues\u002F4",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},15109,"实时运行时语音出现闪烁（flickering）或断续感，如何解决？","实时处理时的音频闪烁通常由块处理（block processing）和重叠相加逻辑不当引起。维护者提供了 `real_time_dtln_audio.py` 文件作为参考。\n如果官方代码仍有问题，可以尝试手动实现重叠相加逻辑：\n1. 初始化一个输出缓冲区 `out_file`。\n2. 遍历每个音频块，提取对应片段并扩展维度（`tf.expand_dims`）。\n3. 推理得到输出块。\n4. 将输出块累加到输出缓冲区的对应位置（考虑 `block_shift` 偏移）。\n\n关键代码逻辑示例：\n```python\nself.out_file = np.zeros((4096), dtype='float32')\nfor idx in range(self.num_blocks):\n    in_block = audio[idx*self.block_shift:(idx*self.block_shift)+self.block_len]\n    in_block = tf.expand_dims(in_block, axis=0)\n    out_block = self.infer(in_block)['conv1d_1']\n    # 重叠相加\n    self.out_file[idx*self.block_shift:(idx*self.block_shift)+self.block_len] += out_block\n```\n调整块大小和移位参数也可能改善效果。","https:\u002F\u002Fgithub.com\u002Fbreizhn\u002FDTLN\u002Fissues\u002F3",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},15110,"训练模型时在 80-90 个 epoch 左右提前停止且验证损失不再下降，如何提升性能？","为避免过早收敛并提升性能，建议优化数据生成策略和训练配置：\n1. **数据混合**：对每个样本随机选取噪声和语音文件，并在 -5 到 25 dB 的范围内随机混合信噪比（SNR）。\n2. **样本长度**：生成 4 秒长的音频样本。\n3. **批次大小**：设置 batchsize 为 16。\n4. **输入电平**：建议在合理范围内变化输入电平以增加鲁棒性。\n\n需在代码中确认以下配置：\n- `DTLN_model.py` 中设置 `self.batchsize = 16` 和 `self.len_samples = 4`。\n- `noisyspeech_synthesizer.cfg` 中确认 SNR 范围已设为 -5 到 25 dB。\n若修改后训练在 10 个 epoch 就发散，请检查学习率或数据归一化是否正确。","https:\u002F\u002Fgithub.com\u002Fbreizhn\u002FDTLN\u002Fissues\u002F30",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},15111,"如何利用预训练权重进行迁移学习以去除块移位（block shift），能否将 stateful 改为 stateless？","可以尝试使用预训练权重进行迁移学习来消除推理时的块处理依赖。具体思路如下：\n1. 加载预训练权重。\n2. 将网络中的 `stftLayer` 替换为 `fftLayer`（如果需要频域处理方式的改变）。\n3. 在重新训练时，可以将两个分离核（separation kernels）的 `stateful` 参数设置为 `False`。\n4. 建议训练约 20 个 epoch。\n\n注意：如果在基线网络架构中发现 LSTM 层之间缺少 Dropout 层，可能会影响效果，需根据具体架构调整。此方法适用于希望从分块处理过渡到流式或非分块处理的场景。","https:\u002F\u002Fgithub.com\u002Fbreizhn\u002FDTLN\u002Fissues\u002F11",{"id":146,"question_zh":147,"answer_zh":148,"source_url":129},15112,"加载转换后的 TFJS 模型时报错 `Input 0 is incompatible with layer lstm: expected ndim=3, found ndim=2` 是什么原因？","该错误表明输入数据的维度与 LSTM 层期望的维度不匹配。LSTM 层通常期望输入形状为 `(batch_size, time_steps, features)` 即 3 维，但实际传入的是 2 维数据。\n\n解决方案：\n1. 检查模型摘要（Model Summary），确认输入层定义的形状（例如 `[1, 512]` 可能隐含了批次和时间步的特殊处理）。\n2. 在调用 `model.predict()` 之前，确保对输入数据进行正确的维度扩展。如果输入是单帧或单通道，需要使用 `tf.expand_dims` 添加缺失的时间步维度或批次维度。\n3. 如果是从 JSON 加载模型失败，尝试直接使用 `.h5` 权重文件配合源代码重建模型结构后再加载权重，避免 JSON 序列化过程中的元数据丢失。",{"id":150,"question_zh":151,"answer_zh":152,"source_url":144},15113,"只有 40 小时的数据量，如何通过数据增强或预处理来有效训练网络？","在小数据集（如 40 小时）上训练时，必须依赖强力的数据增强策略：\n1. **动态混音**：不要固定使用某几个噪声文件，而是对每个训练样本随机选择噪声和语音进行混合。\n2. **随机 SNR**：将信噪比（SNR）在较大范围（如 -5 到 25 dB）内随机采样，增加模型对不同噪声强度的适应性。\n3. **时长截取**：使用固定长度（如 4 秒）的切片进行训练，并设置合适的批次大小（如 16），以增加每个 epoch 的迭代次数。\n4. **增益变化**：在预处理阶段随机改变输入音频的电平（Gain），模拟不同录音音量情况。\n这些措施能显著增加数据的多样性，防止模型过拟合，从而在有限数据下获得更好的泛化能力。",[]]