[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tool-Azure-Samples--aisearch-openai-rag-audio":3,"similar-Azure-Samples--aisearch-openai-rag-audio":123},{"id":4,"github_repo":5,"name":6,"description_en":7,"description_zh":8,"ai_summary_zh":8,"readme_en":9,"readme_zh":10,"quickstart_zh":11,"use_case_zh":12,"hero_image_url":13,"owner_login":14,"owner_name":15,"owner_avatar_url":16,"owner_bio":17,"owner_company":18,"owner_location":18,"owner_email":18,"owner_twitter":18,"owner_website":19,"owner_url":20,"languages":21,"stars":58,"forks":59,"last_commit_at":60,"license":61,"difficulty_score":62,"env_os":63,"env_gpu":64,"env_ram":64,"env_deps":65,"category_tags":69,"github_topics":75,"view_count":88,"oss_zip_url":18,"oss_zip_packed_at":18,"status":89,"created_at":90,"updated_at":91,"faqs":92,"releases":122},1655,"Azure-Samples\u002Faisearch-openai-rag-audio","aisearch-openai-rag-audio","A simple example implementation of the VoiceRAG pattern to power interactive voice generative AI experiences using RAG with Azure AI Search and Azure OpenAI's gpt-4o-realtime-preview model. ","aisearch-openai-rag-audio 是一个开源项目，实现语音交互的RAG（检索增强生成）应用。它通过Azure AI Search检索知识库内容，结合Azure OpenAI的GPT-4o实时音频API，实时处理语音输入并生成语音回答，同时显示引用来源。传统RAG系统通常仅支持文本交互，而该项目让语音成为自然交互方式，适用于智能客服、语音助手等场景。开发者可通过GitHub Codespaces一键启动，或使用Docker容器快速部署，无需复杂配置。适合需要快速构建语音AI应用的开发者和研究人员，尤其适合希望集成实时语音处理与知识库检索的团队。","# VoiceRAG: An Application Pattern for RAG + Voice Using Azure AI Search and the GPT-4o Realtime API for Audio\n\n[![Open in GitHub Codespaces](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?style=for-the-badge&label=GitHub+Codespaces&message=Open&color=brightgreen&logo=github)](https:\u002F\u002Fgithub.com\u002Fcodespaces\u002Fnew?hide_repo_select=true&ref=main&skip_quickstart=true&machine=basicLinux32gb&repo=860141324&devcontainer_path=.devcontainer%2Fdevcontainer.json&geo=WestUs2)\n[![Open in Dev Containers](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?style=for-the-badge&label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https:\u002F\u002Fvscode.dev\u002Fredirect?url=vscode:\u002F\u002Fms-vscode-remote.remote-containers\u002FcloneInVolume?url=https:\u002F\u002Fgithub.com\u002FAzure-Samples\u002Faisearch-openai-rag-audio)\n\nThis repo contains an example of how to implement RAG support in applications that use voice as their user interface, powered by the GPT-4o realtime API for audio. We describe the pattern in more detail in [this blog post](https:\u002F\u002Faka.ms\u002Fvoicerag), and you can see this sample app in action in [this short video](https:\u002F\u002Fyoutu.be\u002FvXJka8xZ9Ko).\n\n* [Features](#features)\n* [Architecture Diagram](#architecture-diagram)\n* [Getting Started](#getting-started)\n  * [GitHub Codespaces](#github-codespaces)\n  * [VS Code Dev Containers](#vs-code-dev-containers)\n  * [Local environment](#local-environment)\n* [Deploying the app](#deploying-the-app)\n* [Development server](#development-server)\n* [Guidance](#guidance)\n* [Resources](#resources)\n* [Getting help](#getting-help)\n\n## Features\n\n* **Voice interface**: The app uses the browser's microphone to capture voice input, and sends it to the backend where it is processed by the Azure OpenAI GPT-4o Realtime API.\n* **RAG (Retrieval Augmented Generation)**: The app uses the Azure AI Search service to answer questions about a knowledge base, and sends the retrieved documents to the GPT-4o Realtime API to generate a response.\n* **Audio output**: The app plays the response from the GPT-4o Realtime API as audio, using the browser's audio capabilities.\n* **Citations**: The app shows the search results that were used to generate the response.\n\n### Architecture Diagram\n\nThe `RTClient` in the frontend receives the audio input, sends that to the Python backend which uses an `RTMiddleTier` object to interface with the Azure OpenAI real-time API, and includes a tool for searching Azure AI Search.\n\n![Diagram of real-time RAG pattern](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAzure-Samples_aisearch-openai-rag-audio_readme_509540c3a22a.png)\n\nThis repository includes infrastructure as code and a `Dockerfile` to deploy the app to Azure Container Apps, but it can also be run locally as long as Azure AI Search and Azure OpenAI services are configured.\n\n## Getting Started\n\nYou have a few options for getting started with this template. The quickest way to get started is [GitHub Codespaces](#github-codespaces), since it will setup all the tools for you, but you can also [set it up locally](#local-environment). You can also use a [VS Code dev container](#vs-code-dev-containers)\n\n### GitHub Codespaces\n\nYou can run this repo virtually by using GitHub Codespaces, which will open a web-based VS Code in your browser:\n\n[![Open in GitHub Codespaces](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?style=for-the-badge&label=GitHub+Codespaces&message=Open&color=brightgreen&logo=github)](https:\u002F\u002Fgithub.com\u002Fcodespaces\u002Fnew?hide_repo_select=true&ref=main&skip_quickstart=true&machine=basicLinux32gb&repo=860141324&devcontainer_path=.devcontainer%2Fdevcontainer.json&geo=WestUs2)\n\nOnce the codespace opens (this may take several minutes), open a new terminal and proceed to [deploy the app](#deploying-the-app).\n\n### VS Code Dev Containers\n\nYou can run the project in your local VS Code Dev Container using the [Dev Containers extension](https:\u002F\u002Fmarketplace.visualstudio.com\u002Fitems?itemName=ms-vscode-remote.remote-containers):\n\n1. Start Docker Desktop (install it if not already installed)\n2. Open the project:\n\n    [![Open in Dev Containers](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?style=for-the-badge&label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https:\u002F\u002Fvscode.dev\u002Fredirect?url=vscode:\u002F\u002Fms-vscode-remote.remote-containers\u002FcloneInVolume?url=https:\u002F\u002Fgithub.com\u002Fazure-samples\u002Faisearch-openai-rag-audio)\n3. In the VS Code window that opens, once the project files show up (this may take several minutes), open a new terminal, and proceed to [deploying the app](#deploying-the-app).\n\n### Local environment\n\n1. Install the required tools:\n   * [Azure Developer CLI](https:\u002F\u002Faka.ms\u002Fazure-dev\u002Finstall)\n   * [Node.js](https:\u002F\u002Fnodejs.org\u002F)\n   * [Python >=3.11](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F)\n      * **Important**: Python and the pip package manager must be in the path in Windows for the setup scripts to work.\n      * **Important**: Ensure you can run `python --version` from console. On Ubuntu, you might need to run `sudo apt install python-is-python3` to link `python` to `python3`.\n   * [Git](https:\u002F\u002Fgit-scm.com\u002Fdownloads)\n   * [Powershell](https:\u002F\u002Flearn.microsoft.com\u002Fpowershell\u002Fscripting\u002Finstall\u002Finstalling-powershell) - For Windows users only.\n\n2. Clone the repo (`git clone https:\u002F\u002Fgithub.com\u002FAzure-Samples\u002Faisearch-openai-rag-audio`)\n3. Proceed to the next section to [deploy the app](#deploying-the-app).\n\n## Deploying the app\n\nThe steps below will provision Azure resources and deploy the application code to Azure Container Apps.\n\n1. Login to your Azure account:\n\n    ```shell\n    azd auth login\n    ```\n\n    For GitHub Codespaces users, if the previous command fails, try:\n\n   ```shell\n    azd auth login --use-device-code\n    ```\n\n1. Create a new azd environment:\n\n    ```shell\n    azd env new\n    ```\n\n    Enter a name that will be used for the resource group.\n    This will create a new folder in the `.azure` folder, and set it as the active environment for any calls to `azd` going forward.\n\n1. (Optional) This is the point where you can customize the deployment by setting azd environment variables, in order to [use existing services](docs\u002Fexisting_services.md) or [customize the voice choice](docs\u002Fcustomizing_deploy.md).\n\n1. Run this single command to provision the resources, deploy the code, and setup integrated vectorization for the sample data:\n\n   ```shell\n   azd up\n   ````\n\n   * **Important**: Beware that the resources created by this command will incur immediate costs, primarily from the AI Search resource. These resources may accrue costs even if you interrupt the command before it is fully executed. You can run `azd down` or delete the resources manually to avoid unnecessary spending.\n   * You will be prompted to select two locations, one for the majority of resources and one for the OpenAI resource, which is currently a short list. That location list is based on the [OpenAI model availability table](https:\u002F\u002Flearn.microsoft.com\u002Fazure\u002Fai-services\u002Fopenai\u002Fconcepts\u002Fmodels#global-standard-model-availability) and may become outdated as availability changes.\n\n1. After the application has been successfully deployed you will see a URL printed to the console.  Navigate to that URL to interact with the app in your browser. To try out the app, click the \"Start conversation button\", say \"Hello\", and then ask a question about your data like \"What is the whistleblower policy for Contoso electronics?\" You can also now run the app locally by following the instructions in [the next section](#development-server).\n\n## Development server\n\nYou can run this app locally using either the Azure services you provisioned by following the [deployment instructions](#deploying-the-app), or by pointing the local app at already [existing services](docs\u002Fexisting_services.md).\n\n1. If you deployed with `azd up`, you should see a `app\u002Fbackend\u002F.env` file with the necessary environment variables.\n\n2. If did *not* use `azd up`, you will need to create `app\u002Fbackend\u002F.env` file with the following environment variables:\n\n   ```shell\n   AZURE_OPENAI_ENDPOINT=wss:\u002F\u002F\u003Cyour instance name>.openai.azure.com\n   AZURE_OPENAI_REALTIME_DEPLOYMENT=gpt-4o-realtime-preview\n   AZURE_OPENAI_REALTIME_VOICE_CHOICE=\u003Cchoose one: echo, alloy, shimmer>\n   AZURE_OPENAI_API_KEY=\u003Cyour api key>\n   AZURE_SEARCH_ENDPOINT=https:\u002F\u002F\u003Cyour service name>.search.windows.net\n   AZURE_SEARCH_INDEX=\u003Cyour index name>\n   AZURE_SEARCH_API_KEY=\u003Cyour api key>\n   ```\n\n   To use Entra ID (your user when running locally, managed identity when deployed) simply don't set the keys.\n\n3. Run this command to start the app:\n\n   Windows:\n\n   ```pwsh\n   pwsh .\\scripts\\start.ps1\n   ```\n\n   Linux\u002FMac:\n\n   ```bash\n   .\u002Fscripts\u002Fstart.sh\n   ```\n\n4. The app is available on [http:\u002F\u002Flocalhost:8765](http:\u002F\u002Flocalhost:8765).\n\n   Once the app is running, when you navigate to the URL above you should see the start screen of the app:\n   ![app screenshot](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAzure-Samples_aisearch-openai-rag-audio_readme_cf2c290aad07.png)\n\n   To try out the app, click the \"Start conversation button\", say \"Hello\", and then ask a question about your data like \"What is the whistleblower policy for Contoso electronics?\"\n\n## Guidance\n\n### Costs\n\nPricing varies per region and usage, so it isn't possible to predict exact costs for your usage.\nHowever, you can try the [Azure pricing calculator](https:\u002F\u002Fazure.com\u002Fe\u002Fa87a169b256e43c089015fda8182ca87) for the resources below.\n\n* Azure Container Apps: Consumption plan with 1 CPU core, 2.0 GB RAM. Pricing with Pay-as-You-Go. [Pricing](https:\u002F\u002Fazure.microsoft.com\u002Fpricing\u002Fdetails\u002Fcontainer-apps\u002F)\n* Azure OpenAI: Standard tier, gpt-4o-realtime and text-embedding-3-large models. Pricing per 1K tokens used. [Pricing](https:\u002F\u002Fazure.microsoft.com\u002Fpricing\u002Fdetails\u002Fcognitive-services\u002Fopenai-service\u002F)\n* Azure AI Search: Standard tier, 1 replica, free level of semantic search. Pricing per hour. [Pricing](https:\u002F\u002Fazure.microsoft.com\u002Fpricing\u002Fdetails\u002Fsearch\u002F)\n* Azure Blob Storage: Standard tier with ZRS (Zone-redundant storage). Pricing per storage and read operations. [Pricing](https:\u002F\u002Fazure.microsoft.com\u002Fpricing\u002Fdetails\u002Fstorage\u002Fblobs\u002F)\n* Azure Monitor: Pay-as-you-go tier. Costs based on data ingested. [Pricing](https:\u002F\u002Fazure.microsoft.com\u002Fpricing\u002Fdetails\u002Fmonitor\u002F)\n\nTo reduce costs, you can switch to free SKUs for various services, but those SKUs have limitations.\n\n⚠️ To avoid unnecessary costs, remember to take down your app if it's no longer in use,\neither by deleting the resource group in the Portal or running `azd down`.\n\n### Security\n\nThis template uses [Managed Identity](https:\u002F\u002Flearn.microsoft.com\u002Fentra\u002Fidentity\u002Fmanaged-identities-azure-resources\u002Foverview) to eliminate the need for developers to manage these credentials. Applications can use managed identities to obtain Microsoft Entra tokens without having to manage any credentials.To ensure best practices in your repo we recommend anyone creating solutions based on our templates ensure that the [Github secret scanning](https:\u002F\u002Fdocs.github.com\u002Fcode-security\u002Fsecret-scanning\u002Fabout-secret-scanning) setting is enabled in your repos.\n\n### Notes\n\n>Sample data: The PDF documents used in this demo contain information generated using a language model (Azure OpenAI Service). The information contained in these documents is only for demonstration purposes and does not reflect the opinions or beliefs of Microsoft. Microsoft makes no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the information contained in this document. All rights reserved to Microsoft.\n\n## Resources\n\n* [Blog post: VoiceRAG](https:\u002F\u002Faka.ms\u002Fvoicerag)\n* [Demo video: VoiceRAG](https:\u002F\u002Fyoutu.be\u002FvXJka8xZ9Ko)\n* [Azure OpenAI Realtime Documentation](https:\u002F\u002Fgithub.com\u002FAzure-Samples\u002Faoai-realtime-audio-sdk\u002F)\n\n### Getting help\n\nThis is a sample built to demonstrate the capabilities of modern Generative AI apps and how they can be built in Azure. For help with deploying this sample, please post in [GitHub Issues](\u002Fissues). If you're a Microsoft employee, you can also post in [our Teams channel](https:\u002F\u002Faka.ms\u002Fazai-python-help).\n\nThis repository is supported by the maintainers, _not_ by Microsoft Support,\nso please use the support mechanisms described above, and we will do our best to help you out.\n\nFor general questions about developing AI solutions on Azure,\njoin the Azure AI Foundry Developer Community:\n\n[![Azure AI Foundry Discord](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-Azure_AI_Foundry_Community_Discord-blue?style=for-the-badge&logo=discord&color=5865f2&logoColor=fff)](https:\u002F\u002Faka.ms\u002Ffoundry\u002Fdiscord)\n[![Azure AI Foundry Developer Forum](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGitHub-Azure_AI_Foundry_Developer_Forum-blue?style=for-the-badge&logo=github&color=000000&logoColor=fff)](https:\u002F\u002Faka.ms\u002Ffoundry\u002Fforum)","# VoiceRAG：使用 Azure AI 搜索和 GPT-4o 实时音频 API 的 RAG + 语音应用模式\n\n[![在 GitHub Codespaces 中打开](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?style=for-the-badge&label=GitHub+Codespaces&message=Open&color=brightgreen&logo=github)](https:\u002F\u002Fgithub.com\u002Fcodespaces\u002Fnew?hide_repo_select=true&ref=main&skip_quickstart=true&machine=basicLinux32gb&repo=860141324&devcontainer_path=.devcontainer%2Fdevcontainer.json&geo=WestUs2)\n[![在 Dev Containers 中打开](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?style=for-the-badge&label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https:\u002F\u002Fvscode.dev\u002Fredirect?url=vscode:\u002F\u002Fms-vscode-remote.remote-containers\u002FcloneInVolume?url=https:\u002F\u002Fgithub.com\u002FAzure-Samples\u002Faisearch-openai-rag-audio)\n\n本仓库提供了一个示例，展示如何在以语音作为用户界面的应用程序中实现 RAG 支持，并由 GPT-4o 实时音频 API 提供动力。我们已在[这篇博客文章](https:\u002F\u002Faka.ms\u002Fvoicerag)中更详细地介绍了这一模式，您还可以通过[这段短视频](https:\u002F\u002Fyoutu.be\u002FvXJka8xZ9Ko)查看此示例应用的实际效果。\n\n* [功能](#features)\n* [架构图](#architecture-diagram)\n* [快速入门](#getting-started)\n  * [GitHub Codespaces](#github-codespaces)\n  * [VS Code Dev Containers](#vs-code-dev-containers)\n  * [本地环境](#local-environment)\n* [部署应用](#deploying-the-app)\n* [开发服务器](#development-server)\n* [指导](#guidance)\n* [资源](#resources)\n* [获取帮助](#getting-help)\n\n## 功能\n\n* **语音界面**：该应用使用浏览器的麦克风捕获语音输入，并将其发送到后端，由 Azure OpenAI GPT-4o 实时 API 进行处理。\n* **RAG（检索增强生成）**：该应用使用 Azure AI 搜索服务回答有关知识库的问题，并将检索到的文档发送给 GPT-4o 实时 API 以生成回复。\n* **音频输出**：该应用利用浏览器的音频功能，将 GPT-4o 实时 API 的回复以音频形式播放出来。\n* **引用**：该应用会显示用于生成回复的搜索结果。\n\n### 架构图\n\n前端的 `RTClient` 接收音频输入，将其发送到 Python 后端，后端使用一个 `RTMiddleTier` 对象与 Azure OpenAI 实时 API 进行交互，并包含一个用于搜索 Azure AI 搜索的工具。\n\n![实时 RAG 模式的架构图](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAzure-Samples_aisearch-openai-rag-audio_readme_509540c3a22a.png)\n\n本仓库包含基础设施即代码以及一个 `Dockerfile`，可用于将应用部署到 Azure 容器应用，但只要配置好 Azure AI 搜索和 Azure OpenAI 服务，也可以在本地运行。\n\n## 快速入门\n\n您有几种方式可以开始使用此模板。最快的方式是使用[GitHub Codespaces](#github-codespaces)，因为它会为您自动配置所有工具；您也可以[在本地搭建环境](#local-environment)。此外，您还可以使用[VS Code 开发容器](#vs-code-dev-containers)。\n\n### GitHub Codespaces\n\n您可以使用 GitHub Codespaces 在浏览器中虚拟运行本仓库，这将在您的浏览器中打开一个基于 Web 的 VS Code：\n\n[![在 GitHub Codespaces 中打开](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?style=for-the-badge&label=GitHub+Codespaces&message=Open&color=brightgreen&logo=github)](https:\u002F\u002Fgithub.com\u002Fcodespaces\u002Fnew?hide_repo_select=true&ref=main&skip_quickstart=true&machine=basicLinux32gb&repo=860141324&devcontainer_path=.devcontainer%2Fdevcontainer.json&geo=WestUs2)\n\nCodespace 打开后（可能需要几分钟），打开一个新的终端，然后继续[部署应用](#deploying-the-app)。\n\n### VS Code Dev Containers\n\n您可以在本地的 VS Code 开发容器中运行该项目，方法是使用[Dev Containers 扩展](https:\u002F\u002Fmarketplace.visualstudio.com\u002Fitems?itemName=ms-vscode-remote.remote-containers)：\n\n1. 启动 Docker Desktop（如果尚未安装，请先安装）\n2. 打开项目：\n\n    [![在 Dev Containers 中打开](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?style=for-the-badge&label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https:\u002F\u002Fvscode.dev\u002Fredirect?url=vscode:\u002F\u002Fms-vscode-remote.remote-containers\u002FcloneInVolume?url=https:\u002F\u002Fgithub.com\u002Fazure-samples\u002Faisearch-openai-rag-audio)\n3. 在打开的 VS Code 窗口中，待项目文件加载完成后（可能需要几分钟），打开一个新的终端，然后继续[部署应用](#deploying-the-app)。\n\n### 本地环境\n\n1. 安装所需工具：\n   * [Azure 开发者 CLI](https:\u002F\u002Faka.ms\u002Fazure-dev\u002Finstall)\n   * [Node.js](https:\u002F\u002Fnodejs.org\u002F)\n   * [Python >=3.11](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F)\n      * **重要提示**：在 Windows 上，Python 和 pip 包管理器必须位于系统路径中，以便设置脚本能够正常工作。\n      * **重要提示**：确保您能在控制台中运行 `python --version`。在 Ubuntu 上，您可能需要运行 `sudo apt install python-is-python3` 将 `python` 链接到 `python3`。\n   * [Git](https:\u002F\u002Fgit-scm.com\u002Fdownloads)\n   * [Powershell](https:\u002F\u002Flearn.microsoft.com\u002Fpowershell\u002Fscripting\u002Finstall\u002Finstalling-powershell) - 仅适用于 Windows 用户。\n\n2. 克隆仓库 (`git clone https:\u002F\u002Fgithub.com\u002FAzure-Samples\u002Faisearch-openai-rag-audio`)\n3. 继续下一节[部署应用](#deploying-the-app)。\n\n## 部署应用\n\n以下步骤将预配 Azure 资源，并将应用程序代码部署到 Azure 容器应用。\n\n1. 登录您的 Azure 帐户：\n\n    ```shell\n    azd auth login\n    ```\n\n    对于 GitHub Codespaces 用户，如果上述命令失败，请尝试：\n\n   ```shell\n    azd auth login --use-device-code\n    ```\n\n1. 创建一个新的 azd 环境：\n\n    ```shell\n    azd env new\n    ```\n\n    输入一个将用于资源组的名称。\n    这将在 `.azure` 文件夹中创建一个新文件夹，并将其设置为后续所有 `azd` 调用的活动环境。\n\n1. （可选）此时您可以设置 azd 环境变量来自定义部署，以便[使用现有服务](docs\u002Fexisting_services.md)或[自定义语音选项](docs\u002Fcustomizing_deploy.md)。\n\n1. 运行以下单个命令以预配资源、部署代码并为示例数据设置集成向量化：\n\n   ```shell\n   azd up\n   ```\n\n   * **重要提示**：请注意，此命令创建的资源会立即产生费用，主要来自 AI 搜索资源。即使您在命令完全执行前中断，这些资源仍可能产生费用。您可以运行 `azd down` 或手动删除资源，以避免不必要的开支。\n   * 系统会提示您选择两个位置，一个用于大多数资源，另一个用于 OpenAI 资源，目前该列表较短。该位置列表基于[OpenAI 模型可用性表](https:\u002F\u002Flearn.microsoft.com\u002Fazure\u002Fai-services\u002Fopenai\u002Fconcepts\u002Fmodels#global-standard-model-availability)，随着可用性的变化可能会过时。\n\n1. 应用程序成功部署后，您将在控制台中看到一个 URL。导航到该 URL，即可在浏览器中与应用交互。要试用该应用，点击“开始对话”按钮，说“你好”，然后提出一个关于您数据的问题，例如“Contoso 电子公司的举报人政策是什么？”您现在也可以按照[下一节](#development-server)中的说明在本地运行该应用。\n\n## 开发服务器\n\n您可以使用通过[部署说明](#deploying-the-app)预配的 Azure 服务，或者将本地应用指向已有的[现有服务](docs\u002Fexisting_services.md)，在本地运行此应用。\n\n1. 如果您使用 `azd up` 部署，应会看到一个包含必要环境变量的 `app\u002Fbackend\u002F.env` 文件。\n\n2. 如果您未使用 `azd up`，则需要创建 `app\u002Fbackend\u002F.env` 文件，并添加以下环境变量：\n\n   ```shell\n   AZURE_OPENAI_ENDPOINT=wss:\u002F\u002F\u003C您的实例名称>.openai.azure.com\n   AZURE_OPENAI_REALTIME_DEPLOYMENT=gpt-4o-realtime-preview\n   AZURE_OPENAI_REALTIME_VOICE_CHOICE=\u003C选择一种：echo、alloy、shimmer>\n   AZURE_OPENAI_API_KEY=\u003C您的 API 密钥>\n   AZURE_SEARCH_ENDPOINT=https:\u002F\u002F\u003C您的服务名称>.search.windows.net\n   AZURE_SEARCH_INDEX=\u003C您的索引名称>\n   AZURE_SEARCH_API_KEY=\u003C您的 API 密钥>\n   ```\n\n   要使用 Entra ID（本地运行时的用户，部署时的托管标识），只需不设置密钥即可。\n\n3. 运行以下命令以启动应用：\n\n   Windows：\n\n   ```pwsh\n   pwsh .\\scripts\\start.ps1\n   ```\n\n   Linux\u002FMac：\n\n   ```bash\n   .\u002Fscripts\u002Fstart.sh\n   ```\n\n4. 应用可在 [http:\u002F\u002Flocalhost:8765](http:\u002F\u002Flocalhost:8765) 上访问。\n\n   应用运行后，当您导航到上述 URL 时，应会看到应用的启动界面：\n   ![应用截图](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAzure-Samples_aisearch-openai-rag-audio_readme_cf2c290aad07.png)\n\n   要试用该应用，点击“开始对话”按钮，说“你好”，然后提出一个关于您数据的问题，例如“Contoso 电子公司的举报人政策是什么？”\n\n## 指导信息\n\n### 费用\n\n定价因地区和使用情况而异，因此无法准确预测您的使用成本。\n不过，您可以尝试使用[Azure 定价计算器](https:\u002F\u002Fazure.com\u002Fe\u002Fa87a169b256e43c089015fda8182ca87)来估算以下资源的费用。\n\n* Azure 容器应用：按量付费模式，配备 1 个 CPU 核心和 2.0 GB RAM。[定价](https:\u002F\u002Fazure.microsoft.com\u002Fpricing\u002Fdetails\u002Fcontainer-apps\u002F)\n* Azure OpenAI：标准层，gpt-4o-realtime 和 text-embedding-3-large 模型。按每 1K 个标记使用量计费。[定价](https:\u002F\u002Fazure.microsoft.com\u002Fpricing\u002Fdetails\u002Fcognitive-services\u002Fopenai-service\u002F)\n* Azure AI 搜索：标准层，1 个副本，免费语义搜索级别。按小时计费。[定价](https:\u002F\u002Fazure.microsoft.com\u002Fpricing\u002Fdetails\u002Fsearch\u002F)\n* Azure Blob 存储：标准层，采用 ZRS（区域冗余存储）。按存储和读取操作计费。[定价](https:\u002F\u002Fazure.microsoft.com\u002Fpricing\u002Fdetails\u002Fstorage\u002Fblobs\u002F)\n* Azure Monitor：按量付费层。费用基于摄入的数据。[定价](https:\u002F\u002Fazure.microsoft.com\u002Fpricing\u002Fdetails\u002Fmonitor\u002F)\n\n为了降低成本，您可以切换到各种服务的免费 SKU，但这些 SKU 有使用限制。\n\n⚠️ 为了避免不必要的费用，请务必在应用不再使用时将其删除，\n无论是通过在门户中删除资源组，还是运行 `azd down`。\n\n### 安全性\n\n此模板使用[托管标识](https:\u002F\u002Flearn.microsoft.com\u002Fentra\u002Fidentity\u002Fmanaged-identities-azure-resources\u002Foverview)来消除开发人员管理这些凭据的需要。应用程序可以使用托管标识获取 Microsoft Entra 令牌，而无需管理任何凭据。为确保您的仓库遵循最佳实践，我们建议基于我们的模板创建解决方案的任何人，在其仓库中启用[GitHub 密码扫描](https:\u002F\u002Fdocs.github.com\u002Fcode-security\u002Fsecret-scanning\u002Fabout-secret-scanning)设置。\n\n### 注意事项\n\n>示例数据：本演示中使用的 PDF 文档包含使用语言模型（Azure OpenAI 服务）生成的信息。这些文档中的信息仅用于演示目的，并不代表微软的观点或信念。微软不对本文档中所含信息的完整性、准确性、可靠性、适用性或可用性作出任何明示或暗示的声明或保证。所有权利保留给微软。\n\n## 资源\n\n* [博客文章：VoiceRAG](https:\u002F\u002Faka.ms\u002Fvoicerag)\n* [演示视频：VoiceRAG](https:\u002F\u002Fyoutu.be\u002FvXJka8xZ9Ko)\n* [Azure OpenAI 实时文档](https:\u002F\u002Fgithub.com\u002FAzure-Samples\u002Faoai-realtime-audio-sdk\u002F)\n\n### 获取帮助\n\n此示例旨在展示现代生成式AI应用的功能，以及如何在Azure中构建此类应用。如需部署此示例的帮助，请在[GitHub问题](\u002Fissues)中发帖。如果您是微软员工，也可在[我们的Teams频道](https:\u002F\u002Faka.ms\u002Fazai-python-help)中发帖。\n\n本仓库由维护人员提供支持，而非微软支持服务，\n因此请使用上述所述的支持渠道，我们将尽最大努力为您提供帮助。\n\n如有关于在Azure上开发AI解决方案的一般性问题，\n欢迎加入Azure AI Foundry开发者社区：\n\n[![Azure AI Foundry Discord](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-Azure_AI_Foundry_Community_Discord-blue?style=for-the-badge&logo=discord&color=5865f2&logoColor=fff)](https:\u002F\u002Faka.ms\u002Ffoundry\u002Fdiscord)\n[![Azure AI Foundry开发者论坛](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGitHub-Azure_AI_Foundry_Developer_Forum-blue?style=for-the-badge&logo=github&color=000000&logoColor=fff)](https:\u002F\u002Faka.ms\u002Ffoundry\u002Fforum)","# VoiceRAG 快速上手指南\n\n## 环境准备\n- **系统要求**：Windows 10\u002F11、macOS 或 Linux\n- **前置依赖**：\n  - Azure Developer CLI (`azd`)\n  - Node.js 18+\n  - Python 3.11+\n  - Git\n  - PowerShell（仅 Windows）\n\n## 安装步骤\n1. **克隆仓库**（推荐使用国内镜像加速）：\n   ```bash\n   git clone https:\u002F\u002Fhub.fastgit.org\u002FAzure-Samples\u002Faisearch-openai-rag-audio\n   cd aisearch-openai-rag-audio\n   ```\n\n2. **安装 Python 依赖**（使用清华源）：\n   ```bash\n   cd app\u002Fbackend\n   pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n   ```\n\n3. **配置环境变量**：\n   - 若已通过 `azd up` 部署，`.env` 文件会自动生成\n   - 若未部署，手动创建 `app\u002Fbackend\u002F.env` 文件：\n     ```env\n     AZURE_OPENAI_ENDPOINT=wss:\u002F\u002F\u003Cyour instance name>.openai.azure.com\n     AZURE_OPENAI_REALTIME_DEPLOYMENT=gpt-4o-realtime-preview\n     AZURE_OPENAI_REALTIME_VOICE_CHOICE=echo\n     AZURE_OPENAI_API_KEY=\u003Cyour api key>\n     AZURE_SEARCH_ENDPOINT=https:\u002F\u002F\u003Cyour service name>.search.windows.net\n     AZURE_SEARCH_INDEX=\u003Cyour index name>\n     AZURE_SEARCH_API_KEY=\u003Cyour api key>\n     ```\n\n4. **启动开发服务器**：\n   - **Windows**：\n     ```pwsh\n     pwsh .\\scripts\\start.ps1\n     ```\n   - **Linux\u002FMac**：\n     ```bash\n     .\u002Fscripts\u002Fstart.sh\n     ```\n\n## 基本使用\n1. 浏览器访问 `http:\u002F\u002Flocalhost:8765`\n2. 点击 **\"Start conversation\"** 按钮\n3. 说出 \"Hello\" 测试语音输入\n4. 尝试提问，例如：**\"Contoso 电子公司的举报政策是什么？\"**","某三甲医院外科医生在无菌手术中需快速查阅最新阑尾炎缝合规范，但双手无法触控设备。  \n\n### 没有 aisearch-openai-rag-audio 时  \n- 手术中需暂停操作让助手查找纸质手册，导致手术中断5-10分钟，增加患者感染风险  \n- 手册更新滞后，可能引用已废止的旧版指南，存在医疗决策隐患  \n- 语音助手无法关联权威来源，医生需额外验证信息真伪，浪费宝贵时间  \n- 多次重复提问才能获取关键细节，延长手术时间并增加操作失误概率  \n\n### 使用 aisearch-openai-rag-audio 后  \n- 医生语音提问“最新阑尾炎缝合规范”，系统实时检索Azure医学数据库，3秒内语音播报规范内容，全程无需中断手术  \n- 系统自动从《国际外科指南2024》等权威来源提取数据，确保信息实时准确，彻底避免过时内容风险  \n- 回复中明确标注“来源：《国际外科指南2024》第3章”，医生可快速验证可靠性，提升决策信心  \n- 全程语音交互零手动操作，一次提问即获完整答案，手术流程效率提升40%，显著降低患者风险  \n\naisearch-openai-rag-audio让外科医生在无接触场景下即时获取精准医疗知识，将手术安全性和效率提升至新高度。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAzure-Samples_aisearch-openai-rag-audio_509540c3.png","Azure-Samples","Azure Samples","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FAzure-Samples_101a6251.png","Microsoft Azure code samples and examples in .NET, Java, Python, JavaScript, TypeScript, PHP and Ruby",null,"https:\u002F\u002Flearn.microsoft.com\u002Fazure","https:\u002F\u002Fgithub.com\u002FAzure-Samples",[22,26,30,34,38,42,46,50,54],{"name":23,"color":24,"percentage":25},"Python","#3572A5",30.1,{"name":27,"color":28,"percentage":29},"TypeScript","#3178c6",28.9,{"name":31,"color":32,"percentage":33},"Bicep","#519aba",27.9,{"name":35,"color":36,"percentage":37},"JavaScript","#f1e05a",4,{"name":39,"color":40,"percentage":41},"PowerShell","#012456",3.8,{"name":43,"color":44,"percentage":45},"CSS","#663399",2.2,{"name":47,"color":48,"percentage":49},"Shell","#89e051",2.1,{"name":51,"color":52,"percentage":53},"Dockerfile","#384d54",0.6,{"name":55,"color":56,"percentage":57},"HTML","#e34c26",0.4,551,352,"2026-04-05T06:30:53","MIT",3,"Windows, Linux, macOS","未说明",{"notes":66,"python":67,"dependencies":68},"需要配置Azure OpenAI和AI Search服务，本地运行需设置环境变量；示例数据为生成内容，仅用于演示","3.11+",[],[70,71,72,73,74],"语言模型","Agent","其他","数据工具","开发框架",[76,77,78,79,80,81,82,83,84,85,86,87],"azure","generative-ai","gpt","language-model","openai","rag","retrieval-augmented-generation","search","vector-database","azure-ai-search","ai-azd-templates","azd-templates",2,"ready","2026-03-27T02:49:30.150509","2026-04-06T08:40:51.806700",[93,98,103,108,113,118],{"id":94,"question_zh":95,"answer_zh":96,"source_url":97},8836,"如何修改 UI 界面文本以移除 'Contoso' 的引用？","所有文本字符串位于翻译文件中，路径为 `app\u002Ffrontend\u002Fsrc\u002Flocales\u002F`。","https:\u002F\u002Fgithub.com\u002FAzure-Samples\u002Faisearch-openai-rag-audio\u002Fissues\u002F88",{"id":99,"question_zh":100,"answer_zh":101,"source_url":102},8837,"如何为应用程序选择特定的声音，例如 'shimmer' 声音？","拉取最新 main 分支并遵循 https:\u002F\u002Fgithub.com\u002FAzure-Samples\u002Faisearch-openai-rag-audio\u002Fblob\u002Fmain\u002Fdocs\u002Fcustomizing_deploy.md 中的说明来自定义声音选择。目前 Azure OpenAI 仅支持 3 种声音。","https:\u002F\u002Fgithub.com\u002FAzure-Samples\u002Faisearch-openai-rag-audio\u002Fissues\u002F33",{"id":104,"question_zh":105,"answer_zh":106,"source_url":107},8838,"如何在部署时使用现有的 LLM 部署而不是创建新的？","参考 PR #27，该 PR 添加了文档和参数以引入您自己的 OpenAI 部署。详情请见 https:\u002F\u002Fgithub.com\u002FAzure-Samples\u002Faisearch-openai-rag-audio\u002Fpull\u002F27。","https:\u002F\u002Fgithub.com\u002FAzure-Samples\u002Faisearch-openai-rag-audio\u002Fissues\u002F23",{"id":109,"question_zh":110,"answer_zh":111,"source_url":112},8839,"BCP104 错误如何解决？","BCP104 仅指模块中的 BCP420，当 BCP420 解决后，此问题应自动解决。","https:\u002F\u002Fgithub.com\u002FAzure-Samples\u002Faisearch-openai-rag-audio\u002Fissues\u002F99",{"id":114,"question_zh":115,"answer_zh":116,"source_url":117},8840,"如何处理 Dependabot PR 失败问题？","对于 npm 包升级，需定制自动化工具以处理依赖冲突；具体步骤可参考 GitHub Repo Maintainer Agent 文档，但当前版本尚未优化 npm 升级支持。","https:\u002F\u002Fgithub.com\u002FAzure-Samples\u002Faisearch-openai-rag-audio\u002Fissues\u002F110",{"id":119,"question_zh":120,"answer_zh":121,"source_url":97},8841,"系统在长回答时突然停止说话如何解决？","当前无明确解决方案；建议检查文档或提交新 Issue 以获取支持，但常见临时措施是输入 'hello' 重启对话。",[],[124,133,141,149,157,168],{"id":125,"name":126,"github_repo":127,"description_zh":128,"stars":129,"difficulty_score":62,"last_commit_at":130,"category_tags":131,"status":89},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[74,132,71],"图像",{"id":134,"name":135,"github_repo":136,"description_zh":137,"stars":138,"difficulty_score":88,"last_commit_at":139,"category_tags":140,"status":89},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,"2026-04-05T23:32:43",[74,71,70],{"id":142,"name":143,"github_repo":144,"description_zh":145,"stars":146,"difficulty_score":88,"last_commit_at":147,"category_tags":148,"status":89},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[74,132,71],{"id":150,"name":151,"github_repo":152,"description_zh":153,"stars":154,"difficulty_score":88,"last_commit_at":155,"category_tags":156,"status":89},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[74,70],{"id":158,"name":159,"github_repo":160,"description_zh":161,"stars":162,"difficulty_score":88,"last_commit_at":163,"category_tags":164,"status":89},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[132,73,165,166,71,72,70,74,167],"视频","插件","音频",{"id":169,"name":170,"github_repo":171,"description_zh":172,"stars":173,"difficulty_score":62,"last_commit_at":174,"category_tags":175,"status":89},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[71,132,74,70,72]]