[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-llmware-ai--llmware":3,"tool-llmware-ai--llmware":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":95,"forks":96,"last_commit_at":97,"license":98,"difficulty_score":23,"env_os":99,"env_gpu":100,"env_ram":101,"env_deps":102,"category_tags":113,"github_topics":114,"view_count":124,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":125,"updated_at":126,"faqs":127,"releases":157},3064,"llmware-ai\u002Fllmware","llmware","Unified framework for building enterprise RAG pipelines with small, specialized models","llmware 是一个专为构建企业级检索增强生成（RAG）应用而设计的统一框架，特别擅长利用小型化、专业化的模型在本地设备上运行。它主要解决了企业在部署 AI 时面临的数据隐私安全、高昂算力成本以及大模型“幻觉”等痛点，让敏感数据无需上传云端即可在本地完成处理。\n\n这款工具非常适合希望在笔记本电脑、边缘设备或私有服务器上快速搭建安全 AI 应用的开发者和技术团队。即使没有昂贵的 GPU 集群，用户也能轻松实现从文档解析、知识库构建到模型推理的全流程。\n\nllmware 的核心亮点在于其庞大的模型目录与高效的本地推理能力。它预集成了 300 多个经过量化优化的模型，包括专为金融、法律等行业任务微调的 SLIM、Bling 和 Dragon 系列，支持 GGUF、OpenVINO 等多种格式，能充分调用本地 CPU、NPU 及显卡性能。同时，它提供了一站式的 RAG 流水线，支持 PDF、Office 文档等多种格式的自动解析与索引，帮助开发者以最小的计算足迹，快速构建准确、可持续且低成本的知识驱动型 AI 应用。","# llmware\n![Static Badge](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10_%7C_3.11%7C_3.12%7C_3.13%7C_3.14-blue?color=blue)\n![PyPI - Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fllmware?color=blue)\n[![members](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fllmware-ai_llmware_readme_c41105f405f2.png)](https:\u002F\u002Fdiscord.gg\u002FbphreFK4NJ)\n[![Documentation](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Factions\u002Fworkflows\u002Fpages.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Factions\u002Fworkflows\u002Fpages.yml)  \n\n## 🧰🛠️ Unified framework for building knowledge-based local, private, secure LLM-based applications       \n\n`llmware` is optimized for AI PC and local laptop, edge and self-hosted deployment across a wide range of Windows, Mac and Linux platforms, with support for GGUF, OpenVINO, ONNXRuntime, ONNXRuntime-QNN (Qualcomm), WindowsLocalFoundry, and Pytorch, providing a high-level interface that makes it easy to leverage the right inferencing technology optimized for the target platform.  \n\n `llmware` has two main components:  \n\n 1.  **Model catalog with 300+ models** - models prepackaged in quantized, optimized formats, to leverage on device GPU and NPU capabilities, with support for major open source model families and 50+ llmware finetuned SLIM, Bling, Dragon and Industry-Bert models specialized for key tasks in enterprise process automation.  Also supports leading cloud models from OpenAI, Anthropic and Google.  \n \n 2.  **RAG Pipeline** - integrated components for the full lifecycle of connecting knowledge sources to generative AI models with wide range of document parsing and ingestion capabilities, and the ability to create scalable knowledge bases.\n\nBy bringing together both of these components,  `llmware` offers a comprehensive set of tools to rapidly build knowledge-based enterprise LLM applications.  \n\nOur vision is that AI should be sustainable, accurate, and cost-effective, using the smallest possible compute footprint to get the job done.  \n\nVirtually all of our examples and models can be run on device - get started right away on your laptop.   \n\n[Join us on Discord](https:\u002F\u002Fdiscord.gg\u002FMhZn5Nc39h)   |  [Watch Youtube Tutorials](https:\u002F\u002Fwww.youtube.com\u002F@llmware)  | [Explore our Model Families on Huggingface](https:\u002F\u002Fwww.huggingface.co\u002Fllmware)   \n\n\n## 🎯  Key features \nWriting code with`llmware` is based on a few main concepts:\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Model Catalog\u003C\u002Fb>: Access all models the same way with easy lookup, regardless of underlying implementation. \n\u003C\u002Fsummary>  \n\n\n```python\n#   300+ Models in Catalog with 50+ RAG-optimized BLING, DRAGON and Industry BERT models\n#   Full support for GGUF, OpenVINO, Onnxruntime, HuggingFace, Sentence Transformers and major API-based models\n#   Easy to extend to add custom models - see examples\n\nfrom llmware.models import ModelCatalog\nfrom llmware.prompts import Prompt\n\n#   all models accessed through the ModelCatalog\nmodels = ModelCatalog().list_all_models()\n\n#   to use any model in the ModelCatalog - \"load_model\" method and pass the model_name parameter\nmy_model = ModelCatalog().load_model(\"llmware\u002Fbling-phi-3-gguf\")\n\n#   call model with: inference \noutput = my_model.inference(\"what is the future of AI?\", add_context=\"Here is the article to read\")\n\n#   call model with: stream\nfor token in my_model.stream(\"What is the future of AI?\"):\n    print(token, end=\"\")\n\n#   to integrate model into a Prompt\nprompter = Prompt().load_model(\"llmware\u002Fbling-tiny-llama-v0\")\nresponse = prompter.prompt_main(\"what is the future of AI?\", context=\"Insert Sources of information\")\n```\n\n\u003C\u002Fdetails>  \n\n\u003Cdetails>  \n\u003Csummary>\u003Cb>Library\u003C\u002Fb>:  ingest, organize and index a collection of knowledge at scale - Parse, Text Chunk and Embed. \u003C\u002Fsummary>  \n\n```python\n\nfrom llmware.library import Library\n\n#   to parse and text chunk a set of documents (pdf, pptx, docx, xlsx, txt, csv, md, json\u002Fjsonl, wav, png, jpg, html)  \n\n#   step 1 - create a library, which is the 'knowledge-base container' construct\n#          - libraries have both text collection (DB) resources, and file resources (e.g., llmware_data\u002Faccounts\u002F{library_name})\n#          - embeddings and queries are run against a library\n\nlib = Library().create_new_library(\"my_library\")\n\n#    step 2 - add_files is the universal ingestion function - point it at a local file folder with mixed file types\n#           - files will be routed by file extension to the correct parser, parsed, text chunked and indexed in text collection DB\n\nlib.add_files(\"\u002Ffolder\u002Fpath\u002Fto\u002Fmy\u002Ffiles\")\n\n#   to install an embedding on a library - pick an embedding model and vector_db\nlib.install_new_embedding(embedding_model_name=\"mini-lm-sbert\", vector_db=\"milvus\", batch_size=500)\n\n#   to add a second embedding to the same library (mix-and-match models + vector db)  \nlib.install_new_embedding(embedding_model_name=\"industry-bert-sec\", vector_db=\"chromadb\", batch_size=100)\n\n#   easy to create multiple libraries for different projects and groups\n\nfinance_lib = Library().create_new_library(\"finance_q4_2023\")\nfinance_lib.add_files(\"\u002Ffinance_folder\u002F\")\n\nhr_lib = Library().create_new_library(\"hr_policies\")\nhr_lib.add_files(\"\u002Fhr_folder\u002F\")\n\n#    pull library card with key metadata - documents, text chunks, images, tables, embedding record\nlib_card = Library().get_library_card(\"my_library\")\n\n#   see all libraries\nall_my_libs = Library().get_all_library_cards()\n\n```\n\u003C\u002Fdetails>  \n\n\u003Cdetails> \n\u003Csummary>\u003Cb>Query\u003C\u002Fb>: query libraries with mix of text, semantic, hybrid, metadata, and custom filters. \u003C\u002Fsummary>\n\n```python\n\nfrom llmware.retrieval import Query\nfrom llmware.library import Library\n\n#   step 1 - load the previously created library \nlib = Library().load_library(\"my_library\")\n\n#   step 2 - create a query object and pass the library\nq = Query(lib)\n\n#    step 3 - run lots of different queries  (many other options in the examples)\n\n#    basic text query\nresults1 = q.text_query(\"text query\", result_count=20, exact_mode=False)\n\n#    semantic query\nresults2 = q.semantic_query(\"semantic query\", result_count=10)\n\n#    combining a text query restricted to only certain documents in the library and \"exact\" match to the query\nresults3 = q.text_query_with_document_filter(\"new query\", {\"file_name\": \"selected file name\"}, exact_mode=True)\n\n#   to apply a specific embedding (if multiple on library), pass the names when creating the query object\nq2 = Query(lib, embedding_model_name=\"mini_lm_sbert\", vector_db=\"milvus\")\nresults4 = q2.semantic_query(\"new semantic query\")\n```\n\n\u003C\u002Fdetails>  \n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Prompt with Sources\u003C\u002Fb>: the easiest way to combine knowledge retrieval with a LLM inference. \u003C\u002Fsummary>\n\n```python\n\nfrom llmware.prompts import Prompt\nfrom llmware.retrieval import Query\nfrom llmware.library import Library\n\n#   build a prompt\nprompter = Prompt().load_model(\"llmware\u002Fbling-tiny-llama-v0\")\n\n#   add a file -> file is parsed, text chunked, filtered by query, and then packaged as model-ready context,\n#   including in batches, if needed, to fit the model context window\n\nsource = prompter.add_source_document(\"\u002Ffolder\u002Fto\u002Fone\u002Fdoc\u002F\", \"filename\", query=\"fast query\")\n\n#   attach query results (from a Query) into a Prompt\nmy_lib = Library().load_library(\"my_library\")\nresults = Query(my_lib).query(\"my query\")\nsource2 = prompter.add_source_query_results(results)\n\n#   run a new query against a library and load directly into a prompt\nsource3 = prompter.add_source_new_query(my_lib, query=\"my new query\", query_type=\"semantic\", result_count=15)\n\n#   to run inference with 'prompt with sources'\nresponses = prompter.prompt_with_source(\"my query\")\n\n#   to run fact-checks - post inference\nfact_check = prompter.evidence_check_sources(responses)\n\n#   to view source materials (batched 'model-ready' and attached to prompt)\nsource_materials = prompter.review_sources_summary()\n\n#   to see the full prompt history\nprompt_history = prompter.get_current_history()\n```\n\n\u003C\u002Fdetails>  \n\n\u003Cdetails> \n\u003Csummary>\u003Cb>RAG-Optimized Models\u003C\u002Fb> -  1-7B parameter models designed for RAG workflow integration and running locally. \u003C\u002Fsummary>  \n\n```\n\"\"\" This 'Hello World' example demonstrates how to get started using local BLING models with provided context, using both\nPytorch and GGUF versions. \"\"\"\n\nimport time\nfrom llmware.prompts import Prompt\n\n\ndef hello_world_questions():\n\n    test_list = [\n\n    {\"query\": \"What is the total amount of the invoice?\",\n     \"answer\": \"$22,500.00\",\n     \"context\": \"Services Vendor Inc. \\n100 Elm Street Pleasantville, NY \\nTO Alpha Inc. 5900 1st Street \"\n                \"Los Angeles, CA \\nDescription Front End Engineering Service $5000.00 \\n Back End Engineering\"\n                \" Service $7500.00 \\n Quality Assurance Manager $10,000.00 \\n Total Amount $22,500.00 \\n\"\n                \"Make all checks payable to Services Vendor Inc. Payment is due within 30 days.\"\n                \"If you have any questions concerning this invoice, contact Bia Hermes. \"\n                \"THANK YOU FOR YOUR BUSINESS!  INVOICE INVOICE # 0001 DATE 01\u002F01\u002F2022 FOR Alpha Project P.O. # 1000\"},\n\n    {\"query\": \"What was the amount of the trade surplus?\",\n     \"answer\": \"62.4 billion yen ($416.6 million)\",\n     \"context\": \"Japan’s September trade balance swings into surplus, surprising expectations\"\n                \"Japan recorded a trade surplus of 62.4 billion yen ($416.6 million) for September, \"\n                \"beating expectations from economists polled by Reuters for a trade deficit of 42.5 \"\n                \"billion yen. Data from Japan’s customs agency revealed that exports in September \"\n                \"increased 4.3% year on year, while imports slid 16.3% compared to the same period \"\n                \"last year. According to FactSet, exports to Asia fell for the ninth straight month, \"\n                \"which reflected ongoing China weakness. Exports were supported by shipments to \"\n                \"Western markets, FactSet added. — Lim Hui Jie\"},\n\n    {\"query\": \"When did the LISP machine market collapse?\",\n     \"answer\": \"1987.\",\n     \"context\": \"The attendees became the leaders of AI research in the 1960s.\"\n                \"  They and their students produced programs that the press described as 'astonishing': \"\n                \"computers were learning checkers strategies, solving word problems in algebra, \"\n                \"proving logical theorems and speaking English.  By the middle of the 1960s, research in \"\n                \"the U.S. was heavily funded by the Department of Defense and laboratories had been \"\n                \"established around the world. Herbert Simon predicted, 'machines will be capable, \"\n                \"within twenty years, of doing any work a man can do'.  Marvin Minsky agreed, writing, \"\n                \"'within a generation ... the problem of creating 'artificial intelligence' will \"\n                \"substantially be solved'. They had, however, underestimated the difficulty of the problem.  \"\n                \"Both the U.S. and British governments cut off exploratory research in response \"\n                \"to the criticism of Sir James Lighthill and ongoing pressure from the US Congress \"\n                \"to fund more productive projects. Minsky's and Papert's book Perceptrons was understood \"\n                \"as proving that artificial neural networks approach would never be useful for solving \"\n                \"real-world tasks, thus discrediting the approach altogether.  The 'AI winter', a period \"\n                \"when obtaining funding for AI projects was difficult, followed.  In the early 1980s, \"\n                \"AI research was revived by the commercial success of expert systems, a form of AI \"\n                \"program that simulated the knowledge and analytical skills of human experts. By 1985, \"\n                \"the market for AI had reached over a billion dollars. At the same time, Japan's fifth \"\n                \"generation computer project inspired the U.S. and British governments to restore funding \"\n                \"for academic research. However, beginning with the collapse of the Lisp Machine market \"\n                \"in 1987, AI once again fell into disrepute, and a second, longer-lasting winter began.\"},\n\n    {\"query\": \"What is the current rate on 10-year treasuries?\",\n     \"answer\": \"4.58%\",\n     \"context\": \"Stocks rallied Friday even after the release of stronger-than-expected U.S. jobs data \"\n                \"and a major increase in Treasury yields.  The Dow Jones Industrial Average gained 195.12 points, \"\n                \"or 0.76%, to close at 31,419.58. The S&P 500 added 1.59% at 4,008.50. The tech-heavy \"\n                \"Nasdaq Composite rose 1.35%, closing at 12,299.68. The U.S. economy added 438,000 jobs in \"\n                \"August, the Labor Department said. Economists polled by Dow Jones expected 273,000 \"\n                \"jobs. However, wages rose less than expected last month.  Stocks posted a stunning \"\n                \"turnaround on Friday, after initially falling on the stronger-than-expected jobs report. \"\n                \"At its session low, the Dow had fallen as much as 198 points; it surged by more than \"\n                \"500 points at the height of the rally. The Nasdaq and the S&P 500 slid by 0.8% during \"\n                \"their lowest points in the day.  Traders were unclear of the reason for the intraday \"\n                \"reversal. Some noted it could be the softer wage number in the jobs report that made \"\n                \"investors rethink their earlier bearish stance. Others noted the pullback in yields from \"\n                \"the day’s highs. Part of the rally may just be to do a market that had gotten extremely \"\n                \"oversold with the S&P 500 at one point this week down more than 9% from its high earlier \"\n                \"this year.  Yields initially surged after the report, with the 10-year Treasury rate trading \"\n                \"near its highest level in 14 years. The benchmark rate later eased from those levels, but \"\n                \"was still up around 6 basis points at 4.58%.  'We’re seeing a little bit of a give back \"\n                \"in yields from where we were around 4.8%. [With] them pulling back a bit, I think that’s \"\n                \"helping the stock market,' said Margaret Jones, chief investment officer at Vibrant Industries \"\n                \"Capital Advisors. 'We’ve had a lot of weakness in the market in recent weeks, and potentially \"\n                \"some oversold conditions.'\"},\n\n    {\"query\": \"Is the expected gross margin greater than 70%?\",\n     \"answer\": \"Yes, between 71.5% and 72.%\",\n     \"context\": \"Outlook NVIDIA’s outlook for the third quarter of fiscal 2024 is as follows:\"\n                \"Revenue is expected to be $16.00 billion, plus or minus 2%. GAAP and non-GAAP \"\n                \"gross margins are expected to be 71.5% and 72.5%, respectively, plus or minus \"\n                \"50 basis points.  GAAP and non-GAAP operating expenses are expected to be \"\n                \"approximately $2.95 billion and $2.00 billion, respectively.  GAAP and non-GAAP \"\n                \"other income and expense are expected to be an income of approximately $100 \"\n                \"million, excluding gains and losses from non-affiliated investments. GAAP and \"\n                \"non-GAAP tax rates are expected to be 14.5%, plus or minus 1%, excluding any discrete items.\"\n                \"Highlights NVIDIA achieved progress since its previous earnings announcement \"\n                \"in these areas:  Data Center Second-quarter revenue was a record $10.32 billion, \"\n                \"up 141% from the previous quarter and up 171% from a year ago. Announced that the \"\n                \"NVIDIA® GH200 Grace™ Hopper™ Superchip for complex AI and HPC workloads is shipping \"\n                \"this quarter, with a second-generation version with HBM3e memory expected to ship \"\n                \"in Q2 of calendar 2024. \"},\n\n    {\"query\": \"What is Bank of America's rating on Target?\",\n     \"answer\": \"Buy\",\n     \"context\": \"Here are some of the tickers on my radar for Thursday, Oct. 12, taken directly from \"\n                \"my reporter’s notebook: It’s the one-year anniversary of the S&P 500′s bear market bottom \"\n                \"of 3,577. Since then, as of Wednesday’s close of 4,376, the broad market index \"\n                \"soared more than 22%.  Hotter than expected September consumer price index, consumer \"\n                \"inflation. The Social Security Administration issues announced a 3.2% cost-of-living \"\n                \"adjustment for 2024.  Chipotle Mexican Grill (CMG) plans price increases. Pricing power. \"\n                \"Cites consumer price index showing sticky retail inflation for the fourth time \"\n                \"in two years. Bank of America upgrades Target (TGT) to buy from neutral. Cites \"\n                \"risk\u002Freward from depressed levels. Traffic could improve. Gross margin upside. \"\n                \"Merchandising better. Freight and transportation better. Target to report quarter \"\n                \"next month. In retail, the CNBC Investing Club portfolio owns TJX Companies (TJX), \"\n                \"the off-price juggernaut behind T.J. Maxx, Marshalls and HomeGoods. Goldman Sachs \"\n                \"tactical buy trades on Club names Wells Fargo (WFC), which reports quarter Friday, \"\n                \"Humana (HUM) and Nvidia (NVDA). BofA initiates Snowflake (SNOW) with a buy rating.\"\n                \"If you like this story, sign up for Jim Cramer’s Top 10 Morning Thoughts on the \"\n                \"Market email newsletter for free. Barclays cuts price targets on consumer products: \"\n                \"UTZ Brands (UTZ) to $16 per share from $17. Kraft Heinz (KHC) to $36 per share from \"\n                \"$38. Cyclical drag. J.M. Smucker (SJM) to $129 from $160. Secular headwinds. \"\n                \"Coca-Cola (KO) to $59 from $70. Barclays cut PTs on housing-related stocks: Toll Brothers\"\n                \"(TOL) to $74 per share from $82. Keeps underweight. Lowers Trex (TREX) and Azek\"\n                \"(AZEK), too. Goldman Sachs (GS) announces sale of fintech platform and warns on \"\n                \"third quarter of 19-cent per share drag on earnings. The buyer: investors led by \"\n                \"private equity firm Sixth Street. Exiting a mistake. Rise in consumer engagement for \"\n                \"Spotify (SPOT), says Morgan Stanley. The analysts hike price target to $190 per share \"\n                \"from $185. Keeps overweight (buy) rating. JPMorgan loves elf Beauty (ELF). Keeps \"\n                \"overweight (buy) rating but lowers price target to $139 per share from $150. \"\n                \"Sees “still challenging” environment into third-quarter print. The Club owns shares \"\n                \"in high-end beauty company Estee Lauder (EL). Barclays upgrades First Solar (FSLR) \"\n                \"to overweight from equal weight (buy from hold) but lowers price target to $224 per \"\n                \"share from $230. Risk reward upgrade. Best visibility of utility scale names.\"},\n\n    {\"query\": \"What was the rate of decline in 3rd quarter sales?\",\n     \"answer\": \"20% year-on-year.\",\n     \"context\": \"Nokia said it would cut up to 14,000 jobs as part of a cost cutting plan following \"\n                \"third quarter earnings that plunged. The Finnish telecommunications giant said that \"\n                \"it will reduce its cost base and increase operation efficiency to “address the \"\n                \"challenging market environment. The substantial layoffs come after Nokia reported \"\n                \"third-quarter net sales declined 20% year-on-year to 4.98 billion euros. Profit over \"\n                \"the period plunged by 69% year-on-year to 133 million euros.\"},\n\n    {\"query\": \"What is a list of the key points?\",\n     \"answer\": \"•Stocks rallied on Friday with stronger-than-expected U.S jobs data and increase in \"\n               \"Treasury yields;\\n•Dow Jones gained 195.12 points;\\n•S&P 500 added 1.59%;\\n•Nasdaq Composite rose \"\n               \"1.35%;\\n•U.S. economy added 438,000 jobs in August, better than the 273,000 expected;\\n\"\n               \"•10-year Treasury rate trading near the highest level in 14 years at 4.58%.\",\n     \"context\": \"Stocks rallied Friday even after the release of stronger-than-expected U.S. jobs data \"\n               \"and a major increase in Treasury yields.  The Dow Jones Industrial Average gained 195.12 points, \"\n               \"or 0.76%, to close at 31,419.58. The S&P 500 added 1.59% at 4,008.50. The tech-heavy \"\n               \"Nasdaq Composite rose 1.35%, closing at 12,299.68. The U.S. economy added 438,000 jobs in \"\n               \"August, the Labor Department said. Economists polled by Dow Jones expected 273,000 \"\n               \"jobs. However, wages rose less than expected last month.  Stocks posted a stunning \"\n               \"turnaround on Friday, after initially falling on the stronger-than-expected jobs report. \"\n               \"At its session low, the Dow had fallen as much as 198 points; it surged by more than \"\n               \"500 points at the height of the rally. The Nasdaq and the S&P 500 slid by 0.8% during \"\n               \"their lowest points in the day.  Traders were unclear of the reason for the intraday \"\n               \"reversal. Some noted it could be the softer wage number in the jobs report that made \"\n               \"investors rethink their earlier bearish stance. Others noted the pullback in yields from \"\n               \"the day’s highs. Part of the rally may just be to do a market that had gotten extremely \"\n               \"oversold with the S&P 500 at one point this week down more than 9% from its high earlier \"\n               \"this year.  Yields initially surged after the report, with the 10-year Treasury rate trading \"\n               \"near its highest level in 14 years. The benchmark rate later eased from those levels, but \"\n               \"was still up around 6 basis points at 4.58%.  'We’re seeing a little bit of a give back \"\n               \"in yields from where we were around 4.8%. [With] them pulling back a bit, I think that’s \"\n               \"helping the stock market,' said Margaret Jones, chief investment officer at Vibrant Industries \"\n               \"Capital Advisors. 'We’ve had a lot of weakness in the market in recent weeks, and potentially \"\n               \"some oversold conditions.'\"}\n\n    ]\n\n    return test_list\n\n\n# this is the main script to be run\n\ndef bling_meets_llmware_hello_world (model_name):\n\n    t0 = time.time()\n\n    # load the questions\n    test_list = hello_world_questions()\n\n    print(f\"\\n > Loading Model: {model_name}...\")\n\n    # load the model \n    prompter = Prompt().load_model(model_name)\n\n    t1 = time.time()\n    print(f\"\\n > Model {model_name} load time: {t1-t0} seconds\")\n \n    for i, entries in enumerate(test_list):\n\n        print(f\"\\n{i+1}. Query: {entries['query']}\")\n     \n        # run the prompt\n        output = prompter.prompt_main(entries[\"query\"],context=entries[\"context\"]\n                                      , prompt_name=\"default_with_context\",temperature=0.30)\n\n        # print out the results\n        llm_response = output[\"llm_response\"].strip(\"\\n\")\n        print(f\"LLM Response: {llm_response}\")\n        print(f\"Gold Answer: {entries['answer']}\")\n        print(f\"LLM Usage: {output['usage']}\")\n\n    t2 = time.time()\n\n    print(f\"\\nTotal processing time: {t2-t1} seconds\")\n\n    return 0\n\n\nif __name__ == \"__main__\":\n\n    # list of 'rag-instruct' laptop-ready small bling models on HuggingFace\n\n    pytorch_models = [\"llmware\u002Fbling-1b-0.1\",                    #  most popular\n                      \"llmware\u002Fbling-tiny-llama-v0\",             #  fastest \n                      \"llmware\u002Fbling-1.4b-0.1\",\n                      \"llmware\u002Fbling-falcon-1b-0.1\",\n                      \"llmware\u002Fbling-cerebras-1.3b-0.1\",\n                      \"llmware\u002Fbling-sheared-llama-1.3b-0.1\",    \n                      \"llmware\u002Fbling-sheared-llama-2.7b-0.1\",\n                      \"llmware\u002Fbling-red-pajamas-3b-0.1\",\n                      \"llmware\u002Fbling-stable-lm-3b-4e1t-v0\",\n                      \"llmware\u002Fbling-phi-3\"                      # most accurate (and newest)  \n                      ]\n\n    #  Quantized GGUF versions generally load faster and run nicely on a laptop with at least 16 GB of RAM\n    gguf_models = [\"bling-phi-3-gguf\", \"bling-stablelm-3b-tool\", \"dragon-llama-answer-tool\", \"dragon-yi-answer-tool\", \"dragon-mistral-answer-tool\"]\n\n    #   try model from either pytorch or gguf model list\n    #   the newest (and most accurate) is 'bling-phi-3-gguf'  \n\n    bling_meets_llmware_hello_world(gguf_models[0]  \n\n    #   check out the model card on Huggingface for RAG benchmark test performance results and other useful information\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Simple-to-Scale Database Options \u003C\u002Fb> - integrated data stores from laptop to parallelized cluster. \u003C\u002Fsummary>\n\n```python\n\nfrom llmware.configs import LLMWareConfig\n\n#   to set the collection database - mongo, sqlite, postgres  \nLLMWareConfig().set_active_db(\"mongo\")  \n\n#   to set the vector database (or declare when installing)  \n#   --options: milvus, pg_vector (postgres), redis, qdrant, faiss, pinecone, mongo atlas  \nLLMWareConfig().set_vector_db(\"milvus\")  \n\n#   for fast start - no installations required  \nLLMWareConfig().set_active_db(\"sqlite\")  \nLLMWareConfig().set_vector_db(\"chromadb\")   # try also faiss and lancedb  \n\n#   for single postgres deployment  \nLLMWareConfig().set_active_db(\"postgres\")  \nLLMWareConfig().set_vector_db(\"postgres\")  \n\n#   to install mongo, milvus, postgres - see the docker-compose scripts as well as examples\n\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\n\u003Csummary> \u003Cb> Agents with Function Calls and SLIM Models \u003C\u002Fb> \u003C\u002Fsummary>  \n\n```python\n\nfrom llmware.agents import LLMfx\n\ntext = (\"Tesla stock fell 8% in premarket trading after reporting fourth-quarter revenue and profit that \"\n        \"missed analysts’ estimates. The electric vehicle company also warned that vehicle volume growth in \"\n        \"2024 'may be notably lower' than last year’s growth rate. Automotive revenue, meanwhile, increased \"\n        \"just 1% from a year earlier, partly because the EVs were selling for less than they had in the past. \"\n        \"Tesla implemented steep price cuts in the second half of the year around the world. In a Wednesday \"\n        \"presentation, the company warned investors that it’s 'currently between two major growth waves.'\")\n\n#   create an agent using LLMfx class\nagent = LLMfx()\n\n#   load text to process\nagent.load_work(text)\n\n#   load 'models' as 'tools' to be used in analysis process\nagent.load_tool(\"sentiment\")\nagent.load_tool(\"extract\")\nagent.load_tool(\"topics\")\nagent.load_tool(\"boolean\")\n\n#   run function calls using different tools\nagent.sentiment()\nagent.topics()\nagent.extract(params=[\"company\"])\nagent.extract(params=[\"automotive revenue growth\"])\nagent.xsum()\nagent.boolean(params=[\"is 2024 growth expected to be strong? (explain)\"])\n\n#   at end of processing, show the report that was automatically aggregated by key\nreport = agent.show_report()\n\n#   displays a summary of the activity in the process\nactivity_summary = agent.activity_summary()\n\n#   list of the responses gathered\nfor i, entries in enumerate(agent.response_list):\n    print(\"update: response analysis: \", i, entries)\n\noutput = {\"report\": report, \"activity_summary\": activity_summary, \"journal\": agent.journal}  \n\n```\n\n\u003C\u002Fdetails>\n\u003Cdetails>\n\n\u003Csummary> 🚀 \u003Cb>Start coding - Quick Start for RAG \u003C\u002Fb> \u003C\u002Fsummary>\n\n```python\n# This example illustrates a simple contract analysis\n# using a RAG-optimized LLM running locally\n\nimport os\nimport re\nfrom llmware.prompts import Prompt, HumanInTheLoop\nfrom llmware.setup import Setup\nfrom llmware.configs import LLMWareConfig\n\ndef contract_analysis_on_laptop (model_name):\n\n    #  In this scenario, we will:\n    #  -- download a set of sample contract files\n    #  -- create a Prompt and load a BLING LLM model\n    #  -- parse each contract, extract the relevant passages, and pass questions to a local LLM\n\n    #  Main loop - Iterate thru each contract:\n    #\n    #      1.  parse the document in memory (convert from PDF file into text chunks with metadata)\n    #      2.  filter the parsed text chunks with a \"topic\" (e.g., \"governing law\") to extract relevant passages\n    #      3.  package and assemble the text chunks into a model-ready context\n    #      4.  ask three key questions for each contract to the LLM\n    #      5.  print to the screen\n    #      6.  save the results in both json and csv for furthe processing and review.\n\n    #  Load the llmware sample files\n\n    print (f\"\\n > Loading the llmware sample files...\")\n\n    sample_files_path = Setup().load_sample_files()\n    contracts_path = os.path.join(sample_files_path,\"Agreements\")\n \n    #  Query list - these are the 3 main topics and questions that we would like the LLM to analyze for each contract\n\n    query_list = {\"executive employment agreement\": \"What are the name of the two parties?\",\n                  \"base salary\": \"What is the executive's base salary?\",\n                  \"vacation\": \"How many vacation days will the executive receive?\"}\n\n    #  Load the selected model by name that was passed into the function\n\n    print (f\"\\n > Loading model {model_name}...\")\n\n    prompter = Prompt().load_model(model_name, temperature=0.0, sample=False)\n\n    #  Main loop\n\n    for i, contract in enumerate(os.listdir(contracts_path)):\n\n        #   excluding Mac file artifact (annoying, but fact of life in demos)\n        if contract != \".DS_Store\":\n\n            print(\"\\nAnalyzing contract: \", str(i+1), contract)\n\n            print(\"LLM Responses:\")\n\n            for key, value in query_list.items():\n\n                # step 1 + 2 + 3 above - contract is parsed, text-chunked, filtered by topic key,\n                # ... and then packaged into the prompt\n\n                source = prompter.add_source_document(contracts_path, contract, query=key)\n\n                # step 4 above - calling the LLM with 'source' information already packaged into the prompt\n\n                responses = prompter.prompt_with_source(value, prompt_name=\"default_with_context\")  \n\n                # step 5 above - print out to screen\n\n                for r, response in enumerate(responses):\n                    print(key, \":\", re.sub(\"[\\n]\",\" \", response[\"llm_response\"]).strip())\n\n                # We're done with this contract, clear the source from the prompt\n                prompter.clear_source_materials()\n\n    # step 6 above - saving the analysis to jsonl and csv\n\n    # Save jsonl report to jsonl to \u002Fprompt_history folder\n    print(\"\\nPrompt state saved at: \", os.path.join(LLMWareConfig.get_prompt_path(),prompter.prompt_id))\n    prompter.save_state()\n\n    # Save csv report that includes the model, response, prompt, and evidence for human-in-the-loop review\n    csv_output = HumanInTheLoop(prompter).export_current_interaction_to_csv()\n    print(\"csv output saved at:  \", csv_output)\n\n\nif __name__ == \"__main__\":\n\n    # use local cpu model - try the newest - RAG finetune of Phi-3 quantized and packaged in GGUF  \n    model = \"bling-phi-3-gguf\"\n\n    contract_analysis_on_laptop(model)\n\n```\n\u003C\u002Fdetails>\n\n## 🔥 Latest Enhancements and Features   \n\n### ONNXRuntime-QNN - run models on Snapdragon NPU (Windows Arm64) \n\n- 7 NPU-optimized models 'ready to run' in Model Catalog - see [using-qnn-npu-models example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FModels\u002Fusing-qnn-npu-models.py)  \n\n### OpenVINO Encoders - new examples (Windows Intel x86)  \n\n- 20 OV-optimized encoding models with OVEmbeddingModel class - supports wide of embedding, reranker and classifers, e.g.,  \n- [using_openvino_embedding_model](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_openvino_embedding_model.py)  \n- [using_openvino_reranker_model](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_openvino_reranker_model.py)  \n- [using_openvino_classifier_model](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_openvino_classifier_model.py)  \n\n### ONNXRuntime Reranker - use rerankers optimized for Onnxruntime deployment   \n\n- [using_onnx_reranker_model](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_onnx_reranker_models.py)   \n  \n### WindowsLocalFoundry integration - use WindowsLocalFoundry models in llmware  \n\n- [using_local_foundry_model](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_local_foundry_models.py)  \n\n### Model Depot - largest out-of-the-box collection of OpenVINO-based LLMs (95+)   \n\n- [Model Depot on Huggingface](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fllmware\u002Fmodel-depot)   \n- [using_stream_generation_with_openvino](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_openvino_streamer.py)     \n- [getting_started_with_openvino](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_openvino_models.py)  \n\n### Image Generation - Multimedia Bot  \n\n- [multimedia-bot example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FUI\u002Fmultimedia_bot.py)  \n\n [**Multi-Model Agents with SLIM Models**](examples\u002FSLIM-Agents\u002F) - [**Intro-Video**](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=cQfdaTcmBpY)    \n\n### New Use Cases & Applications  \n\n- **BizBot: RAG + SQL Local Chatbot**  \n  Implement a local chatbot for business intelligence using RAG and SQL.\n  - [Code example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fbiz_bot.py) | [Demo video](https:\u002F\u002Fyoutu.be\u002F4nBYDEjxxTE?si=o6PDPbu0PVcT-tYd)\n\n- **Lecture Tool**  \n  Enables Q&A on voice recordings for education and lecture analysis.\n  - [Lecture tool code](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FUse_Cases\u002Flecture_tool\u002F)\n\n- **Web Services for Financial Research**  \n  An end-to-end example demonstrating web services with agent calls for financial research.\n  - [Demo video](https:\u002F\u002Fyoutu.be\u002Fl0jzsg1_Ik0?si=hmLhpT1iv_rxpkHo) | [Code example](examples\u002FUse_Cases\u002Fweb_services_slim_fx.py)\n\n### Audio & Text Processing\n\n- **Voice Transcription with WhisperCPP**  \n  Start transcription projects with WhisperCPP, featuring tools for sample file usage and famous speeches.\n  - [Getting started guide](examples\u002FModels\u002Fusing-whisper-cpp-getting-started.py) | [Parsing great speeches](examples\u002FUse_Cases\u002Fparsing_great_speeches.py) | [Demo video](https:\u002F\u002Fyoutu.be\u002F5y0ez5ZBpPE?si=KVxsXXtX5TzvlEws)\n\n- **Natural Language Query to CSV**  \n  Convert natural language queries to CSV with Slim-SQL, supporting custom Postgres tables.\n  - [Demo video](https:\u002F\u002Fyoutu.be\u002Fz48z5XOXJJg?si=V-CX1w-7KRioI4Bi) | [End-to-end example](examples\u002FSLIM-Agents\u002Ftext2sql-end-to-end-2.py) | [Custom table usage](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fagent_with_custom_tables.py)\n\n### Multi-Model Agents\n\n- **Multi-Model Agents with SLIM**  \n  Use SLIM models on CPU for multi-step agents in complex workflows.\n  - [Demo video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=cQfdaTcmBpY) | [Example directory](examples\u002FSLIM-Agents)\n\n### Document & OCR Processing\n\n- **OCR Embedded Document Images**  \n  Extract text systematically from images embedded in documents for enhanced document processing.\n  - [OCR example](examples\u002FParsing\u002Focr_embedded_doc_images.py)\n\n- **Enhanced Document Parsing for PDFs, Word, PowerPoint, and Excel**  \n  Improved text-chunking controls, table extraction, and content parsing.\n  - [Parsing example](examples\u002FParsing\u002Fpdf_parser_new_configs.py)\n\n- **Optimizing Accuracy of RAG Prompts**  \n  Tutorials for tuning RAG prompt settings for increased accuracy.\n  - [Settings example](examples\u002FModels\u002Fadjusting_sampling_settings.py) | Videos: [Part I](https:\u002F\u002Fyoutu.be\u002F7oMTGhSKuNY?si=14mS2pftk7NoKQbC), [Part II](https:\u002F\u002Fyoutu.be\u002FiXp1tj-pPjM?si=T4teUAISnSWgtThu)  \n  \nNew to RAG?  [Check out the Fast Start video series](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL1-dn33KwsmD7SB9iSO6vx4ZLRAWea1DB)  \n\n[Intro to SLIM Function Call Models](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_function_calls.py)  \nCan't wait?  Get SLIMs right away:  \n\n```python \nfrom llmware.models import ModelCatalog\n\nModelCatalog().get_llm_toolkit()  # get all SLIM models, delivered as small, fast quantized tools \nModelCatalog().tool_test_run(\"slim-sentiment-tool\") # see the model in action with test script included  \n```\n\n## 🌱 Getting Started\n\n**Step 1 - Install llmware** -  `pip3 install llmware` or `pip3 install 'llmware[full]'`  \n\n- [core install](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fllmware\u002Frequirements.txt) (minimal set of dependencies)  \n- [full install](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fllmware\u002Frequirements_extras.txt) (adds to the core with wider set of related python libraries).  \n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Step 2- Go to Examples\u003C\u002Fb> - Get Started Fast with 100+ 'Cut-and-Paste' Recipes \u003C\u002Fsummary>\n\n## 🔥 Top New Examples 🔥  \n\nEnd-to-End Scenario - [**Function Calls with SLIM Extract and Web Services for Financial Research**](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fweb_services_slim_fx.py)  \nAnalyzing Voice Files - [**Great Speeches with LLM Query and Extract**](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fparsing_great_speeches.py)  \nNew to LLMWare - [**Fast Start tutorial series**](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Ffast_start)  \nGetting Setup - [**Getting Started**](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FGetting_Started)  \nSLIM Examples -  [**SLIM Models**](examples\u002FSLIM-Agents\u002F)  \n\n| Example     |  Detail      |\n|-------------|--------------|\n| 1.   BLING models fast start ([code](examples\u002FModels\u002Fbling_fast_start.py) \u002F [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=JjgqOZ2v5oU)) | Get started with fast, accurate, CPU-based models - question-answering, key-value extraction, and basic summarization.  |\n| 2.   Parse and Embed 500 PDF Documents ([code](examples\u002FEmbedding\u002Fdocs2vecs_with_milvus-un_resolutions.py))  | End-to-end example for Parsing, Embedding and Querying UN Resolution documents with Milvus  |\n| 3.  Hybrid Retrieval - Semantic + Text ([code](examples\u002FRetrieval\u002Fdual_pass_with_custom_filter.py)) | Using 'dual pass' retrieval to combine best of semantic and text search |  \n| 4.   Multiple Embeddings with PG Vector ([code](examples\u002FEmbedding\u002Fusing_multiple_embeddings.py) \u002F [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Bncvggy6m5Q)) | Comparing Multiple Embedding Models using Postgres \u002F PG Vector |\n| 5.   DRAGON GGUF Models ([code](examples\u002FModels\u002Fdragon_gguf_fast_start.py) \u002F [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=BI1RlaIJcsc&t=130s)) | State-of-the-Art 7B RAG GGUF Models.  | \n| 6.   RAG with BLING ([code](examples\u002FUse_Cases\u002Fcontract_analysis_on_laptop_with_bling_models.py) \u002F [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=8aV5p3tErP0)) | Using contract analysis as an example, experiment with RAG for complex document analysis and text extraction using `llmware`'s BLING ~1B parameter GPT model running on your laptop. |  \n| 7.   Master Service Agreement Analysis with DRAGON ([code](examples\u002FUse_Cases\u002Fmsa_processing.py) \u002F [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Cf-07GBZT68&t=2s)) | Analyzing MSAs using DRAGON YI 6B Model.   |                                                                                                                         \n| 8.   Streamlit Example ([code](examples\u002FUI\u002Fsimple_rag_ui_with_streamlit.py))  | Ask questions to Invoices with UI run inference.  |  \n| 9.   Integrating LM Studio ([code](examples\u002FModels\u002Fusing-open-chat-models.py) \u002F [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=h2FDjUyvsKE&t=101s)) | Integrating LM Studio Models with LLMWare  |                                                                                                                                       \n| 10.  Prompts With Sources ([code](examples\u002FPrompts\u002Fprompt_with_sources.py))  | Attach wide range of knowledge sources directly into Prompts.   |   \n| 11.  Fact Checking ([code](examples\u002FPrompts\u002Ffact_checking.py))  | Explore the full set of evidence methods in this example script that analyzes a set of contracts.   |\n| 12.  Using 7B GGUF Chat Models ([code](examples\u002FModels\u002Fchat_models_gguf_fast_start.py)) | Using 4 state of the art 7B chat models in minutes running locally |  \n\n\nCheck out:  [llmware examples](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FREADME.md)  \n\n\u003C\u002Fdetails>  \n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Step 3 - Tutorial Videos\u003C\u002Fb> - check out our Youtube channel for high-impact 5-10 minute tutorials on the latest examples.   \u003C\u002Fsummary>\n\n🎬 Check out these videos to get started quickly:  \n- [Document Summarization](https:\u002F\u002Fyoutu.be\u002FPs3W-P9A1m8?si=Rxvst3RJv8ZaOk0L)  \n- [Bling-3-GGUF Local Chatbot](https:\u002F\u002Fyoutu.be\u002FgzzEVK8p3VM?si=8cNn_do0oxSzCEnM)  \n- [Agent-based Complex Research Analysis](https:\u002F\u002Fyoutu.be\u002Fy4WvwHqRR60?si=jX3KCrKcYkM95boe)  \n- [Getting Started with SLIMs (with code)](https:\u002F\u002Fyoutu.be\u002FaWZFrTDmMPc?si=lmo98_quo_2Hrq0C)  \n- [Are you prompting wrong for RAG - Stochastic Sampling-Part I](https:\u002F\u002Fyoutu.be\u002F7oMTGhSKuNY?si=_KSjuBnqArvWzYbx)  \n- [Are you prompting wrong for RAG - Stochastic Sampling-Part II- Code Experiments](https:\u002F\u002Fyoutu.be\u002FiXp1tj-pPjM?si=3ZeMgipY0vJDHIMY)  \n- [SLIM Models Intro](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=cQfdaTcmBpY)  \n- [Text2SQL Intro](https:\u002F\u002Fyoutu.be\u002FBKZ6kO2XxNo?si=tXGt63pvrp_rOlIP)  \n- [RAG with BLING on your laptop](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=JjgqOZ2v5oU)    \n- [DRAGON-7B-Models](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=d_u7VaKu6Qk&t=37s)  \n- [Install and Compare Multiple Embeddings with Postgres and PGVector](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Bncvggy6m5Q)  \n- [Background on GGUF Quantization & DRAGON Model Example](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ZJyQIZNJ45E)  \n- [Using LM Studio Models](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=h2FDjUyvsKE)  \n- [Using Ollama Models](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=qITahpVDuV0)  \n- [Use any GGUF Model](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=9wXJgld7Yow)  \n- [Use small LLMs for RAG for Contract Analysis (feat. LLMWare)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=8aV5p3tErP0)\n- [Invoice Processing with LLMware](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=VHZSaBBG-Bo&t=10s)\n- [Ingest PDFs at Scale](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=O0adUfrrxi8&t=10s)\n- [Evaluate LLMs for RAG with LLMWare](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=s0KWqYg5Buk&t=105s)\n- [Fast Start to RAG with LLMWare Open Source Library](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0naqpH93eEU)\n- [Use Retrieval Augmented Generation (RAG) without a Database](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=tAGz6yR14lw)\n- [Pop up LLMWare Inference Server](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=qiEmLnSRDUA&t=20s)\n\n\n\u003C\u002Fdetails>  \n\n## ✍️ Working with the llmware Github repository  \n\nThe llmware repo can be pulled locally to get access to all the examples, or to work directly with the latest version of the llmware code.  \n\n```bash\ngit clone git@github.com:llmware-ai\u002Fllmware.git\n```  \n\nWe have provided a **welcome_to_llmware** automation script in the root of the repository folder.  After cloning:  \n- On Windows command line:  `.\\welcome_to_llmware_windows.sh`  \n- On Mac \u002F Linux command line:  `sh .\u002Fwelcome_to_llmware.sh`  \n\nAlternatively, if you prefer to complete setup without the welcome automation script, then the next steps include:  \n\n1.  **install requirements.txt** - inside the \u002Fllmware path - e.g., ```pip3 install -r llmware\u002Frequirements.txt```  \n\n2.  **install requirements_extras.txt** - inside the \u002Fllmware path - e.g., ```pip3 install -r llmware\u002Frequirements_extras.txt```(Depending upon your use case, you may not need all or any of these installs, but some of these will be used in the examples.)\n\n3.  **run examples** - copy one or more of the example .py files into the root project path.   (We have seen several IDEs that will attempt to run interactively from the nested \u002Fexample path, and then not have access to the \u002Fllmware module - the easy fix is to just copy the example you want to run into the root path).  \n\n4.  **install vector db** - no-install vector db options include milvus lite, chromadb, faiss and lancedb - which do not require a server install, but do require that you install the python sdk library for that vector db, e.g., `pip3 install pymilvus`, or `pip3 install chromadb`.  If you look in [examples\u002FEmbedding](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FEmbedding), you will see examples for getting started with various vector DB, and in the root of the repo, you will see easy-to-get-started docker compose scripts for installing milvus, postgres\u002Fpgvector, mongo, qdrant, neo4j, and redis.  \n\n\n## Data Store Options\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Fast Start\u003C\u002Fb>:  use SQLite3 and ChromaDB (File-based) out-of-the-box - no install required \u003C\u002Fsummary>  \n\n```python\nfrom llmware.configs import LLMWareConfig \nLLMWareConfig().set_active_db(\"sqlite\")   \nLLMWareConfig().set_vector_db(\"chromadb\")  \n```\n\u003C\u002Fdetails>  \n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Speed + Scale\u003C\u002Fb>:  use MongoDB (text collection) and Milvus (vector db) - install with Docker Compose \u003C\u002Fsummary> \n\n```bash\ncurl -o docker-compose.yaml https:\u002F\u002Fraw.githubusercontent.com\u002Fllmware-ai\u002Fllmware\u002Fmain\u002Fdocker-compose.yaml\ndocker compose up -d\n```\n\n```python\nfrom llmware.configs import LLMWareConfig\nLLMWareConfig().set_active_db(\"mongo\")\nLLMWareConfig().set_vector_db(\"milvus\")\n```\n\n\u003C\u002Fdetails>  \n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Postgres\u003C\u002Fb>:  use Postgres for both text collection and vector DB - install with Docker Compose \u003C\u002Fsummary> \n\n```bash\ncurl -o docker-compose.yaml https:\u002F\u002Fraw.githubusercontent.com\u002Fllmware-ai\u002Fllmware\u002Fmain\u002Fdocker-compose-pgvector.yaml\ndocker compose up -d\n```\n\n```python\nfrom llmware.configs import LLMWareConfig\nLLMWareConfig().set_active_db(\"postgres\")\nLLMWareConfig().set_vector_db(\"postgres\")\n```\n\n\u003C\u002Fdetails>  \n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Mix-and-Match\u003C\u002Fb>: LLMWare supports 3 text collection databases (Mongo, Postgres, SQLite) and \n10 vector databases (Milvus, PGVector-Postgres, Neo4j, Redis, Mongo-Atlas, Qdrant, Faiss, LanceDB, ChromaDB and Pinecone)  \u003C\u002Fsummary>\n\n```bash\n# scripts to deploy other options\ncurl -o docker-compose.yaml https:\u002F\u002Fraw.githubusercontent.com\u002Fllmware-ai\u002Fllmware\u002Fmain\u002Fdocker-compose-redis-stack.yaml\n```\n\n\u003C\u002Fdetails>  \n\n## Meet our Models   \n\n- **SLIM model series:** small, specialized models fine-tuned for function calling and multi-step, multi-model Agent workflows.  \n- **DRAGON model series:**  Production-grade RAG-optimized 6-9B parameter models - \"Delivering RAG on ...\" the leading foundation base models.  \n- **BLING model series:**  Small CPU-based RAG-optimized, instruct-following 1B-5B parameter models.  \n- **Industry BERT models:**  out-of-the-box custom trained sentence transformer embedding models fine-tuned for the following industries:  Insurance, Contracts, Asset Management, SEC.  \n- **GGUF Quantization:** we provide 'gguf' and 'tool' versions of many SLIM, DRAGON and BLING models, optimized for CPU deployment.  \n\n\nInterested in contributing to llmware? Information on ways to participate can be found in our [Contributors Guide](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Frepo_docs\u002FCONTRIBUTING.md#contributing-to-llmware).  As with all aspects of this project, contributing is governed by our [Code of Conduct](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Frepo_docs\u002FCODE_OF_CONDUCT.md).\n\nQuestions and discussions are welcome in our [github discussions](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fdiscussions).  \n\n## 📣  Release notes and Change Log  \n\nFor complete history of release notes, please open the Change log tab.  \n\n**Supported Operating Systems**: Windows (x86 and Arm64), MacOS (Metal - M1-M5), Linux (x86, aarch64)    \n- Linux - support Ubuntu 20+  (glibc 2.31+)   \n- If you need support for another Linux version, please raise an issue - we will prioritize testing and ensure support.    \n\n**Supported Vector Databases**: Milvus, Postgres (PGVector), Neo4j, Redis, LanceDB, ChromaDB, Qdrant, FAISS, Pinecone, Mongo Atlas Vector Search\n\n**Supported Text Index Databases**: MongoDB, Postgres, SQLite  \n\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Optional\u003C\u002Fb>\u003C\u002Fsummary>\n\n- [Docker](https:\u002F\u002Fdocs.docker.com\u002Fget-docker\u002F)\n  \n- To enable the OCR parsing capabilities, install [Tesseract v5.3.3](https:\u002F\u002Ftesseract-ocr.github.io\u002Ftessdoc\u002FInstallation.html) and [Poppler v23.10.0](https:\u002F\u002Fpoppler.freedesktop.org\u002F) native packages.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n  \u003Csummary>\u003Cb>🚧 Change Log\u003C\u002Fb>\u003C\u002Fsummary>\n\n**Thursday, January 1 - v0.4.3 - WIP**  \n - Updated BaseModel and PromptCatalog classes \n - Updated cloud model versions and support for OpenAI, Gemini and Anthropic latest models   \n - Removed deprecated model classes \n - Removed deprecated modules (Dataset Builder and Graph)  \n - Work-in-Progress for other Model Class and Card updates  \n - Repo is up-to-date, but not in pip install release - targeted week of January 12, 2026  \n   \n**Monday, March 3 - v0.4.0**  \n - Updates in GGUF implementation, configs and libs  \n - Updates in ONNXRuntime implementation and configs  \n - New Models added to ModelCatalog, including phi-4, Deepseek-Qwen-7B, Deepseek-Qwen-14B, and many others  \n - Added support for Windows ARM64  \n - Changed default active_db to \"sqlite\" (both mongo and postgres available for production)  \n - Streamlined dependencies in core requirements.txt and pip install  \n - 'Extra\u002Foptional' dependencies available in requirements_extras.txt and through configurations passed in the pip install process (see setup.py for options)\n   \n**Friday, November 8 - v0.3.9**  \n - Enhanced Azure OpenAI configuration, including streaming generation  \n - Removed deprecated parser binaries for Linux aarch64 and Mac x86  \n - Added generator option for CustomTable insert rows to provide progress on larger table builds  \n   \n**Sunday, October 27 - v0.3.8**\n - Integrating Model Depot collection of 100+ OpenVino and ONNX Models into LLMWare default model catalog  \n - Supporting changes in model classes, model catalog and model configs  \n   \n**Sunday, October 6 - v0.3.7**  \n- Added new model class - OVGenerativeModel - to support the use of models packaged in OpenVino format  \n- Added new model class - ONNXGenerativeModel - to support use of models packaged in ONNX format  \n- Getting started with [OpenVino example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_openvino_models.py)  \n- Getting started with [ONNX example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_onnx_models.py)  \n  \n**Tuesday, October 1 - v0.3.6**  \n- Added new prompt and chat templates  \n- Improved and updated model configurations    \n- New utility functions for locating and highlighting text matches in search results  \n- Improved hashing check utility functions  \n  \n**Monday, August 26 - v0.3.5**  \n- Added 10 new BLING+SLIM models to Model Catalog - featuring Qwen2, Phi-3 and Phi-3.5  \n- Launched new DRAGON models on Qwen-7B, Yi-9B, Mistral-v0.3, and Llama-3.1  \n- New Qwen2 Models (and RAG + function-calling fine-tunes) - [using-qwen2-models](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing-qwen2-models.py)  \n- New Phi-3 function calling models - [using-phi-3-function-calls](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing-phi-3-function-calls.py)  \n- New use case example - [lecture_tool](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FUse_Cases\u002Flecture_tool\u002F)   \n- Improved GGUF Configs to expand context window  \n- Added model benchmark performance data to model configs \n- Enhanced Utilities hashing functions  \n\n**Monday, July 29 - v03.4**  \n- Enhanced safety protections for text2sql db reads for LLMfx agents   \n- New examples - see [example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FUI\u002Fdueling_chatbot.py)    \n- More Notebook examples - see [notebook examples](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FNotebooks)      \n  \n**Monday, July 8 - v03.3**  \n- Improvements in model configuration options, logging, and various small fixes  \n- Improved Azure OpenAI configs - see [example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing-azure-openai.py)  \n  \n**Saturday, June 29 - v0.3.2**  \n- Update to PDF and Office parsers - improvements to configurations in logging and text chunking options  \n  \n**Saturday, June 22 - v0.3.1**  \n- Added module 3 to Fast Start example series [examples 7-9 on Agents & Function Calls](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Ffast_start)  \n- Added reranker Jina model for in-memory semantic similarity RAG - see [example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FEmbedding\u002Fusing_semantic_reranker_with_rag.py)  \n- Enhanced model fetching parameterization in model loading process  \n- Added new 'tiny' versions of slim-extract and slim-summary in both Pytorch and GGUF versions - check out 'slim-extract-tiny-tool' and 'slim-summary-tiny-tool'  \n- [Biz Bot] use case - see [example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fbiz_bot.py) and [video](https:\u002F\u002Fyoutu.be\u002F4nBYDEjxxTE?si=o6PDPbu0PVcT-tYd)  \n- Updated numpy reqs \u003C2 and updated yfinance version minimum (>=0.2.38)     \n\n**Tuesday, June 4 - v0.3.0**  \n- Added support for new Milvus Lite embedded 'no-install' database - see [example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FEmbedding\u002Fusing_milvus_lite.py).   \n- Added two new SLIM models to catalog and agent processes - ['q-gen'](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FSLIM-Agents\u002Fusing-slim-q-gen.py) and ['qa-gen'](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FSLIM-Agents\u002Fusing-slim-qa-gen.py)    \n- Updated model class instantiation to provide more extensibility to add new classes in different modules  \n- New welcome_to_llmware.sh and welcome_to_llmware_windows.sh fast install scripts  \n- Enhanced Model class base with new configurable post_init and register methods  \n- Created InferenceHistory to track global state of all inferences completed  \n- Multiple improvements and updates to logging at module level  \n- Note: starting with v0.3.0, pip install provides two options - a base minimal install `pip3 install llmware` which will support most use cases, and a larger install `pip3 install 'llmware[full]'` with other commonly-used libraries.  \n  \n**Wednesday, May 22 - v0.2.15**  \n- Improvements in Model class handling of Pytorch and Transformers dependencies (just-in-time loading, if needed)  \n- Expanding API endpoint options and inference server functionality - see new [client access options](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fllmware_inference_api_client.py)  and [server_launch](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fllmware_inference_server.py)  \n\n**Saturday, May 18 - v0.2.14**  \n- New OCR image parsing methods with [example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fslicing_and_dicing_office_docs.py)  \n- Adding first part of logging improvements (WIP) in Configs and Models.    \n- New embedding model added to catalog - industry-bert-loans.  \n- Updates to model import methods and configurations.  \n\n**Sunday, May 12 - v0.2.13**  \n- New GGUF streaming method with [basic example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FModels\u002Fgguf_streaming.py) and [phi3 local chatbot](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUI\u002Fgguf_streaming_chatbot.py)  \n- Significant cleanups in ancillary imports and dependencies to reduce install complexity - note: the updated requirements.txt and setup.py files.  \n- Defensive code to provide informative warning of any missing dependencies in specialized parts of the code, e.g., OCR, Web Parser.  \n- Updates of tests, notice and documentation.   \n- OpenAIConfigs created to support Azure OpenAI.   \n  \n**Sunday, May 5 - v0.2.12 Update**  \n- Launched [\"bling-phi-3\"](https:\u002F\u002Fhuggingface.co\u002Fllmware\u002Fbling-phi-3) and [\"bling-phi-3-gguf\"](https:\u002F\u002Fhuggingface.co\u002Fllmware\u002Fbling-phi-3-gguf) in ModelCatalog - newest and most accurate BLING\u002FDRAGON model  \n- New long document summarization method using slim-summary-tool [example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FPrompts\u002Fdocument_summarizer.py)  \n- New Office (Powerpoint, Word, Excel) sample files [example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FParsing\u002Fparsing_microsoft_ir_docs.py)  \n- Added support for Python 3.12  \n- Deprecated faiss and replaced with 'no-install' chromadb in Fast Start examples  \n- Refactored Datasets, Graph and Web Services classes  \n- Updated Voice parsing with WhisperCPP into Library  \n  \n**Monday, April 29 - v0.2.11 Update**  \n- Updates to gguf libs for Phi-3 and Llama-3  \n- Added Phi-3 [example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FModels\u002Fusing-microsoft-phi-3.py)  and Llama-3 [example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FModels\u002Fusing-llama-3.py) and Quantized Versions to Model Catalog  \n- Integrated WhisperCPP Model class and prebuilt shared libraries - [getting-started-example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FModels\u002Fusing-whisper-cpp-getting-started.py)  \n- New voice sample files for testing - [example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FModels\u002Fusing-whisper-cpp-sample-files.py)  \n- Improved CUDA detection on Windows and safety checks for older Mac OS versions  \n\n**Monday, April 22 - v0.2.10 Update**  \n- Updates to Agent class to support Natural Language queries of Custom Tables on Postgres [example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fagent_with_custom_tables.py)  \n- New Agent API endpoint implemented with LLMWare Inference Server and new Agent capabilities [example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FSLIM-Agents\u002Fagent_api_endpoint.py)  \n  \n**Tuesday, April 16 - v0.2.9 Update**  \n- New CustomTable class to rapidly create custom DB tables in conjunction with LLM-based workflows.  \n- Enhanced methods for converting CSV and JSON\u002FJSONL files into DB tables.  \n- See new examples [Creating Custom Table example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FStructured_Tables\u002Fcreate_custom_table-1.py)\n    \n**Tuesday, April 9 - v0.2.8 Update**  \n- Office Parser (Word Docx, Powerpoint PPTX, and Excel XLSX) - multiple improvements - new libs + Python method.  \n- Includes: several fixes, improved text chunking controls, header text extraction and configuration options.  \n- Generally, new office parser options conform with the new PDF parser options.  \n- Please see [Office Parsing Configs example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FParsing\u002Foffice_parser_new_configs.py)  \n\n**Wednesday, April 3 - v0.2.7 Update**  \n- PDF Parser - multiple improvements - new libs + Python methods.  \n- Includes: UTF-8 encoding for European languages.  \n- Includes: Better text chunking controls, header text extraction and configuration options.  \n- Please see [PDF Parsing Configs example](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FParsing\u002Fpdf_parser_new_configs.py) for more details.  \n- Note:  deprecating support for aarch64-linux (will use 0.2.6 parsers).  Full support going forward for Linux Ubuntu20+ on x86_64 + with CUDA.  \n  \n**Friday, March 22 - v0.2.6 Update**  \n- New SLIM models:  summary, extract, xsum, boolean, tags-3b, and combo sentiment-ner.  \n- New logit and sampling analytics.  \n- New SLIM examples showing how to use the new models.  \n  \n**Thursday, March 14 - v0.2.5 Update**  \n- Improved support for GGUF on CUDA (Windows and Linux), with new prebuilt binaries and exception handling.  \n- Enhanced model configuration options (sampling, temperature, top logit capture).  \n- Added full back-level support for Ubuntu 20+ with parsers and GGUF engine.  \n- Support for new Anthropic Claude 3 models.  \n- New retrieval methods: document_lookup and aggregate_text.  \n- New model:  bling-stablelm-3b-tool - fast, accurate 3b quantized question-answering model - one of our new favorites.  \n\n**Wednesday, February 28 - v0.2.4 Update**  \n- Major upgrade of GGUF Generative Model class - support for Stable-LM-3B, CUDA build options, and better control over sampling strategies.\n- Note: new GGUF llama.cpp built libs packaged with build starting in v0.2.4.  \n- Improved GPU support for HF Embedding Models.   \n  \n**Friday, February 16 - v0.2.3 Update**  \n- Added 10+ embedding models to ModelCatalog - nomic, jina, bge, gte, ember and uae-large.   \n- Updated OpenAI support >=1.0 and new text-3 embedding models.    \n- SLIM model keys and output_values now accessible in ModelCatalog.  \n- Updating encodings to 'utf-8-sig' to better handle txt\u002Fcsv files with bom.  \n\n**Latest Updates - 19 Jan 2024 - llmware v0.2.0**\n  - Added new database integration options - Postgres and SQlite\n  - Improved status update and parser event logging options for parallelized parsing\n  - Significant enhancements to interactions between Embedding + Text collection databases\n  - Improved error exception handling in loading dynamic modules\n\n**Latest Updates - 15 Jan 2024: llmware v0.1.15**\n  - Enhancements to dual pass retrieval queries\n  - Expanded configuration objects and options for endpoint resources\n    \n**Latest Updates - 30 Dec 2023: llmware v0.1.14**\n  - Added support for Open Chat inference servers (compatible with OpenAI API)\n  - Improved capabilities for multiple embedding models and vector DB configurations\n  - Added docker-compose install scripts for PGVector and Redis vector databases\n  - Added 'bling-tiny-llama' to model catalog\n         \n**Latest Updates - 22 Dec 2023: llmware v0.1.13**\n  - Added 3 new vector databases - Postgres (PG Vector), Redis, and Qdrant\n  - Improved support for integrating sentence transformers directly in the model catalog\n  - Improvements in the model catalog attributes\n  - Multiple new Examples in Models & Embeddings, including GGUF, Vector database, and model catalog\n\n- **17 Dec 2023: llmware v0.1.12**\n  - dragon-deci-7b added to catalog - RAG-finetuned model on high-performance new 7B model base from Deci\n  - New GGUFGenerativeModel class for easy integration of GGUF Models\n  - Adding prebuilt llama_cpp \u002F ctransformer shared libraries for Mac M1, Mac x86, Linux x86 and Windows\n  - 3 DRAGON models packaged as Q4_K_M GGUF models for CPU laptop use (dragon-mistral-7b, dragon-llama-7b, dragon-yi-6b)\n  - 4 leading open source chat models added to default catalog with Q4_K_M\n  \n- **8 Dec 2023: llmware v0.1.11**\n  - New fast start examples for high volume Document Ingestion and Embeddings with Milvus.\n  - New LLMWare 'Pop up' Inference Server model class and example script.\n  - New Invoice Processing example for RAG.\n  - Improved Windows stack management to support parsing larger documents.\n  - Enhancing debugging log output mode options for PDF and Office parsers.\n\n- **30 Nov 2023: llmware v0.1.10**\n  - Windows added as a supported operating system.\n  - Further enhancements to native code for stack management. \n  - Minor defect fixes.\n\n- **24 Nov 2023: llmware v0.1.9**\n  - Markdown (.md) files are now parsed and treated as text files.\n  - PDF and Office parser stack optimizations which should avoid the need to set ulimit -s.\n  - New llmware_models_fast_start.py example that allows discovery and selection of all llmware HuggingFace models.\n  - Native dependencies (shared libraries and dependencies) now included in repo to faciliate local development.\n  - Updates to the Status class to support PDF and Office document parsing status updates.\n  - Minor defect fixes including image block handling in library exports.\n\n- **17 Nov 2023: llmware v0.1.8**\n  - Enhanced generation performance by allowing each model to specific the trailing space parameter.\n  - Improved handling for eos_token_id for llama2 and mistral.\n  - Improved support for Hugging Face dynamic loading\n  - New examples with the new llmware DRAGON models.\n    \n- **14 Nov 2023: llmware v0.1.7**\n  - Moved to Python Wheel package format for PyPi distribution to provide seamless installation of native dependencies on all supported platforms.  \n  - ModelCatalog enhancements:\n    - OpenAI update to include newly announced ‘turbo’ 4 and 3.5 models.\n    - Cohere embedding v3 update to include new Cohere embedding models.\n    - BLING models as out-of-the-box registered options in the catalog. They can be instantiated like any other model, even without the “hf=True” flag.\n    - Ability to register new model names, within existing model classes, with the register method in ModelCatalog.\n  - Prompt enhancements:\n    - “evidence_metadata” added to prompt_main output dictionaries allowing prompt_main responses to be plug into the evidence and fact-checking steps without modification.\n    - API key can now be passed directly in a prompt.load_model(model_name, api_key = “[my-api-key]”)\n  - LLMWareInference Server - Initial delivery:\n    - New Class for LLMWareModel which is a wrapper on a custom HF-style API-based model.    \n    - LLMWareInferenceServer is a new class that can be instantiated on a remote (GPU) server to create a testing API-server that can be integrated into any Prompt workflow.    \n \n- **03 Nov 2023: llmware v0.1.6**\n  - Updated packaging to require mongo-c-driver 1.24.4 to temporarily workaround segmentation fault with mongo-c-driver 1.25.\n  - Updates in python code needed in anticipation of future Windows support.  \n\n- **27 Oct 2023: llmware v0.1.5**\n  - Four new example scripts focused on RAG workflows with small, fine-tuned instruct models that run on a laptop (`llmware` [BLING](https:\u002F\u002Fhuggingface.co\u002Fllmware) models).\n  - Expanded options for setting temperature inside a prompt class.\n  - Improvement in post processing of Hugging Face model generation.\n  - Streamlined loading of Hugging Face generative models into prompts.\n  - Initial delivery of a central status class: read\u002Fwrite of embedding status with a consistent interface for callers.\n  - Enhanced in-memory dictionary search support for multi-key queries.\n  - Removed trailing space in human-bot wrapping to improve generation quality in some fine-tuned models.\n  - Minor defect fixes, updated test scripts, and version update for Werkzeug to address [dependency security alert](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fsecurity\u002Fdependabot\u002F2).\n- **20 Oct 2023: llmware v0.1.4**\n  - GPU support for Hugging Face models.\n  - Defect fixes and additional test scripts.\n- **13 Oct 2023: llmware v0.1.3**\n  - MongoDB Atlas Vector Search support.\n  - Support for authentication using a MongoDB connection string.\n  - Document summarization methods.\n  - Improvements in capturing the model context window automatically and passing changes in the expected output length.  \n  - Dataset card and description with lookup by name.\n  - Processing time added to model inference usage dictionary.\n  - Additional test scripts, examples, and defect fixes.\n- **06 Oct 2023: llmware v0.1.1**\n  - Added test scripts to the github repository for regression testing.\n  - Minor defect fixes and version update of Pillow to address [dependency security alert](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fsecurity\u002Fdependabot\u002F1).\n- **02 Oct 2023: llmware v0.1.0**  🔥 Initial release of llmware to open source!! 🔥\n\n\n\u003C\u002Fdetails>\n\u003Cp align=\"centre\">\n  \u003Ca href=\"#top\">⬆️ Back to Top\u003C\u002Fa>\n\u003C\u002Fp>\n\n## 🤓 Read our White Papers\n\n\n- **Revolutionizing AI Deployment: Unleashing AI Acceleration with Intel's AI PCs and Model HQ by LLMWare** [AI PC Model HQ.pdf](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Ffiles\u002F18024139\u002FAI.PC.Model.HQ.pdf)\n- **Revultionizing AI Deployment (Intel Abstract Version)**  [LNL White paper (Abstract Version) final.pdf](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Ffiles\u002F18281644\u002FLNL.White.paper.Abstract.Version.final.pdf)\n\n- **Accelerating AI Powered Productivity with AI PCs** [Laptop.Performance.WP.Final (10).pdf](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Ffiles\u002F18024294\u002FLaptop.Performance.WP.Final.10.pdf)\n\n## Intel Joint Solutions\n\n- **Arrow Lake** \n[IPA.Optimization.Summary.LLMWare (1).pdf](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Ffiles\u002F18292873\u002FIPA.Optimization.Summary.LLMWare.1.pdf)\n\n## About Model HQ\n  - **Privacy Policy** [AI.BLOKS.PRIVACY.POLICY.1.3.25.pdf](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Ffiles\u002F19289355\u002FAI.BLOKS.PRIVACY.POLICY.1.3.25.pdf)\n\n- **Terms of Service** [AI.Bloks.Terms.of.Service.3.3.25.pdf](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Ffiles\u002F19289545\u002FAI.Bloks.Terms.of.Service.3.3.25.pdf)\n\n- **Acceptable Use Policy**[Acceptable Use Policy for Model HQ by AI BLOKS LLC.docx](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Ffiles\u002F18291481\u002FAcceptable.Use.Policy.for.Model.HQ.by.AI.BLOKS.LLC.docx)\n\n","# llmware\n![静态徽章](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10_%7C_3.11%7C_3.12%7C_3.13%7C_3.14-blue?color=blue)\n![PyPI - 版本](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fllmware?color=blue)\n[![成员](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fllmware-ai_llmware_readme_c41105f405f2.png)](https:\u002F\u002Fdiscord.gg\u002FbphreFK4NJ)\n[![文档](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Factions\u002Fworkflows\u002Fpages.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Factions\u002Fworkflows\u002Fpages.yml)  \n\n## 🧰🛠️ 用于构建基于知识的本地、私有、安全的LLM应用的统一框架       \n\n`llmware` 针对AI PC和本地笔记本电脑、边缘设备以及自托管部署进行了优化，可在广泛的Windows、Mac和Linux平台上运行，支持GGUF、OpenVINO、ONNXRuntime、ONNXRuntime-QNN（高通）、WindowsLocalFoundry和Pytorch等技术，提供一个高级接口，使用户能够轻松地利用针对目标平台优化的推理技术。  \n\n`llmware` 包含两个主要组件：  \n\n 1.  **包含300多种模型的模型目录** - 模型以量化、优化的格式预打包，以便充分利用设备上的GPU和NPU性能，支持主流开源模型家族以及50多种llmware微调的SLIM、Bling、Dragon和Industry-Bert模型，这些模型专为企业流程自动化中的关键任务而设计。同时，也支持来自OpenAI、Anthropic和Google的领先云端模型。  \n \n 2.  **RAG管道** - 集成化的组件，覆盖从连接知识源到生成式AI模型的完整生命周期，具备广泛的文档解析和摄取能力，并可创建可扩展的知识库。\n\n通过将这两个组件结合起来，`llmware` 提供了一套全面的工具，可快速构建基于知识的企业级LLM应用。  \n\n我们的愿景是：人工智能应当可持续、准确且具有成本效益，以尽可能小的计算资源完成任务。  \n\n我们几乎所有的示例和模型都可以在本地设备上运行——您现在就可以在自己的笔记本电脑上开始使用。   \n\n[加入我们的Discord](https:\u002F\u002Fdiscord.gg\u002FMhZn5Nc39h)   |  [观看YouTube教程](https:\u002F\u002Fwww.youtube.com\u002F@llmware)  | [在Huggingface上探索我们的模型系列](https:\u002F\u002Fwww.huggingface.co\u002Fllmware)   \n\n\n## 🎯  主要特性 \n使用`llmware`编写代码基于以下几个核心概念：\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>模型目录\u003C\u002Fb>: 无论底层实现如何，都可以通过简单的查找以相同的方式访问所有模型。 \n\u003C\u002Fsummary>  \n\n\n```python\n#   目录中包含300多种模型，其中50多种经过RAG优化的BLING、DRAGON和Industry BERT模型\n#   完全支持GGUF、OpenVINO、Onnxruntime、HuggingFace、Sentence Transformers以及主流的基于API的模型\n#   易于扩展以添加自定义模型——请参阅示例\n\nfrom llmware.models import ModelCatalog\nfrom llmware.prompts import Prompt\n\n#   所有模型都通过ModelCatalog访问\nmodels = ModelCatalog().list_all_models()\n\n#   要使用ModelCatalog中的任何模型，只需调用“load_model”方法并传入model_name参数\nmy_model = ModelCatalog().load_model(\"llmware\u002Fbling-phi-3-gguf\")\n\n#   调用模型进行推理：\noutput = my_model.inference(\"什么是AI的未来？\", add_context=\"这是需要阅读的文章\")\n\n#   调用模型进行流式输出：\nfor token in my_model.stream(\"什么是AI的未来？\"):\n    print(token, end=\"\")\n\n#   将模型集成到Prompt中：\nprompter = Prompt().load_model(\"llmware\u002Fbling-tiny-llama-v0\")\nresponse = prompter.prompt_main(\"什么是AI的未来？\", context=\"插入信息来源\")\n```\n\n\u003C\u002Fdetails>  \n\n\u003Cdetails>  \n\u003Csummary>\u003Cb>Library\u003C\u002Fb>: 大规模摄取、组织和索引知识集合——解析、文本分块和嵌入。 \u003C\u002Fsummary>  \n\n```python\n\nfrom llmware.library import Library\n\n#   解析并分块处理一组文档（pdf、pptx、docx、xlsx、txt、csv、md、json\u002Fjsonl、wav、png、jpg、html）  \n\n#   第一步——创建一个library，即“知识库容器”结构\n#          - libraries既包含文本集合（DB）资源，也包含文件资源（例如，llmware_data\u002Faccounts\u002F{library_name}）\n#          - 嵌入和查询都是针对library进行的\n\nlib = Library().create_new_library(\"my_library\")\n\n#    第二步——add_files是通用的摄取函数——指向包含多种文件类型的本地文件夹\n#           - 文件会根据文件扩展名被路由到相应的解析器，进行解析、文本分块并索引到文本集合数据库中\n\nlib.add_files(\"\u002Ffolder\u002Fpath\u002Fto\u002Fmy\u002Ffiles\")\n\n#   在library上安装嵌入——选择嵌入模型和向量数据库\nlib.install_new_embedding(embedding_model_name=\"mini-lm-sbert\", vector_db=\"milvus\", batch_size=500)\n\n#   向同一library添加第二个嵌入（混合搭配模型和向量数据库）  \nlib.install_new_embedding(embedding_model_name=\"industry-bert-sec\", vector_db=\"chromadb\", batch_size=100)\n\n#   可以轻松为不同的项目和团队创建多个library\n\nfinance_lib = Library().create_new_library(\"finance_q4_2023\")\nfinance_lib.add_files(\"\u002Ffinance_folder\u002F\")\n\nhr_lib = Library().create_new_library(\"hr_policies\")\nhr_lib.add_files(\"\u002Fhr_folder\u002F\")\n\n#    获取library卡片，其中包含关键元数据——文档、文本块、图像、表格、嵌入记录\nlib_card = Library().get_library_card(\"my_library\")\n\n#   查看所有library\nall_my_libs = Library().get_all_library_cards()\n\n```\n\u003C\u002Fdetails>  \n\n\u003Cdetails> \n\u003Csummary>\u003Cb>Query\u003C\u002Fb>: 使用文本、语义、混合、元数据以及自定义过滤器查询libraries。 \u003C\u002Fsummary>\n\n```python\n\nfrom llmware.retrieval import Query\nfrom llmware.library import Library\n\n#   第一步——加载之前创建的library \nlib = Library().load_library(\"my_library\")\n\n#   第二步——创建一个query对象并传入library\nq = Query(lib)\n\n#    第三步——运行各种不同的query（更多选项请参阅示例）\n\n#    基本文本query\nresults1 = q.text_query(\"text query\", result_count=20, exact_mode=False)\n\n#    语义query\nresults2 = q.semantic_query(\"semantic query\", result_count=10)\n\n#    结合文本query，仅限于library中的特定文档，并要求与query完全匹配\nresults3 = q.text_query_with_document_filter(\"new query\", {\"file_name\": \"selected file name\"}, exact_mode=True)\n\n#   如果library上有多个嵌入，可以在创建query对象时指定具体的嵌入名称\nq2 = Query(lib, embedding_model_name=\"mini_lm_sbert\", vector_db=\"milvus\")\nresults4 = q2.semantic_query(\"new semantic query\")\n```\n\n\u003C\u002Fdetails>  \n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Prompt with Sources\u003C\u002Fb>: 将知识检索与LLM推理相结合的最简单方式。 \u003C\u002Fsummary>\n\n```python\n\nfrom llmware.prompts import Prompt\nfrom llmware.retrieval import Query\nfrom llmware.library import Library\n\n#   构建一个prompt\nprompter = Prompt().load_model(\"llmware\u002Fbling-tiny-llama-v0\")\n\n#   添加一个文件——文件会被解析、文本分块、按query筛选，然后被打包成适合模型使用的上下文，\n\n#   包括在必要时分批处理，以适应模型的上下文窗口\n\nsource = prompter.add_source_document(\"\u002Ffolder\u002Fto\u002Fone\u002Fdoc\u002F\", \"filename\", query=\"fast query\")\n\n#   将查询结果（来自 Query）附加到 Prompt 中\nmy_lib = Library().load_library(\"my_library\")\nresults = Query(my_lib).query(\"my query\")\nsource2 = prompter.add_source_query_results(results)\n\n#   对库执行新查询，并直接加载到提示中\nsource3 = prompter.add_source_new_query(my_lib, query=\"my new query\", query_type=\"semantic\", result_count=15)\n\n#   使用“带来源的提示”进行推理\nresponses = prompter.prompt_with_source(\"my query\")\n\n#   在推理之后进行事实核查\nfact_check = prompter.evidence_check_sources(responses)\n\n#   查看来源材料（已分批处理并准备好供模型使用，且已附加到提示中）\nsource_materials = prompter.review_sources_summary()\n\n#   查看完整的提示历史\nprompt_history = prompter.get_current_history()\n```\n\n\u003C\u002Fdetails>  \n\n\u003Cdetails> \n\u003Csummary>\u003Cb>RAG优化模型\u003C\u002Fb> - 1-70亿参数模型，专为RAG工作流集成和本地运行而设计。\u003C\u002Fsummary>  \n\n```\n\"\"\" 这个“Hello World”示例展示了如何使用提供的上下文，通过PyTorch和GGUF版本，开始使用本地BLING模型。 \"\"\"\n\nimport time\nfrom llmware.prompts import Prompt\n\n\ndef hello_world_questions():\n\n    test_list = [\n\n    {\"query\": \"发票总额是多少？\",\n     \"answer\": \"$22,500.00\",\n     \"context\": \"服务供应商公司 \\n100榆树街 普莱森特维尔, 纽约州 \\n致阿尔法公司 5900第一街 洛杉矶, 加利福尼亚州 \\n描述 前端工程服务 $5000.00 \\n 后端工程服务 $7500.00 \\n 质量保证经理 $10,000.00 \\n 总额 $22,500.00 \\n 所有支票请支付给服务供应商公司。付款应在30天内完成。\\n如果您对本发票有任何疑问，请联系比亚·赫尔墨斯。\\n感谢您的惠顾！ 发票 发票编号0001 日期2022年1月1日 针对阿尔法项目 P.O. 编号1000\"},\n\n    {\"query\": \"贸易顺差金额是多少？\",\n     \"answer\": \"624亿日元（4.166亿美元）\",\n     \"context\": \"日本9月份贸易收支转为顺差，超出预期\\n日本9月份录得624亿日元（4.166亿美元）的贸易顺差，这一数据好于路透社调查的经济学家预测的425亿日元贸易逆差。日本海关部门的数据显示，9月份出口同比增长4.3%，而进口则同比下降16.3%。据FactSet称，对亚洲的出口已连续第九个月下滑，这反映出中国经济持续疲软。不过，FactSet还指出，对西方市场的出口有所增长，从而支撑了整体出口表现。——林慧洁\"},\n\n    {\"query\": \"LISP机器市场何时崩溃？\",\n     \"answer\": \"1987年。\",\n     \"context\": \"与会者在20世纪60年代成为了人工智能研究的领军人物。\\n他们及其学生开发出的程序被媒体形容为‘令人惊叹’：计算机能够学习跳棋策略、解决代数应用题、证明逻辑定理，甚至能用英语对话。到了20世纪60年代中期，美国国防部对人工智能研究给予了大量资助，世界各地也纷纷建立了相关实验室。赫伯特·西蒙曾预言：‘在二十年内，机器将能够完成人类所能做的任何工作。’马文·明斯基也表示赞同，他写道：‘再过一代人的时间……创造“人工智能”的问题将基本得到解决。’然而，他们低估了问题的复杂性。由于詹姆斯·莱特希尔爵士的批评以及美国国会不断施加的压力，要求资助更具生产力的项目，美英两国政府相继停止了探索性研究。明斯基和帕珀特合著的《感知器》一书被解读为证明了人工神经网络方法永远无法用于解决现实世界的问题，从而彻底否定了这一技术路线。随后进入了‘AI寒冬’时期，即很难获得人工智能项目资金支持的阶段。到了20世纪80年代初，专家系统的商业成功重新点燃了人工智能研究的热潮——这种人工智能程序能够模拟人类专家的知识和分析能力。到1985年，人工智能市场规模已超过10亿美元。与此同时，日本的第五代计算机计划促使美英两国政府恢复了对学术研究的资金支持。然而，自1987年LISP机器市场崩溃以来，人工智能再次陷入低谷，第二次更漫长的寒冬也随之到来。”},\n\n{\"query\": \"目前10年期美国国债的收益率是多少？\",\n     \"answer\": \"4.58%\",\n     \"context\": \"尽管美国公布了好于预期的就业数据，且美国国债收益率大幅上升，但股市周五仍出现反弹。道琼斯工业平均指数上涨195.12点，涨幅0.76%，收于31,419.58点。标准普尔500指数上涨1.59%，报4,008.50点。以科技股为主的纳斯达克综合指数上涨1.35%，收于12,299.68点。美国劳工部表示，8月份美国经济新增就业岗位438,000个。接受道琼斯调查的经济学家此前预计新增岗位273,000个。不过，上月薪资增幅低于预期。  在最初因就业报告强于预期而下跌后，股市在周五出现了惊人的反转。盘中最低点时，道指一度下挫198点；而在涨势最猛时，又飙升超过500点。纳斯达克和标准普尔500指数当日最低点时分别下跌了0.8%。  交易员们对日内反转的原因尚不明确。一些人指出，可能是就业报告中薪资数据不及预期，促使投资者重新评估此前的看跌立场。另一些人则认为，收益率从当日高位回落也是原因之一。此次反弹或许还与市场此前过度超卖有关——本周早些时候，标准普尔500指数曾较今年高点下跌逾9%。  报告发布后，收益率一度飙升，10年期美国国债收益率逼近14年来的最高水平。随后该基准利率有所回落，但仍较前一日上涨约6个基点，报4.58%。  “我们看到收益率从此前的4.8%左右有所回落。随着收益率小幅回撤，我认为这对股市有所帮助，”Vibrant Industries Capital Advisors首席投资官玛格丽特·琼斯表示，“近几周市场一直表现疲软，可能存在超卖的情况。”\"},\n\n    {\"query\": \"预计毛利率是否高于70%？\",\n     \"answer\": \"是的，介于71.5%到72%之间。\",\n     \"context\": \"英伟达公司对2024财年第三季度的展望如下：  预计营收为160亿美元，上下浮动2%。GAAP和非GAAP毛利率预计分别为71.5%和72.5%，上下浮动50个基点。  GAAP和非GAAP营业费用预计分别为约29.5亿美元和20亿美元。  GAAP和非GAAP其他收入及支出预计为约1亿美元的收益，不包括非关联投资的损益。GAAP和非GAAP税率预计均为14.5%，上下浮动1%，不包括任何一次性项目。  亮点：自上次财报发布以来，英伟达在以下领域取得了进展：  数据中心：第二季度营收创历史新高，达到103.2亿美元，较上一季度增长141%，较去年同期增长171%。宣布用于复杂人工智能和高性能计算工作负载的NVIDIA® GH200 Grace™ Hopper™超级芯片已于本季度开始出货，第二代配备HBM3e内存的版本预计将于2024年第二季度出货。\"},\n\n    {\"query\": \"美国银行对塔吉特的评级是什么？\",\n     \"answer\": \"买入\",\n     \"context\": \"以下是我在10月12日（星期四）重点关注的一些股票代码，摘自我的记者笔记本：  标准普尔500指数熊市底部3,577点已满一周年。自那以来，截至周三收盘的4,376点，这一宽基指数已飙升逾22%。  9月份消费者价格指数高于预期，消费者通胀居高不下。美国社会保障管理局宣布，2024年的生活成本调整幅度为3.2%。  墨西哥卷饼连锁店Chipotle Mexican Grill (CMG)计划提价，显示出其定价能力。该公司援引消费者价格指数显示，零售通胀在过去两年内第四次呈现粘性特征。  美国银行将塔吉特 (TGT) 的评级从“中性”上调至“买入”，理由是当前股价处于低位，风险回报比具有吸引力。预计客流量有望改善，毛利率存在上行空间，商品陈列和货运运输状况也在好转。塔吉特将于下月公布季度财报。  在零售板块，CNBC投资俱乐部的投资组合持有TJX Companies (TJX)，这家折扣巨头旗下拥有T.J. Maxx、Marshalls和HomeGoods等品牌。高盛策略性买入的俱乐部标的包括富国银行 (WFC)——该公司将于周五公布财报——以及Humana (HUM)和英伟达 (NVDA)。美国银行首次给予Snowflake (SNOW)“买入”评级。  如果你喜欢这篇报道，请免费订阅吉姆·克莱默的《市场十大晨间思考》邮件通讯。  巴克莱下调了消费品公司的目标股价：UTZ Brands (UTZ)由17美元下调至16美元；卡夫亨氏 (KHC)由38美元下调至36美元。受周期性因素拖累。J.M. Smucker (SJM)由160美元下调至129美元，受长期不利因素影响。可口可乐 (KO)由70美元下调至59美元。  巴克莱还下调了与住房相关的股票目标价：Toll Brothers (TOL)由82美元下调至74美元，维持“低配”评级；同时下调了Trex (TREX)和Azek (AZEK)的目标价。  高盛 (GS)宣布出售其金融科技平台，并警告称这将在第三季度每股摊薄19美分的盈利。收购方是由私募股权公司Sixth Street牵头的投资集团。此举被视为纠正过去的失误。  摩根士丹利表示，Spotify (SPOT)的用户参与度正在提升，因此将目标股价由185美元上调至190美元，维持“超配（买入）”评级。  摩根大通看好美妆品牌elf Beauty (ELF)，维持“超配（买入）”评级，但将目标股价由150美元下调至139美元。该公司认为，第三季度的经营环境仍将“充满挑战”。该俱乐部持有高端美妆公司雅诗兰黛 (EL)的股份。  巴克莱将First Solar (FSLR)的评级由“同等权重（持有）”上调至“超配”，但将其目标股价由230美元下调至224美元。此次评级上调基于更好的风险回报比，且该公司在公用事业规模项目中的前景最为清晰。\"},\n\n{\"query\": \"第三季度销售额的降幅是多少？\",\n     \"answer\": \"同比下降20%。\",\n     \"context\": \"诺基亚表示，将在第三季度财报大幅下滑后，作为成本削减计划的一部分，裁减多达14,000个工作岗位。这家芬兰电信巨头称，将降低其成本基础并提高运营效率，以“应对充满挑战的市场环境”。此次大规模裁员是在诺基亚报告第三季度净销售额同比下降20%，降至49.8亿欧元之后进行的。该季度利润同比暴跌69%，至1.33亿欧元。\"},\n\n    {\"query\": \"关键要点有哪些？\",\n     \"answer\": \"•周五股市上涨，得益于强于预期的美国就业数据以及国债收益率上升；\\n•道琼斯指数上涨195.12点；\\n•标普500指数上涨1.59%；\\n•纳斯达克综合指数上涨1.35%；\\n•美国经济8月份新增就业岗位438,000个，高于预期的273,000个；\\n•10年期美国国债收益率接近14年来的最高水平，报4.58%。\",\n     \"context\": \"尽管公布了强于预期的美国就业数据且国债收益率大幅上升，但周五股市仍出现反弹。道琼斯工业平均指数上涨195.12点，涨幅0.76%，收于31,419.58点。标普500指数上涨1.59%，收于4,008.50点。以科技股为主的纳斯达克综合指数上涨1.35%，收于12,299.68点。美国劳工部表示，美国经济8月份新增就业岗位438,000个。接受道琼斯调查的经济学家此前预计新增就业岗位273,000个。然而，上月薪资涨幅低于预期。在最初因强于预期的就业报告而下跌后，股市在周五出现了惊人的反转。盘中最低点时，道指一度下跌198点；但在涨势最猛时，却飙升了500多点。纳斯达克和标普500指数当日最低点时分别下跌0.8%。交易员们对盘中逆转的原因尚不明确。一些人认为，可能是就业报告中薪资增幅不及预期，促使投资者重新评估此前的看跌立场。另一些人则指出，收益率从当日高位回落也起到了作用。此次反弹的部分原因可能在于市场此前过度抛售——本周早些时候，标普500指数曾较今年高点下跌超过9%。报告发布后，收益率一度飙升，10年期美国国债收益率逼近14年来的最高水平。随后该基准收益率有所回落，但仍较前一日高出约6个基点，报4.58%。Vibrant Industries Capital Advisors首席投资官玛格丽特·琼斯表示：“我们看到收益率从之前的约4.8%有所回落。随着收益率小幅回撤，我认为这对股市有所帮助。”她补充说：“近几周市场一直表现疲软，可能存在超卖的情况。”\"}\n\n    ]\n\n    return test_list\n\n\n\n\n# 这是需要运行的主脚本\n\ndef bling_meets_llmware_hello_world (model_name):\n\n    t0 = time.time()\n\n    # 加载问题\n    test_list = hello_world_questions()\n\n    print(f\"\\n > 加载模型: {model_name}...\")\n\n    # 加载模型 \n    prompter = Prompt().load_model(model_name)\n\n    t1 = time.time()\n    print(f\"\\n > 模型 {model_name} 加载时间: {t1-t0} 秒\")\n \n    for i, entries in enumerate(test_list):\n\n        print(f\"\\n{i+1}. 查询: {entries['query']}\")\n     \n        # 运行提示\n        output = prompter.prompt_main(entries[\"query\"],context=entries[\"context\"]\n                                      , prompt_name=\"default_with_context\",temperature=0.30)\n\n        # 打印结果\n        llm_response = output[\"llm_response\"].strip(\"\\n\")\n        print(f\"LLM 回答: {llm_response}\")\n        print(f\"黄金答案: {entries['answer']}\")\n        print(f\"LLM 使用情况: {output['usage']}\")\n\n    t2 = time.time()\n\n    print(f\"\\n总处理时间: {t2-t1} 秒\")\n\n    return 0\n\n\nif __name__ == \"__main__\":\n\n    # HuggingFace 上适用于笔记本电脑的小型 BLING 模型列表（RAG 指令版）\n\n    pytorch_models = [\"llmware\u002Fbling-1b-0.1\",                    # 最受欢迎\n                      \"llmware\u002Fbling-tiny-llama-v0\",             # 最快 \n                      \"llmware\u002Fbling-1.4b-0.1\",\n                      \"llmware\u002Fbling-falcon-1b-0.1\",\n                      \"llmware\u002Fbling-cerebras-1.3b-0.1\",\n                      \"llmware\u002Fbling-sheared-llama-1.3b-0.1\",    \n                      \"llmware\u002Fbling-sheared-llama-2.7b-0.1\",\n                      \"llmware\u002Fbling-red-pajamas-3b-0.1\",\n                      \"llmware\u002Fbling-stable-lm-3b-4e1t-v0\",\n                      \"llmware\u002Fbling-phi-3\"                      # 最准确（也是最新）  \n                      ]\n\n    # 量化后的 GGUF 版本通常加载更快，并且在至少配备16 GB内存的笔记本电脑上运行良好\n    gguf_models = [\"bling-phi-3-gguf\", \"bling-stablelm-3b-tool\", \"dragon-llama-answer-tool\", \"dragon-yi-answer-tool\", \"dragon-mistral-answer-tool\"]\n\n    #   尝试使用 PyTorch 或 GGUF 模型列表中的任意一个模型\n    #   最新（也是最准确）的是 'bling-phi-3-gguf'  \n\n    bling_meets_llmware_hello_world(gguf_models[0]  \n\n    #   请在 HuggingFace 上查看模型卡片，了解 RAG 基准测试性能结果及其他有用信息\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>简单可扩展的数据库选项 \u003C\u002Fb> - 从笔记本电脑到并行化集群的一体化数据存储。 \u003C\u002Fsummary>\n\n```python\n\nfrom llmware.configs import LLMWareConfig\n\n#   设置集合数据库 - MongoDB、SQLite、PostgreSQL  \nLLMWareConfig().set_active_db(\"mongo\")  \n\n#   设置向量数据库（或在安装时声明）  \n#   --选项：Milvus、pg_vector（PostgreSQL）、Redis、Qdrant、Faiss、Pinecone、Mongo Atlas  \nLLMWareConfig().set_vector_db(\"milvus\")  \n\n#   快速启动 - 无需安装  \nLLMWareConfig().set_active_db(\"sqlite\")  \nLLMWareConfig().set_vector_db(\"chromadb\")   # 也可以尝试 Faiss 和 LanceDB  \n\n#   单一 PostgreSQL 部署  \nLLMWareConfig().set_active_db(\"postgres\")  \nLLMWareConfig().set_vector_db(\"postgres\")\n\n#   要安装MongoDB、Milvus和PostgreSQL，请参阅docker-compose脚本以及示例\n\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\n\u003Csummary> \u003Cb> 具有函数调用和SLIM模型的智能体 \u003C\u002Fb> \u003C\u002Fsummary>  \n\n```python\n\nfrom llmware.agents import LLMfx\n\ntext = (\"特斯拉股价在盘前交易中下跌8%，此前该公司公布的第四季度营收和利润均低于分析师预期。这家电动汽车公司还警告称，2024年的汽车销量增速‘可能明显低于’去年的增速。与此同时，汽车业务收入仅同比增长1%，部分原因是其电动汽车的售价较以往有所下降。特斯拉在今年下半年在全球范围内实施了大幅降价。在周三的一次演示中，该公司提醒投资者，目前正处于‘两波重大增长之间’。\")\n\n#   使用LLMfx类创建一个智能体\nagent = LLMfx()\n\n#   加载待处理文本\nagent.load_work(text)\n\n#   将“模型”作为“工具”加载到分析流程中\nagent.load_tool(\"sentiment\")\nagent.load_tool(\"extract\")\nagent.load_tool(\"topics\")\nagent.load_tool(\"boolean\")\n\n#   使用不同工具运行函数调用\nagent.sentiment()\nagent.topics()\nagent.extract(params=[\"company\"])\nagent.extract(params=[\"automotive revenue growth\"])\nagent.xsum()\nagent.boolean(params=[\"预计2024年的增长会强劲吗？（请解释）\"])\n\n#   处理结束后，展示由关键信息自动汇总而成的报告\nreport = agent.show_report()\n\n#   显示整个处理过程的活动摘要\nactivity_summary = agent.activity_summary()\n\n#   收集到的所有响应列表\nfor i, entries in enumerate(agent.response_list):\n    print(\"更新：响应分析：\", i, entries)\n\noutput = {\"report\": report, \"activity_summary\": activity_summary, \"journal\": agent.journal}  \n\n```\n\n\u003C\u002Fdetails>\n\u003Cdetails>\n\n\u003Csummary> 🚀 \u003Cb>开始编码 - RAG快速入门\u003C\u002Fb> \u003C\u002Fsummary>\n\n```python\n# 本示例展示了使用本地运行的RAG优化LLM进行简单合同分析的过程\n\nimport os\nimport re\nfrom llmware.prompts import Prompt, HumanInTheLoop\nfrom llmware.setup import Setup\nfrom llmware.configs import LLMWareConfig\n\ndef contract_analysis_on_laptop (model_name):\n\n    # 在这个场景中，我们将：\n    # -- 下载一组示例合同文件\n    # -- 创建一个Prompt并加载BLING LLM模型\n    # -- 解析每份合同，提取相关段落，并将问题传递给本地LLM\n\n    # 主循环 - 遍历每份合同：\n    #\n    #      1.  在内存中解析文档（将PDF文件转换为带有元数据的文本块）\n    #      2.  使用“主题”筛选解析后的文本块（例如，“管辖法律”），以提取相关段落\n    #      3.  将文本块打包并组装成适合模型输入的上下文\n    #      4.  对每份合同向LLM提出三个关键问题\n    #      5.  打印到屏幕\n    #      6.  将结果分别保存为json和csv格式，以便后续处理和审查。\n\n    #  加载llmware示例文件\n\n    print (f\"\\n > 正在加载llmware示例文件...\")\n\n    sample_files_path = Setup().load_sample_files()\n    contracts_path = os.path.join(sample_files_path,\"Agreements\")\n \n    #  查询列表 - 这是我们希望LLM针对每份合同分析的3个主要主题和问题\n\n    query_list = {\"executive employment agreement\": \"双方当事人的名称是什么？\",\n                  \"base salary\": \"高管的基本工资是多少？\",\n                  \"vacation\": \"该高管将获得多少天的带薪假期？\"}\n\n    #  根据传入函数的名称加载选定的模型\n\n    print (f\"\\n > 正在加载模型 {model_name}...\")\n\n    prompter = Prompt().load_model(model_name, temperature=0.0, sample=False)\n\n    #  主循环\n\n    for i, contract in enumerate(os.listdir(contracts_path)):\n\n        #   排除Mac系统生成的文件（虽然烦人，但在演示中是不可避免的）\n        if contract != \".DS_Store\":\n\n            print(\"\\n正在分析合同：\", str(i+1), contract)\n\n            print(\"LLM的回答：\")\n\n            for key, value in query_list.items():\n\n                # 上述步骤1 + 2 + 3 - 合同被解析、分块、按主题键过滤，\n                # ... 然后被打包进prompt中\n\n                source = prompter.add_source_document(contracts_path, contract, query=key)\n\n                # 上述步骤4 - 使用已打包在prompt中的“source”信息调用LLM\n\n                responses = prompter.prompt_with_source(value, prompt_name=\"default_with_context\")  \n\n                # 上述步骤5 - 打印到屏幕\n\n                for r, response in enumerate(responses):\n                    print(key, \":\", re.sub(\"[\\n]\",\" \", response[\"llm_response\"]).strip())\n\n                # 完成这份合同后，清除prompt中的源材料\n                prompter.clear_source_materials()\n\n    # 上述步骤6 - 将分析结果保存为jsonl和csv格式\n\n    # 将jsonl报告保存到\u002Fprompt_history文件夹\n    print(\"\\nPrompt状态已保存至：\", os.path.join(LLMWareConfig.get_prompt_path(),prompter.prompt_id))\n    prompter.save_state()\n\n    # 保存包含模型、响应、prompt及证据的csv报告，供人工审核\n    csv_output = HumanInTheLoop(prompter).export_current_interaction_to_csv()\n    print(\"csv输出已保存至：\", csv_output)\n\n\nif __name__ == \"__main__\":\n\n    # 使用本地CPU模型 - 尝试最新的 - 经过RAG微调并以GGUF格式封装的Phi-3模型  \n    model = \"bling-phi-3-gguf\"\n\n    contract_analysis_on_laptop(model)\n\n```\n\u003C\u002Fdetails>\n\n## 🔥 最新增强与功能   \n\n### ONNXRuntime-QNN - 在Snapdragon NPU上运行模型（Windows Arm64） \n\n- 模型目录中有7个NPU优化模型“即刻可用” - 请参阅[using-qnn-npu-models示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FModels\u002Fusing-qnn-npu-models.py)  \n\n### OpenVINO编码器 - 新示例（Windows Intel x86）  \n\n- 20个经过OV优化的编码模型，使用OVEmbeddingModel类 - 支持广泛的嵌入、重排序器和分类器，例如，  \n- [using_openvino_embedding_model](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_openvino_embedding_model.py)  \n- [using_openvino_reranker_model](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_openvino_reranker_model.py)  \n- [using_openvino_classifier_model](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_openvino_classifier_model.py)  \n\n### ONNXRuntime重排序器 - 使用为Onnxruntime部署优化的重排序器   \n\n- [using_onnx_reranker_model](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_onnx_reranker_models.py)   \n  \n### WindowsLocalFoundry集成 - 在llmware中使用WindowsLocalFoundry模型  \n\n- [using_local_foundry_model](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_local_foundry_models.py)\n\n### 模型库 - 规模最大的基于 OpenVINO 的预构建大语言模型集合（95+ 个）  \n\n- [Hugging Face 上的模型库](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fllmware\u002Fmodel-depot)   \n- [使用 OpenVINO 进行流式生成](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_openvino_streamer.py)     \n- [OpenVINO 模型入门](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_openvino_models.py)  \n\n### 图像生成 - 多媒体机器人  \n\n- [多媒体机器人示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FUI\u002Fmultimedia_bot.py)  \n\n[**基于 SLIM 模型的多模型智能体**](examples\u002FSLIM-Agents\u002F) - [**介绍视频**](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=cQfdaTcmBpY)    \n\n### 新用例与应用  \n\n- **BizBot：RAG + SQL 本地聊天机器人**  \n  使用 RAG 和 SQL 实现用于商业智能的本地聊天机器人。\n  - [代码示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fbiz_bot.py) | [演示视频](https:\u002F\u002Fyoutu.be\u002F4nBYDEjxxTE?si=o6PDPbu0PVcT-tYd)\n\n- **讲座工具**  \n  支持对语音录音进行问答，适用于教育和讲座分析。\n  - [讲座工具代码](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FUse_Cases\u002Flecture_tool\u002F)\n\n- **金融研究 Web 服务**  \n  一个端到端示例，展示结合智能体调用的金融研究 Web 服务。\n  - [演示视频](https:\u002F\u002Fyoutu.be\u002Fl0jzsg1_Ik0?si=hmLhpT1iv_rxpkHo) | [代码示例](examples\u002FUse_Cases\u002Fweb_services_slim_fx.py)\n\n### 音频与文本处理\n\n- **使用 WhisperCPP 进行语音转录**  \n  从 WhisperCPP 开始转录项目，提供样本文件使用工具和著名演讲内容。\n  - [入门指南](examples\u002FModels\u002Fusing-whisper-cpp-getting-started.py) | [解析伟大演讲](examples\u002FUse_Cases\u002Fparsing_great_speeches.py) | [演示视频](https:\u002F\u002Fyoutu.be\u002F5y0ez5ZBpPE?si=KVxsXXtX5TzvlEws)\n\n- **自然语言查询转 CSV**  \n  使用 Slim-SQL 将自然语言查询转换为 CSV，支持自定义 Postgres 表。\n  - [演示视频](https:\u002F\u002Fyoutu.be\u002Fz48z5XOXJJg?si=V-CX1w-7KRioI4Bi) | [端到端示例](examples\u002FSLIM-Agents\u002Ftext2sql-end-to-end-2.py) | [自定义表使用](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fagent_with_custom_tables.py)\n\n### 多模型智能体\n\n- **基于 SLIM 的多模型智能体**  \n  在复杂工作流中使用 CPU 上的 SLIM 模型构建多步智能体。\n  - [演示视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=cQfdaTcmBpY) | [示例目录](examples\u002FSLIM-Agents)\n\n### 文档与 OCR 处理\n\n- **嵌入文档中的 OCR 图像**  \n  系统性地从文档中嵌入的图像中提取文本，以增强文档处理能力。\n  - [OCR 示例](examples\u002FParsing\u002Focr_embedded_doc_images.py)\n\n- **PDF、Word、PowerPoint 和 Excel 的增强文档解析**  \n  改进了文本分块控制、表格提取和内容解析功能。\n  - [解析示例](examples\u002FParsing\u002Fpdf_parser_new_configs.py)\n\n- **优化 RAG 提示词的准确性**  \n  教程帮助调整 RAG 提示词设置以提高准确性。\n  - [设置示例](examples\u002FModels\u002Fadjusting_sampling_settings.py) | 视频：[第一部分](https:\u002F\u002Fyoutu.be\u002F7oMTGhSKuNY?si=14mS2pftk7NoKQbC), [第二部分](https:\u002F\u002Fyoutu.be\u002FiXp1tj-pPjM?si=T4teUAISnSWgtThu)  \n  \n刚接触 RAG？ [观看快速入门系列视频](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PL1-dn33KwsmD7SB9iSO6vx4ZLRAWea1DB)  \n\n[SLIM 函数调用模型简介](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_function_calls.py)  \n迫不及待了？立即获取 SLIM 模型：  \n\n```python \nfrom llmware.models import ModelCatalog\n\nModelCatalog().get_llm_toolkit()  # 获取所有 SLIM 模型，以小型、快速量化工具形式交付  \nModelCatalog().tool_test_run(\"slim-sentiment-tool\") # 查看模型的实际运行效果，并附带测试脚本  \n```\n\n## 🌱 入门指南\n\n**步骤 1 - 安装 llmware** -  `pip3 install llmware` 或 `pip3 install 'llmware[full]'`  \n\n- [核心安装](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fllmware\u002Frequirements.txt)（最小依赖集）  \n- [完整安装](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fllmware\u002Frequirements_extras.txt)（在核心基础上添加更广泛的 Python 相关库）。  \n\n\u003Cdetails>\n\u003Csummary>\u003Cb>步骤 2 - 查看示例\u003C\u002Fb> - 通过 100 多个“即插即用”教程快速上手 \u003C\u002Fsummary>\n\n## 🔥 热门新示例 🔥  \n\n端到端场景 - [**使用 SLIM 提取和 Web 服务进行金融研究的函数调用**](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fweb_services_slim_fx.py)  \n分析语音文件 - [**利用 LLM 查询与提取的伟大演讲**](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fparsing_great_speeches.py)  \nLLMWare 新手入门 - [**快速入门教程系列**](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Ffast_start)  \n环境搭建 - [**入门指南**](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FGetting_Started)  \nSLIM 示例 - [**SLIM 模型**](examples\u002FSLIM-Agents\u002F)  \n\n| 示例     |  详情      |\n|-------------|--------------|\n| 1.   BLING 模型快速入门（[代码](examples\u002FModels\u002Fbling_fast_start.py) \u002F [视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=JjgqOZ2v5oU)) | 使用快速、精准的基于 CPU 的模型开始——问答、键值对提取及基础摘要。  |\n| 2.   解析并嵌入 500 份 PDF 文档（[代码](examples\u002FEmbedding\u002Fdocs2vecs_with_milvus-un_resolutions.py)）  | 使用 Milvus 对联合国决议文档进行解析、嵌入与查询的端到端示例 |\n| 3.  混合检索——语义+文本（[代码](examples\u002FRetrieval\u002Fdual_pass_with_custom_filter.py)） | 利用“双通道”检索结合语义搜索与文本搜索的优势 |\n| 4.   多种嵌入模型与 PG Vector（[代码](examples\u002FEmbedding\u002Fusing_multiple_embeddings.py) \u002F [视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Bncvggy6m5Q)) | 使用 Postgres \u002F PG Vector 比较多种嵌入模型 |\n| 5.   DRAGON GGUF 模型（[代码](examples\u002FModels\u002Fdragon_gguf_fast_start.py) \u002F [视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=BI1RlaIJcsc&t=130s)) | 当前最先进的 7B RAG GGUF 模型。  | \n| 6.   使用 BLING 进行 RAG（[代码](examples\u002FUse_Cases\u002Fcontract_analysis_on_laptop_with_bling_models.py) \u002F [视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=8aV5p3tErP0)) | 以合同分析为例，尝试使用 `llmware` 的 BLING ~1B 参数 GPT 模型在您的笔记本电脑上进行复杂文档分析和文本提取。 |\n| 7.   使用 DRAGON 分析主服务协议（[代码](examples\u002FUse_Cases\u002Fmsa_processing.py) \u002F [视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Cf-07GBZT68&t=2s)) | 利用 DRAGON YI 6B 模型分析 MSA 协议。   |\n| 8.   Streamlit 示例（[代码](examples\u002FUI\u002Fsimple_rag_ui_with_streamlit.py)）  | 通过 UI 接口向发票提问并运行推理。  |\n| 9.   集成 LM Studio（[代码](examples\u002FModels\u002Fusing_open_chat_models.py) \u002F [视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=h2FDjUyvsKE&t=101s)) | 将 LM Studio 模型与 LLMWare 集成 |\n| 10.  带来源的提示词（[代码](examples\u002FPrompts\u002Fprompt_with_sources.py)）  | 将广泛的知识来源直接附加到提示词中。   |\n| 11.  事实核查（[代码](examples\u002FPrompts\u002Ffact_checking.py)）  | 在此示例脚本中探索完整的证据方法，用于分析一组合同。   |\n| 12.  使用 7B GGUF 聊天模型（[代码](examples\u002FModels\u002Fchat_models_gguf_fast_start.py)） | 在本地几分钟内使用 4 种最先进的 7B 聊天模型 |  \n\n\n查看：[llmware 示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FREADME.md)  \n\n\u003C\u002Fdetails>  \n\n\u003Cdetails>\n\u003Csummary>\u003Cb>步骤 3 - 教程视频\u003C\u002Fb> —— 请访问我们的 YouTube 频道，观看关于最新示例的 5–10 分钟高影响力教程。\u003C\u002Fsummary>\n\n🎬 快速上手，请观看以下视频：  \n- [文档摘要](https:\u002F\u002Fyoutu.be\u002FPs3W-P9A1m8?si=Rxvst3RJv8ZaOk0L)  \n- [Bling-3-GGUF 本地聊天机器人](https:\u002F\u002Fyoutu.be\u002FgzzEVK8p3VM?si=8cNn_do0oxSzCEnM)  \n- [基于代理的复杂研究分析](https:\u002F\u002Fyoutu.be\u002Fy4WvwHqRR60?si=jX3KCrKcYkM95boe)  \n- [SLIM 入门（附代码）](https:\u002F\u002Fyoutu.be\u002FaWZFrTDmMPc?si=lmo98_quo_2Hrq0C)  \n- [您是否为 RAG 使用了错误的提示方式？——随机采样·第一部分](https:\u002F\u002Fyoutu.be\u002F7oMTGhSKuNY?si=_KSjuBnqArvWzYbx)  \n- [您是否为 RAG 使用了错误的提示方式？——随机采样·第二部分——代码实验](https:\u002F\u002Fyoutu.be\u002FiXp1tj-pPjM?si=3ZeMgipY0vJDHIMY)  \n- [SLIM 模型简介](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=cQfdaTcmBpY)  \n- [Text2SQL 简介](https:\u002F\u002Fyoutu.be\u002FBKZ6kO2XxNo?si=tXGt63pvrp_rOlIP)  \n- [在笔记本电脑上使用 BLING 进行 RAG](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=JjgqOZ2v5oU)    \n- [DRAGON-7B-模型](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=d_u7VaKu6Qk&t=37s)  \n- [安装并比较多种嵌入模型——Postgres 和 PGVector](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Bncvggy6m5Q)  \n- [GGUF 量化背景及 DRAGON 模型示例](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ZJyQIZNJ45E)  \n- [使用 LM Studio 模型](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=h2FDjUyvsKE)  \n- [使用 Ollama 模型](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=qITahpVDuV0)  \n- [使用任意 GGUF 模型](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=9wXJgld7Yow)  \n- [使用小型 LLM 进行 RAG 合同分析（特邀 LLMWare）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=8aV5p3tErP0)\n- [LLMWare 处理发票](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=VHZSaBBG-Bo&t=10s)\n- [大规模导入 PDF 文件](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=O0adUfrrxi8&t=10s)\n- [使用 LLMWare 评估 LLM 用于 RAG](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=s0KWqYg5Buk&t=105s)\n- [使用 LLMWare 开源库快速启动 RAG](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0naqpH93eEU)\n- [无需数据库即可使用检索增强生成（RAG）](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=tAGz6yR14lw)\n- [启动 LLMWare 推理服务器](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=qiEmLnSRDUA&t=20s)\n\n\n\u003C\u002Fdetails>\n\n## ✍️ 使用 llmware GitHub 仓库  \n\n您可以将 llmware 仓库克隆到本地，以访问所有示例，或直接使用 llmware 的最新代码版本。  \n\n```bash\ngit clone git@github.com:llmware-ai\u002Fllmware.git\n```  \n\n我们在仓库根目录下提供了一个名为 **welcome_to_llmware** 的自动化脚本。克隆完成后：  \n- 在 Windows 命令行中：`.\\welcome_to_llmware_windows.sh`  \n- 在 Mac \u002F Linux 命令行中：`sh .\u002Fwelcome_to_llmware.sh`  \n\n或者，如果您更倾向于不使用欢迎自动化脚本完成设置，则后续步骤包括：  \n\n1. **安装 requirements.txt** - 在 \u002Fllmware 路径内执行，例如：`pip3 install -r llmware\u002Frequirements.txt`  \n\n2. **安装 requirements_extras.txt** - 在 \u002Fllmware 路径内执行，例如：`pip3 install -r llmware\u002Frequirements_extras.txt`（根据您的使用场景，您可能不需要全部或部分安装，但其中一些将在示例中用到。）  \n\n3. **运行示例** - 将一个或多个示例 .py 文件复制到项目根路径。（我们发现有些 IDE 会尝试从嵌套的 \u002Fexample 路径中交互式运行，但无法访问 \u002Fllmware 模块——简单的解决方法是将您想运行的示例直接复制到根路径。）  \n\n4. **安装向量数据库** - 无需安装服务器的向量数据库选项包括 Milvus Lite、ChromaDB、FAISS 和 LanceDB，这些数据库不需要单独安装服务器，但需要安装相应的 Python SDK 库，例如 `pip3 install pymilvus` 或 `pip3 install chromadb`。如果您查看 [examples\u002FEmbedding](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FEmbedding)，可以看到各种向量数据库的入门示例；而在仓库根目录下，您会找到易于使用的 Docker Compose 脚本，用于安装 Milvus、Postgres\u002Fpgvector、Mongo、Qdrant、Neo4j 和 Redis。  \n\n\n## 数据存储选项\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>快速入门\u003C\u002Fb>：使用 SQLite3 和 ChromaDB（基于文件）开箱即用——无需安装\u003C\u002Fsummary>  \n\n```python\nfrom llmware.configs import LLMWareConfig \nLLMWareConfig().set_active_db(\"sqlite\")   \nLLMWareConfig().set_vector_db(\"chromadb\")  \n```\n\u003C\u002Fdetails>  \n\n\u003Cdetails>\n\u003Csummary>\u003Cb>速度 + 扩展性\u003C\u002Fb>：使用 MongoDB（文本集合）和 Milvus（向量数据库）——通过 Docker Compose 安装\u003C\u002Fsummary> \n\n```bash\ncurl -o docker-compose.yaml https:\u002F\u002Fraw.githubusercontent.com\u002Fllmware-ai\u002Fllmware\u002Fmain\u002Fdocker-compose.yaml\ndocker compose up -d\n```\n\n```python\nfrom llmware.configs import LLMWareConfig\nLLMWareConfig().set_active_db(\"mongo\")\nLLMWareConfig().set_vector_db(\"milvus\")\n```\n\n\u003C\u002Fdetails>  \n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Postgres\u003C\u002Fb>：同时使用 Postgres 作为文本集合和向量数据库——通过 Docker Compose 安装\u003C\u002Fsummary> \n\n```bash\ncurl -o docker-compose.yaml https:\u002F\u002Fraw.githubusercontent.com\u002Fllmware-ai\u002Fllmware\u002Fmain\u002Fdocker-compose-pgvector.yaml\ndocker compose up -d\n```\n\n```python\nfrom llmware.configs import LLMWareConfig\nLLMWareConfig().set_active_db(\"postgres\")\nLLMWareConfig().set_vector_db(\"postgres\")\n```\n\n\u003C\u002Fdetails>  \n\n\u003Cdetails>\n\u003Csummary>\u003Cb>混合搭配\u003C\u002Fb>：LLMWare 支持 3 种文本集合数据库（Mongo、Postgres、SQLite）和 10 种向量数据库（Milvus、PGVector-Postgres、Neo4j、Redis、Mongo-Atlas、Qdrant、FAISS、LanceDB、ChromaDB 和 Pinecone）\u003C\u002Fsummary>\n\n```bash\n# 部署其他选项的脚本\ncurl -o docker-compose.yaml https:\u002F\u002Fraw.githubusercontent.com\u002Fllmware-ai\u002Fllmware\u002Fmain\u002Fdocker-compose-redis-stack.yaml\n```\n\n\u003C\u002Fdetails>  \n\n## 我们的模型介绍   \n\n- **SLIM 模型系列**：小型、专用模型，针对函数调用及多步、多模型 Agent 工作流进行了微调。  \n- **DRAGON 模型系列**：生产级 RAG 优化的 6-9B 参数模型——“在……之上实现 RAG”，基于领先的预训练基础模型。  \n- **BLING 模型系列**：小型 CPU 型 RAG 优化、指令遵循的 1B-5B 参数模型。  \n- **行业 BERT 模型**：开箱即用的自定义训练句子转换器嵌入模型，专为保险、合同、资产管理、SEC 等行业进行微调。  \n- **GGUF 量化**：我们提供了许多 SLIM、DRAGON 和 BLING 模型的 'gguf' 和 'tool' 版本，专为 CPU 部署优化。  \n\n\n有兴趣参与 llmware 的开发吗？有关参与方式的信息，请参阅我们的 [贡献者指南](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Frepo_docs\u002FCONTRIBUTING.md#contributing-to-llmware)。与该项目的各个方面一样，参与也受我们的 [行为准则](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Frepo_docs\u002FCODE_OF_CONDUCT.md)约束。\n\n欢迎在我们的 [GitHub 讨论区](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fdiscussions)提出问题和参与讨论。  \n\n## 📣 发布说明和变更日志  \n\n如需完整的发布说明历史，请打开“变更日志”选项卡。  \n\n**支持的操作系统**：Windows（x86 和 Arm64）、MacOS（Metal - M1-M5）、Linux（x86、aarch64）  \n- Linux —— 支持 Ubuntu 20+（glibc 2.31+）  \n- 如果您需要其他 Linux 版本的支持，请提交问题——我们将优先测试并确保支持。    \n\n**支持的向量数据库**：Milvus、Postgres（PGVector）、Neo4j、Redis、LanceDB、ChromaDB、Qdrant、FAISS、Pinecone、Mongo Atlas 向量搜索  \n\n**支持的文本索引数据库**：MongoDB、Postgres、SQLite  \n\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>可选\u003C\u002Fb>\u003C\u002Fsummary>\n\n- [Docker](https:\u002F\u002Fdocs.docker.com\u002Fget-docker\u002F)\n  \n- 若要启用 OCR 解析功能，请安装 [Tesseract v5.3.3](https:\u002F\u002Ftesseract-ocr.github.io\u002Ftessdoc\u002FInstallation.html) 和 [Poppler v23.10.0](https:\u002F\u002Fpoppler.freedesktop.org\u002F) 的原生软件包。\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n  \u003Csummary>\u003Cb>🚧 变更日志\u003C\u002Fb>\u003C\u002Fsummary>\n\n**周四，1月1日 - v0.4.3 - 进行中**  \n - 更新了 BaseModel 和 PromptCatalog 类  \n - 更新了云端模型版本，并支持 OpenAI、Gemini 和 Anthropic 的最新模型  \n - 移除了已弃用的模型类  \n - 移除了已弃用的模块（Dataset Builder 和 Graph）  \n - 其他模型类和卡片的更新仍在进行中  \n - 代码库已更新至最新状态，但尚未发布到 pip 安装版本——计划于 2026 年 1 月第 2 周发布  \n\n**周一，3月3日 - v0.4.0**  \n - 更新了 GGUF 实现、配置和库  \n - 更新了 ONNXRuntime 实现及配置  \n - ModelCatalog 中新增了 phi-4、Deepseek-Qwen-7B、Deepseek-Qwen-14B 等多款模型  \n - 新增对 Windows ARM64 的支持  \n - 将默认 active_db 更改为“sqlite”（生产环境中仍可使用 mongo 和 postgres）  \n - 简化了核心 requirements.txt 文件中的依赖项，并优化了 pip 安装流程  \n - “额外\u002F可选”依赖项可在 requirements_extras.txt 文件中找到，也可通过 pip 安装时传递的配置来指定（具体选项参见 setup.py）  \n\n**周五，11月8日 - v0.3.9**  \n - 优化了 Azure OpenAI 配置，包括流式生成功能  \n - 移除了针对 Linux aarch64 和 Mac x86 的已弃用解析器二进制文件  \n - 为 CustomTable 插入行操作新增了生成器选项，以便在构建大型表格时显示进度  \n\n**周日，10月27日 - v0.3.8**  \n - 将 Model Depot 收藏的 100 多种 OpenVino 和 ONNX 模型集成到 LLMWare 默认模型目录中  \n - 支持对模型类、模型目录和模型配置的调整  \n\n**周日，10月6日 - v0.3.7**  \n - 新增了 OVGenerativeModel 模型类，用于支持以 OpenVino 格式封装的模型  \n - 新增了 ONNXGenerativeModel 模型类，用于支持以 ONNX 格式封装的模型  \n - 开始使用 [OpenVino 示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_openvino_models.py)  \n - 开始使用 [ONNX 示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing_onnx_models.py)  \n\n**周二，10月1日 - v0.3.6**  \n - 新增了新的提示词和聊天模板  \n - 改进了模型配置并进行了更新  \n - 新增了用于在搜索结果中定位和高亮文本匹配的实用函数  \n - 改进了哈希校验相关工具函数  \n\n**周一，8月26日 - v0.3.5**  \n - 向 Model Catalog 中添加了 10 款 BLING+SLIM 模型，涵盖 Qwen2、Phi-3 和 Phi-3.5 系列  \n - 推出了基于 Qwen-7B、Yi-9B、Mistral-v0.3 和 Llama-3.1 的 DRAGON 系列新模型  \n - 新的 Qwen2 模型（包含 RAG 和函数调用微调版本）——[使用 Qwen2 模型示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing-qwen2-models.py)  \n - 新的 Phi-3 函数调用模型——[使用 Phi-3 函数调用示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing-phi-3-function-calls.py)  \n - 新的用例示例——[讲座工具](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FUse_Cases\u002Flecture_tool\u002F)  \n - 优化了 GGUF 配置以扩展上下文窗口  \n - 在模型配置中加入了模型基准性能数据  \n - 增强了工具函数中的哈希功能  \n\n**周一，7月29日 - v03.4**  \n - 加强了 LLMfx 代理在 text2sql 数据库读取操作中的安全防护  \n - 新增示例——参见 [示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FUI\u002Fdueling_chatbot.py)  \n - 更多 Notebook 示例——参见 [Notebook 示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FNotebooks)  \n\n**周一，7月8日 - v03.3**  \n - 改进了模型配置选项、日志记录，并修复了一些小问题  \n - 优化了 Azure OpenAI 配置——参见 [示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fblob\u002Fmain\u002Fexamples\u002FModels\u002Fusing-azure-openai.py)  \n\n**周六，6月29日 - v0.3.2**  \n - 更新了 PDF 和 Office 解析器，改进了日志记录和文本分块选项的配置  \n\n**周六，6月22日 - v0.3.1**  \n - 为 Fast Start 示例系列新增了第 3 模块——[关于代理与函数调用的示例 7-9](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Ffast_start)  \n - 新增了用于内存内语义相似度 RAG 的 Jina 重排序模型——参见 [示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FEmbedding\u002Fusing_semantic_reranker_with_rag.py)  \n - 优化了模型加载过程中的参数化获取方式  \n - 新增了 Pytorch 和 GGUF 版本的“tiny”精简提取和精简摘要工具——请查看“slim-extract-tiny-tool”和“slim-summary-tiny-tool”  \n - [Biz Bot] 用例——参见 [示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fbiz_bot.py) 和 [视频](https:\u002F\u002Fyoutu.be\u002F4nBYDEjxxTE?si=o6PDPbu0PVcT-tYd)  \n - 更新了 numpy 要求 \u003C2，并将 yfinance 版本最低要求提升至 >=0.2.38  \n\n**周二，6月4日 - v0.3.0**  \n - 新增了对 Milvus Lite 内嵌“无需安装”数据库的支持——参见 [示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FEmbedding\u002Fusing_milvus_lite.py)。  \n - 向目录和代理流程中添加了两款新的 SLIM 模型——['q-gen'](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FSLIM-Agents\u002Fusing-slim-q-gen.py) 和 ['qa-gen'](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FSLIM-Agents\u002Fusing-slim-qa-gen.py)    \n - 更新了模型类实例化方式，以提高扩展性，便于在不同模块中添加新类  \n - 新增了 welcome_to_llmware.sh 和 welcome_to_llmware_windows.sh 快速安装脚本  \n - 在模型基类中新增了可配置的 post_init 和 register 方法  \n - 创建了 InferenceHistory 来跟踪所有已完成推理的全局状态  \n - 对模块级别的日志记录进行了多项改进和更新  \n - 注意：从 v0.3.0 开始，pip 安装提供两种选项——基础最小安装 `pip3 install llmware` 可满足大多数用例需求，而完整安装 `pip3 install 'llmware[full]'` 则包含了其他常用库。  \n\n**周三，5月22日 - v0.2.15**  \n - 改进了模型类对 Pytorch 和 Transformers 依赖项的处理方式（按需即时加载）  \n - 扩展了 API 端点选项和推理服务器功能——参见新的 [客户端访问选项](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fllmware_inference_api_client.py) 和 [服务器启动示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fllmware_inference_server.py)  \n\n**周六，5月18日 - v0.2.14**  \n - 新的 OCR 图像解析方法——参见 [示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fslicing_and_dicing_office_docs.py)  \n - 开始实施日志改进的第一部分（进行中），涉及 Configs 和 Models 模块。  \n - 向目录中新增了一款嵌入模型——industry-bert-loans。  \n - 更新了模型导入方法和配置。\n\n**星期日，5月12日 - v0.2.13**  \n- 新增GGUF流式传输方法，附带[基础示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FModels\u002Fgguf_streaming.py)和[phi3本地聊天机器人](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUI\u002Fgguf_streaming_chatbot.py)  \n- 大幅清理了辅助导入和依赖项，以降低安装复杂度——请注意更新后的requirements.txt和setup.py文件。  \n- 添加了防御性代码，在代码的特定部分（如OCR、网页解析器）中，若缺少依赖项会给出详细的警告信息。  \n- 更新了测试、公告和文档。  \n- 创建了OpenAIConfigs配置类，以支持Azure OpenAI服务。  \n  \n**星期日，5月5日 - v0.2.12更新**  \n- 在ModelCatalog中上线了[\"bling-phi-3\"](https:\u002F\u002Fhuggingface.co\u002Fllmware\u002Fbling-phi-3)和[\"bling-phi-3-gguf\"](https:\u002F\u002Fhuggingface.co\u002Fllmware\u002Fbling-phi-3-gguf)模型——这是最新且最精准的BLING\u002FDRAGON系列模型。  \n- 新增长文档摘要方法，使用slim-summary-tool工具，[示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FPrompts\u002Fdocument_summarizer.py)  \n- 新增Office（PowerPoint、Word、Excel）样本文件，[示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FParsing\u002Fparsing_microsoft_ir_docs.py)  \n- 增加了对Python 3.12的支持。  \n- 废弃了faiss库，并在快速入门示例中用“无需安装”的chromadb替代。  \n- 重构了Datasets、Graph和Web Services类。  \n- 将WhisperCPP语音解析功能整合进Library库中。  \n  \n**星期一，4月29日 - v0.2.11更新**  \n- 更新了Phi-3和Llama-3的gguf库。  \n- 在ModelCatalog中新增了Phi-3[示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FModels\u002Fusing-microsoft-phi-3.py)和Llama-3[示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FModels\u002Fusing-llama-3.py)，以及量化版本。  \n- 集成了WhisperCPP模型类和预编译的共享库——[入门示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FModels\u002Fusing-whisper-cpp-getting-started.py)。  \n- 新增用于测试的语音样本文件——[示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FModels\u002Fusing-whisper-cpp-sample-files.py)。  \n- 改进了Windows系统上的CUDA检测功能，并增加了对旧版Mac OS的安全检查。  \n  \n**星期一，4月22日 - v0.2.10更新**  \n- 更新了Agent类，支持使用自然语言查询PostgreSQL中的自定义表——[示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FUse_Cases\u002Fagent_with_custom_tables.py)。  \n- 实现了新的Agent API端点，结合LLMWare推理服务器和全新的Agent功能——[示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FSLIM-Agents\u002Fagent_api_endpoint.py)。  \n  \n**星期二，4月16日 - v0.2.9更新**  \n- 新增CustomTable类，可与基于LLM的工作流结合，快速创建自定义数据库表。  \n- 优化了将CSV和JSON\u002FJSONL文件转换为数据库表的方法。  \n- 参见新示例：[创建自定义表示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FStructured_Tables\u002Fcreate_custom_table-1.py)。  \n  \n**星期二，4月9日 - v0.2.8更新**  \n- Office解析器（Word Docx、PowerPoint PPTX和Excel XLSX）进行了多项改进——新增库和Python方法。  \n- 包括：多处修复、改进的文本分块控制、页眉文本提取及配置选项。  \n- 总体而言，新的Office解析器选项与新的PDF解析器选项保持一致。  \n- 请参阅[Office解析配置示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FParsing\u002Foffice_parser_new_configs.py)。  \n  \n**星期三，4月3日 - v0.2.7更新**  \n- PDF解析器进行了多项改进——新增库和Python方法。  \n- 包括：针对欧洲语言的UTF-8编码支持。  \n- 包括：更完善的文本分块控制、页眉文本提取及配置选项。  \n- 更多细节请参阅[PDF解析配置示例](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Ftree\u002Fmain\u002Fexamples\u002FParsing\u002Fpdf_parser_new_configs.py)。  \n- 注意：将逐步停止对aarch64-linux平台的支持（后续将使用v0.2.6版本的解析器）。今后将全面支持x86_64架构且配备CUDA的Linux Ubuntu 20及以上版本。  \n  \n**星期五，3月22日 - v0.2.6更新**  \n- 新增SLIM模型：summary、extract、xsum、boolean、tags-3b以及combo sentiment-ner。  \n- 新增logit和采样分析功能。  \n- 提供了展示如何使用这些新模型的新示例。  \n  \n**星期四，3月14日 - v0.2.5更新**  \n- 改进了CUDA环境下（Windows和Linux）对GGUF的支持，提供了新的预编译二进制文件和异常处理机制。  \n- 增强了模型配置选项（采样、温度、top logit捕获等）。  \n- 对Ubuntu 20及以上版本提供了完整的向后兼容支持，包括解析器和GGUF引擎。  \n- 增加了对Anthropic Claude 3模型的支持。  \n- 新增检索方法：document_lookup和aggregate_text。  \n- 新增模型：bling-stablelm-3b-tool——一款快速、精准的3B量化问答模型，是我们近期的最爱之一。  \n  \n**星期三，2月28日 - v0.2.4更新**  \n- 对GGUF生成式模型类进行了重大升级——支持Stable-LM-3B、CUDA构建选项，并进一步优化了采样策略的控制。  \n- 注意：从v0.2.4版本开始，将随构建打包新的GGUF llama.cpp预编译库。  \n- 改进了对HF嵌入模型的GPU支持。  \n  \n**星期五，2月16日 - v0.2.3更新**  \n- 向ModelCatalog中添加了10余种嵌入模型——nomic、jina、bge、gte、ember和uae-large。  \n- 更新了OpenAI支持至1.0及以上版本，并新增text-3嵌入模型。  \n- SLIM模型的密钥和输出值现在可在ModelCatalog中访问。  \n- 将编码方式更新为‘utf-8-sig’，以更好地处理带有BOM标记的txt\u002Fcsv文件。  \n  \n**最新更新——2024年1月19日——llmware v0.2.0**  \n- 新增了数据库集成选项——Postgres和SQLite。  \n- 改进了并行解析时的状态更新和解析事件日志记录功能。  \n- 大幅增强了Embedding数据库与Text集合数据库之间的交互能力。  \n- 改进了动态模块加载过程中的错误异常处理机制。  \n  \n**最新更新——2024年1月15日：llmware v0.1.15**  \n- 优化了双通道检索查询。  \n- 扩展了配置对象和端点资源的选项。  \n  \n**最新更新——2023年12月30日：llmware v0.1.14**  \n- 增加了对Open Chat推理服务器的支持（兼容OpenAI API）。  \n- 提升了多种嵌入模型和向量数据库配置的能力。  \n- 新增了PGVector和Redis向量数据库的docker-compose安装脚本。  \n- 将“bling-tiny-llama”加入模型目录。  \n  \n**最新更新——2023年12月22日：llmware v0.1.13**  \n- 新增了3个向量数据库——Postgres（PG Vector）、Redis和Qdrant。  \n- 改进了将Sentence Transformers直接集成到模型目录中的支持。  \n- 优化了模型目录的属性设置。  \n- 在模型与嵌入、向量数据库等领域新增了多个示例，涵盖GGUF、向量数据库和模型目录等内容。\n\n- **2023年12月17日：llmware v0.1.12**\n  - dragon-deci-7b 被添加到目录中——基于 Deci 的高性能新 7B 模型基础进行 RAG 微调的模型\n  - 新增 GGUFGenerativeModel 类，便于集成 GGUF 模型\n  - 为 Mac M1、Mac x86、Linux x86 和 Windows 添加预构建的 llama_cpp \u002F ctransformer 共享库\n  - 3 个 DRAGON 模型被打包为 Q4_K_M GGUF 格式，适用于 CPU 笔记本电脑使用（dragon-mistral-7b、dragon-llama-7b、dragon-yi-6b）\n  - 4 个领先的开源聊天模型以 Q4_K_M 格式被添加到默认目录中\n\n- **2023年12月8日：llmware v0.1.11**\n  - 新增针对大规模文档摄取和 Milvus 嵌入的快速入门示例。\n  - 新增 LLMWare“弹出式”推理服务器模型类及示例脚本。\n  - 新增用于 RAG 的发票处理示例。\n  - 改进了 Windows 堆栈管理，以支持解析更大文档。\n  - 增强了 PDF 和 Office 解析器的调试日志输出模式选项。\n\n- **2023年11月30日：llmware v0.1.10**\n  - Windows 被添加为受支持的操作系统。\n  - 进一步优化了原生代码中的堆栈管理。\n  - 修复了一些小缺陷。\n\n- **2023年11月24日：llmware v0.1.9**\n  - Markdown (.md) 文件现在会被解析并视为文本文件。\n  - 对 PDF 和 Office 解析器堆栈进行了优化，应可避免设置 ulimit -s 的需求。\n  - 新增 llmware_models_fast_start.py 示例，允许发现和选择所有 llmware HuggingFace 模型。\n  - 现在仓库中包含了原生依赖项（共享库和依赖），以方便本地开发。\n  - 更新了 Status 类，以支持 PDF 和 Office 文档解析状态更新。\n  - 修复了一些小缺陷，包括库导出中的图像块处理问题。\n\n- **2023年11月17日：llmware v0.1.8**\n  - 通过允许每个模型指定尾随空格参数，提升了生成性能。\n  - 改进了 llama2 和 mistral 的 eos_token_id 处理。\n  - 加强了对 Hugging Face 动态加载的支持。\n  - 新增了使用 llmware DRAGON 模型的示例。\n\n- **2023年11月14日：llmware v0.1.7**\n  - 切换到 Python Wheel 包格式，以便在 PyPi 上分发，从而在所有支持平台上无缝安装原生依赖项。\n  - ModelCatalog 的增强：\n    - OpenAI 更新，加入了新发布的“turbo”4 和 3.5 模型。\n    - Cohere embedding v3 更新，加入了新的 Cohere 嵌入模型。\n    - BLING 模型作为开箱即用的注册选项出现在目录中。它们可以像其他任何模型一样实例化，即使没有“hf=True”标志。\n    - 可以使用 ModelCatalog 中的 register 方法，在现有模型类别中注册新的模型名称。\n  - Prompt 的增强：\n    - 在 prompt_main 输出字典中添加了“evidence_metadata”，使 prompt_main 的响应无需修改即可直接接入证据和事实核查步骤。\n    - 现在可以在 prompt.load_model(model_name, api_key = “[my-api-key]”) 中直接传递 API 密钥。\n  - LLMWareInference Server — 初始发布：\n    - 新增 LLMWareModel 类，这是一个基于自定义 HF 风格 API 的模型封装类。\n    - LLMWareInferenceServer 是一个新的类，可以在远程（GPU）服务器上实例化，创建一个可集成到任何 Prompt 工作流中的测试 API 服务器。\n\n- **2023年11月3日：llmware v0.1.6**\n  - 更新了打包方式，要求使用 mongo-c-driver 1.24.4，以暂时绕过 mongo-c-driver 1.25 的段错误问题。\n  - 为未来支持 Windows 做好准备，对 Python 代码进行了相应更新。\n\n- **2023年10月27日：llmware v0.1.5**\n  - 四个新的示例脚本专注于小型微调指令模型的 RAG 工作流，这些模型可在笔记本电脑上运行（`llmware` [BLING](https:\u002F\u002Fhuggingface.co\u002Fllmware) 模型）。\n  - 扩展了在 prompt 类中设置温度的选项。\n  - 改进了 Hugging Face 模型生成后的后处理。\n  - 简化了将 Hugging Face 生成模型加载到 prompt 中的过程。\n  - 初始发布了中央 status 类：提供一致接口供调用者读写嵌入状态。\n  - 增强了内存中字典搜索对多键查询的支持。\n  - 移除了人机交互包装中的尾随空格，以提升部分微调模型的生成质量。\n  - 修复了一些小缺陷，更新了测试脚本，并升级了 Werkzeug 版本，以解决[依赖性安全警报](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fsecurity\u002Fdependabot\u002F2)。\n\n- **2023年10月20日：llmware v0.1.4**\n  - 支持 Hugging Face 模型的 GPU 加速。\n  - 修复了一些缺陷，并增加了测试脚本。\n\n- **2023年10月13日：llmware v0.1.3**\n  - 支持 MongoDB Atlas 向量搜索。\n  - 支持使用 MongoDB 连接字符串进行身份验证。\n  - 提供文档摘要方法。\n  - 改进了自动捕捉模型上下文窗口以及传递预期输出长度变化的功能。\n  - 提供按名称查找的数据集卡片和描述。\n  - 在模型推理使用字典中添加了处理时间。\n  - 增加了测试脚本、示例和缺陷修复。\n\n- **2023年10月6日：llmware v0.1.1**\n  - 将测试脚本添加到 GitHub 仓库中，用于回归测试。\n  - 修复了一些小缺陷，并更新了 Pillow 版本，以解决[依赖性安全警报](https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fsecurity\u002Fdependabot\u002F1)。\n\n- **2023年10月2日：llmware v0.1.0** 🔥 llmware 开源首发！！🔥\n\n\n\u003C\u002Fdetails>\n\u003Cp align=\"centre\">\n  \u003Ca href=\"#top\">⬆️ 返回顶部\u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\n## 🤓 阅读我们的白皮书\n\n\n- **革新 AI 部署：借助 Intel 的 AI PC 和 LLMWare 的 Model HQ 实现 AI 加速** [AI PC Model HQ.pdf](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Ffiles\u002F18024139\u002FAI.PC.Model.HQ.pdf)\n- **革新 AI 部署（Intel 摘要版）**  [LNL White paper (Abstract Version) final.pdf](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Ffiles\u002F18281644\u002FLNL.White.paper.Abstract.Version.final.pdf)\n\n- **利用 AI PC 加速 AI 驱动的生产力** [Laptop.Performance.WP.Final (10).pdf](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Ffiles\u002F18024294\u002FLaptop.Performance.WP.Final.10.pdf)\n\n## 英特尔联合解决方案\n\n- **Arrow Lake** \n[IPA.Optimization.Summary.LLMWare (1).pdf](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Ffiles\u002F18292873\u002FIPA.Optimization.Summary.LLMWare.1.pdf)\n\n## 关于 Model HQ\n  - **隐私政策** [AI.BLOKS.PRIVACY.POLICY.1.3.25.pdf](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Ffiles\u002F19289355\u002FAI.BLOKS.PRIVACY.POLICY.1.3.25.pdf)\n\n- **服务条款** [AI.Bloks.Terms.of.Service.3.3.25.pdf](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Ffiles\u002F19289545\u002FAI.Bloks.Terms.of.Service.3.3.25.pdf)\n\n- **可接受使用政策**[AI BLOKS LLC 的 Model HQ 可接受使用政策.docx](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Ffiles\u002F18291481\u002FAcceptable.Use.Policy.for.Model.HQ.by.AI.BLOKS.LLC.docx)","# llmware 快速上手指南\n\n`llmware` 是一个用于构建基于知识的本地、私有且安全的 LLM 应用程序的统一框架。它专为 AI PC、笔记本电脑及边缘设备优化，支持在 Windows、Mac 和 Linux 上运行，能够轻松集成 RAG（检索增强生成）流程与企业级模型。\n\n## 1. 环境准备\n\n### 系统要求\n- **操作系统**: Windows, macOS, 或 Linux\n- **Python 版本**: 3.10, 3.11, 3.12, 3.13 或 3.14\n- **硬件建议**: \n  - 本地推理推荐使用带有 GPU 或 NPU 的设备以获得最佳性能。\n  - 支持多种推理后端：GGUF, OpenVINO, ONNXRuntime, PyTorch 等。\n\n### 前置依赖\n确保已安装 Python 和 pip。建议在虚拟环境中进行安装以避免依赖冲突：\n\n```bash\npython -m venv llmware-env\nsource llmware-env\u002Fbin\u002Factivate  # Linux\u002FmacOS\n# 或\nllmware-env\\Scripts\\activate     # Windows\n```\n\n## 2. 安装步骤\n\n通过 PyPI 直接安装最新稳定版：\n\n```bash\npip install llmware\n```\n\n> **提示**：如果下载速度较慢，可使用国内镜像源加速安装：\n> ```bash\n> pip install llmware -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n安装完成后，验证安装：\n\n```bash\npython -c \"import llmware; print(llmware.__version__)\"\n```\n\n## 3. 基本使用\n\n以下是最简单的“你好世界”示例，展示如何加载模型并进行推理。\n\n### 步骤 1: 加载模型并执行推理\n\n```python\nfrom llmware.models import ModelCatalog\n\n# 从模型目录加载一个量化优化的本地模型\nmy_model = ModelCatalog().load_model(\"llmware\u002Fbling-phi-3-gguf\")\n\n# 执行简单推理\noutput = my_model.inference(\"what is the future of AI?\")\nprint(output)\n```\n\n### 步骤 2: 创建知识库并添加文档\n\n```python\nfrom llmware.library import Library\n\n# 创建一个新的知识库容器\nlib = Library().create_new_library(\"my_first_library\")\n\n# 添加本地文件夹中的文档（支持 PDF, DOCX, TXT, CSV 等多种格式）\nlib.add_files(\"\u002Fpath\u002Fto\u002Fyour\u002Fdocuments\")\n\n# 安装嵌入模型以启用语义搜索\nlib.install_new_embedding(embedding_model_name=\"mini-lm-sbert\", vector_db=\"milvus\", batch_size=500)\n```\n\n### 步骤 3: 查询知识库并结合 Prompt 生成回答\n\n```python\nfrom llmware.retrieval import Query\nfrom llmware.prompts import Prompt\n\n# 加载知识库\nlib = Library().load_library(\"my_first_library\")\n\n# 创建查询对象并执行语义搜索\nresults = Query(lib).semantic_query(\"总结文档主要内容\", result_count=5)\n\n# 初始化 Prompt 并加载模型\nprompter = Prompt().load_model(\"llmware\u002Fbling-tiny-llama-v0\")\n\n# 将查询结果作为上下文加入 Prompt\nprompter.add_source_query_results(results)\n\n# 生成最终回答\nresponse = prompter.prompt_with_source(\"请根据文档内容回答问题\")\nprint(response)\n```\n\n---\n\n现在你已经完成了 `llmware` 的基础配置与首次运行。你可以进一步探索其丰富的模型目录（300+ 模型）和高级 RAG 功能，构建属于自己的本地知识应用。","某中型律所的合规团队需要在完全离线的笔记本电脑上，快速构建一个能精准检索并总结数百份内部合同与判例文档的智能问答系统。\n\n### 没有 llmware 时\n- **环境配置繁琐**：开发人员需手动适配不同操作系统（Windows\u002FMac\u002FLinux）下的 GPU、NPU 驱动及推理后端（如 GGUF、ONNX），耗时数天仍常遇兼容性问题。\n- **模型选型困难**：缺乏统一入口，难以从海量开源模型中筛选出适合法律垂直领域且经过量化优化的轻量级模型，导致尝试成本高。\n- **数据处理割裂**：文档解析（PDF\u002FWord）、文本分块、向量化嵌入需拼接多个独立库，代码冗余且容易在数据流转中丢失上下文信息。\n- **隐私安全风险**：为满足数据不出域的要求，不得不放弃效果更好的云端大模型，而本地部署方案往往因资源占用过大无法在普通办公本上运行。\n\n### 使用 llmware 后\n- **一键跨平台部署**：llmware 自动识别硬件环境并调用最优推理引擎（如 Windows 下的 DirectML 或 Mac 的 Metal），将环境搭建时间从几天缩短至几分钟。\n- **专属模型直达**：直接通过模型目录加载预置的\"Industry-Bert\"或\"Bling\"系列法律微调模型，这些模型专为小显存优化，既精准又节省资源。\n- **流水线式知识构建**：调用内置 Library 组件，一行代码即可完成多格式合同文档的解析、分块与索引，自动生成结构化知识库供 RAG 管道使用。\n- **纯本地安全运行**：整个检索增强生成（RAG）流程均在本地笔记本完成，无需联网即可实现企业级的数据隐私保护与低延迟响应。\n\nllmware 让开发者能在普通办公电脑上，以极低的算力成本快速落地私有化、高精度的企业级知识应用。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fllmware-ai_llmware_ab840bdb.png","llmware-ai","llmware.ai","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fllmware-ai_e7d87628.png","Providing enterprise-grade LLM-based development framework, tools, and fine-tuned models.",null,"https:\u002F\u002Fwww.llmware.ai\u002F","https:\u002F\u002Fgithub.com\u002Fllmware-ai",[83,87,91],{"name":84,"color":85,"percentage":86},"Python","#3572A5",99.6,{"name":88,"color":89,"percentage":90},"Shell","#89e051",0.4,{"name":92,"color":93,"percentage":94},"Dockerfile","#384d54",0,14861,2949,"2026-04-03T20:13:52","Apache-2.0","Windows, macOS, Linux","非必需。支持利用本地 GPU 和 NPU 加速，适配多种推理后端（GGUF, OpenVINO, ONNXRuntime, PyTorch 等），专为 AI PC 和笔记本电脑优化，未指定具体显卡型号或显存要求。","未说明（设计目标为最小化计算足迹，可在普通笔记本电脑上运行）",{"notes":103,"python":104,"dependencies":105},"该工具专为本地、私有和安全部署设计，支持在边缘设备和自托管环境中运行。内置 300+ 模型目录（含 50+ 针对 RAG 优化的微调模型），支持多种量化格式以适配不同硬件。无需高端服务器，普通笔记本电脑即可运行绝大多数示例和模型。支持多种向量数据库（如 Milvus, ChromaDB）和文档解析格式。","3.10, 3.11, 3.12, 3.13, 3.14",[67,106,107,108,109,110,111,112],"PyTorch","GGUF","OpenVINO","ONNXRuntime","Sentence Transformers","Milvus","ChromaDB",[54,13,15,26],[115,116,117,118,119,120,121,122,123],"parsing","retrieval-augmented-generation","agents","generative-ai-tools","llamacpp","llm","small-specialized-models","onnx","openvino",7,"2026-03-27T02:49:30.150509","2026-04-06T05:35:34.856570",[128,133,138,143,147,152],{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},14106,"在 Ubuntu 20+ 系统上找不到 PDF 解析器或遇到 GLIBC 依赖错误怎么办？","请克隆最新的仓库代码，维护者已重新构建了对应 Ubuntu 20+ 支持的解析器二进制文件和 GGUF 库。关键依赖是 GLIBC 版本需为 2.31 或更高。如果您使用 CUDA，请确保驱动程序版本为 12.1 或更高。如果更新后仍有问题，请提交新的工单。","https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fissues\u002F444",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},14107,"如何在 modelCatalog 中使用最新的 OpenAI (o1, o3-mini) 和 Claude 模型？","维护者已将以下新模型添加到 modelCatalog 中：\n- OpenAI: o1, o1-pro, o3-mini\n- Claude: claude-3-haiku-20240307, claude-3-5-haiku-20241022, claude-3-5-sonnet-20240620, claude-3-7-sonnet-20250219\n请注意，Gemini 模型将在稍后添加。您可以直接通过模型名称调用这些新模型。","https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fissues\u002F1170",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},14108,"在 Windows 上运行 SLIM Models 时遇到 OSError [WinError -1073741795] 错误如何解决？","该问题通常与 CUDA 驱动和 GGUF 库的兼容性有关。维护者已合并了针对 CUDA 12.1 构建的新版 libllama_cuda_win.dll 二进制文件。请拉取最新代码并尝试使用 CUDA 12.1 驱动程序。如果使用的是 CUDA 12.4，也可以尝试，但若失败请回退到 12.1 版本。","https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fissues\u002F490",{"id":144,"question_zh":145,"answer_zh":146,"source_url":142},14109,"Windows 上的 GGUF 模型为什么只使用 CPU 而不使用 GPU 加速？","这通常是因为缺少适配的 CUDA 二进制文件或驱动版本不匹配。请确保您已拉取包含最新 win-cuda gguf 库的代码，并使用 CUDA 12.1 驱动程序（已验证支持良好）。虽然 12.2-12.4 版本理论上支持，但建议优先测试 12.1 以确保稳定性。",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},14110,"如何参与 llmware 项目的文档（Docstrings）贡献？","项目目前正在进行文档完善工作。如果您想贡献，建议先选择一个具体的模块（module），为该模块中的所有类添加类文档字符串（Class docstrings），然后提交 Pull Request。避免一次性提交过大的 PR，以便审查。在开始工作前，最好在议题中先说明您计划处理的模块。","https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fissues\u002F219",{"id":153,"question_zh":154,"answer_zh":155,"source_url":156},14111,"llmware 是否支持 Neo4j 作为向量数据库？","项目计划扩展向量数据库支持以包含 Neo4j。实现范围包括更新 LLMWareConfig 和 EmbeddingHandler，创建专用于 Neo4j 的 Embedding 类，并提供测试脚本和示例代码。开发者可以参考 embeddings.py 中现有的 Milvus 和 Pinecone 实现作为指南进行开发或等待官方更新。","https:\u002F\u002Fgithub.com\u002Fllmware-ai\u002Fllmware\u002Fissues\u002F22",[158,162],{"id":159,"version":160,"summary_zh":79,"released_at":161},80823,"v0.4.5","2026-02-21T21:21:03",{"id":163,"version":164,"summary_zh":79,"released_at":165},80824,"v0.4.4","2026-02-11T17:59:06"]