[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-featureform--featureform":3,"tool-featureform--featureform":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",150720,2,"2026-04-11T11:33:10",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":64,"owner_name":72,"owner_avatar_url":73,"owner_bio":74,"owner_company":75,"owner_location":75,"owner_email":75,"owner_twitter":76,"owner_website":77,"owner_url":78,"languages":79,"stars":118,"forks":119,"last_commit_at":120,"license":121,"difficulty_score":122,"env_os":123,"env_gpu":124,"env_ram":124,"env_deps":125,"category_tags":129,"github_topics":131,"view_count":32,"oss_zip_url":75,"oss_zip_packed_at":75,"status":17,"created_at":144,"updated_at":145,"faqs":146,"releases":167},6591,"featureform\u002Ffeatureform","featureform","The Virtual Feature Store. Turn your existing data infrastructure into a feature store.","Featureform 是一款“虚拟特征存储”工具，旨在帮助数据科学团队在不替换现有数据设施的前提下，高效管理机器学习模型所需的特征。它就像一层智能调度系统，覆盖在您已有的数据库、数据仓库或流处理平台之上，将其统一转化为功能完备的特征存储中心。\n\n在实际工作中，Featureform 主要解决了团队协作难、实验管理乱以及生产部署复杂等痛点。它通过标准化的定义方式，让特征转换、标签和训练集变得可共享、可复用，彻底告别了混乱的临时脚本。同时，它能自动协调底层异构基础设施，处理重试逻辑与分布式系统问题，确保特征从开发到上线的可靠性与一致性。此外，内置的权限控制和审计日志也能帮助团队轻松满足合规要求。\n\n这款工具特别适合拥有多样化数据架构的数据科学家、机器学习工程师以及需要规范化 ML 流程的研发团队。其最大的技术亮点在于“虚拟化”理念：无需迁移数据或重构架构，只需通过代码定义，即可让现有的数据基础设施具备专业特征存储的能力，既降低了成本，又提升了灵活性。","\u003Ch1 align=\"center\">\n\t\u003Cimg width=\"300\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffeatureform_featureform_readme_6766aa771c9c.png\" alt=\"featureform\">\n\t\u003Cbr>\n\u003C\u002Fh1>\n\n\u003Cdiv align=\"center\">\n\t\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Factions\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Ffeatureform-workflow-blue?style=for-the-badge&logo=appveyor\" alt=\"Embedding Store workflow\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fjoin.slack.com\u002Ft\u002Ffeatureform-community\u002Fshared_invite\u002Fzt-xhqp2m4i-JOCaN1vRN2NDXSVif10aQg\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FJoin-Slack-blue?style=for-the-badge&logo=appveyor\" alt=\"Featureform Slack\">\u003C\u002Fa>\n    \u003Cbr>\n    \u003Ca href=\"https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-%203.7|3.8|3.9|3.10-brightgreen.svg\" alt=\"Python supported\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Ffeatureform\u002F\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Ffeatureform.svg\" alt=\"PyPi Version\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fwww.featureform.com\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fwebsite?url=https%3A%2F%2Fwww.featureform.com%2F?style=for-the-badge&logo=appveyor\" alt=\"Featureform Website\">\u003C\u002Fa>  \n    \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002FfeatureformML\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Furl\u002Fhttp\u002Fshields.io.svg?style=social\" alt=\"Twitter\">\u003C\u002Fa>\n\n\n\t\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n    \u003Ch3 align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Fwww.featureform.com\u002F\">Website\u003C\u002Fa>\n        \u003Cspan> | \u003C\u002Fspan>\n        \u003Ca href=\"https:\u002F\u002Fdocs.featureform.com\u002F\">Docs\u003C\u002Fa>\n        \u003Cspan> | \u003C\u002Fspan>\n        \u003C!-- \u003Ca href=\"https:\u002F\u002Fapidocs.featureform.com\u002F\">API Docs\u003C\u002Fa>\n        \u003Cspan> | \u003C\u002Fspan> -->\n        \u003Ca href=\"https:\u002F\u002Fjoin.slack.com\u002Ft\u002Ffeatureform-community\u002Fshared_invite\u002Fzt-xhqp2m4i-JOCaN1vRN2NDXSVif10aQg\">Community forum\u003C\u002Fa>\n    \u003C\u002Fh3>\n\u003C\u002Fdiv>\n\n\n# What is Featureform?\n\n\n[Featureform](https:\u002F\u002Ffeatureform.com) is a virtual feature store. It enables data scientists to define, manage, and serve their ML model's features. Featureform sits atop your existing infrastructure and orchestrates it to work like a traditional feature store.\nBy using Featureform, a data science team can solve the following organizational problems:\n\n* **Enhance Collaboration** Featureform ensures that transformations, features, labels, and training sets are defined in a standardized form, so they can easily be shared, re-used, and understood across the team.\n* **Organize Experimentation** The days of untitled_128.ipynb are over. Transformations, features, and training sets can be pushed from notebooks to a centralized feature repository with metadata like name, variant, lineage, and owner.\n* **Facilitate Deployment** Once a feature is ready to be deployed, Featureform will orchestrate your data infrastructure to make it ready in production. Using the Featureform API, you won't have to worry about the idiosyncrasies of your heterogeneous infrastructure (beyond their transformation language).\n* **Increase Reliability** Featureform enforces that all features, labels, and training sets are immutable. This allows them to safely be re-used among data scientists without worrying about logic changing. Furthermore, Featureform's orchestrator will handle retry logic and attempt to resolve other common distributed system problems automatically.\n* **Preserve Compliance** With built-in role-based access control, audit logs, and dynamic serving rules, your compliance logic can be enforced directly by Featureform.\n\n### Further Reading\n* [Feature Stores Explained: The Three Common Architectures](https:\u002F\u002Fwww.featureform.com\u002Fpost\u002Ffeature-stores-explained-the-three-common-architectures)\n\n\n\u003Cbr \u002F>\n\u003Cbr \u002F>\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffeatureform_featureform_readme_97926ebb2137.png\" alt=\"A virtual feature store's architecture\" style=\"width:50em\"\u002F>\n\n\u003Cbr \u002F>\n\u003Cbr \u002F>\n\n# Why is Featureform unique?\n**Use your existing data infrastructure.** Featureform does not replace your existing infrastructure. Rather, Featureform transforms your existing infrastructure into a feature store. In being infrastructure-agnostic, teams can pick the right data infrastructure to solve their processing problems, while Featureform provides a feature store abstraction above it. Featureform orchestrates and manages transformations rather than actually computing them. The computations are offloaded to the organization's existing data infrastructure. In this way, Featureform is more akin to a framework and workflow, than an additional piece of data infrastructure.\n\n**Designed for both single data scientists and large enterprise teams** Whether you're a single data scientist or a part of a large enterprise organization, Featureform allows you to document and push your transformations, features, and training sets definitions to a centralized repository. It works everywhere from a laptop to a large heterogeneous cloud deployment.\n* _A single data scientist working locally_: The days of untitled_128.ipynb, df_final_final_7, and hundreds of undocumented versions of datasets. A data scientist working in a notebook can push transformation, feature, and training set definitions to a centralized, local repository.\n* _A single data scientist with a production deployment_: Register your PySpark transformations and let Featureform orchestrate your data infrastructure from Spark to Redis, and monitor both the infrastructure and the data.\n* _A data science team_: Share, re-use, and learn from each other's transformations, features, and training sets. Featureform standardizes how machine learning resources are defined and provides an interface for search and discovery. It also maintains a history of changes, allows for different variants of features, and enforces immutability to resolve the most common cases of failure when sharing resources.\n* _A data science organization_: An enterprise will have a variety of different rules around access control of their data and features. The rules may be based on the data scientist’s role, the model’s category, or dynamically based on a user’s input data (i.e. they are in Europe and subject to GDPR). All of these rules can be specified, and Featureform will enforce them. Data scientists can be sure to comply with the organization’s governance rules without modifying their workflow.\n\n**Native embeddings support** Featureform was built from the ground up with embeddings in mind. It supports vector databases as both inference and training stores. Transformer models can be used as transformations, so that embedding tables can be versioned and reliably regenerated. We even created and open-sourced a popular vector database, Emeddinghub.\n\n**Open-source** Featureform is free to use under the [Mozilla Public License 2.0](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fblob\u002Fmain\u002FLICENSE).\n\n\u003Cbr \u002F>\n\n# The Featureform Abstraction\n\n\u003Cbr \u002F>\n\u003Cbr \u002F>\n\n\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Ffeatureform\u002Ffeatureform\u002Fmain\u002Fassets\u002Fcomponents.svg\" alt=\"The components of a feature\" style=\"width:50em\"\u002F>\n\n\u003Cbr \u002F>\n\u003Cbr \u002F>\n\nIn reality, the feature’s definition is split across different pieces of infrastructure: the data source, the transformations, the inference store, the training store, and all their underlying data infrastructure. However, a data scientist will think of a feature in its logical form, something like: “a user’s average purchase price”. Featureform allows data scientists to define features in their logical form through transformations, providers, labels, and training set resources. Featureform will then orchestrate the actual underlying components to achieve the data scientists' desired state.\n\n# How to use Featureform\nFeatureform can be run locally on files or in Kubernetes with your existing infrastructure.\n## Kubernetes\n\nFeatureform on Kubernetes can be used to connect to your existing cloud infrastructure and can also be run \nlocally on Minikube. \n\nTo check out how to run it in the cloud,\nfollow our [Kubernetes deployment](https:\u002F\u002Fdocs.featureform.com\u002Fdeployment\u002Fkubernetes).\n\nTo try Featureform in a single docker container, follow our [docker quickstart guide](https:\u002F\u002Fdocs.featureform.com\u002Fdeployment\u002Fquickstart-docker)\n\n\n\u003Cbr \u002F>\n\u003Cbr \u002F>\n\n# Contributing\n\n* To contribute to Featureform, please check out [Contribution docs](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fblob\u002Fmain\u002FCONTRIBUTING.md).\n* Welcome to our community, join us on [Slack](https:\u002F\u002Fjoin.slack.com\u002Ft\u002Ffeatureform-community\u002Fshared_invite\u002Fzt-xhqp2m4i-JOCaN1vRN2NDXSVif10aQg).\n\n\u003Cbr \u002F>\n\n\n# Report Issues\n\nPlease help us by [reporting any issues](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fissues\u002Fnew\u002Fchoose) you may have while using Featureform.\n\n\u003Cbr \u002F>\n\n# License\n\n* [Mozilla Public License Version 2.0](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fblob\u002Fmain\u002FLICENSE)\n","\u003Ch1 align=\"center\">\n\t\u003Cimg width=\"300\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffeatureform_featureform_readme_6766aa771c9c.png\" alt=\"featureform\">\n\t\u003Cbr>\n\u003C\u002Fh1>\n\n\u003Cdiv align=\"center\">\n\t\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Factions\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Ffeatureform-workflow-blue?style=for-the-badge&logo=appveyor\" alt=\"Embedding Store workflow\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fjoin.slack.com\u002Ft\u002Ffeatureform-community\u002Fshared_invite\u002Fzt-xhqp2m4i-JOCaN1vRN2NDXSVif10aQg\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FJoin-Slack-blue?style=for-the-badge&logo=appveyor\" alt=\"Featureform Slack\">\u003C\u002Fa>\n    \u003Cbr>\n    \u003Ca href=\"https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-%203.7|3.8|3.9|3.10-brightgreen.svg\" alt=\"Python supported\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Ffeatureform\u002F\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Ffeatureform.svg\" alt=\"PyPi Version\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fwww.featureform.com\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fwebsite?url=https%3A%2F%2Fwww.featureform.com%2F?style=for-the-badge&logo=appveyor\" alt=\"Featureform Website\">\u003C\u002Fa>  \n    \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002FfeatureformML\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Furl\u002Fhttp\u002Fshields.io.svg?style=social\" alt=\"Twitter\">\u003C\u002Fa>\n\n\n\t\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n    \u003Ch3 align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Fwww.featureform.com\u002F\">Website\u003C\u002Fa>\n        \u003Cspan> | \u003C\u002Fspan>\n        \u003Ca href=\"https:\u002F\u002Fdocs.featureform.com\u002F\">Docs\u003C\u002Fa>\n        \u003Cspan> | \u003C\u002Fspan>\n        \u003C!-- \u003Ca href=\"https:\u002F\u002Fapidocs.featureform.com\u002F\">API Docs\u003C\u002Fa>\n        \u003Cspan> | \u003C\u002Fspan> -->\n        \u003Ca href=\"https:\u002F\u002Fjoin.slack.com\u002Ft\u002Ffeatureform-community\u002Fshared_invite\u002Fzt-xhqp2m4i-JOCaN1vRN2NDXSVif10aQg\">Community forum\u003C\u002Fa>\n    \u003C\u002Fh3>\n\u003C\u002Fdiv>\n\n\n# 什么是 Featureform？\n\n\n[Featureform](https:\u002F\u002Ffeatureform.com) 是一个虚拟特征存储库。它使数据科学家能够定义、管理和提供其机器学习模型的特征。Featureform 构建在您现有的基础设施之上，并对其进行编排，使其像传统的特征存储库一样工作。\n通过使用 Featureform，数据科学团队可以解决以下组织问题：\n\n* **增强协作** Featureform 确保转换、特征、标签和训练集以标准化形式定义，以便在整个团队中轻松共享、重用和理解。\n* **组织实验** 无标题_128.ipynb 的时代已经过去。转换、特征和训练集可以从笔记本推送到具有名称、变体、血统和所有者等元数据的集中式特征仓库。\n* **促进部署** 一旦特征准备好部署，Featureform 将编排您的数据基础设施，使其在生产环境中就绪。使用 Featureform API，您无需担心异构基础设施的特殊性（除了它们的转换语言之外）。\n* **提高可靠性** Featureform 强制要求所有特征、标签和训练集都是不可变的。这使得它们可以在数据科学家之间安全地重用，而无需担心逻辑发生变化。此外，Featureform 的编排器会自动处理重试逻辑并尝试解决其他常见的分布式系统问题。\n* **保持合规性** 通过内置的角色基于访问控制、审计日志和动态服务规则，您的合规性逻辑可以直接由 Featureform 强制执行。\n\n### 更多阅读\n* [特征存储库解析：三种常见架构](https:\u002F\u002Fwww.featureform.com\u002Fpost\u002Ffeature-stores-explained-the-three-common-architectures)\n\n\n\u003Cbr \u002F>\n\u003Cbr \u002F>\n\n\u003Cimg src=\"https6:\u002F\u002Fraw.githubusercontent.com\u002Ffeatureform\u002Ffeatureform\u002Fmain\u002Fassets\u002Fvirtual_arch.png\" alt=\"虚拟特征存储库的架构\" style=\"width:50em\"\u002F>\n\n\u003Cbr \u002F>\n\u003Cbr \u002F>\n\n# 为什么 Featureform 独特？\n**利用您现有的数据基础设施。** Featureform 并不会取代您现有的基础设施。相反，Featureform 将您现有的基础设施转变为一个特征存储库。由于对基础设施具有通用性，团队可以选择合适的数据基础设施来解决其处理问题，而 Featureform 则在其之上提供了一个特征存储库抽象层。Featureform 编排和管理转换，而不是实际进行计算。计算任务被卸载到组织现有的数据基础设施上。因此，Featureform 更类似于一个框架和工作流，而不是额外的数据基础设施。\n\n**专为单个数据科学家和大型企业团队设计** 无论您是单个数据科学家还是大型企业组织的一员，Featureform 都可以让您将转换、特征和训练集的定义记录并推送到一个集中式仓库。它适用于从笔记本电脑到大型异构云部署的各种环境。\n* _本地工作的单个数据科学家_: 无标题_128.ipynb、df_final_final_7 和数百个未文档化的数据集版本的时代已经结束。在笔记本中工作的数据科学家可以将转换、特征和训练集的定义推送到一个集中式的本地仓库。\n* _拥有生产部署的单个数据科学家_: 注册您的 PySpark 转换，让 Featureform 编排您的从 Spark 到 Redis 的数据基础设施，并监控基础设施和数据。\n* _数据科学团队_: 共享、重用并相互学习彼此的转换、特征和训练集。Featureform 标准izes了机器学习资源的定义方式，并提供了搜索和发现的接口。它还维护变更历史，允许不同版本的特征存在，并强制实施不可变性，以解决资源共享中最常见的失败情况。\n* _数据科学组织_: 企业通常对其数据和特征的访问控制有各种不同的规则。这些规则可能基于数据科学家的角色、模型的类别，或根据用户输入数据动态决定（例如，用户位于欧洲并受 GDPR 约束）。所有这些规则都可以指定，Featureform 将强制执行它们。数据科学家可以确保遵守组织的治理规则，而无需修改其工作流程。\n\n**原生嵌入支持** Featureform 从一开始就以嵌入为目标进行构建。它支持向量数据库作为推理和训练存储。Transformer 模型可以用作转换，从而实现嵌入表的版本化和可靠再生。我们甚至创建并开源了一个流行的向量数据库 Emeddinghub。\n\n**开源** Featureform 可在 [Mozilla Public License 2.0](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fblob\u002Fmain\u002FLICENSE) 许可下免费使用。\n\n\u003Cbr \u002F>\n\n# Featureform 抽象层\n\n\u003Cbr \u002F>\n\u003Cbr \u002F>\n\n\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Ffeatureform\u002Ffeatureform\u002Fmain\u002Fassets\u002Fcomponents.svg\" alt=\"特征的组成部分\" style=\"width:50em\"\u002F>\n\n\u003Cbr \u002F>\n\u003Cbr \u002F>\n\n实际上，特征的定义分散在不同的基础设施组件中：数据源、转换逻辑、推理存储、训练存储，以及它们各自的基础数据设施。然而，数据科学家通常会以逻辑形式来思考特征，例如“用户的平均购买价格”。Featureform 允许数据科学家通过转换、提供者、标签和训练集资源，以逻辑形式定义特征。随后，Featureform 会协调底层的实际组件，以实现数据科学家所期望的状态。\n\n# 如何使用 Featureform\nFeatureform 可以在本地基于文件运行，也可以与您现有的基础设施一起部署在 Kubernetes 集群中。\n## Kubernetes\n\n在 Kubernetes 上运行 Featureform 可以连接到您现有的云基础设施，同时也可以在 Minikube 上进行本地部署。\n\n如需了解如何在云端运行，请参阅我们的 [Kubernetes 部署文档](https:\u002F\u002Fdocs.featureform.com\u002Fdeployment\u002Fkubernetes)。\n\n若想在单个 Docker 容器中试用 Featureform，请参考我们的 [Docker 快速入门指南](https:\u002F\u002Fdocs.featureform.com\u002Fdeployment\u002Fquickstart-docker)。\n\n\u003Cbr \u002F>\n\u003Cbr \u002F>\n\n# 贡献\n* 如需为 Featureform 做贡献，请查看 [贡献文档](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fblob\u002Fmain\u002FCONTRIBUTING.md)。\n* 欢迎加入我们的社区！请访问 [Slack](https:\u002F\u002Fjoin.slack.com\u002Ft\u002Ffeatureform-community\u002Fshared_invite\u002Fzt-xhqp2m4i-JOCaN1vRN2NDXSVif10aQg) 加入我们。\n\n\u003Cbr \u002F>\n\n\n# 提交问题\n如果您在使用 Featureform 时遇到任何问题，请帮助我们通过 [提交问题](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fissues\u002Fnew\u002Fchoose) 来反馈。\n\n\u003Cbr \u002F>\n\n# 许可证\n* [Mozilla 公共许可证 2.0 版](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fblob\u002Fmain\u002FLICENSE)","# Featureform 快速上手指南\n\nFeatureform 是一个**虚拟特征存储（Virtual Feature Store）**工具。它不替换你现有的数据基础设施，而是作为一层抽象，协调现有的数据源、转换逻辑和存储系统，帮助数据科学家统一地定义、管理和部署机器学习特征。\n\n## 环境准备\n\n在开始之前，请确保你的开发环境满足以下要求：\n\n*   **操作系统**：Linux, macOS 或 Windows (WSL 推荐)\n*   **Python 版本**：3.7, 3.8, 3.9 或 3.10\n*   **前置依赖**：\n    *   `pip` (Python 包管理工具)\n    *   (可选) **Docker** 或 **Kubernetes**：如果你计划使用容器化部署或连接云基础设施。\n    *   (可选) **现有数据基础设施**：如 Spark, Redis, Snowflake 等（Featureform 将协调这些工具进行实际计算）。\n\n## 安装步骤\n\n你可以直接通过 PyPI 安装 Featureform 的 Python 客户端库。\n\n```bash\npip install featureform\n```\n\n> **提示**：国内开发者若遇到下载速度慢的问题，可使用清华或阿里镜像源加速安装：\n> ```bash\n> pip install featureform -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n若你需要运行本地服务端进行测试，推荐使用 Docker 快速启动（需先安装 Docker）：\n\n```bash\ndocker run -p 8080:8080 featureform\u002Ffeatureform\n```\n\n## 基本使用\n\nFeatureform 的核心工作流是：**定义资源 -> 注册到仓库 -> 由编排器调度执行**。以下是一个最简化的本地使用示例，展示如何定义数据源、转换逻辑和训练集。\n\n### 1. 初始化客户端\n\n首先，导入库并连接到本地实例（或远程集群）。\n\n```python\nimport featureform as ff\n\n# 连接到本地运行的 Featureform 实例\nclient = ff.Client(host=\"localhost:8080\")\n```\n\n### 2. 定义数据源与转换\n\n假设我们有一个包含用户交易数据的 CSV 文件，我们需要定义一个特征：\"用户的平均购买价格\"。\n\n```python\n# 注册一个本地文件作为数据源\n@client.register_file(\n    name=\"user_transactions\",\n    version=\"1.0\",\n    path=\".\u002Fdata\u002Ftransactions.csv\",\n    description=\"Raw user transaction data\"\n)\ndef user_transactions_source():\n    pass\n\n# 定义转换逻辑：计算平均购买价格\n@client.df_transformation(\n    name=\"avg_purchase_price\",\n    version=\"1.0\",\n    inputs=[(\"user_transactions\", \"1.0\")],\n    description=\"Calculates the average purchase price per user\"\n)\ndef avg_purchase_price_df(transactions):\n    # 使用 Pandas\u002FSpark 语法进行转换\n    return transactions.groupby(\"user_id\").agg({\"price\": \"mean\"}).rename(columns={\"price\": \"avg_price\"})\n```\n\n### 3. 定义特征与训练集\n\n将上述转换结果注册为特征，并构建用于模型训练的训练集。\n\n```python\n# 注册特征\n@client.feature(\n    name=\"user_avg_price\",\n    version=\"1.0\",\n    transformation=(\"avg_purchase_price\", \"1.0\"),\n    description=\"Average price of purchases for a user\"\n)\ndef user_avg_price():\n    pass\n\n# 定义标签 (Label) - 例如：用户是否流失\n@client.label(\n    name=\"user_churn\",\n    version=\"1.0\",\n    source=(\"user_transactions\", \"1.0\"),\n    description=\"Whether the user has churned\"\n)\ndef user_churn_label():\n    pass\n\n# 创建训练集\n@client.training_set(\n    name=\"churn_prediction_dataset\",\n    version=\"1.0\",\n    features=[(\"user_avg_price\", \"1.0\")],\n    labels=[(\"user_churn\", \"1.0\")]\n)\ndef churn_training_set():\n    pass\n```\n\n### 4. 应用变更\n\n将定义的元数据推送到 Featureform 仓库。如果是本地模式，这会更新本地元数据存储；如果是生产模式，Featureform 将开始编排底层基础设施（如触发 Spark 任务）来物化这些特征。\n\n```python\n# 提交所有定义\nclient.apply()\n```\n\n### 5. 获取特征数据\n\n在模型训练或推理阶段，可以通过 API 获取生成的训练集数据。\n\n```python\n# 获取训练集数据 (返回 Pandas DataFrame 或其他格式，取决于配置)\ntraining_data = client.get_training_set(\"churn_prediction_dataset\", \"1.0\")\n\nprint(training_data.head())\n```\n\n---\n\n**下一步建议**：\n*   对于生产环境，请参考官方文档配置 **Kubernetes** 部署以连接您的云数据仓库（如 Snowflake, BigQuery）和在线存储（如 Redis）。\n*   加入社区 Slack 频道交流最佳实践。","某电商公司的数据科学团队正在构建实时反欺诈模型，需要整合分散在 PostgreSQL、Redis 和 Kafka 中的用户行为数据。\n\n### 没有 featureform 时\n- **协作混乱**：每位数据科学家在各自的 Jupyter Notebook 中重复编写特征转换逻辑，导致“特征定义不一致”，多人使用同一特征时计算结果却不同。\n- **实验难追溯**：特征代码散落在名为 `Untitled_128.ipynb` 的文件中，缺乏版本管理和元数据记录，无法复现三个月前的模型训练环境。\n- **部署成本高**：将模型从测试推送到生产时，工程师需手动重写代码以适配不同的底层存储系统，耗时数周且容易出错。\n- **数据不可靠**：特征逻辑可被随意修改，下游模型常因上游逻辑变更而失效，且缺乏自动重试机制来处理分布式系统的临时故障。\n\n### 使用 featureform 后\n- **标准化协作**：团队通过 featureform 统一定义特征、标签和训练集，所有成员共享同一套经过验证的逻辑，彻底消除歧义。\n- **实验可管理**：特征代码从 Notebook 推送至中央仓库，自动记录名称、版本、血缘关系和负责人，随时可回溯任意历史实验。\n- **无缝部署**：featureform 直接编排现有的 PostgreSQL 和 Kafka 基础设施，自动处理异构系统的差异，使生产部署时间从数周缩短至数小时。\n- **高可靠性与合规**：强制特征不可变性防止逻辑被意外篡改，内置的重试机制自动解决分布式故障，同时通过角色控制确保数据合规。\n\nfeatureform 让团队无需替换现有架构，即可将分散的数据基础设施瞬间升级为统一、可靠且易于协作的虚拟特征商店。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffeatureform_featureform_6766aa77.png","Featureform","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ffeatureform_d6ce3a91.png","We turn features into first-class component of the ML process.",null,"FeatureformML","https:\u002F\u002Ffeatureform.com","https:\u002F\u002Fgithub.com\u002Ffeatureform",[80,84,88,92,96,100,104,108,112,115],{"name":81,"color":82,"percentage":83},"Go","#00ADD8",39.3,{"name":85,"color":86,"percentage":87},"Jupyter Notebook","#DA5B0B",37.1,{"name":89,"color":90,"percentage":91},"Python","#3572A5",14.8,{"name":93,"color":94,"percentage":95},"JavaScript","#f1e05a",6.5,{"name":97,"color":98,"percentage":99},"Gherkin","#5B2063",0.6,{"name":101,"color":102,"percentage":103},"C++","#f34b7d",0.5,{"name":105,"color":106,"percentage":107},"Dockerfile","#384d54",0.4,{"name":109,"color":110,"percentage":111},"PLpgSQL","#336790",0.2,{"name":113,"color":114,"percentage":111},"Makefile","#427819",{"name":116,"color":117,"percentage":111},"Starlark","#76d275",1968,106,"2026-04-09T13:46:55","MPL-2.0",4,"Linux, macOS, Windows","未说明",{"notes":126,"python":127,"dependencies":128},"该工具是一个虚拟特征存储（Virtual Feature Store），本身不直接计算数据，而是编排现有的数据基础设施（如 Spark, Redis 等）。支持在本地文件、Kubernetes（包括 Minikube）或 Docker 容器中运行。具体资源需求取决于所连接的后端数据基础设施。","3.7, 3.8, 3.9, 3.10",[],[130,14,16],"其他",[132,133,134,135,136,137,138,139,140,141,142,143],"machine-learning","data-science","vector-database","embeddings-similarity","embeddings","hacktoberfest","feature-store","mlops","data-quality","feature-engineering","ml","python","2026-03-27T02:49:30.150509","2026-04-11T20:50:09.742651",[147,152,157,162],{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},29777,"遇到无法进行基于时间的连接或时间列名称不是 'ts' 的错误怎么办？","该问题已在版本 1.1.12 中修复。用户需要升级到此最新版本，并清除后重新创建 `.featureform` 目录，因为架构发生了轻微变化。执行命令：`rm -rf .featureform`，然后重新运行 `featureform apply`。","https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fissues\u002F423",{"id":153,"question_zh":154,"answer_zh":155,"source_url":156},29778,"从 BigQuery 向 Redis 同步超过 10 万行数据时，为什么只写入了 10 万条？","这是一个已知的分页\u002F分块处理缺陷。当数据量超过 10 万时，协调器作业会将数据分为多个 10 万的块（chunk），但系统似乎只成功写入了第一个块到 Redis，导致后续数据丢失。目前建议避免单次同步超过 10 万实体，或关注官方后续针对 coordinator job chunk size 的修复更新。","https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fissues\u002F1365",{"id":158,"question_zh":159,"answer_zh":160,"source_url":161},29779,"如何在 Featureform 中使用 DuckDB 作为离线存储（Offline Store）？","目前社区正在推进将 DuckDB 实现为离线存储提供者。虽然尚未完全合并到主分支的标准发布中，但已有贡献者在进行相关开发。用户可以关注该项目的进展，或者参考 `provider\u002Fpostgres.go` 的实现逻辑自行尝试构建自定义提供者，需实现 `offline store interface` 接口。","https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fissues\u002F1070",{"id":163,"question_zh":164,"answer_zh":165,"source_url":166},29780,"在 Spark Provider 中指定 S3 输出位置时出现 'Invalid JSON' 错误如何解决？","该错误通常发生在配置 S3 输出路径时，系统期望接收 JSON 格式的字符串但实际输入不符合规范。请检查 `register_s3` 和 `register_spark` 的配置代码，确保 `outputLocation` 等参数字符串格式正确，没有被错误地拼接或转义。如果使用的是自定义 Docker 镜像，请确认 `pyspark` 版本与 Featureform 版本兼容（如示例中的 3.4.0）。","https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fissues\u002F1576",[168,173,178,183,188,193,198,203,208,213,218,223,228,233,238,243,248,253,258,263],{"id":169,"version":170,"summary_zh":171,"released_at":172},206330,"v0.12.1","**完整变更日志**: https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fcompare\u002Fv0.12.0...v0.12.1","2024-02-13T00:44:53",{"id":174,"version":175,"summary_zh":176,"released_at":177},206331,"v0.12.0","## 变更内容\n* 功能：ClickHouse 离线存储支持 (#1224)，由 @ahmadnazeri 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1232 中实现\n* 功能（时间戳变体）默认开启，由 @aolfat 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1176 中实现\n* 将 pandas 升级至 >=1.3.5，由 @epps 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1175 中实现\n* 特性可选推理存储，由 @aolfat 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1178 中实现\n* 增加通过 CLI 在 Docker 上部署 Featureform 的支持，由 @ahmadnazeri 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F962 中实现\n* 截断过长的表单错误信息，由 @anthonylasso 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1205 中实现\n* 批量注册特性，由 @RiddhiBagadiaa 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1195 中实现\n* 可搜索标签，由 @ihkap11 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1167 中实现\n* SQL 转换的输入参数，由 @aolfat 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1233 中实现\n* 资源位置支持，由 @ahmadnazeri 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1262 中实现\n* 从主包中移除本地模式，由 @ahmadnazeri 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1294 中实现\n\n\n## 错误修复\n* 增加 gRPC 流式超时时间，由 @sdreyer 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1190 中实现\n* 长时间运行作业的客户端 gRPC 配置，由 @epps 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1192 中实现\n* 将 Spark 提交参数写入文件存储，以避免 Databricks API 10KB 字节限制，由 @epps 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1197 中实现\n* Helm 安装\u002F升级 ETCD 修复，由 @anthonylasso 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1220 中实现\n* `offline_store_spark_runner.py` 的 MD5 哈希值，由 @epps 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1213 中实现\n* Bug：健康提供者在重新应用时不会被重新检查，由 @epps 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1231 中实现\n* 修复横幅重新加载问题，由 @anthonylasso 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1300 中实现\n* Redshift 配置修正，由 @epps 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1307 中实现\n* 源模态空行修复，由 @anthonylasso 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1306 中实现\n* `get_dynamodb` 方法，由 @epps 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1309 中实现\n* 添加缺失的 HDFS switch case，由 @anthonylasso 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1311 中实现\n* 为 SQL 提供者添加变体到物化 ID 中，由 @epps 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1313 中实现\n* 在调用 .Query 之前检查 cast 的其他定义，由 @aolfat 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1321 中实现\n* 使按需特性能够作为对象传递，由 @aolfat 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1312 中实现\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fcompare\u002Fv0.11.0...v0.12.0","2024-02-12T23:27:47",{"id":179,"version":180,"summary_zh":181,"released_at":182},206332,"v0.11.0","## 变更内容\n### 新功能\n* 提供商健康检查，由 @epps 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1085 中实现\n* 预览特征数据的功能，由 @anthonylasso 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1129 中实现\n* 批量推理（Snowflake），由 @RiddhiBagadiaa 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1158 中实现\n* 批量推理（Spark），由 @RiddhiBagadiaa 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1174 中实现\n* 仪表板中按需预览特征代码的功能，由 @anthonylasso 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1169 中实现\n* 通过 S3 导入到 DynamoDB 进行物化，由 @epps 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1161 中实现\n* 将名称变体复制到剪贴板的功能，由 @anthonylasso 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1163 中实现\n* 获取用于训练集的 Spark DataFrame 的功能，由 @ahmadnazeri 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1121 中实现\n* 在仪表板中查看转换血缘关系的功能，由 @anthonylasso 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1096 中实现\n\n### 生活质量改进\n* 改进了 Docker 重建时的缓存机制，由 @sdreyer 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1043 中实现\n* 动态设置仪表板表格大小，由 @anthonylasso 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1030 中实现\n* Databricks\u002F客户端凭据输入验证，由 @ihkap11 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1149 中实现\n\n## Bug修复\n* 确保每个转换函数参数都有输入，由 @aolfat 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1018 中实现\n* 修复资源重新定义错误的状态错误，由 @aolfat 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1032 中实现\n* 解决源数据无法检索任何数据的问题，由 @ahmadnazeri 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1034 中实现\n* 刷新和加载问题的修复，由 @anthonylasso 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1039 中实现\n* 在标题中添加资源类型，由 @anthonylasso 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1024 中实现\n* 修复了会添加重复特征的条件语句，由 @sdreyer 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1062 中实现\n* 文档解析错误，由 @joshcolts18 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1090 中实现\n* 物化复制性能提升，由 @epps 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1079 中实现\n* 搜索结果加载状态过慢的问题，由 @anthonylasso 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1088 中实现\n* Flask 与 Python 3.7 的兼容性，由 @sdreyer 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1111 中实现\n* 物化复制中的竞态条件，由 @epps 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1113 中实现\n* 增加长时间运行作业的超时时间，由 @ahmadnazeri 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1118 中实现\n\n## 新贡献者\n* @jerempy 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1074 中做出了首次贡献\n* @joshcolts18 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F1090 中做出了首次贡献\n* @syedzubeen 做出了首次","2023-11-30T06:22:47",{"id":184,"version":185,"summary_zh":186,"released_at":187},206333,"v0.10.1","## 变更内容\n* 由 @sdreyer 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F917 中修复 Pinecone 相关问题\n* 由 @dependabot 将 grpcio 从 1.51.1 升级至 1.53.0，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F908\n* 由 @anthonylasso 实现 SQL 格式化函数的单元测试，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F918\n* 由 @anthonylasso 修复表格行高调整问题，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F937\n* 由 @anthonylasso 更新 CLI 版本输出信息，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F934\n* 由 @epps 添加 `pytest` 覆盖率报告，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F940\n* Bug 修复：KCF 无法读取在数据源上注册的 CSV 文件，由 @aolfat 修复，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F943\n* 由 @sdreyer 更新 README.md，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F946\n* 由 @sdreyer 在主分支上运行覆盖率测试，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F950\n* 由 @anthonylasso 设置代码以支持单元级别的 metadata_server 和 provider 测试，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F944\n* Provider 配置测试 #64，由 @anthonylasso 完成，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F948\n* 将 KCF Dockerfile 镜像升级至 Python 3.10，并统一使用 dill 的版本管理，由 @aolfat 完成，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F955\n* Provider 的“获取”功能测试，由 @anthonylasso 完成，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F954\n* 由 @anthonylasso 修复仪表板元数据变体相关的 Bug，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F933\n* 将资源名称改为 ResourceVariant，由 @aolfat 完成，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F941\n* 注册与实例化测试，由 @anthonylasso 完成，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F961\n* Bug 修复：DataFrame 转换问题，由 @anthonylasso 修复，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F926\n* Spark 测试，由 @sdreyer 完成，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F958\n* Tests\u002Fspark 测试，由 @sdreyer 完成，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F965\n* 功能新增：PostgreSQL 增加对 SSL 模式的支持，由 @ahmadnazeri 完成，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F964\n* 修复 ETCD 问题 #945，由 @anthonylasso 完成，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F967\n* 云存储路径处理（Azure Blob Storage），由 @epps 完成，详见 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F947\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fcompare\u002Fv0.10.0...v0.10.1","2023-08-17T23:33:20",{"id":189,"version":190,"summary_zh":191,"released_at":192},206334,"v0.10.0","## 变更内容\n\n### **V0.10 版本带来了：**\n\n**- 全新的仪表板界面及增强的功能**  \n**- 在本地和托管模式下支持 Weaviate 和 Pinecone 向量数据库**  \n**- 面向数据科学开发的 API 改进**  \n**- 更新的文档及错误修复**\n\n## 仪表板焕新与升级\n我们很高兴为您带来一个更加美观的仪表板，并为用户和管理员提供了全新功能，包括资源标签的元数据管理、转换结果预览以及清晰的转换逻辑展示。\n\n![64af1d82d153973fccf86317_4Ar9Po7GUJLOTIq5CZbJoNQFWgWcpCvVm3exXLtr7ogMJOlp2SCqY-61Cj_akC1U1KCfmaVwTwiRE763cvhW0bvWiaIlgoOLHmFuBNNIx3jsMpNyk08_ec1YXgt5MjNhfruB4XjfjUdrHBVd5IT5jfk](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fassets\u002F32000779\u002F6a6b6b7d-caed-4a54-b800-7dfe70ca8816)\n\n**可通过仪表板界面直接为资源添加标签**\n![64af1d83823a5ead51dfa88f_z9-GtRKcrE0MIjfeLS6Q6gM4wVS2BQVWoH_Yn33b0UHxiqJuLFsp9F9ytOF2mCfh5_ikrPM3yNGkUuvmnj13dTlHnQ-j_OYY88y-s8QuDlyCi7fBWc9eYIbUxXciNqjVH4PAn3RmJXzcfP5ot9TWUVw](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fassets\u002F32000779\u002F3ac0f2e3-793b-454d-9bf1-d23a9e15455e)\n\n**[从仪表板编辑资源元数据](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F907)**\n\n![64af1d83823a5ead51dfa88f_z9-GtRKcrE0MIjfeLS6Q6gM4wVS2BQVWoH_Yn33b0UHxiqJuLFsp9F9ytOF2mCfh5_ikrPM3yNGkUuvmnj13dTlHnQ-j_OYY88y-s8QuDlyCi7fBWc9eYIbUxXciNqjVH4PAn3RmJXzcfP5ot9TWUVw-2](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fassets\u002F32000779\u002Ff2e541b7-43c9-4170-98fb-0e1112d01d1e)\n\n\n**[直接从仪表板预览数据集](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F858)**\n\n![64af1d83307770eac98fc774_aecI-9V-HWManpCyGWroExo-uYd5x9SB4LlBsevmikq0iPQEN3VXugvd1pGkdwBR8zpsYN2zyCRZExsWlEA4Uwcpv_Jt1gqwgvrcWa1yLnOvizZhKp-DZ-ne8ALUSZ_Nwe7ZlqssMs6mZG4nnq3AWgs](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fassets\u002F32000779\u002Fe32cf0ab-0ba5-4034-a777-f0647673f1ee)\n\n**[Python 和 SQL 转换的更好格式化](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F915)**\n\n![64af1d83823a5ead51dfa88f_z9-GtRKcrE0MIjfeLS6Q6gM4wVS2BQVWoH_Yn33b0UHxiqJuLFsp9F9ytOF2mCfh5_ikrPM3yNGkUuvmnj13dTlHnQ-j_OYY88y-s8QuDlyCi7fBWc9eYIbUxXciNqjVH4PAn3RmJXzcfP5ot9TWUVw-1](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fassets\u002F32000779\u002F5fd60c11-abb9-4df9-975c-9f37bc7ced9d)\n\n## 向量数据库支持\n现在您可以将 Weaviate 和 Pinecone 注册为提供商！\n\n**[Pinecone](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F854)**\n\n\u003Cimg width=\"312\" alt=\"Screen Shot 2023-07-14 at 11 56 58 AM\" src=\"https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fassets\u002F32000779\u002F71d7473d-96c6-4a10-8554-e3e0403c16c5\">\n\n**[Weaviate](https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F888)**\n\n\u003Cimg width=\"274\" alt=\"Screen Shot 2023-07-14 at 12 46 27 PM\" src=\"https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fassets\u002F32000779\u002F19c4d133-7258-456d-9c12-bcf6eeb067b9\">\n\n## 面向数据科学开发的 API 改进\n\n**[Re","2023-07-14T19:50:41",{"id":194,"version":195,"summary_zh":196,"released_at":197},206335,"v0.9.0","# 新功能\n## 向量数据库与嵌入支持\n您可以通过 Featureform 定义并编排生成嵌入的数据管道。Featureform 可以将这些嵌入写入 Redis，以便进行最近邻查找。这还允许用户以声明式的方式对嵌入进行版本控制、复用和管理。\n\n### 注册 Redis 作为向量存储（注册方式与常规注册相同）\n```python\nff.register_redis(\n        name=\"redis\",\n        description=\"示例推理存储\",\n        team=\"Featureform\",\n        host=\"0.0.0.0\",\n        port=6379,\n)\n```\n\n### 从文本生成嵌入的管道\n\n```python\ndocs = spark.register_file(...)\n\n@spark.df_transform(\n    inputs=[docs],\n)\ndef embed_docs():\n    docs[\"embedding\"] = docs[\"text\"].map(lambda txt: openai.Embedding.create(\n                    model=\"text-embedding-ada-002\",\n                    input=txt,\n                )[\"data\"])\n    return docs\n```\n\n### 定义并版本化嵌入\n```python\n@ff.entity\ndef Article:\n    embedding = ff.Embedding(embed_docs[[\"id\", \"embedding\"]], dims=1024, vector_db=redis)\n\n@ff.entity\nclass Article:\n    embedding = ff.Embedding(\n        embed_docs[[\"id\", \"embedding\"]],\n        dims=1024,\n        variant=\"test-variant\",\n        vector_db=redis,\n    )\n```\n\n### 执行最近邻查找\n```python\nclient.Nearest(Article.embedding, “id_123”, 25)\n```\n\n## 将训练集当作 DataFrame 操作\n您已经可以将数据源当作 DataFrame 来操作，本次发布新增了对训练集的相同功能。\n\n### 将训练集当作 Pandas DataFrame 操作\n```python\nimport featureform as ff\n\nclient = ff.Client(...)\ndf = client.training_set(\"fraud\", \"simple\").dataframe()\nprint(df.head())\n```\n\n## 针对离线存储的增强调度功能\nFeatureform 支持使用 Cron 语法来调度转换任务的执行。本次发布对该功能进行了优化，使其更加稳定和高效，并增加了更详细的错误信息。\n\n### 每小时在 Snowflake 上运行的转换\n```python\n@snowflake.sql_transform(schedule=\"0 * * * *\")\ndef avg_transaction_price():\n    return \"SELECT user, AVG(price) FROM {{transaction}} GROUP BY user\"\n```\n\n## 在 Kubernetes 上使用 S3 运行 Pandas 转换\nFeatureform 会为您调度并运行转换任务。我们支持直接运行 Pandas 脚本，Featureform 会启动一个 Kubernetes 作业来执行它。这并不能替代像 Spark 这样的分布式处理框架（我们也支持 Spark），但对于已经在生产环境中使用 Pandas 的团队来说，这是一个非常好的选择。\n\n### 定义我们的 Kubernetes 提供商上的 Pandas 环境\n\n```python\naws_creds = ff.AWSCredentials(\n        aws_access_key_id=\"\u003Caws_access_key_id>\",\n        aws_secret_access_key=\"\u003Caws_secret_access_key>\",\n)\n\ns3 = ff.register_s3(\n    name=\"s3\",\n    credentials=aws_creds,\n    bucket_path=\"\u003Cs3_bucket_path>\",\n    bucket_region=\"\u003Cs3_bucket_region>\"\n)\n\npandas_k8s = ff.register_k8s(\n        ","2023-06-06T04:08:37",{"id":199,"version":200,"summary_zh":201,"released_at":202},206336,"v0.8.1","## 变更内容\n### 新功能\n* KCF\u002FS3 支持，由 @ahmadnazeri 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F786 中实现\n\n### 功能增强\n* 更新 README 示例以修复服务相关问题，并采用类 API，由 @ahmadnazeri 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F792 中完成\n* 仪表板路由及构建优化，由 @RedLeader16 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F781 中完成\n* 设置调度作业数量限制，由 @aolfat 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F794 中实现\n* 重新格式化并清理状态展示组件，由 @aolfat 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F782 中完成\n* 将 pymdown-extensions 从 9.9.2 升级至 10.0，由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F804 中完成\n\n### 错误修复\n* 修复路径异常 #769，由 @RedLeader16 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F773 中完成\n* 如果输入元组类型不是 (str, str)，则抛出错误，由 @ahmadnazeri 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F780 中完成\n* 修复 Spark 文件路径相关问题，由 @ahmadnazeri 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F776 中完成\n* 修复 SparkProvider 不同字段检查中缺少执行器类型的问题，由 @zhilingc 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F789 中完成\n* 为 etcd 协调器添加默认用户名和密码，由 @aolfat 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F798 中完成\n\n## 新贡献者\n* @zhilingc 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F789 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fcompare\u002Fv0.8.0...v0.8.1","2023-05-16T20:31:56",{"id":204,"version":205,"summary_zh":206,"released_at":207},206337,"v0.8.0","## 变更内容\n* Spark 增强：支持 Yarn\n* 将源数据和转换数据拉取到客户端\n```python\nclient = Client()  # 假设已设置 $FEATUREFORM_HOST\nclient.apply(insecure=False)  # 对于 Docker 使用 `insecure=True`（仅适用于快速入门）\n\n# 主要源数据作为 DataFrame\ntransactions_df = client.dataframe(\n    transactions, limit=2\n)  # 直接使用 ColumnSourceRegistrar 实例，并限制为 2 行\n\n# SQL 转换源数据作为 DataFrame\navg_user_transaction_df = client.dataframe(\n    \"average_user_transaction\", \"quickstart\"\n)  # 使用源名称和变体，不设置限制，因此会获取所有行\n\nprint(transactions_df.head())\n\n\"\"\"\n  \"transactionid\" \"customerid\" \"customerdob\" \"custlocation\"  \"custaccountbalance\"  \"transactionamount\"           \"timestamp\"  \"isfraud\"\n0              T1     C5841053       10\u002F1\u002F94     JAMSHEDPUR              17819.05                 25.0  2022-04-09T11:33:09Z      False\n1              T2     C2142763        4\u002F4\u002F57        JHAJJAR               2270.69              27999.0  2022-03-27T01:04:21Z      False\n\"\"\"\n```\n* 新增适用于 Azure、AWS 和 GCP 的电子商务笔记本\n* 文档：更新了自定义资源文档，并新增了 KCF 相关文档\n* 修复：更新并优化了错误信息\n* 修复：修复了资源搜索功能\n* 修复：修复了面包屑导航的类型和大小写问题\n* 修复：KCF 资源限制问题\n* 修复：修正了 Docker 文件和 Spark 的路径问题\n* 修复：修复了仪表板路由和页面重新加载的问题\n* 修复：Spark Databricks 错误信息问题\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fcompare\u002Fv0.7.3...v0.8.0","2023-05-10T16:21:40",{"id":209,"version":210,"summary_zh":211,"released_at":212},206338,"v0.7.3","## 变更内容\n* 类 API 增强：注册特征\u002F标签时可选 `timestamp_column` 参数\n* 文档更新：AWS 部署文档已更新，涵盖 `eksctl` 的变更\n* 支持 Python 3.11.2\n* 修复：CLI 列表命令中的资源状态问题\n* 修复：解决 Spark 链式转换中的 Spark 问题\n* 修复：修复了不允许将 Python 对象作为 DataFrame 转换输入的问题\n* 修复：在创建训练集之前检查特征是否存在\n* 在文档中添加 Notebook 链接\n\n## 新贡献者\n* @jmeisele 在 https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fpull\u002F727 中完成了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fcompare\u002Fv0.7.2...v0.7.3","2023-04-18T23:18:04",{"id":214,"version":215,"summary_zh":216,"released_at":217},206339,"v0.7.2","## 变更内容\n\n- 客户端的多项用户体验优化\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fcompare\u002Fv0.7.1...v0.7.2","2023-04-06T03:01:09",{"id":219,"version":220,"summary_zh":221,"released_at":222},206340,"v0.7.1","### What's Changed\r\n- Bugfix for On demand feature status\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fcompare\u002Fv0.7.0...v0.7.1","2023-04-03T17:31:40",{"id":224,"version":225,"summary_zh":226,"released_at":227},206341,"v0.7.0","# Release 0.7\r\n\r\n## Define Feature and Labels with an ORM-style Syntax\r\nFeatureform has added a new way to define entities, features, and labels. This new API, which takes inspiration from Python ORMs, makes it easier for data scientists to define and manage their features and labels in code.\r\n\r\n**Example**\r\n```python\r\ntransactions = postgres.register_table(\r\n  name=\"transactions\",\r\n  table=\"Transactions\", # This is the table's name in Postgres\r\n)\r\n\r\n@postgres.sql_transformation()\r\ndef average_user_transaction():\r\n  return \"SELECT CustomerID as user_id, avg(TransactionAmount) \" \\\r\n  \"as avg_transaction_amt from {{transactions.default}} GROUP BY user_id\"\r\n\r\n@ff.entity\r\nclass User:\r\n  avg_transactions = ff.Feature(\r\n    average_user_transaction[[\"user_id\", \"avg_transaction_amt\"]],\r\n    type=ff.Float32,\r\n    inference_store=redis,\r\n  )\r\n  fraudulent = ff.Label(\r\n    transactions[[\"customerid\", \"isfraud\"]], variant=\"quickstart\", type=ff.Bool\r\n  )\r\n\r\nff.register_training_set(\r\n  \"fraud_training\",\r\n  label=\"fraudulent\",\r\n  features=[\"avg_transactions\"],\r\n)\r\n```\r\n\r\nYou can read more in the [docs](https:\u002F\u002Fdocs.featureform.com\u002Fquickstart-docker).\r\n\r\n## Compute features at serving time with on-demand features\r\n\r\nA highly requested feature was to feature-ize incoming data at serving time. For example, you may have an on-demand feature that turns a user comment into an embedding, or one that processes an incoming image.\r\n\r\n**On-demand feature that turns a comment to an embedding at serving time**\r\n\r\n```python\r\n@ff.ondemand_feature\r\ndef text_to_embedding(serving_client, params, entities):\r\n    return bert_transform(params[“comment”])\r\n```\r\n\r\nYou can learn more in the [docs](https:\u002F\u002Fdocs.featureform.com\u002Fgetting-started\u002Fserving-for-inference-and-training#on-demand-features)\r\n\r\n## Attach tags & user-defined values to Featureform resources like transformations, features, and labels.\r\n\r\nAll features, labels, transformations, and training sets now have a `tags` and `properties` argument. `properties` is a dict and `tags` is a list.\r\n\r\n```python\r\nclient.register_training_set(“CustomerLTV_Training”, “default”, label=”ltv”, features=[“f1”, “f2”], tags=[“revenue”], properties={“visibility”: “internal”})\r\n```\r\n\r\nYou can read more in the [docs](https:\u002F\u002Fdocs.featureform.com\u002Fgetting-started\u002Fmetadata-tags).\r\n\r\n## Transformation and training set caching in local mode.\r\n\r\nFeatureform has a [local mode](https:\u002F\u002Fdocs.featureform.com\u002Fquickstart-local) that allows users to define, manage, and serve their features when working locally off their laptop. It doesn’t require anything to be deployed. It would historically re-generate training sets and features on each run, but with 0.7, we cache results by default to decrease iteration time.\r\n\r\n## A cleaner (and more colorful) CLI flow!\r\n![New CLI running featureform apply with colors](https:\u002F\u002Fuploads-ssl.webflow.com\u002F60cb5ca3bdbec96efaf3889d\u002F642a54e30700e355ee95cee5_runmode_short.gif)\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fcompare\u002Fv0.6.4...v0.7.0","2023-04-03T05:44:25",{"id":229,"version":230,"summary_zh":231,"released_at":232},206342,"v0.6.4","### What's Changed\r\n- Bugfix for headers not being fetched in Spark Dataframe transformations\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fcompare\u002Fv0.6.3...v0.6.4","2023-03-21T01:46:04",{"id":234,"version":235,"summary_zh":236,"released_at":237},206343,"v0.6.3","### What's Changed\r\n- Added Search to the standalone docker container\r\n- GCP Filestore bug fixes\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fcompare\u002Fv0.6.2...v0.6.3","2023-03-20T23:24:14",{"id":239,"version":240,"summary_zh":241,"released_at":242},206344,"v0.6.2","### What's Changed\r\n\r\n- Bugfix for typeguard python package version\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fcompare\u002Fv0.6.1...v0.6.2","2023-03-15T22:44:44",{"id":244,"version":245,"summary_zh":246,"released_at":247},206345,"v0.6.1","\r\n### What's Changed\r\n\r\n- Search Bugfix for Standalone Container\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fcompare\u002Fv0.6.0...v0.6.1","2023-03-15T22:42:03",{"id":249,"version":250,"summary_zh":251,"released_at":252},206346,"v0.6.0","# Release 0.6\r\n\r\n## Generic Spark support as a Provider\r\nFeatureform has had support for Spark on EMR and Spark on Databricks for a while. We’ve generalized our Spark implementation to handle all versions of Spark using any of S3, GCS, Azure Blob Store, or HDFS as a backing store!\r\n\r\n### Here are some examples:\r\n\r\n#### Spark with GCS backend\r\n```python\r\nspark_creds = ff.SparkCredentials(\r\n\tmaster=master_ip_or_local,\r\n\tdeploy_mode=\"client\",\r\n\tpython_version=cluster_py_version,\r\n)\r\n\r\ngcp_creds = ff.GCPCredentials(\r\n\tproject_id=project_id,\r\n\tcredentials_path=path_to_gcp_creds,\r\n)\r\n\r\ngcs = ff.register_gcs(\r\n\tname=gcs_provider_name,\r\n\tcredentials=gcp_creds,\r\n\tbucket_name=”bucket_name”,\r\n\tbucket_path=\"directory\u002F\",\r\n)\r\n\r\nspark = ff.register_spark(\r\n\tname=spark_provider_name,\r\n\tdescription=\"A Spark deployment we created for the Featureform quickstart\",\r\n\tteam=\"featureform-team\",\r\n\texecutor=spark_creds,\r\n\tfilestore=gcs,\r\n)\r\n```\r\n\r\n#### Databricks with Azure\r\n```python\r\ndatabricks = ff.DatabricksCredentials(\r\n\thost=host,\r\n\ttoken=token,\r\n\tcluster_id=cluster,\r\n)\r\n\r\nazure_blob = ff.register_blob_store(\r\n\tname=”blob”,\r\n\taccount_name=os.getenv(\"AZURE_ACCOUNT_NAME\", None),\r\n\taccount_key=os.getenv(\"AZURE_ACCOUNT_KEY\", None),\r\n\tcontainer_name=os.getenv(\"AZURE_CONTAINER_NAME\", None),\r\n\troot_path=\"testing\u002Fff\",\r\n)\r\n\r\nspark = ff.register_spark(\r\n\tname=”spark-databricks-azure”,\r\n\tdescription=\"A Spark deployment we created for the Featureform quickstart\",\r\n\tteam=\"featureform-team\",\r\n\texecutor=databricks,\r\n\tfilestore=azure_blob,\r\n)\r\n```\r\n\r\n#### EMR with S3\r\n```python\r\nspark_creds = ff.SparkCredentials(\r\n\tmaster=master_ip_or_local,\r\n\tdeploy_mode=\"client\",\r\n\tpython_version=cluster_py_version,\r\n)\r\n\r\naws_creds = ff.AWSCredentials(\r\n\taws_access_key_id=os.getenv(\"AWS_ACCESS_KEY_ID\", None),\r\n\taws_secret_access_key=os.getenv(\"AWS_SECRET_KEY\", None),\r\n)\r\n\r\ns3 = ff.register_s3(\r\n\tname=\"s3-quickstart\",\r\n\tcredentials=aws_creds,\r\n\tbucket_path=os.getenv(\"S3_BUCKET_PATH\", None),\r\n\tbucket_region=os.getenv(\"S3_BUCKET_REGION\", None),\r\n)\r\n\r\nspark = ff.register_spark(\r\n\tname=\"spark-generic-s3\",\r\n\tdescription=\"A Spark deployment we created for the Featureform quickstart\",\r\n\tteam=\"featureform-team\",\r\n\texecutor=spark_creds,\r\n\tfilestore=s3,\r\n)\r\n```\r\n\r\n#### Spark with HDFS\r\n```python\r\nspark_creds = ff.SparkCredentials(\r\n\tmaster=os.getenv(\"SPARK_MASTER\", \"local\"),\r\n\tdeploy_mode=\"client\",\r\n\tpython_version=\"3.7.16\",\r\n)\r\n\r\nhdfs = ff.register_hdfs(\r\n\tname=\"hdfs_provider\",\r\n\thost=host,\r\n\tport=\"9000\",\r\n\tusername=\"hduser\"\r\n)\r\n\r\nspark = ff.register_spark(\r\n\tname=\"spark-hdfs\",\r\n\tdescription=\"A Spark deployment we created for the Featureform quickstart\",\r\n\tteam=\"featureform-team\",\r\n\texecutor=spark_creds,\r\n\tfilestore=hdfs,\r\n)\r\n```\r\n\r\nYou can read more in the docs.\r\n\r\n## Track which models are using features \u002F training sets at serving time\r\n\r\nA highly requested feature was to add a lineage link between models and their feature & training set. Now when you serve a feature and training set you can include an *optional* model argument.\r\n\r\n```python\r\nclient.features(\"review_text\", entities={\"order\": \"df8e5e994bcc820fcf403f9a875201e6\"}, model=\"sentiment_analysis\")\r\n````\r\n\r\n```python\r\nclient.training_set(“CustomerLTV_Training”, “default”, model=”linear_ltv_model”)\r\n```\r\n\r\nIt can then be viewed via the CLI & the Dashboard:\r\n\r\n**Dashboard**\r\n\r\n![](https:\u002F\u002Flh5.googleusercontent.com\u002F0HpJ6SLKN8F0NYqzCUFmHlhF7iNrSvfHcyG4WLBjfe0QWuJApOk4luC1C9P6KMag4ja-DPQx--SiaYGpPtlSTQI3aEAGMXqMCwsPoQ0-GooYuBKseJuJw8sv8pGNUlSgrxknvMJ4e09XfahTIwHZiRg)\r\n\r\n**CLI**\r\n![](https:\u002F\u002Flh4.googleusercontent.com\u002FmSM7wx1pGV8fUrOl0CnQHL3cN5LuxsL1lH7DcVlRi90aWMwcItoXin3IJ0HyHXs-FwAVZYS-WHtMhQWAgpPtjX32nFCGm4ZV52nJerFaRRYHrvw-PaF99iwhOD_6GEOgGX8a9cWjk1bampTVpDBYj84)\r\n\r\n  \r\n\r\nYou can learn more in the [docs](https:\u002F\u002Fdocs.featureform.com\u002Fgetting-started\u002Fserving-for-inference-and-training#model-registration)\r\n\r\n## Backup & Recovery now available in open-source Featureform\r\n\r\nBackup and recovery was originally exclusive to our enterprise offering. It is our goal to open-source everything in the product that isn’t related to governance, though we often first pilot new features with clients as we nail down the API.\r\n\r\n### Enable Backups\r\n1.  Create a k8s secret with information on where to store backups.\r\n```shell\r\n> python backup\u002Fcreate_secret.py --help\r\nUsage: create_secret.py [OPTIONS] COMMAND [ARGS]...\r\nGenerates a Kubernetes secret to store Featureform backup data.\r\nUse this script to generate the Kubernetes secret, then apply it with:\r\n`kubectl apply -f backup_secret.yaml`\r\n\r\nOptions:\r\n-h, --help Show this message and exit.\r\n\r\nCommands:\r\nazure Create secret for azure storage containers\r\ngcs Create secret for GCS buckets\r\ns3 Create secret for S3 buckets\r\n```\r\n  \r\n\r\n2.  Upgrade your Helm cluster (if it was created without backups enabled)\r\n```shell\r\nhelm upgrade featureform featureform\u002Ffeatureform [FLAGS] --set backup.enable=true --set backup.schedule=\u003Cschedule>\r\n```\r\nWhere schedule is in cron syntax, for example a","2023-03-06T10:22:03",{"id":254,"version":255,"summary_zh":256,"released_at":257},206347,"v0.5.1","## What's Changed\r\n* Additional Snowflake Parameters (Role & Warehouse)\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fcompare\u002Fv0.5.0...v0.5.1","2023-02-08T02:48:09",{"id":259,"version":260,"summary_zh":261,"released_at":262},206348,"v0.5.0","## What's Changed\r\n* Status Functions For Resources\r\n* Custom KCF Images\r\n* Azure Quickstart\r\n* Support For Legacy Snowflake Credentials \r\n* ETCD Backup and Recovery \r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fcompare\u002Fv0.4.0...v0.5.0","2023-02-07T02:06:57",{"id":264,"version":265,"summary_zh":266,"released_at":267},206349,"v0.4.6","## What's Changed\r\n* Fix for Provider Image in Local Mode \r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Ffeatureform\u002Ffeatureform\u002Fcompare\u002Fv0.4.5...v0.4.6","2023-02-01T00:27:43"]