[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-tensorforce--tensorforce":3,"tool-tensorforce--tensorforce":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",156804,2,"2026-04-15T11:34:33",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":64,"owner_name":72,"owner_avatar_url":73,"owner_bio":74,"owner_company":75,"owner_location":75,"owner_email":76,"owner_twitter":75,"owner_website":75,"owner_url":77,"languages":78,"stars":87,"forks":88,"last_commit_at":89,"license":90,"difficulty_score":10,"env_os":91,"env_gpu":92,"env_ram":93,"env_deps":94,"category_tags":101,"github_topics":102,"view_count":32,"oss_zip_url":75,"oss_zip_packed_at":75,"status":17,"created_at":108,"updated_at":109,"faqs":110,"releases":140},7751,"tensorforce\u002Ftensorforce","tensorforce","Tensorforce: a TensorFlow library for applied reinforcement learning","Tensorforce 是一个基于 Google TensorFlow 构建的开源深度强化学习框架，旨在为科研探索与实际应用提供灵活且易用的解决方案。它主要解决了强化学习算法在落地过程中面临的复杂性难题，通过将核心算法与具体应用场景彻底解耦，让开发者无需重复造轮子即可快速适配不同的状态输入与动作输出。\n\n该工具特别适合人工智能研究人员、算法工程师以及希望将强化学习技术应用于机器人控制、游戏策略或自动化决策系统的开发者使用。其独特的技术亮点在于高度模块化的组件设计，允许用户像搭积木一样自由配置功能；同时，Tensorforce 将整个强化学习逻辑（包括控制流）完全用 TensorFlow 实现，这不仅确保了计算图的可移植性，还极大地简化了模型在不同编程语言环境中的部署流程。\n\n需要特别提醒的是，该项目目前已停止维护。尽管如此，其清晰的设计理念和完整的代码实现对于理解强化学习系统架构仍具有重要的参考价值，适合用于学习研究或在现有基础上进行二次开发。","# Tensorforce: a TensorFlow library for applied reinforcement learning\n\n[![Docs](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftensorforce_tensorforce_readme_6bf48b3e9a6d.png)](http:\u002F\u002Ftensorforce.readthedocs.io\u002Fen\u002Flatest\u002F)\n[![Gitter](https:\u002F\u002Fbadges.gitter.im\u002Ftensorforce\u002Fcommunity.svg)](https:\u002F\u002Fgitter.im\u002Ftensorforce\u002Fcommunity)\n[![Build Status](https:\u002F\u002Ftravis-ci.com\u002Ftensorforce\u002Ftensorforce.svg?branch=master)](https:\u002F\u002Ftravis-ci.com\u002Ftensorforce\u002Ftensorforce)\n[![pypi version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Ftensorforce)](https:\u002F\u002Fpypi.org\u002Fproject\u002FTensorforce\u002F)\n[![python version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Ftensorforce)](https:\u002F\u002Fpypi.org\u002Fproject\u002FTensorforce\u002F)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache%202.0-blue.svg)](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Fblob\u002Fmaster\u002FLICENSE)\n[![Donate](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdonate-GitHub_Sponsors-yellow)](https:\u002F\u002Fgithub.com\u002Fsponsors\u002FAlexKuhnle)\n[![Donate](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdonate-Liberapay-yellow)](https:\u002F\u002Fliberapay.com\u002FTensorforceTeam\u002Fdonate)\n\n\n**This project is not maintained any longer!**\n\n\n#### Introduction\n\nTensorforce is an open-source deep reinforcement learning framework, with an emphasis on modularized flexible library design and straightforward usability for applications in research and practice. Tensorforce is built on top of [Google's TensorFlow framework](https:\u002F\u002Fwww.tensorflow.org\u002F) and requires Python 3.\n\nTensorforce follows a set of high-level design choices which differentiate it from other similar libraries:\n\n- **Modular component-based design**: Feature implementations, above all, strive to be as generally applicable and configurable as possible, potentially at some cost of faithfully resembling details of the introducing paper.\n- **Separation of RL algorithm and application**: Algorithms are agnostic to the type and structure of inputs (states\u002Fobservations) and outputs (actions\u002Fdecisions), as well as the interaction with the application environment.\n- **Full-on TensorFlow models**: The entire reinforcement learning logic, including control flow, is implemented in TensorFlow, to enable portable computation graphs independent of application programming language, and to facilitate the deployment of models.\n\n\n\n#### Quicklinks\n\n- [Documentation](http:\u002F\u002Ftensorforce.readthedocs.io) and [update notes](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Fblob\u002Fmaster\u002FUPDATE_NOTES.md)\n- [Contact](mailto:tensorforce.team@gmail.com) and [Gitter channel](https:\u002F\u002Fgitter.im\u002Ftensorforce\u002Fcommunity)\n- [Benchmarks](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Fblob\u002Fmaster\u002Fbenchmarks) and [projects using Tensorforce](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Fblob\u002Fmaster\u002FPROJECTS.md)\n- [Roadmap](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Fblob\u002Fmaster\u002FROADMAP.md) and [contribution guidelines](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Fblob\u002Fmaster\u002FCONTRIBUTING.md)\n- [GitHub Sponsors](https:\u002F\u002Fgithub.com\u002Fsponsors\u002FAlexKuhnle) and [Liberapay](https:\u002F\u002Fliberapay.com\u002FTensorforceTeam\u002Fdonate)\n\n\n\n#### Table of content\n\n- [Installation](#installation)\n- [Quickstart example code](#quickstart-example-code)\n- [Command line usage](#command-line-usage)\n- [Features](#features)\n- [Environment adapters](#environment-adapters)\n- [Support, feedback and donating](#support-feedback-and-donating)\n- [Core team and contributors](#core-team-and-contributors)\n- [Cite Tensorforce](#cite-tensorforce)\n\n\n\n## Installation\n\nA stable version of Tensorforce is periodically updated on PyPI and installed as follows:\n\n```bash\npip3 install tensorforce\n```\n\nTo always use the latest version of Tensorforce, install the GitHub version instead:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce.git\npip3 install -e tensorforce\n```\n\n**Note on installation on M1 Macs:** At the moment Tensorflow, which is a core dependency of Tensorforce, cannot be installed on M1 Macs directly. Follow the [\"M1 Macs\" section](https:\u002F\u002Ftensorforce.readthedocs.io\u002Fen\u002Flatest\u002Fbasics\u002Finstallation.html) in the documentation for a workaround.\n\nEnvironments require additional packages for which there are setup options available (`ale`, `gym`, `retro`, `vizdoom`, `carla`; or `envs` for all environments), however, some require additional tools to be installed separately (see [environments documentation](http:\u002F\u002Ftensorforce.readthedocs.io)). Other setup options include `tfa` for [TensorFlow Addons](https:\u002F\u002Fwww.tensorflow.org\u002Faddons) and `tune` for [HpBandSter](https:\u002F\u002Fgithub.com\u002Fautoml\u002FHpBandSter) required for the `tune.py` script.\n\n**Note on GPU usage:** Different from (un)supervised deep learning, RL does not always benefit from running on a GPU, depending on environment and agent configuration. In particular for environments with low-dimensional state spaces (i.e., no images), it is hence worth trying to run on CPU only.\n\n\n\n## Quickstart example code\n\n```python\nfrom tensorforce import Agent, Environment\n\n# Pre-defined or custom environment\nenvironment = Environment.create(\n    environment='gym', level='CartPole', max_episode_timesteps=500\n)\n\n# Instantiate a Tensorforce agent\nagent = Agent.create(\n    agent='tensorforce',\n    environment=environment,  # alternatively: states, actions, (max_episode_timesteps)\n    memory=10000,\n    update=dict(unit='timesteps', batch_size=64),\n    optimizer=dict(type='adam', learning_rate=3e-4),\n    policy=dict(network='auto'),\n    objective='policy_gradient',\n    reward_estimation=dict(horizon=20)\n)\n\n# Train for 300 episodes\nfor _ in range(300):\n\n    # Initialize episode\n    states = environment.reset()\n    terminal = False\n\n    while not terminal:\n        # Episode timestep\n        actions = agent.act(states=states)\n        states, terminal, reward = environment.execute(actions=actions)\n        agent.observe(terminal=terminal, reward=reward)\n\nagent.close()\nenvironment.close()\n```\n\n\n\n## Command line usage\n\nTensorforce comes with a range of [example configurations](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Ftree\u002Fmaster\u002Fbenchmarks\u002Fconfigs) for different popular reinforcement learning environments. For instance, to run Tensorforce's implementation of the popular [Proximal Policy Optimization (PPO) algorithm](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.06347) on the [OpenAI Gym CartPole environment](https:\u002F\u002Fgym.openai.com\u002Fenvs\u002FCartPole-v1\u002F), execute the following line:\n\n```bash\npython3 run.py --agent benchmarks\u002Fconfigs\u002Fppo.json --environment gym \\\n    --level CartPole-v1 --episodes 100\n```\n\nFor more information check out the [documentation](http:\u002F\u002Ftensorforce.readthedocs.io).\n\n\n\n## Features\n\n- **Network layers**: Fully-connected, 1- and 2-dimensional convolutions, embeddings, pooling, RNNs, dropout, normalization, and more; *plus* support of Keras layers.\n- **Network architecture**: Support for multi-state inputs and layer (block) reuse, simple definition of directed acyclic graph structures via register\u002Fretrieve layer, plus support for arbitrary architectures.\n- **Memory types**: Simple batch buffer memory, random replay memory.\n- **Policy distributions**: Bernoulli distribution for boolean actions, categorical distribution for (finite) integer actions, Gaussian distribution for continuous actions, Beta distribution for range-constrained continuous actions, multi-action support.\n- **Reward estimation**: Configuration options for estimation horizon, future reward discount, state\u002Fstate-action\u002Fadvantage estimation, and for whether to consider terminal and horizon states.\n- **Training objectives**: (Deterministic) policy gradient, state-(action-)value approximation.\n- **Optimization algorithms**: Various gradient-based optimizers provided by TensorFlow like Adam\u002FAdaDelta\u002FRMSProp\u002Fetc, evolutionary optimizer, natural-gradient-based optimizer, plus a range of meta-optimizers.\n- **Exploration**: Randomized actions, sampling temperature, variable noise.\n- **Preprocessing**: Clipping, deltafier, sequence, image processing.\n- **Regularization**: L2 and entropy regularization.\n- **Execution modes**: Parallelized execution of multiple environments based on Python's `multiprocessing` and `socket`.\n- **Optimized act-only SavedModel extraction**.\n- **TensorBoard support**.\n\nBy combining these modular components in different ways, a variety of popular deep reinforcement learning models\u002Ffeatures can be replicated:\n\n- Q-learning: [Deep Q-learning](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fnature14236), [Double-DQN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1509.06461), [Dueling DQN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.06581), [n-step DQN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.01783), [Normalised Advantage Function (NAF)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1603.00748)\n- Policy gradient: [vanilla policy-gradient \u002F REINFORCE](http:\u002F\u002Fwww-anw.cs.umass.edu\u002F~barto\u002Fcourses\u002Fcs687\u002Fwilliams92simple.pdf), [Actor-critic and A3C](https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.01783), [Proximal Policy Optimization](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.06347), [Trust Region Policy Optimization](https:\u002F\u002Farxiv.org\u002Fabs\u002F1502.05477), [Deterministic Policy Gradient](https:\u002F\u002Farxiv.org\u002Fabs\u002F1509.02971)\n\nNote that in general the replication is not 100% faithful, since the models as described in the corresponding paper often involve additional minor tweaks and modifications which are hard to support with a modular design (and, arguably, also questionable whether it is important\u002Fdesirable to support them). On the upside, these models are just a few examples from the multitude of module combinations supported by Tensorforce.\n\n\n\n## Environment adapters\n\n- [Arcade Learning Environment](https:\u002F\u002Fgithub.com\u002Fmgbellemare\u002FArcade-Learning-Environment), a simple object-oriented framework that allows researchers and hobbyists to develop AI agents for Atari 2600 games.\n- [CARLA](https:\u002F\u002Fgithub.com\u002Fcarla-simulator\u002Fcarla), is an open-source simulator for autonomous driving research.\n- [OpenAI Gym](https:\u002F\u002Fgym.openai.com\u002F), a toolkit for developing and comparing reinforcement learning algorithms which supports teaching agents everything from walking to playing games like Pong or Pinball.\n- [OpenAI Retro](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fretro), lets you turn classic video games into Gym environments for reinforcement learning and comes with integrations for ~1000 games.\n- [OpenSim](http:\u002F\u002Fosim-rl.stanford.edu\u002F), reinforcement learning with musculoskeletal models.\n- [PyGame Learning Environment](https:\u002F\u002Fgithub.com\u002Fntasfi\u002FPyGame-Learning-Environment\u002F), learning environment which allows a quick start to Reinforcement Learning in Python.\n- [ViZDoom](https:\u002F\u002Fgithub.com\u002Fmwydmuch\u002FViZDoom), allows developing AI bots that play Doom using only the visual information.\n\n\n## Support, feedback and donating\n\nPlease get in touch via [mail](mailto:tensorforce.team@gmail.com) or on [Gitter](https:\u002F\u002Fgitter.im\u002Ftensorforce\u002Fcommunity) if you have questions, feedback, ideas for features\u002Fcollaboration, or if you seek support for applying Tensorforce to your problem.\n\nIf you want to support the Tensorforce core team (see below), please also consider donating: [GitHub Sponsors](https:\u002F\u002Fgithub.com\u002Fsponsors\u002FAlexKuhnle) or [Liberapay](https:\u002F\u002Fliberapay.com\u002FTensorforceTeam\u002Fdonate).\n\n\n\n## Core team and contributors\n\nTensorforce is currently developed and maintained by [Alexander Kuhnle](https:\u002F\u002Fgithub.com\u002FAlexKuhnle).\n\nEarlier versions of Tensorforce (\u003C= 0.4.2) were developed by [Michael Schaarschmidt](https:\u002F\u002Fgithub.com\u002Fmichaelschaarschmidt), [Alexander Kuhnle](https:\u002F\u002Fgithub.com\u002FAlexKuhnle) and [Kai Fricke](https:\u002F\u002Fgithub.com\u002Fkrfricke).\n\nThe advanced parallel execution functionality was originally contributed by Jean Rabault (@jerabaul29) and Vincent Belus (@vbelus). Moreover, the pretraining feature was largely developed in collaboration with Hongwei Tang (@thw1021) and Jean Rabault (@jerabaul29).\n\nThe CARLA environment wrapper is currently developed by Luca Anzalone (@luca96).\n\nWe are very grateful for our open-source contributors (listed according to Github, updated periodically):\n\nIslandman93, sven1977, Mazecreator, wassname, lefnire, daggertye, trickmeyer, mkempers,\nmryellow, ImpulseAdventure,\njanislavjankov, andrewekhalel,\nHassamSheikh, skervim,\nbeflix, coord-e,\nbenelot, tms1337, vwxyzjn, erniejunior,\nDeathn0t, petrbel, nrhodes, batu, yellowbee686, tgianko,\nAdamStelmaszczyk, BorisSchaeling, christianhidber, Davidnet, ekerazha, gitter-badger, kborozdin, Kismuz, mannsi, milesmcc, nagachika, neitzal, ngoodger, perara, sohakes, tomhennigan.\n\n\n\n## Cite Tensorforce\n\nPlease cite the framework as follows:\n\n```\n@misc{tensorforce,\n  author       = {Kuhnle, Alexander and Schaarschmidt, Michael and Fricke, Kai},\n  title        = {Tensorforce: a TensorFlow library for applied reinforcement learning},\n  howpublished = {Web page},\n  url          = {https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce},\n  year         = {2017}\n}\n```\n\nIf you use the [parallel execution functionality](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Ftree\u002Fmaster\u002Ftensorforce\u002Fcontrib), please additionally cite it as follows:\n\n```\n@article{rabault2019accelerating,\n  title        = {Accelerating deep reinforcement learning strategies of flow control through a multi-environment approach},\n  author       = {Rabault, Jean and Kuhnle, Alexander},\n  journal      = {Physics of Fluids},\n  volume       = {31},\n  number       = {9},\n  pages        = {094105},\n  year         = {2019},\n  publisher    = {AIP Publishing}\n}\n```\n\nIf you use Tensorforce in your research, you may additionally consider citing the following paper:\n\n```\n@article{lift-tensorforce,\n  author       = {Schaarschmidt, Michael and Kuhnle, Alexander and Ellis, Ben and Fricke, Kai and Gessert, Felix and Yoneki, Eiko},\n  title        = {{LIFT}: Reinforcement Learning in Computer Systems by Learning From Demonstrations},\n  journal      = {CoRR},\n  volume       = {abs\u002F1808.07903},\n  year         = {2018},\n  url          = {http:\u002F\u002Farxiv.org\u002Fabs\u002F1808.07903},\n  archivePrefix = {arXiv},\n  eprint       = {1808.07903}\n}\n```\n","# Tensorforce：用于应用强化学习的 TensorFlow 库\n\n[![文档](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftensorforce_tensorforce_readme_6bf48b3e9a6d.png)](http:\u002F\u002Ftensorforce.readthedocs.io\u002Fen\u002Flatest\u002F)\n[![Gitter](https:\u002F\u002Fbadges.gitter.im\u002Ftensorforce\u002Fcommunity.svg)](https:\u002F\u002Fgitter.im\u002Ftensorforce\u002Fcommunity)\n[![构建状态](https:\u002F\u002Ftravis-ci.com\u002Ftensorforce\u002Ftensorforce.svg?branch=master)](https:\u002F\u002Ftravis-ci.com\u002Ftensorforce\u002Ftensorforce)\n[![PyPI 版本](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Ftensorforce)](https:\u002F\u002Fpypi.org\u002Fproject\u002FTensorforce\u002F)\n[![Python 版本](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Ftensorforce)](https:\u002F\u002Fpypi.org\u002Fproject\u002FTensorforce\u002F)\n[![许可证](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache%202.0-blue.svg)](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Fblob\u002Fmaster\u002FLICENSE)\n[![捐赠](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdonate-GitHub_Sponsors-yellow)](https:\u002F\u002Fgithub.com\u002Fsponsors\u002FAlexKuhnle)\n[![捐赠](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdonate-Liberapay-yellow)](https:\u002F\u002Fliberapay.com\u002FTensorforceTeam\u002Fdonate)\n\n\n**该项目已不再维护！**\n\n\n#### 简介\n\nTensorforce 是一个开源的深度强化学习框架，强调模块化、灵活的库设计以及在研究和实践中简单易用的特点。Tensorforce 构建于 [Google 的 TensorFlow 框架](https:\u002F\u002Fwww.tensorflow.org\u002F) 之上，并且需要 Python 3。\n\nTensorforce 遵循一系列高层次的设计理念，使其与其他类似库有所区别：\n\n- **模块化的组件式设计**：功能实现力求尽可能通用和可配置，即使这可能会牺牲对原始论文细节的忠实复现。\n- **强化学习算法与应用分离**：算法对输入（状态\u002F观测）和输出（动作\u002F决策）的类型和结构，以及与应用环境的交互方式均不敏感。\n- **完全基于 TensorFlow 的模型**：整个强化学习逻辑，包括控制流，都由 TensorFlow 实现，从而能够生成独立于应用程序语言的可移植计算图，并便于模型的部署。\n\n\n\n#### 快速链接\n\n- [文档](http:\u002F\u002Ftensorforce.readthedocs.io) 和 [更新说明](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Fblob\u002Fmaster\u002FUPDATE_NOTES.md)\n- [联系方式](mailto:tensorforce.team@gmail.com) 和 [Gitter 论坛](https:\u002F\u002Fgitter.im\u002Ftensorforce\u002Fcommunity)\n- [基准测试](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Fblob\u002Fmaster\u002Fbenchmarks) 和 [使用 Tensorforce 的项目](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Fblob\u002Fmaster\u002FPROJECTS.md)\n- [路线图](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Fblob\u002Fmaster\u002FROADMAP.md) 和 [贡献指南](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Fblob\u002Fmaster\u002FCONTRIBUTING.md)\n- [GitHub 赞助](https:\u002F\u002Fgithub.com\u002Fsponsors\u002FAlexKuhnle) 和 [Liberapay](https:\u002F\u002Fliberapay.com\u002FTensorforceTeam\u002Fdonate)\n\n\n\n#### 目录\n\n- [安装](#installation)\n- [快速入门示例代码](#quickstart-example-code)\n- [命令行使用](#command-line-usage)\n- [特性](#features)\n- [环境适配器](#environment-adapters)\n- [支持、反馈与捐赠](#support-feedback-and-donating)\n- [核心团队与贡献者](#core-team-and-contributors)\n- [引用 Tensorforce](#cite-tensorforce)\n\n\n\n## 安装\n\nTensorforce 的稳定版本会定期发布到 PyPI，安装方法如下：\n\n```bash\npip3 install tensorforce\n```\n\n若希望始终使用最新版本的 Tensorforce，可以安装 GitHub 上的版本：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce.git\npip3 install -e tensorforce\n```\n\n**关于 M1 Mac 的安装说明：** 目前，作为 Tensorforce 核心依赖项的 TensorFlow 尚无法直接在 M1 Mac 上安装。请参考文档中的“[M1 Mac]”章节以获取解决方法。\n\n环境需要额外的软件包，这些软件包有多种安装选项（`ale`、`gym`、`retro`、`vizdoom`、`carla`；或使用 `envs` 安装所有环境），但某些环境还需要单独安装其他工具（详见 [环境文档](http:\u002F\u002Ftensorforce.readthedocs.io)）。此外，还有其他安装选项，例如使用 `tfa` 安装 [TensorFlow Addons](https:\u002F\u002Fwww.tensorflow.org\u002Faddons)，以及使用 `tune` 安装 [HpBandSter](https:\u002F\u002Fgithub.com\u002Fautoml\u002FHpBandSter)，后者是运行 `tune.py` 脚本所必需的。\n\n**关于 GPU 使用的说明：** 与（无监督）深度学习不同，强化学习并不总是受益于 GPU 加速，具体取决于环境和智能体的配置。尤其是对于状态空间维度较低的环境（即没有图像输入的情况），建议尝试仅使用 CPU 运行。\n\n\n\n## 快速入门示例代码\n\n```python\nfrom tensorforce import Agent, Environment\n\n# 预定义或自定义环境\nenvironment = Environment.create(\n    environment='gym', level='CartPole', max_episode_timesteps=500\n)\n\n# 实例化一个 Tensorforce 智能体\nagent = Agent.create(\n    agent='tensorforce',\n    environment=environment,  # 或者使用 states、actions、(max_episode_timesteps)\n    memory=10000,\n    update=dict(unit='timesteps', batch_size=64),\n    optimizer=dict(type='adam', learning_rate=3e-4),\n    policy=dict(network='auto'),\n    objective='policy_gradient',\n    reward_estimation=dict(horizon=20)\n)\n\n# 训练 300 个回合\nfor _ in range(300):\n\n    # 初始化回合\n    states = environment.reset()\n    terminal = False\n\n    while not terminal:\n        # 回合中的一步\n        actions = agent.act(states=states)\n        states, terminal、reward = environment.execute(actions=actions)\n        agent.observe(terminal=terminal, reward=reward)\n\nagent.close()\nenvironment.close()\n```\n\n\n\n## 命令行使用\n\nTensorforce 提供了一系列针对不同热门强化学习环境的 [示例配置](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Ftree\u002Fmaster\u002Fbenchmarks\u002Fconfigs)。例如，要在 [OpenAI Gym CartPole 环境](https:\u002F\u002Fgym.openai.com\u002Fenvs\u002FCartPole-v1\u002F) 上运行 Tensorforce 实现的流行 [近端策略优化 (PPO) 算法](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.06347)，可以执行以下命令：\n\n```bash\npython3 run.py --agent benchmarks\u002Fconfigs\u002Fppo.json --environment gym \\\n    --level CartPole-v1 --episodes 100\n```\n\n更多信息请参阅 [文档](http:\u002F\u002Ftensorforce.readthedocs.io)。\n\n## 功能特性\n\n- **网络层**：全连接层、一维和二维卷积、嵌入层、池化层、RNN、Dropout、归一化等；*此外*还支持Keras的层。\n- **网络架构**：支持多状态输入和层（块）复用，通过注册\u002F检索层方式简单定义有向无环图结构，并支持任意架构。\n- **记忆类型**：简单的批量缓冲记忆、随机回放缓冲记忆。\n- **策略分布**：布尔型动作采用伯努利分布，有限整数型动作采用分类分布，连续型动作采用高斯分布，范围受限的连续型动作采用贝塔分布，支持多动作场景。\n- **奖励估计**：可配置估计 horizon、未来奖励折扣、状态\u002F状态-动作\u002F优势估计，以及是否考虑终止状态和 horizon 状态。\n- **训练目标**：（确定性）策略梯度、状态-（动作-）值函数近似。\n- **优化算法**：TensorFlow 提供的多种基于梯度的优化器，如 Adam\u002FAdaDelta\u002FRMSProp 等，进化优化器、自然梯度优化器，以及一系列元优化器。\n- **探索机制**：随机动作、采样温度、可变噪声。\n- **预处理**：裁剪、差分器、序列处理、图像处理。\n- **正则化**：L2 正则化和熵正则化。\n- **执行模式**：基于 Python 的 `multiprocessing` 和 `socket` 并行执行多个环境。\n- **优化的仅执行 SavedModel 提取**。\n- **TensorBoard 支持**。\n\n通过以不同方式组合这些模块化组件，可以复现多种流行的深度强化学习模型和功能：\n\n- Q 学习：[Deep Q-learning](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fnature14236)、[Double-DQN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1509.06461)、[Dueling DQN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.06581)、[n-step DQN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.01783)、[Normalised Advantage Function (NAF)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1603.00748)\n- 策略梯度：[vanilla policy-gradient \u002F REINFORCE](http:\u002F\u002Fwww-anw.cs.umass.edu\u002F~barto\u002Fcourses\u002Fcs687\u002Fwilliams92simple.pdf)、[Actor-critic 和 A3C](https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.01783)、[Proximal Policy Optimization](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.06347)、[Trust Region Policy Optimization](https:\u002F\u002Farxiv.org\u002Fabs\u002F1502.05477)、[Deterministic Policy Gradient](https:\u002F\u002Farxiv.org\u002Fabs\u002F1509.02971)\n\n需要注意的是，通常情况下，复现并不完全忠实于原文，因为论文中描述的模型往往包含一些难以通过模块化设计支持的小调整和修改（而且，是否有必要或值得支持这些细节也存在争议）。不过，从积极的一面来看，这些模型只是 Tensorforce 支持的众多模块组合中的几个例子而已。\n\n\n\n## 环境适配器\n\n- [Arcade Learning Environment](https:\u002F\u002Fgithub.com\u002Fmgbellemare\u002FArcade-Learning-Environment)，一个简单的面向对象框架，允许研究人员和爱好者为 Atari 2600 游戏开发 AI 智能体。\n- [CARLA](https:\u002F\u002Fgithub.com\u002Fcarla-simulator\u002Fcarla)，一款用于自动驾驶研究的开源模拟器。\n- [OpenAI Gym](https:\u002F\u002Fgym.openai.com\u002F)，一个用于开发和比较强化学习算法的工具包，支持训练智能体完成从行走到玩 Pong 或弹球等各种任务。\n- [OpenAI Retro](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fretro)，可将经典电子游戏转换为 Gym 环境用于强化学习，并已集成约 1000 款游戏。\n- [OpenSim](http:\u002F\u002Fosim-rl.stanford.edu\u002F)，基于肌肉骨骼模型的强化学习。\n- [PyGame Learning Environment](https:\u002F\u002Fgithub.com\u002Fntasfi\u002FPyGame-Learning-Environment\u002F)，一个便于在 Python 中快速入门强化学习的学习环境。\n- [ViZDoom](https:\u002F\u002Fgithub.com\u002Fmwydmuch\u002FViZDoom)，允许仅使用视觉信息开发能够玩 Doom 的 AI 机器人。\n\n\n## 支持、反馈与捐赠\n\n如有任何问题、反馈、功能建议或合作意向，或需要帮助将 Tensorforce 应用于您的实际问题，请通过 [邮件](mailto:tensorforce.team@gmail.com) 或 [Gitter](https:\u002F\u002Fgitter.im\u002Ftensorforce\u002Fcommunity) 联系我们。\n\n如果您希望支持 Tensorforce 核心团队（见下文），也欢迎通过以下方式捐赠：[GitHub Sponsors](https:\u002F\u002Fgithub.com\u002Fsponsors\u002FAlexKuhnle) 或 [Liberapay](https:\u002F\u002Fliberapay.com\u002FTensorforceTeam\u002Fdonate)。\n\n\n\n## 核心团队与贡献者\n\n目前，Tensorforce 由 [Alexander Kuhnle](https:\u002F\u002Fgithub.com\u002FAlexKuhnle) 开发并维护。\n\nTensorforce 较早版本（\u003C= 0.4.2）由 [Michael Schaarschmidt](https:\u002F\u002Fgithub.com\u002Fmichaelschaarschmidt)、[Alexander Kuhnle](https:\u002F\u002Fgithub.com\u002FAlexKuhnle) 和 [Kai Fricke](https:\u002F\u002Fgithub.com\u002Fkrfricke) 共同开发。\n\n高级并行执行功能最初由 Jean Rabault (@jerabaul29) 和 Vincent Belus (@vbelus) 贡献。此外，预训练功能主要是在 Hongwei Tang (@thw1021) 和 Jean Rabault (@jerabaul29) 的协作下开发的。\n\nCARLA 环境封装目前由 Luca Anzalone (@luca96) 开发。\n\n我们非常感谢所有开源贡献者（按 GitHub 排序，定期更新）：\n\nIslandman93, sven1977, Mazecreator, wassname, lefnire, daggertye, trickmeyer, mkempers,\nmryellow, ImpulseAdventure,\njanislavjankov, andrewekhalel,\nHassamSheikh, skervim,\nbeflix, coord-e,\nbenelot, tms1337, vwxyzjn, erniejunior,\nDeathn0t, petrbel, nrhodes, batu, yellowbee686, tgianko,\nAdamStelmaszczyk, BorisSchaeling, christianhidber, Davidnet, ekerazha, gitter-badger, kborozdin, Kismuz, mannsi, milesmcc, nagachika, neitzal, ngoodger, perara, sohakes, tomhennigan。\n\n## 引用 Tensorforce\n\n请按以下方式引用该框架：\n\n```\n@misc{tensorforce,\n  author       = {Kuhnle, Alexander 和 Schaarschmidt, Michael 和 Fricke, Kai},\n  title        = {Tensorforce：用于应用强化学习的 TensorFlow 库},\n  howpublished = {网页},\n  url          = {https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce},\n  year         = {2017}\n}\n```\n\n如果您使用了[并行执行功能](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Ftree\u002Fmaster\u002Ftensorforce\u002Fcontrib)，请同时按以下方式引用相关文献：\n\n```\n@article{rabault2019accelerating,\n  title        = {通过多环境方法加速流动控制的深度强化学习策略},\n  author       = {Rabault, Jean 和 Kuhnle, Alexander},\n  journal      = {Physics of Fluids},\n  volume       = {31},\n  number       = {9},\n  pages        = {094105},\n  year         = {2019},\n  publisher    = {AIP Publishing}\n}\n```\n\n如果您在研究中使用了 Tensorforce，也可以考虑引用以下论文：\n\n```\n@article{lift-tensorforce,\n  author       = {Schaarschmidt, Michael 和 Kuhnle, Alexander 和 Ellis, Ben 和 Fricke, Kai 和 Gessert, Felix 和 Yoneki, Eiko},\n  title        = {{LIFT}：通过从示范中学习实现计算机系统的强化学习},\n  journal      = {CoRR},\n  volume       = {abs\u002F1808.07903},\n  year         = {2018},\n  url          = {http:\u002F\u002Farxiv.org\u002Fabs\u002F1808.07903},\n  archivePrefix = {arXiv},\n  eprint       = {1808.07903}\n}\n```","# Tensorforce 快速上手指南\n\n> **重要提示**：根据官方说明，本项目目前已**不再维护**。建议在新项目中谨慎评估使用，或考虑其他活跃的强化学习框架（如 RLlib、Stable-Baselines3 等）。以下内容基于项目最后稳定版本整理。\n\nTensorforce 是一个基于 TensorFlow 的开源深度强化学习库，强调模块化设计和易用性，适用于科研与实际应用。\n\n## 环境准备\n\n- **操作系统**：Linux, macOS, Windows\n- **Python 版本**：Python 3.6+\n- **核心依赖**：TensorFlow (CPU 或 GPU 版本)\n- **可选依赖**：\n  - 游戏\u002F仿真环境：`gym`, `ale` (Atari), `vizdoom`, `carla`, `retro` 等\n  - 优化工具：`tfa` (TensorFlow Addons), `tune` (HpBandSter)\n\n**M1 Mac 用户注意**：\nTensorFlow 在 M1 芯片上无法直接通过常规方式安装。请参考官方文档中的 [\"M1 Macs\" 章节](https:\u002F\u002Ftensorforce.readthedocs.io\u002Fen\u002Flatest\u002Fbasics\u002Finstallation.html) 获取变通方案（通常涉及使用 `tensorflow-macos` 和 `tensorflow-metal`）。\n\n**GPU 使用建议**：\n与监督学习不同，强化学习并不总是能从 GPU 加速中受益。对于状态空间维度较低（非图像输入）的环境，仅使用 CPU 运行往往效率更高且足够。\n\n## 安装步骤\n\n### 方法一：安装稳定版（推荐）\n通过 PyPI 安装最新稳定版本：\n\n```bash\npip3 install tensorforce\n```\n\n*国内用户加速建议*：可使用清华或阿里镜像源加速安装：\n```bash\npip3 install tensorforce -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 方法二：安装开发版\n如需使用 GitHub 上的最新代码（包含未发布的功能或修复）：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce.git\npip3 install -e tensorforce\n```\n\n### 安装特定环境支持\n如果需要连接特定的仿真环境（如 Gym, Atari 等），需额外安装对应包。例如安装所有常见环境支持：\n\n```bash\npip3 install tensorforce[envs]\n```\n或者单独安装：\n```bash\npip3 install tensorforce[gym]\npip3 install tensorforce[ale]\n```\n\n## 基本使用\n\n以下是一个最简单的示例，展示如何使用 Tensorforce 在 OpenAI Gym 的 `CartPole` 环境中训练一个代理（Agent）。\n\n### Python 代码示例\n\n```python\nfrom tensorforce import Agent, Environment\n\n# 1. 创建环境 (这里使用预定义的 Gym 环境)\nenvironment = Environment.create(\n    environment='gym', level='CartPole', max_episode_timesteps=500\n)\n\n# 2. 实例化 Agent\n# 配置包括：记忆容量、更新频率、优化器、网络结构、目标函数等\nagent = Agent.create(\n    agent='tensorforce',\n    environment=environment,  # 也可以直接传入 states\u002Factions 定义\n    memory=10000,\n    update=dict(unit='timesteps', batch_size=64),\n    optimizer=dict(type='adam', learning_rate=3e-4),\n    policy=dict(network='auto'), # 自动根据输入输出推断网络结构\n    objective='policy_gradient',\n    reward_estimation=dict(horizon=20)\n)\n\n# 3. 开始训练循环\nfor _ in range(300):  # 训练 300 个回合\n\n    # 初始化回合\n    states = environment.reset()\n    terminal = False\n\n    while not terminal:\n        # 单步交互\n        actions = agent.act(states=states)\n        states, terminal, reward = environment.execute(actions=actions)\n        \n        # 观察结果并更新内部状态\n        agent.observe(terminal=terminal, reward=reward)\n\n# 4. 关闭资源\nagent.close()\nenvironment.close()\n```\n\n### 命令行使用示例\n\nTensorforce 还支持通过命令行直接运行预设配置。例如，使用 PPO 算法运行 CartPole 环境：\n\n```bash\npython3 run.py --agent benchmarks\u002Fconfigs\u002Fppo.json --environment gym \\\n    --level CartPole-v1 --episodes 100\n```\n\n更多配置文件示例可在项目的 `benchmarks\u002Fconfigs` 目录中找到。","某自动驾驶初创公司的算法团队正在开发一套智能交通信号控制系统，旨在通过强化学习动态优化路口红绿灯时长以缓解拥堵。\n\n### 没有 tensorforce 时\n- 研究人员需手动编写大量底层 TensorFlow 代码来构建强化学习逻辑，导致控制流复杂且难以调试，开发周期长达数周。\n- 算法模型与具体的交通仿真环境强耦合，一旦更换仿真器或调整输入状态格式，就必须重构核心算法代码。\n- 由于逻辑分散在 Python 和 TensorFlow 之间，模型难以直接部署到边缘计算设备，跨语言移植成本极高。\n- 尝试复现论文中的新算法时，往往因框架缺乏模块化组件而不得不重新造轮子，实验迭代效率低下。\n\n### 使用 tensorforce 后\n- 利用 tensorforce 的全 TensorFlow 实现特性，团队通过配置即可生成完整的计算图，将原型开发时间从数周缩短至几天。\n- 得益于算法与应用的分离设计，同一套 PPO 或 DQN 算法可无缝适配不同的交通仿真环境，无需修改核心逻辑。\n- 完整的强化学习逻辑被封装为便携式计算图，轻松导出并部署到路侧单元等生产环境中，解决了落地难题。\n- 借助其模块化组件库，研究人员能快速组合出各种变体算法进行对比实验，大幅提升了科研探索的灵活性。\n\ntensorforce 通过高度模块化与全图执行架构，让团队从繁琐的底层实现中解放出来，专注于解决真实的交通优化问题。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftensorforce_tensorforce_5b6aab8a.png","Tensorforce","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ftensorforce_dfda5d5c.png","",null,"tensorforce.team@gmail.com","https:\u002F\u002Fgithub.com\u002Ftensorforce",[79,83],{"name":80,"color":81,"percentage":82},"Python","#3572A5",99.9,{"name":84,"color":85,"percentage":86},"Shell","#89e051",0.1,3309,525,"2026-04-10T06:14:07","Apache-2.0","Linux, macOS, Windows","非必需。文档指出强化学习并不总是能从 GPU 受益，特别是对于低维状态空间（无图像）的环境，建议仅使用 CPU。未指定具体显卡型号、显存大小或 CUDA 版本。","未说明",{"notes":95,"python":96,"dependencies":97},"1. 项目状态：该项目已不再维护。\n2. M1 Mac 特别说明：核心依赖 TensorFlow 无法直接在 M1 Mac 上安装，需参考官方文档中的变通方案。\n3. 环境依赖：使用特定环境（如 ale, gym, retro, vizdoom, carla）需要额外安装对应的软件包或工具。\n4. 架构优势：整个强化学习逻辑（包括控制流）均在 TensorFlow 中实现，以支持独立于应用语言的便携式计算图。","3.x (要求 Python 3)",[98,99,100],"tensorflow","TensorFlow Addons (可选)","HpBandSter (可选，用于调优)",[14],[103,98,104,105,64,106,107],"reinforcement-learning","deep-reinforcement-learning","tensorflow-library","control","system-control","2026-03-27T02:49:30.150509","2026-04-16T01:43:10.985410",[111,116,121,126,131,136],{"id":112,"question_zh":113,"answer_zh":114,"source_url":115},34696,"为什么加载代理（Agent）后内存使用量会翻倍？","这通常是因为调用方式错误导致的。`load()` 是一个静态方法，应该通过类直接调用 `agent = Agent.load(...)`，而不是通过实例调用 `agent.load(...)`。此外，如果在调用 `load()` 之前已经使用了 `Agent.create(...)` 创建了一个实例，会导致内存分配两次：一次由 `create` 分配，另一次由错误的 `load` 调用分配，且新加载的代理指针可能丢失。正确的做法是直接使用 `Agent.load(...)`，无需先调用 `Agent.create(...)`。","https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Fissues\u002F658",{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},34697,"如何在 TensorForce 中添加注意力机制（Attention Mechanism）？","目前 TensorForce 尚未将注意力机制作为内置选项提供。要实现该功能，需要深入代码库进行根本性的修改。虽然该功能在路线图上，但短期内不会实现。如果急需此功能，可以考虑通过赞助开发者来推动其优先开发。","https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Fissues\u002F790",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},34698,"如何定义自定义网络架构（如 U-Net）或冻结网络层？","TensorForce 支持通过字典定义自定义网络架构，只要动作空间和网络输出形状匹配即可。对于冻结层或使用预训练网络的功能，目前也已实现。如果需要更复杂的操作（如特定的张量重塑而不展平），可以使用 Keras 层包装器（`tensorforce\u002Fcore\u002Flayers\u002Fkeras.py`）。对于简单的重塑操作，可能需要改进现有的池化层接口或直接使用 TensorFlow\u002FKeras 构建网络并集成。","https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Fissues\u002F627",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},34699,"在 GPU 上运行 Quickstart 示例时训练卡住不动怎么办？","这个问题通常与 `memory_model` 中的 `tf.cond` 操作在 GPU 上的兼容性有关。解决方案是将 `tf.cond` 替换为对 GPU 更友好的 `tf.while_loop` 结构。具体代码修改如下：\n```python\ndef body(optimize):\n    optimize = self.fn_optimization(**batch)\n    with tf.control_dependencies(control_inputs=(optimize,)):\n        return tf.logical_and(x=False, y=False)\n\ndef cond(optimize):\n    return optimize\n\nreturn tf.while_loop(cond=cond, body=body, loop_vars=(optimize,))\n```\n建议升级到包含此修复的最新版本。","https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Fissues\u002F391",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},34700,"在 TF2 分支中使用 saved_model 导出时遇到数据类型或签名错误如何解决？","在 TF2 分支导出 saved_model 时，可能需要修改 `tensorforce\u002Fcore\u002Futils\u002Fdicts.py` 第 121 行，以接受所有数据类型，因为 TensorFlow 在过程中会尝试重建字典。修改后的 `value_type` 应包含 `(tf.IndexedSlices, tf.Tensor, tf.Variable, object)`。此外，如果在 `tensorforce\u002Fcore\u002Fmodels\u002Fmodel.py` 中遇到签名相关的 ValueError，通常需要检查输出张量的扁平化处理，确保模型签名符合 SavedModel 的要求。","https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce\u002Fissues\u002F704",{"id":137,"question_zh":138,"answer_zh":139,"source_url":115},34701,"训练过程中 RAM 占用过高导致崩溃，如何优化配置？","RAM 占用过高可能与 `batch_size` 和网络架构有关。在 Issue #658 中，用户发现每经过 `batch_size` 个 episode，内存会增加数 GB。优化建议包括：减小 `batch_size`（例如设为 4 或更小），调整 `update_frequency`，或者简化网络结构（如减少 `internal_rnn` 的大小或深度）。此外，确保没有重复创建代理实例（参见加载代理的内存问题），并使用最新的 TensorFlow 和 TensorForce 版本，因为新版本对内存管理和检查点功能进行了优化。",[141,146,151,156,161,166,171,176,181,186,191,196,201,205],{"id":142,"version":143,"summary_zh":144,"released_at":145},272036,"0.6.5","##### 代理：\n- 将代理参数 `reward_preprocessing` 重命名为 `reward_processing`；对于 Tensorforce 代理，该参数已移至 `reward_estimation[reward_processing]`。\n\n##### 分布：\n- 新增 `categorical` 分布参数 `skip_linear`，用于不添加隐式的线性 logits 层。\n\n##### 环境：\n- 通过新函数 `Environment.num_actors()` 支持多智能体并行环境。\n  - 如果环境为多智能体环境，`Runner` 默认使用多智能体并行机制。\n- 新增可选的 `Environment` 函数 `episode_return()`，用于返回上一集的真实回报，当环境奖励的累计和不适合作为运行器显示指标时使用。\n\n##### 示例：\n- 新增 `vectorized_environment.py` 和 `multiactor_environment.py` 脚本，演示如何设置向量化或多智能体环境。","2021-08-30T20:20:58",{"id":147,"version":148,"summary_zh":149,"released_at":150},272037,"0.6.4","##### 代理：\n- 代理参数 `update_frequency` \u002F `update[frequency]` 现在支持大于 0.0 的浮点数值，用于指定相对于批次大小的更新频率。\n- 将 DQN、DoubleDQN 和 DuelingDQN 代理的参数 `update_frequency` 的默认值由 `1.0` 更改为 `0.25`。\n- 为所有代理子类型新增了 `return_processing` 和（适用时）`advantage_processing` 参数。\n- 新增函数 `Agent.get_specification()`，用于以字典形式返回代理的配置信息。\n- 新增函数 `Agent.get_architecture()`，用于返回网络层架构的字符串表示。\n\n##### 模块：\n- 改进了模块规范并使其更简洁。例如，可以使用 `network=my_module` 而不是 `network=my_module.TestNetwork`；或者使用 `environment=envs.custom_env` 而不是 `environment=envs.custom_env.CustomEnvironment`（模块文件需位于同一目录或其子目录中）。\n\n##### 网络：\n- 为部分策略类型新增了 `single_output=True` 参数。若将其设置为 `False`，则可通过已注册的张量为部分或全部动作指定额外的网络输出。\n- `KerasNetwork` 的参数 `model` 现在支持任意函数，只要这些函数返回 `tf.keras.Model` 即可。\n\n##### 层：\n- 新增了一种层类型 `SelfAttention`（规格化键为：`self_attention`）。\n\n##### 参数：\n- 支持跟踪非常量参数值。\n\n##### 运行器：\n- 将属性 `episode_rewards` 重命名为 `episode_returns`，并将 TQDM 状态中的 `reward` 改为 `return`。\n- 扩展了参数 `agent`，使其支持 `Agent.load()` 的关键字参数，以便加载现有代理，而非创建新代理。\n\n##### 示例：\n- 添加了 `action_masking.py` 示例脚本，用于演示带有内置动作掩码的环境实现。\n\n##### 修复：\n- 自定义设备放置未应用于大多数张量。","2021-06-05T10:34:19",{"id":152,"version":153,"summary_zh":154,"released_at":155},272038,"0.6.3","##### 代理：\n- 新增代理参数 `tracking` 及其对应函数 `tracked_tensors()`，用于跟踪并检索预定义张量的当前值，类似于 TensorBoard 摘要中的 `summarizer`。\n- 新增实验性参数 `trace_decay` 和 `gae_decay`，用于 Tensorforce 代理的 `reward_estimation` 参数，未来也将应用于其他类型的代理。\n- Tensorforce 代理参数 `reward_estimation` 的值 `estimate_advantage` 新增选项 `\"early\"` 和 `\"late\"`。\n- 将 `Agent.act()` 参数 `deterministic` 的默认值由 `False` 改为 `True`。\n\n##### 网络：\n- 新增网络类型 `KerasNetwork`（规格键：`keras`），作为以 Keras 模型形式指定的网络的封装器。\n- 当将 Keras 模型类\u002F对象作为策略或网络参数传递时，会自动解析为 `KerasNetwork`。\n\n##### 分布：\n- 将 `Gaussian` 分布参数 `global_stddev=False` 改为 `stddev_mode='predicted'`。\n- 新增 `Categorical` 分布参数 `temperature_mode=None`。\n\n##### 层：\n- 为 `Function` 层参数 `function` 新增选项，允许传入带有参数 `x` 的字符串函数表达式，例如 `(x+1.0)\u002F2.0`。\n\n##### 摘要记录器：\n- 新增摘要 `episode-length`，作为摘要标签 “reward” 的一部分进行记录。\n\n##### 环境：\n- 通过新函数 `Environment.is_vectorizable()` 和 `Environment.reset()` 的新参数 `num_parallel`，支持向量化并行环境。\n    - 可参阅 `tensorforce\u002Fenvironments.cartpole.py` 获取可向量化环境示例。\n    - 如果 `num_parallel > 1`、`remote=None` 且环境支持向量化，则 `Runner` 默认使用向量化并行机制。\n    - 更多关于行动-观察交互的详细信息，请参阅 `examples\u002Fact_observe_vectorized.py`。\n- 新增扩展且可向量化的自定义 CartPole 环境，键为 `custom_cartpole`（开发中）。\n- 新增环境参数 `reward_shaping`，用于以简单方式修改或塑造环境奖励，可指定为可调用对象或字符串函数表达式。\n\n##### run.py 脚本：\n- 新增命令行参数 `--checkpoints` 和 `--summaries` 选项，允许在目录之外添加以逗号分隔的检查点\u002F摘要文件名。\n- 日志绘图中除每集回报外，还增加了每集长度信息。\n\n##### 修复：\n- RNN 层的时间步长处理问题。\n- 修复了与基线 RNN 结合使用时，晚期时间步长价值预测（包括 DQN 变体和 DPG 代理）的关键性错误。\n- 解决了 GPU 在散射操作中出现的问题。","2021-03-22T21:45:44",{"id":157,"version":158,"summary_zh":159,"released_at":160},272039,"0.6.2","##### 修复内容：\n- 修复DQN变体和DPG智能体的严重 bug","2020-10-03T16:00:39",{"id":162,"version":163,"summary_zh":164,"released_at":165},272040,"0.6.1","##### 代理：\n- 移除了 Tensorforce 代理参数 `optimizer` 的默认值 `\"adam\"`（由于默认优化器参数 `learning_rate` 已被移除，详见下文）\n- 移除了 Tensorforce 代理参数 `memory` 的选项 `\"minimum\"`，改用 `None`\n- 将 `dqn`\u002F`double_dqn`\u002F`dueling_dqn` 代理参数 `huber_loss` 的默认值由 `0.0` 改为 `None`\n\n##### 层：\n- 移除了 `exponential_normalization` 层参数 `decay` 的默认值 `0.999`\n- 新增了 `batch_normalization` 层（通常仅应用于代理的 `reward_processing[return_processing]` 和 `reward_processing[advantage_processing]` 参数）\n- 为 `exponential\u002Finstance_normalization` 层新增了参数 `only_mean`，默认值为 `False`\n- 为 `exponential\u002Finstance_normalization` 层新增了参数 `min_variance`，默认值为 `1e-4`\n\n##### 优化器：\n- 移除了优化器参数 `learning_rate` 的默认值 `1e-3`\n- 将优化器参数 `gradient_norm_clipping` 的默认值由 `1.0` 改为 `None`（即不进行梯度裁剪）\n- 新增了优化器 `doublecheck_step` 及其对应的包装器参数 `doublecheck_update`\n- 移除了 `linesearch_step` 优化器参数 `accept_ratio`\n- 移除了 `natural_gradient` 优化器参数 `return_improvement_estimate`\n\n##### 保存器：\n- 增加了将代理参数 `saver` 指定为字符串的选项，该字符串会被解释为 `saver[directory]`，其余参数使用默认值\n- 为代理参数 `saver[frequency]` 添加了默认值 `10`（即默认每 10 次更新保存一次模型）\n- 将代理参数 `saver[max_checkpoints]` 的默认值由 `5` 调整为 `10`\n\n##### 总结器：\n- 增加了将代理参数 `summarizer` 指定为字符串的选项，该字符串会被解释为 `summarizer[directory]`，其余参数使用默认值\n- 将代理参数 `summarizer` 的选项名称从 `summarizer[labels]` 更改为 `summarizer[summaries]`（“label”一词源于早期版本，现已过时且容易引起混淆）\n- 将代理参数 `summarizer[summaries] = \"all\"` 的含义调整为仅包含数值型摘要，即排除“graph”之外的所有摘要\n- 将代理参数 `summarizer[summaries]` 的默认值由 `[\"graph\"]` 改为 `\"all\"`\n- 将代理参数 `summarizer[max_summaries]` 的默认值由 `5` 调整为 `7`（与 TensorBoard 中不同颜色的数量一致）\n- 为代理参数 `summarizer` 新增了 `summarizer[filename]` 选项\n\n##### 录制器：\n- 增加了将代理参数 `recorder` 指定为字符串的选项，该字符串会被解释为 `recorder[directory]`，其余参数使用默认值\n\n##### run.py 脚本：\n- 新增了 `--checkpoints`、`--summaries` 和 `--recordings` 命令行参数，以便在核心代理配置之外单独指定保存器、总结器和录制器的相关参数\n\n##### 示例：\n- 新增了 `save_load_agent.py` 示例脚本，用于演示常规的代理保存与加载操作\n\n##### 修复：\n- 修复了优化器参数 `gradient_norm_clipping` 不生效的问题","2020-09-19T11:16:10",{"id":167,"version":168,"summary_zh":169,"released_at":170},272041,"0.6.0","- 移除了智能体参数 `execution`、`buffer_observe` 和 `seed`\n- 将智能体参数 `baseline_policy`、`baseline_network` 和 `critic_network` 分别重命名为 `baseline` 和 `critic`\n- 将智能体 `reward_estimation` 参数中的 `estimate_horizon` 重命名为 `predict_horizon_values`，`estimate_actions` 重命名为 `predict_action_values`，`estimate_terminal` 重命名为 `predict_terminal_values`\n- 将智能体参数 `preprocessing` 重命名为 `state_preprocessing`\n- 智能体预处理的默认设置为 `linear_normalization`\n- 将用于奖励\u002F回报\u002F优势处理的智能体参数从 `preprocessing` 移至 `reward_preprocessing` 和 `reward_estimation[return_\u002Fadvantage_processing]`\n- 新增智能体参数 `config`，其取值包括 `buffer_observe`、`enable_int_action_masking` 和 `seed`\n- 将 PPO\u002FTRPO\u002FDPG 的参数 `critic_network` 和 `_optimizer` 分别重命名为 `baseline` 和 `baseline_optimizer`\n- 将 PPO 的参数 `optimization_steps` 重命名为 `multi_step`\n- 新增 TRPO 参数 `subsampling_fraction`\n- 将智能体参数 `use_beta_distribution` 的默认值改为 `false`\n- 增加了双 DQN 智能体 (`double_dqn`)\n- 移除了 `Agent.act()` 参数 `evaluation`\n- 移除了智能体函数参数 `query`（该功能已被移除）\n- 智能体保存功能发生变更：由 Saver\u002FProtobuf 改为 Checkpoint\u002FSavedModel，相应地，`save`\u002F`load` 函数及 `saver` 参数均有所调整。\n- 当指定 `saver` 时，默认行为是不加载智能体，除非智能体是通过 `Agent.load` 创建的。\n- 智能体摘要功能也发生了变化：`summarizer` 参数被修改，部分摘要标签及其他选项被移除。\n- 将 RNN 层的 `internal_{rnn\u002Flstm\u002Fgru}` 重命名为 `rnn\u002Flstm\u002Fgru`，并将 `rnn\u002Flstm\u002Fgru` 重命名为 `input_{rnn\u002Flstm\u002Fgru}`\n- 将 `auto` 网络参数 `internal_rnn` 重命名为 `rnn`\n- 将 `(internal_)rnn\u002Flstm\u002Fgru` 层参数 `length` 重命名为 `horizon`\n- 将 `update_modifier_wrapper` 重命名为 `optimizer_wrapper`\n- 将 `optimizing_step` 重命名为 `linesearch_step`，并将 `UpdateModifierWrapper` 参数 `optimizing_iterations` 重命名为 `linesearch_iterations`\n- 优化器的 `subsampling_step` 现在接受绝对值（整数）和相对值（浮点数）两种形式的分数。\n- 目标函数 `policy_gradient` 中的 `ratio_based` 参数被重命名为 `importance_sampling`\n- 新增目标函数 `state_value` 和 `action_value`\n- 新增 `Gaussian` 分布参数 `global_stddev` 和 `bounded_transform`（用于更好地处理有界动作空间）。\n- 将内存设备的默认参数 `device` 更改为 `CPU:0`\n- 重命名了奖励摘要\n- `Agent.create()` 现在可以接受 `act-function` 作为 `agent` 参数，用于记录。\n- 单例状态和动作现在始终以单例形式处理。\n- 对策略处理及默认设置进行了重大调整，尤其是关于 `parametrized_distributions`，并引入了新的默认策略 `parametrized_state\u002Faction_value`。\n- 合并了 `long` 和 `int` 类型。\n- 始终将环境包装在 `EnvironmentWrapper` 类中。\n- 修改了 `tune.py` 的参数。","2020-08-30T16:48:30",{"id":172,"version":173,"summary_zh":174,"released_at":175},272042,"0.5.5","- 将 `agent.act` 的独立模式改为使用动态超参数的最终值，并避免使用 TensorFlow 条件语句。\n- 扩展了 `agent.save` 的 `\"tensorflow\"` 格式，新增一个仅包含执行图的优化 Protobuf 模型文件（`.pb`），同时新增 `Agent.load` 的 `\"pb-actonly\"` 格式，用于加载基于 Protobuf 模型的仅执行代理。\n- 通过新的 `summarizer` 参数值 `custom` 支持自定义摘要，以指定摘要类型；并提供 `Agent.summarize(...)` 方法来记录摘要值。\n- 为动态超参数添加了最小\u002F最大边界检查，以确保取值范围有效，并自动推断其他相关参数。\n- 现在所有代理类都必须指定 `batch_size` 参数。\n- 移除了 `Estimator` 中的 `capacity` 参数，该参数现在将始终自动推断。\n- 对代理的 `memory`、`update` 和 `reward_estimation` 等内部参数进行了调整。\n- 更改了部分层的默认 `bias` 和 `activation` 参数。\n- 修复了 `sequence` 预处理程序的相关问题。\n- DQN 和双 DQN 代理现已被正确约束为仅支持整数动作。\n- 向许多代理以及 `ParametrizedDistributions` 策略中添加了 `use_beta_distribution` 参数，默认值设为 `True`，以便用户可以根据需要修改默认设置。","2020-06-16T20:08:35",{"id":177,"version":178,"summary_zh":179,"released_at":180},272043,"0.5.4","- DQN\u002FDuelingDQN\u002FDPG 的 `memory` 参数现在必须显式指定，并且 `update_frequency` 的默认值已更改。\n- 由于 TensorFlow 的梯度问题，已暂时移除 `conv1d\u002Fconv2d_transpose` 层。\n- 现在可以通过 `from tensorforce import ...` 导入 `Agent`、`Environment` 和 `Runner`。\n- 新增了一个通用的重塑层，名为 `reshape`。\n- 支持批量版本的 `Agent.act` 和 `Agent.observe`。\n- 基于 Python 的 `multiprocessing` 和 `socket` 模块，支持并行化的远程环境（取代了 `tensorforce\u002Fcontrib\u002Fsocket_remote_env\u002F` 和 `tensorforce\u002Fenvironments\u002Fenvironment_process_wrapper.py`），可通过 `Environment.create(...)`、`Runner(...)` 和 `run.py` 使用。\n- 移除了 `ParallelRunner`，并将相关功能合并到 `Runner` 中。\n- 修改了 `run.py` 的命令行参数。\n- 更改了 `Agent.act` 的独立模式：新增了 `internals` 参数及对应的返回值；初始内部状态可通过 `Agent.initial_internals()` 获取，不再需要调用 `Agent.reset()`。\n- 移除了 `Agent.act` 的 `deterministic` 参数，除非处于独立模式。\n- 为 `save`\u002F`load`\u002F`restore` 添加了 `format` 参数，支持的格式包括 `tensorflow`、`numpy` 和 `hdf5`。\n- 将 `save` 参数 `append_timestep` 改为 `append`，默认值设为 `None`（之前为 `'timesteps'`）。\n- 新增了 `get_variable` 和 `assign_variable` 代理函数。","2020-02-15T22:13:14",{"id":182,"version":183,"summary_zh":184,"released_at":185},272044,"0.5.3","- 为各类智能体添加了可选的 `memory` 参数\n- 改进了摘要标签，尤其是 `\"entropy\"` 和 `\"kl-divergence\"`\n- `linear` 层现在接受秩为 1 到 3 的张量\n- 网络输出或分布输入不再必须是向量\n- 转置卷积层（`conv1d\u002F2d_transpose`）\n- 由 @jerabaul29 贡献的并行执行功能，目前位于 `tensorforce\u002Fcontrib\u002F` 目录下\n- 允许为运行器的 `save_best_agent` 参数传入字符串，以指定与 `saver` 配置不同的最佳模型保存目录\n- 移除了 `saver` 参数中的 `steps`，并将 `seconds` 重命名为 `frequency`\n- 将 `Parallel\u002FRunner` 的 `max_episode_timesteps` 参数从 `run(...)` 方法移至构造函数\n- 新增 `Environment.create(...)` 的 `max_episode_timesteps` 参数\n- 支持 TensorFlow 2.0\n- 改进了 TensorBoard 摘要的记录功能\n- 摘要标签 `graph`、`variables` 和 `variables-histogram` 暂时无法使用\n- TF 优化器已更新为 TensorFlow 2.0 的 Keras 优化器\n- 添加了 TensorFlow Addons 依赖，并支持 TFA 优化器\n- 将 `dqn` 和 `dueling_dqn` 智能体的 `target_sync_frequency` 单位由时间步改为更新次数","2019-12-26T18:31:25",{"id":187,"version":188,"summary_zh":189,"released_at":190},272045,"0.5.2","- 提升了单元测试性能\n- 新增了 `updates`，并重命名了智能体和运行器中的 `timesteps` 和 `episodes` 计数器\n- 将 `critic_{network,optimizer}` 参数更名为 `baseline_{network,optimizer}`\n- 新增了策略梯度（`ac`）、优势 Actor-Critic（`a2c`）以及 Dueling DQN 智能体\n- 改进了“相同”基线优化器模式，并增加了可选的权重指定功能\n- 重用层现为全局设置，以实现跨模块的参数共享\n- 新增了块层类型（`block`），便于更轻松地共享层块\n- 将 `PolicyAgent\u002F-Model` 重命名为 `TensorforceAgent\u002F-Model`\n- 新增了 `Agent.load(...)` 函数，保存内容包含智能体配置\n- 移除了 `PolicyAgent` 的 `(baseline-)network` 参数\n- 新增了策略参数 `temperature`\n- 移除了 `baseline_*` 参数中的 `\"same\"` 和 `\"equal\"` 选项，并调整了内部基线处理方式\n- 将 `state\u002Faction_value` 合并为 `value` 目标，新增参数 `value`，取值为 `\"state\"` 或 `\"action\"`","2019-10-14T22:49:32",{"id":192,"version":193,"summary_zh":194,"released_at":195},272046,"0.5.1","- Fixed setup.py packages value","2019-09-10T18:23:52",{"id":197,"version":198,"summary_zh":199,"released_at":200},272047,"0.5.0","### Major Revision\r\n\r\n##### Agent:\r\n\r\n- DQFDAgent removed (temporarily)\r\n- DQNNstepAgent and NAFAgent part of DQNAgent\r\n- Agents need to be initialized via `agent.initialize()` before application\r\n- States\u002Factions of type `int` require an entry `num_values` (instead of `num_actions`)\r\n- `Agent.from_spec()` changed and renamed to `Agent.create()`\r\n- `Agent.act()` argument `fetch_tensors` changed and renamed to `query`, `index` renamed to `parallel`, `buffered` removed\r\n- `Agent.observe()` argument `index` renamed to `parallel`\r\n- `Agent.atomic_observe()` removed\r\n- `Agent.save\u002Frestore_model()` renamed to `Agent.save\u002Frestore()`\r\n\r\n##### Agent arguments:\r\n\r\n- `update_mode` renamed to `update`\r\n- `states_preprocessing` and `reward_preprocessing` changed and combined to `preprocessing`\r\n- `actions_exploration` changed and renamed to `exploration`\r\n- `execution` entry `num_parallel` replaced by a separate argument `parallel_interactions`\r\n- `batched_observe` and `batching_capacity` replaced by argument `buffer_observe`\r\n- `scope` renamed to `name`\r\n\r\n##### DQNAgent arguments:\r\n\r\n- `update_mode` replaced by `batch_size`, `update_frequency` and `start_updating`\r\n- `optimizer` removed, implicitly defined as `'adam'`, `learning_rate` added\r\n- `memory` defines capacity of implicitly defined memory `'replay'`\r\n- `double_q_model` removed (temporarily)\r\n\r\n##### Policy gradient agent arguments:\r\n\r\n- New mandatory argument `max_episode_timesteps`\r\n- `update_mode` replaced by `batch_size` and `update_frequency`\r\n- `memory` removed\r\n- `baseline_mode` removed\r\n- `baseline` argument changed and renamed to `critic_network`\r\n- `baseline_optimizer` renamed to `critic_optimizer`\r\n- `gae_lambda` removed (temporarily)\r\n\r\n##### PPOAgent arguments:\r\n\r\n- `step_optimizer` removed, implicitly defined as `'adam'`, `learning_rate` added\r\n\r\n##### TRPOAgent arguments:\r\n\r\n- `cg_*` and `ls_*` arguments removed\r\n\r\n##### VPGAgent arguments:\r\n\r\n- `optimizer` removed, implicitly defined as `'adam'`, `learning_rate` added\r\n\r\n##### Environment:\r\n\r\n- Environment properties `states` and `actions` are now functions `states()` and `actions()`\r\n- States\u002Factions of type `int` require an entry `num_values` (instead of `num_actions`)\r\n- New function `Environment.max_episode_timesteps()`\r\n\r\n##### Contrib environments:\r\n\r\n- ALE, MazeExp, OpenSim, Gym, Retro, PyGame and ViZDoom moved to `tensorforce.environments`\r\n- Other environment implementations removed (may be upgraded in the future)\r\n\r\n##### Runners:\r\n\r\n- Improved `run()` API for `Runner` and `ParallelRunner`\r\n- `ThreadedRunner` removed\r\n\r\n##### Other:\r\n\r\n- `examples` folder (including `configs`) removed, apart from `quickstart.py`\r\n- New `benchmarks` folder to replace parts of old `examples` folder\r\n","2019-09-08T14:48:21",{"id":202,"version":203,"summary_zh":75,"released_at":204},272048,"0.4.4","2019-09-07T13:53:13",{"id":206,"version":207,"summary_zh":75,"released_at":208},272049,"0.4.3","2018-08-16T10:01:49"]