[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-lnmangione--Halite-III":3,"tool-lnmangione--Halite-III":64},[4,17,27,35,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,43,44,45,15,46,26,13,47],"数据工具","视频","插件","其他","音频",{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,46],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74939,"2026-04-05T23:16:38",[26,14,13,46],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":80,"owner_website":82,"owner_url":83,"languages":84,"stars":105,"forks":106,"last_commit_at":107,"license":80,"difficulty_score":108,"env_os":109,"env_gpu":110,"env_ram":110,"env_deps":111,"category_tags":116,"github_topics":80,"view_count":23,"oss_zip_url":80,"oss_zip_packed_at":80,"status":16,"created_at":117,"updated_at":118,"faqs":119,"releases":120},4058,"lnmangione\u002FHalite-III","Halite-III","In this paper, we apply machine learning to create bots for Halite III, @twosigma's annual A.I. competition. We develop one classifier using Support Vector Machine with Supervised Learning, and one using a Deep Neural Network with Reinforcement Learning","Halite-III 是一个基于机器学习技术构建的智能机器人项目，专为 Two Sigma 举办的年度人工智能挑战赛——资源管理游戏 Halite III 设计。该项目的核心目标是解决游戏中单船模式下的资源采集优化问题，通过算法自动决策船只的移动、采集与部署策略，以在限定回合内最大化收集名为“卤素”的能源资源。\n\n与传统依靠人工编写固定规则的策略不同，Halite-III 探索了两种前沿的技术路径：一是结合监督学习的支持向量机（SVM）分类器，二是利用强化学习的深度神经网络（DNN）。实验表明，这两种智能体均能成功习得游戏策略，其表现水平介于基础规则机器与经过遗传算法调优的高级规则机器之间，验证了机器学习在复杂博弈决策中的有效性。\n\n该项目非常适合对人工智能、强化学习及博弈论感兴趣的研究人员和开发者使用。对于希望深入理解如何将深度学习算法应用于动态环境决策、或想要复现经典论文实验结果的极客而言，Halite-III 提供了一个结构清晰、逻辑严谨的开源范例。它不仅展示了从理论到实践的完整流程，也为后续探索多玩家对抗或更大规模地图的通用化策略奠定了坚实基础。","\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flnmangione_Halite-III_readme_8f67a1c0d8f1.gif\">\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Ctext>A 2 player game of Halite III\u003C\u002Ftext>\n\u003C\u002Fp>\n\n# Mastering Halite with Reinforcement Learning\n\n## Abstract\n\nIn this paper, we apply machine learning to create computerized bots for Halite III, a resource-management game. We focus on a single-player, single-ship game, and develop two classifiers: one using Support Vector Machine with Supervised Learning, and one using a Deep Neural Network with Reinforcement Learning. We evaluate the performance of each of these bots against two benchmark bots, one simple rule-based bot and one genetically-tuned rule-based bot. We find that the machine learning bots successfully learn gameplay, and performing at a level between the two benchmarks.\n\n## 1 Introduction\n\nHalite is an annual open source artificial intelligence challenge, created by Two Sigma. This year’s competition, Halite III, is a resource management game in which players must program a bot that builds and commands ships to explore the game map and collect halite, an energy resource that can be found scattered across the map. Ships use halite as an energy resource, and the player with the most halite at the end of the game is the winner.\n\n### 1.1 Game Rules\n\nEach player begins a game of Halite III with a shipyard, where ships spawn and halite is stored. Players spend halite to build a ship, move a ship, and to convert a ship to a dropoff. Players can interact by seeking inspiring competition, or by colliding ships to send both to the bottom of the sea. \n\nPlayers each start the game with 5,000 stored halite, a shipyard, and knowledge of the game map. Each game is played in groups of two or four on a two dimensional map with a unique symmetric pattern of halite.\n\nShips can make one action per turn: they can move one unit in any cardinal direction, collect halite from the sea in their current position, or convert into dropoffs. When a ship is over a friendly shipyard or dropoff, it automatically deposits its halite cargo increasing the player's collected halite.\n\nEach turn, the game engine sends the players the positions of all ships and dropoffs, and an updated game map. Players have up to two seconds to issue their commands for the turn. The game engine will parse and execute the commands, calculating each player’s resulting halite score and resolving all movement. The game continues for 400 to 500 turns, depending on the game map size. The winner of the game is the player who has the most collected halite at the end.\n\n\u003Cp align=\"center\">\n  \u003Ckbd>\n    \u003Cimg width=\"460\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flnmangione_Halite-III_readme_a45fc433b47b.png\">\n  \u003C\u002Fkbd>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    \u003Ctext> Figure 1: The Halite III Game Board \u003C\u002Ftext>\n\u003C\u002Fp>\n\nAn example of a particular instance of a game state for Halite is shown in Figure 1. For more information about the rules and recordings of games of Halite III, visit the website, halite.io.\n\n\n### 1.2 Importance of this topic\n\nThis rapid growth in how machines learn to play games was in no small part due to the discovery and use of new methods and algorithms along the way. One such algorithm is called minimax which works to minimize the maximum potential chances of loss. This algorithm is used in chess engines and can be applied to general decision making when there exists an element of uncertainty. The development of algorithms like this one have enhanced our ability to solve problems (Ontanon, 2012). This is why learning games is important. \n\n## 2 Problem and Approach\n\n### 2.1 Problem Statement\n\nOur objective is to create a machine learning bot for Halite III that optimizes single-player resource gathering on a set map size. The decisions this bot will make per turn of the game will be with regards to ship navigation. Afterward tackling this problem, we can see how the bot generalizes in a multiplayer setting or on differently sized maps; however, this is not the focus of this project. Evenso, it is worth noting that the approach we take can potentially be generalized the the multiplayer setting by simply having multi single-player classifiers, one for each player. \n\nOur simplification of the problem was achieved by creating a new engine locally and setting up an environment such that only one player can play at a time all the while maintaining a 32x32 map size. \n\nAnother simplification made was that the decision of when to create a ship will be left as a rule-based decision. In order to correctly make this decision a lot of data would have to be generated since for the majority of moves in a game, the player doesn't have enough . Also, this problem would have to be learned separately from the problem of ship navigation meaning that another classifier would have to be used. The use of another classifier would cause confounding in the results making it hard to draw conclusions. For this reason, the decision was made to make ship generation rule-based. \n\n\nThe final simplification made was that training and learning will be done with a single ship. This will make training easier because if there are multiple ships, then the environment from turn to turn would be non-deterministic. This would mean that we cannot model our ship as a complete agent, the \"random\" aspect of the map state would need to be simulated, and, in general, would require a very different and more complex model than was used. \n\n\n### 2.2 SVM Bot\n\nThe first machine learning approach to solving this problem that we considered was a supervised approach by creating data from replays. On the website for Halite, there exits a repository of all of the replays of each game that each player has played. Thus, the data we would use to train our bot came from 50 of the top ranked  player's games.\n\n\nThe idea here is that the behavior of the ships of the best player in a competitive game will generalize to maximizing resource collection in the single player setting. This is admittedly a bit of a jump though since because we are trying to learn one problem from a different, more complex, one. As a result, we decided just to take data from the first 150 moves of each of the 50 games, as this would be the period in the game where the least amount of interactions with other players would occur. This gives in total 750 moves to learn from. However, for each move, data from multiple ships is generated. \n\nFor each ship on each move, data is parsed. Information regarding the amount of halite the ship is carrying, the location of the shipyard, and the location and amount of nearby available halite is encoded. This represents our feature space. The label is the navigational decision that the \"good\" player's bot made. Training becomes the simple process of showing the model this labeled data.\n\nThe model we used is an SVM with an RBF kernel weighted classifier. This was chosen because it is a well-understood kernel widely used in many practical settings and ensures data is linearly separable. It is available in many standard libraries.\n\nOne potential problem of this supervised method is that it only takes data from one player. As a result, it performs perfectly when it performs exactly like that player, which means that it will never surpass the performance of the bot it is learning from. For this reason, it may be helpful to train on multiple different boys; however, this runs into another problem: if those bots have different strategies then there is no guarantee that the SVM bot we train will learn something substantive.\n\nAnother potential problem with this supervised approach that may limit the SVM bot's its ability to continuously improve is that the SVM bot makes moves that mimic the data it trained on without taking into account the strategy behind those moves.\n\n### 2.3 Deep-Q Learning Bot\n\nA second goal involves attempting to build a more complex, robust algorithm to improve upon the performance of the bot. To do this, we consider training a neural network that takes into consideration the current game state and is able to make a prediction of which move to take.\n\n\nWhile this is a reasonable model to consider, one key challenge that arises is how to determine whether a particular move on a particular game state is considered a \"good move\", since the only metric we have to evaluate performance is simply to see how well the bot did only after the game has ended. Therefore, a natural approach to this issue involves using reinforcement learning. At a high level, we want to make a move that gives the highest chance of optimizing the expected reward in the long-term. To train, we retroactively give each move in a game a reward based on the final outcome of the game.\n\nThe data we will use for this approach from which our features will be drawn will be taken from games the bot itself plays. The features will be layers in a neural network, each layer, of which there will be three, will be a 33x33 grid centered around the ship. The first layer will contain all of the available halite in each position around the ship. The second layer will contain information about nearby ships. It will be encoded by placing in the grid the amount of halite each ship is holding in the position on the grid of the ship. In this way, all positions without ships will be encoded as zero, and the central position of the graph, the position of the ship that we train, will always contain the amount of halite that ship is currently holding. Note that due to simplifications made, this central position will be the only potentially non-zero position in this layer. The final layer, another grid, will contain a single one where the shipyard is located and zeros elsewhere. This ensures that the ship will have knowledge about where to go in order to deposit its collected halite.\n\nTo be more specific, we consider Deep-Q learning as a model to train our classifier. Q-learning is a specific implementation of reinforcement learning, and is useful to use in our specific model because it uses a discrete action space and a continuous state space. While in theory our state space is discrete, there are such a large number of possible states that considering the space to be discrete would be impractical for a learning algorithm to train on. Deep-Q learning has been experimented in similar applications to ours and proven successful, including applying it to the Atari Games, which are computerized game bots that face similar issues as ours, including a high-dimensional visual input (Mnih, 2013).\n\nTo create our Reinforcement Learning bot, we utilized the DQN baseline developed by OpenAI. Given the process through which halite games are instantiated by the halite engine, simultaneously playing games and applying Q-learning requires significant modification of the game engine. Thus, we opted to build a simulation of the single-ship scenario through an OpenAI gym environment, and then learn using this entirely new environment. Note that while we trained this bot in a simulated environment, the learning problem remained identical. In terms of selecting adequate parameters, we primarily based our selections on the models used to train DQN on Atari games.\n\nWe created two reward function for our DQN: immediate and delayed reward. Immediate reward rewards ships fro depositing halite as they do it while delayed reward sets the reward after a game to be the amount of halite collected within the game. We will want to determine which reward function is viable and in the case that they are both viable which one performs better.\n\n\nTwo key components to the way this model trains in the Q-learning is via an exploration and exploitation phase. At a particular iteration, the model either guesses (\"explores\") or leverages the data in the Q-table (\"exploits\") in order to determine the move to make next. The model chooses which of these two to perform based on the epsilon-greedy strategy, which dictates to explore with probability 1 - *epsilon* and to exploit with probability *epsilon*. Initially *epsilon* starts off small, and as the number of iterations in the game increases, this value increases. Based on the results of the chosen parameter at a particular iteration, the values in the Q-table get updated accordingly (Mnih, 2013).\n\n\nThe algorithm used also leverages the dueling enhancements, which is an approach in which multiple neural nets compete with each other in an attempt to have better learning, similar to human imagination (Wang, 2016).\n\n### 2.4 Expectations\n\nWe expect our second approach using reinforcement learning to perform better than our supervised learning approach for a number of reasons. As explained above, the supervised learning approach takes data from a more complicated problem and tends to learn to mimic, while the reinforcement learning approach is tailored for this problem where we have a discrete action space to choose from. We anticipate that it may be difficult to tune the parameters correctly for our DQN approach but regardless we predict that it will outperform the SVM.\nOverall, we predict that the genetic benchmark and reinforcement learning bots will achieve the highest performance, as these bots are specifically tuned for the single-ship, 300-turn scenario.\n\n\n## 3 Results and Evaluation\n\nTo evaluate the performance of our DQN and SVM bots, we compared against two benchmarks: a simple rule-based bot and a genetically-tuned rule-based bot.\n\nSpecifically, we compared performance by running each bot over 500 games on a 32x32 map with 300 turns per game. We tested each of the four bots on the same 500 map seeds.\n\n### 3.1 Rule-Based Bot\n\nFor our first benchmark, we created a simple rule-based bot that operates using a greedy choice. The ship harvests from its current cell until the cell is below 100 halite, at which point it moves to the adjacent cell with the most halite. When its cargo reaches the maximum capacity of 1000, it travels to the shipyard and deposits.\n\n### 3.2 Genetic Bot\n\nFor our second benchmark, we developed a high-performance rule-based bot using a custom genetic algorithm to tune various parameters. The ship scans surrounding cells for the closest cell above a certain threshold. The scan distance, and threshold amount are both tuned genetically. When harvesting, the ship harvests its current cell until it is considered \"scarce.\" This notion of scarcity is dependent on a multiplier of the average surrounding halite and an absolute minimum, both of which are tuned. Finally, the cargo threshold at which the ship returns to the shipyard is tuned.\n\nOur genetic algorithm is initialized with 20 individuals, each with random \"genes\" for each parameter. Each generation, each individual plays five games. Using a fitness function proportional to the cumulative percentage of halite harvested over the five games, fittest individuals are selected using tournament selection. The parameters of the offspring are determined by crossing over and mutation. \n\nWe ran this custom algorithm for 200 generations with 20 individuals per generation, at which point performance stabilized and we had developed a formidable benchmark.\n\n\n### 3.3 Results\n\n\u003Cp align=\"center\">\n  \u003Ckbd>\n    \u003Cimg width=\"460\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flnmangione_Halite-III_readme_eaecefb8e7a2.png\">\n  \u003C\u002Fkbd>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    \u003Ctext> Figure 2\u003C\u002Ftext>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ckbd>\n    \u003Cimg width=\"460\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flnmangione_Halite-III_readme_bd4ea9bd213c.png\">\n  \u003C\u002Fkbd>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    \u003Ctext> Figure 3\u003C\u002Ftext>\n\u003C\u002Fp>\n\nFor each of the 500 games played by the four bots, we record the total halite deposited and the average halite per map cell. Figure 2 depicts a scatter plot of each of the 2000 data points, with average cell halite plotted on the X-axis and total halite collected on the Y-axis. Figure 3 depicts the trailing average with period 16 of this same data. We chose this metric, as dense halite maps naturally result in more collected halite, so it is insufficient to simply compare average collection amounts over the 500 games.\n\nOur results indicate that both machine learning bots adequately learn halite collection behavior, performing on a similar level to the two benchmarks. \nWe find that the genetically-tuned bot outperforms all other bots across all map densities. Our DQN bot outperforms the rule-based bot for low halite maps, but achieves similar performance for medium and high density maps. Finally, our SVM bot performs on a comparable level as the rule-based bot for low halite games, while it under-performs all bots in medium and high density scenarios.\n\n\n### 3.4 Discussion\n\nAs predicted, we find that the Reinforcement Learning bot and genetically-tuned bot achieve the highest performance. These results were expected, as the DQN and genetically-tuned bot are specifically trained and evolved for the single-ship, 300 turn scenario. Meanwhile, the SVM is trained on a slightly different dataset and the rule-based bot trains on nothing.\n\nWe observe that the DQN was able to learn gameplay simply given map data and rewards immediately upon depositing halite. In our experiments, however, we were unable to achieve the same performance using the delayed reward function. Using this function, the DQN bot only receives an award proportional to the total amount of halite deposited at the end of the 300 turn game. Given that the DQN receives no reward for 99.66 percent of turns using this reward function, it is understandable that inadequate performance was achieved.\nRegardless, we find that using the immediate reward function, the DQN is able to learn the difficult problem of harvesting halite and returning to the shipyard. We note that this is a difficult problem, because the DQN cannot simply learn to return to the shipyard once its halite cargo reaches a certain threshold, as moving between cells costs halite and will bring ships cargo below this threshold. Thus, this would result in a never-ending cycle. Instead, a more complex function for behavior must be learned.\n\nWe find that the SVM, while achieving slightly worse performance than all other bots, achieves a relatively comparable level of play. Given that this SVM trains on the more complex, multiple-player multiple-ship scenario, we find that it still generalizes well to the modified single-ship game.\n\n## 4 Future Work\n\nThrough the experiments conducted and the results obtained, we have shown that applying machine learning to creating bot algorithms is an interesting and useful application. However, there is much more exploration to be done in this context.\n\n\nNotably, the experiments conducted in this context were of simplified versions of the Halite game; we reduced the number of ships to one, fixated the map size, fixated the number of iterations, and stuck only to single-player optimization goal. In the real game, the parameters are more complex and there are more variables and states to consider at each iteration (most interestingly, the other players' states). \n\nAdapting the trained bots to multiplayer mode in particular raises some more interesting questions and explorations, including possible game-theoretic considerations in which a player might try to optimize his move by guessing what his opponents will do in order to maximize their chance of winning.\n\nIn addition, there are much more types of machine learning bots that would be interesting to test and consider their performance, along with variations in the parameters, and also variations in the map size used for the game. One final interesting area to experiment with is to also relate these approaches to other games, and see how well they translate to them.\n\n## 5 Related Work\n\nIncorporating machine learning tactics into creating game algorithms has been heavily experimented with. Different types of classifiers and topics, commonly neural nets and reinforcement learning, have been used to create computer bots for several games. One such application, done by DeepMind, involves applying reinforcement learning to creating a highly performing AI for many games, including Chess, Shogi, Go, and others (Silver, 2017). In particular, using a type of reinforcement learning called *tabula rasa* reinforcement learning, the authors were able to train an algorithm that, given no information about the game at all other than the rules of the game, outperformed the best known previous algorithms in each game, which even translated well to computationally harder games such as Shogi.\n\nWhile many bots have been created using machine learning for multiple games, however, we note that the game-specific problem of applying this tactic to Halite is somewhat innovative. There are many possible ways to successfully apply machine learning principles to create a bot, and the distinct feature representations used in our algorithms along with the tactics applied to our specific simplified version of Halite have, to the best of our knowledge, not been specifically experimented with previously.\n\n## 6 Conclusion\n\nIn summary, we trained two bots with machine learning; one supervised bot via an SVM classifier, and one based off of a neural network with reinforcement learning. After evaluating the performance of the classifiers to our benchmark bots, we see that the neural network bot performed considerably better than the SVM bot, yet overall still performed significantly worse than the genetically-trained rule-based bot. Nevertheless, we know that both machine learning bots performed at a level above complete randomness because they had a similar level of performance to the rule-based benchmark. Hence, we can say that both machine learning bots managed to make meaningful decisions.\n\nAlthough our neural net bot performed fairly well as compared the benchmarks, it did not perform as well as we would have initially hoped. One way to potentially improve this could be to introduce a reward function that gives a reward after a constant number of moves equal to the amount of halite that was collected during those moves. In other words, rewards would be spaced out among intervals instead of either being delivered instantly when a deposit is made or at the end of the entire game. This would allow the bot to learn that a pattern of behavior is good as opposed to specific actions being good. This would potentially allow the bot to learn strategies more easily.\n","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flnmangione_Halite-III_readme_8f67a1c0d8f1.gif\">\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Ctext>双人版《Halite III》游戏\u003C\u002Ftext>\n\u003C\u002Fp>\n\n# 使用强化学习掌握《Halite》\n\n## 摘要\n\n本文运用机器学习技术，为资源管理类游戏《Halite III》开发计算机智能机器人。我们聚焦于单人单船模式，分别构建了两种分类器：一种基于监督学习的支持向量机模型，另一种则采用强化学习的深度神经网络模型。我们将这两种机器学习机器人与两个基准机器人进行对比评估，其中一个为简单的规则驱动型机器人，另一个则是经过遗传算法优化的规则驱动型机器人。实验结果表明，机器学习机器人能够成功学习游戏策略，并在性能上介于这两个基准之间。\n\n## 1 引言\n\n《Halite》是由Two Sigma公司每年举办的开源人工智能挑战赛。今年的比赛版本《Halite III》是一款资源管理类游戏，玩家需要编写机器人程序，控制船只探索游戏地图并收集“halite”——一种散布在地图各处的能量资源。船只以“halite”作为能量来源，比赛结束时拥有最多“halite”的玩家将获胜。\n\n### 1.1 游戏规则\n\n每位玩家在《Halite III》开始时都拥有一座船厂，船只在此出生并储存“halite”。玩家消耗“halite”来建造新船、移动现有船只或将船只转换为卸货点。此外，玩家还可以通过良性竞争相互激励，或故意让船只相撞，使双方一同沉入海底。\n\n每位玩家初始拥有5,000单位的“halite”储备、一座船厂以及对游戏地图的了解。每局游戏由两至四名玩家参与，在一个具有独特对称图案的二维地图上进行。船只每回合只能执行一项操作：向任意一个基本方向移动一格、在当前位置采集“halite”，或转换为卸货点。当船只位于己方船厂或卸货点上方时，会自动卸下所携带的“halite”，从而增加玩家的总储量。\n\n每回合开始时，游戏引擎会向所有玩家发送当前所有船只和卸货点的位置信息，以及更新后的游戏地图。玩家有最多两秒钟的时间下达本轮指令。游戏引擎会解析并执行这些指令，计算每位玩家的“halite”得分，并处理所有移动动作。游戏通常持续400至500回合，具体时长取决于地图大小。最终，“halite”储量最多的玩家即为胜者。\n\n\u003Cp align=\"center\">\n  \u003Ckbd>\n    \u003Cimg width=\"460\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flnmangione_Halite-III_readme_a45fc433b47b.png\">\n  \u003C\u002Fkbd>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    \u003Ctext> 图1：《Halite III》游戏棋盘 \u003C\u002Ftext>\n\u003C\u002Fp>\n\n图1展示了一个具体的《Halite》游戏状态示例。更多关于《Halite III》规则及比赛录像的信息，请访问官网halite.io。\n\n\n### 1.2 本课题的重要性\n\n近年来，机器学习在游戏领域的快速发展，很大程度上得益于一系列新方法和算法的发现与应用。其中一种名为“极小化极大算法”的技术，旨在最小化可能的最大损失风险。该算法广泛应用于国际象棋引擎中，也可用于存在不确定性因素的一般决策问题。这类算法的发展极大地提升了我们解决问题的能力（Ontanon, 2012）。因此，研究机器学习如何玩好游戏具有重要意义。\n\n## 2 问题与方法\n\n### 2.1 问题陈述\n\n我们的目标是为《Halite III》开发一款机器学习机器人，使其能够在固定大小的地图上实现单人模式下的资源收集最优化。该机器人每回合的主要决策将围绕船只的导航展开。在解决这一问题之后，我们可以进一步探讨该机器人在多人模式或不同尺寸地图上的泛化能力；然而，这并非本项目的重点。尽管如此，值得注意的是，我们所采用的方法其实可以简单地扩展到多人模式，只需为每个玩家单独训练一个单人分类器即可。\n\n为了简化问题，我们在本地搭建了一个新的游戏引擎，并设置了一个环境，使得每次仅允许一名玩家参与游戏，同时保持32×32的地图尺寸不变。此外，我们还将船只的建造时机交由规则决定。若要准确判断何时建造船只，需要生成大量数据，因为在游戏中大多数情况下，玩家并不具备足够的资源。而且，船只建造决策本身也需要单独建模，这意味着必须使用另一套分类器。然而，引入额外的分类器会导致实验结果混淆，难以得出明确结论。因此，我们决定将船只建造决策设定为规则驱动型。\n\n最后，我们还简化了训练过程，只使用一艘船只进行训练。这样做可以使训练更加容易，因为如果有多艘船只参与，那么每回合的游戏环境都会变得非确定性，导致我们无法将船只视为一个完整的智能体。此时，我们需要对地图状态中的随机因素进行模拟，整体模型也会变得更加复杂。\n\n### 2.2 SVM 机器人\n\n我们考虑的首个用于解决该问题的机器学习方法是监督学习，即通过回放录像生成训练数据。Halite 官网上有一个存储库，包含了每位玩家所有对局的完整回放记录。因此，我们用来训练机器人的数据便取自排名前50位的玩家的比赛。\n\n其核心思想是：在竞技游戏中表现最出色的玩家的舰船行为，应当能够泛化到单人模式下最大化资源收集的目标。不过，这一假设确实存在一定的跳跃性，因为我们试图从一个更为复杂的问题中学习解决另一个问题的方法。为此，我们决定仅选取每场比赛前150步的数据，因为这一阶段与其他玩家的交互最少。这样一来，总共可获得750步可供学习的数据。然而，对于每一步，都会产生来自多艘舰船的数据。\n\n在每一步、针对每一艘舰船，我们都会解析并提取相关信息：该舰船当前携带的卤石数量、船坞的位置，以及附近可用卤石的位置与数量。这些信息共同构成了我们的特征空间。而标签则是“优秀”玩家机器人所作出的导航决策。训练过程则简单地将这些带标签的数据输入模型即可。\n\n我们选用的模型是带有径向基函数（RBF）核的加权支持向量机分类器。之所以选择它，是因为这是一种广为人知且在众多实际场景中广泛应用的核函数，能够确保数据在高维空间中线性可分；同时，它也易于在许多标准库中实现。\n\n这种监督学习方法的一个潜在问题在于，它仅基于单一玩家的数据进行训练。因此，当机器人完全模仿该玩家的行为时，它的表现会达到最佳状态；但这也意味着它永远无法超越所学习的那个机器人。基于此，或许可以通过训练多个不同风格的机器人来提升性能，然而这又会带来另一个问题：如果这些机器人采用不同的策略，我们就无法保证最终训练出的 SVM 机器人能够真正学到有价值的知识。\n\n此外，这种监督学习方法还可能限制 SVM 机器人持续改进的能力，原因在于：SVM 机器人只会机械地模仿训练数据中的动作，而不会深入理解这些动作背后的策略意图。\n\n### 2.3 深度Q学习机器人\n\n第二个目标是尝试构建一个更复杂、更鲁棒的算法，以提升机器人的性能。为此，我们考虑训练一个神经网络，该网络能够综合当前的游戏状态，并预测应采取的最佳行动。\n\n尽管这是一个合理的模型选择，但其中面临的一个关键挑战是如何判断在特定游戏状态下，某一具体行动是否为“好棋”。因为我们唯一的评价指标是在游戏结束后才能知道机器人最终的表现如何。因此，解决这一问题的一种自然方法是采用强化学习。从高层次来看，我们希望做出能够最大化长期预期奖励的决策。为了进行训练，我们会根据每局游戏的最终结果，为游戏中每一步行动事后赋予相应的奖励。\n\n用于此方法的数据将来自机器人自身对弈的游戏记录，这些数据将作为特征来源。我们的特征将构成神经网络的各层：共三层，每一层都是以飞船为中心的33×33网格。第一层包含飞船周围每个位置上的可用哈立特资源；第二层则编码附近其他飞船的信息——在网格中对应于每艘飞船位置上填入其持有的哈立特数量。这样，没有飞船的位置将被编码为零，而网格中心（即我们所训练的飞船所在位置）始终会显示该飞船当前持有的哈立特总量。需要注意的是，由于简化处理，这一中心位置将是该层中唯一可能非零的单元格。最后一层同样是网格形式，仅在船坞所在位置填入数字1，其余均为0，从而确保飞船能够明确前往何处卸载收集到的哈立特。\n\n具体而言，我们选择深度Q学习作为训练分类器的模型。Q学习是强化学习的一种实现方式，特别适用于我们的场景，因为它使用离散的动作空间和连续的状态空间。虽然理论上我们的状态空间是离散的，但可能的状态组合极其庞大，若将其视为离散空间，则对于学习算法来说将难以有效训练。深度Q学习已在类似的应用中得到验证并取得成功，例如应用于Atari游戏——这些计算机化游戏机器人同样面临与我们相似的问题，包括高维视觉输入（Mnih, 2013）。\n\n为了构建我们的强化学习机器人，我们采用了OpenAI开发的DQN基准框架。然而，鉴于哈立特游戏由引擎实例化的方式，同时进行对局并应用Q学习需要对游戏引擎进行大量修改。因此，我们选择通过OpenAI Gym环境构建单舰场景的模拟环境，并在此全新环境中进行学习。值得注意的是，尽管我们在模拟环境中训练了机器人，但学习问题的本质并未改变。在参数选择方面，我们主要参考了在Atari游戏中训练DQN时所使用的配置。\n\n我们为DQN设计了两种奖励函数：即时奖励和延迟奖励。即时奖励会在飞船完成哈立特投放时立即给予奖励；而延迟奖励则根据整场比赛中累计收集的哈立特总量，在游戏结束后统一发放。我们将评估哪种奖励函数更为有效，若两者均可行，则进一步比较其表现优劣。\n\n该模型在Q学习中的训练过程主要依赖于探索与利用两个阶段。在每一轮迭代中，模型会根据当前情况随机猜测（探索）或基于Q表中的已有信息作出决策（利用），以确定下一步行动。模型依据ε-贪心策略来决定执行哪一种操作：以1-ε的概率进行探索，以ε的概率进行利用。初始时，ε值较小，随着迭代次数的增加，ε值逐渐增大。根据每轮迭代中选定的参数值，Q表中的数值会相应更新（Mnih, 2013）。\n\n此外，该算法还采用了对决式增强机制，这是一种多组神经网络相互竞争的学习方式，类似于人类的想象力（Wang, 2016）。\n\n### 2.4 预期\n\n我们预计，采用强化学习的第二种方法将在多个方面优于监督学习方法。如前所述，监督学习是从较为复杂的问题中提取数据，并倾向于模仿既定模式；而强化学习则专门针对我们的问题进行了优化，因为我们的动作空间是离散的，可供选择的选项有限。尽管我们可能在调整DQN参数时遇到一定困难，但我们仍预测它将超越支持向量机的表现。\n\n总体而言，我们预计基于遗传算法的基准机器人和强化学习机器人将取得最佳性能，因为它们是专门为单舰、300回合的场景量身定制的。\n\n## 3 结果与评估\n\n为评估DQN和SVM机器人的性能，我们将其与两个基准进行了对比：一个简单的规则驱动机器人，以及一个经过遗传优化的规则驱动机器人。\n\n具体而言，我们让每种机器人在32×32的地图上分别进行500局对战，每局设定300个回合。所有四款机器人均在相同的500组地图种子上进行了测试。\n\n### 3.1 规则驱动机器人\n\n作为第一个基准，我们创建了一个简单的规则驱动机器人，其行为基于贪婪策略。该机器人会持续从当前位置采集哈立特，直到该格子内的哈立特低于100点，随后移动到相邻哈立特最多的格子。当货物达到最大容量1000点时，它便会返回船坞进行投放。\n\n### 3.2 遗传机器人\n\n对于我们的第二个基准测试，我们开发了一款高性能的基于规则的机器人，并使用自定义的遗传算法来优化各项参数。该机器人会扫描周围单元格，寻找距离最近且高于某一阈值的单元格。扫描距离和阈值大小均通过遗传算法进行调优。在采集资源时，机器人会持续采集当前单元格中的资源，直到该单元格被认为“稀缺”。所谓“稀缺”的判断标准取决于周围平均卤石含量的一个倍数因子以及一个绝对最小值，这两者同样经过遗传优化。最后，机器人返回船厂的载货量阈值也进行了调优。\n\n我们的遗传算法初始种群包含20个个体，每个个体的各项参数都由随机生成的“基因”决定。每一代中，每个个体都会参与五场比赛。根据适应度函数——即五场比赛中累计采集的卤石占比——采用锦标赛选择法选出适应度最高的个体。子代的参数则通过交叉和变异产生。\n\n我们运行了这一自定义算法200代，每代保持20个个体，直到性能趋于稳定，最终构建了一个强大的基准模型。\n\n\n### 3.3 结果\n\n\u003Cp align=\"center\">\n  \u003Ckbd>\n    \u003Cimg width=\"460\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flnmangione_Halite-III_readme_eaecefb8e7a2.png\">\n  \u003C\u002Fkbd>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    \u003Ctext> 图2\u003C\u002Ftext>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ckbd>\n    \u003Cimg width=\"460\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flnmangione_Halite-III_readme_bd4ea9bd213c.png\">\n  \u003C\u002Fkbd>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    \u003Ctext> 图3\u003C\u002Ftext>\n\u003C\u002Fp>\n\n对于四款机器人各自进行的500场比赛，我们记录了每场比赛中总沉积的卤石数量以及地图上每个单元格的平均卤石含量。图2展示了这2000个数据点的散点图，其中X轴为每个单元格的平均卤石含量，Y轴为总采集的卤石数量。图3则呈现了同一组数据的16期移动平均线。我们选择这一指标，是因为卤石分布密集的地图自然会导致更高的总采集量，因此仅比较500场比赛的平均采集量并不足以全面评估表现。\n\n我们的结果表明，两款机器学习机器人能够较好地学习卤石采集行为，其表现与两个基准机器人相当。此外，经过遗传优化的机器人在所有地图密度下均优于其他机器人。DQN机器人在低卤石密度的地图上表现优于基于规则的机器人，但在中高密度地图上则与之持平。而SVM机器人在低卤石密度的游戏场景中表现与基于规则的机器人相近，但在中高密度场景中则整体落后于其他机器人。\n\n\n### 3.4 讨论\n\n正如预期，强化学习机器人和遗传优化机器人取得了最高性能。这一结果在意料之中，因为DQN和遗传优化机器人是专门为单艘飞船、300回合的比赛场景训练和进化的。相比之下，SVM是在稍有不同的数据集上训练的，而基于规则的机器人则未接受任何专门训练。\n\n我们观察到，DQN仅凭地图数据和每次沉积卤石时获得的即时奖励，便能迅速学会游戏策略。然而，在实验中，我们未能使用延迟奖励函数达到同样的效果。在这种奖励机制下，DQN只有在300回合比赛结束时才会根据总沉积的卤石量获得奖励。由于在此奖励机制下，DQN在99.66%的回合中都不会收到任何奖励，因此表现不佳也就不足为奇了。\n\n尽管如此，我们发现当采用即时奖励机制时，DQN确实能够掌握复杂的卤石采集与返回船厂的任务。值得注意的是，这一任务颇具挑战性：DQN不能简单地在载货量达到某个阈值时就返回船厂，因为船只在移动过程中会消耗卤石，导致载货量低于该阈值，从而陷入无休止的循环。因此，DQN必须学习更为复杂的行为模式。\n\n至于SVM，虽然其性能略逊于其他机器人，但仍然表现出相对接近的水平。考虑到SVM是在更复杂的多人多船场景下训练的，它仍能很好地泛化到我们简化后的单人单船场景中。\n\n\n## 4 未来工作\n\n通过本次实验及所得结果，我们证明了将机器学习应用于机器人算法的开发是一项有趣且实用的研究方向。然而，在这一领域仍有大量值得深入探索的空间。\n\n\n值得注意的是，本研究中的实验是对Halite游戏的简化版本：我们将船只数量限制为一艘，固定了地图尺寸和迭代次数，并且仅针对单人优化目标进行测试。而在真实的游戏中，参数更加复杂，每一轮都需要考虑更多的变量和状态（尤其是其他玩家的状态）。\n\n特别是将训练好的机器人适配到多人模式时，还会引发一些更有趣的问题和探索方向，例如可能需要引入博弈论的考量，让玩家通过预测对手的行为来优化自己的策略，以最大化获胜机会。\n\n此外，还有许多不同类型的机器学习机器人值得进一步测试和评估其性能，同时也可以尝试调整各种参数，或改变游戏所用的地图尺寸。另一个有趣的探索方向是将这些方法推广到其他类型的游戏中，考察它们在不同环境下的适用性和迁移能力。\n\n## 5 相关工作\n\n将机器学习方法融入游戏算法的开发中，一直以来都得到了广泛的研究与实践。各类分类器及研究方向，尤其是神经网络和强化学习，已被应用于多种游戏的人工智能程序开发。其中，DeepMind的一项重要工作便是利用强化学习为国际象棋、将棋、围棋等多款游戏打造高性能AI（Silver, 2017）。特别地，作者采用了一种称为“白板式”强化学习的方法，训练出一种完全不依赖先验知识、仅基于游戏规则的算法，在每款游戏中均超越了当时已知的最佳算法，并且这种能力还能很好地迁移到计算复杂度更高的将棋等游戏中。\n\n尽管已有许多基于机器学习的游戏AI被开发出来，但我们注意到，将这一方法应用于Halite这一特定游戏的问题仍具有一定创新性。实际上，运用机器学习原理来构建AI有多种可行的方式；而我们在算法中所采用的独特特征表示以及针对Halite简化版的具体策略，在现有文献中尚未见专门探讨。\n\n## 6 结论\n\n综上所述，我们通过机器学习训练了两款AI：一款基于支持向量机分类器的监督学习模型，另一款则基于神经网络的强化学习模型。经过与基准AI的性能对比，我们发现神经网络模型的表现显著优于SVM模型，但整体而言仍远逊于基于遗传算法训练的规则驱动型AI。不过，由于两款机器学习模型的性能均高于完全随机的行为，且与规则驱动型基准相当，因此可以认为它们确实能够做出具有一定意义的决策。\n\n尽管我们的神经网络模型在与基准对比时表现尚可，但其效果仍未达到最初的预期。为进一步提升性能，一种可能的改进方案是引入一种奖励函数：每当完成固定数量的步数后，便根据该阶段内收集到的Halite总量给予相应奖励。换言之，奖励不再仅在采集资源或游戏结束时发放，而是以间隔方式分布在整个对局过程中。这样可以使AI学会识别某种行为模式的价值，而非单纯关注单个动作的好坏，从而更有效地学习并掌握策略。","# Halite-III 快速上手指南\n\nHalite-III 是一个由 Two Sigma 举办的开源人工智能挑战赛，玩家需编写机器人进行资源管理（收集卤素\u002F能量）。本项目展示了如何利用机器学习（支持向量机 SVM 和深度强化学习 DQN）构建单船智能体来优化资源采集。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux, macOS 或 Windows (推荐 Linux 以获得最佳兼容性)\n*   **Python 版本**: Python 3.6 或更高版本\n*   **前置依赖**:\n    *   `pip` (Python 包管理工具)\n    *   `git` (版本控制工具)\n    *   主要 Python 库：`numpy`, `scikit-learn`, `tensorflow` 或 `pytorch` (取决于具体实现), `gym` (用于强化学习环境)\n\n> **提示**：国内开发者建议使用清华源或阿里源加速 Python 包的安装。\n\n## 安装步骤\n\n1.  **克隆项目仓库**\n    将代码下载到本地：\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Flnmangione\u002FHalite-III.git\n    cd Halite-III\n    ```\n\n2.  **创建虚拟环境（推荐）**\n    为了避免依赖冲突，建议创建独立的虚拟环境：\n    ```bash\n    python3 -m venv venv\n    source venv\u002Fbin\u002Factivate  # Windows 用户请使用: venv\\Scripts\\activate\n    ```\n\n3.  **安装依赖包**\n    使用国内镜像源安装所需依赖（以清华源为例）：\n    ```bash\n    pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n    ```\n    *注：如果项目中没有 `requirements.txt`，请根据代码导入情况手动安装核心库，例如：*\n    ```bash\n    pip install numpy scikit-learn tensorflow gym -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n    ```\n\n4.  **配置本地游戏引擎**\n    该项目涉及对单船场景的简化模拟。确保您已按照项目说明初始化了本地的 OpenAI Gym 环境模拟接口（通常在首次运行脚本时会自动处理，或需运行特定的 setup 脚本）。\n\n## 基本使用\n\n本项目主要包含两种机器人实现：基于监督学习的 SVM 机器人和基于深度强化学习的 DQN 机器人。\n\n### 1. 训练 SVM 机器人 (监督学习)\n该模型通过分析顶级玩家的重放录像（Replays）来学习航行决策。\n\n```bash\npython train_svm_bot.py\n```\n*执行后，脚本将解析前 50 名玩家的游戏数据（前 150 步），提取特征（卤素携带量、船坞位置、附近资源等），并训练一个带有 RBF 核的支持向量机分类器。*\n\n### 2. 训练 DQN 机器人 (强化学习)\n该模型通过自我博弈在模拟环境中学习，旨在最大化长期奖励（收集的卤素总量）。\n\n```bash\npython train_dqn_bot.py\n```\n*执行后，脚本将在自定义的 32x32 单船 Gym 环境中启动训练。模型会利用 $\\epsilon$-greedy 策略平衡探索与利用，并通过即时奖励（沉积卤素）或延迟奖励（游戏结束总分）来更新网络权重。*\n\n### 3. 运行评估\n训练完成后，您可以让机器人在本地模拟环境中运行，观察其表现：\n\n```bash\npython evaluate_bot.py --model_type svm\n# 或者\npython evaluate_bot.py --model_type dqn\n```\n\n*系统将渲染游戏画面或输出日志，展示机器人在单船模式下的资源采集效率。*","某 AI 算法团队正在备战 Two Sigma 举办的年度 Halite III 资源管理竞赛，急需在有限的开发周期内构建出能高效采集“海石”能源的智能机器人。\n\n### 没有 Halite-III 时\n- **策略固化难调优**：开发人员只能手动编写基于规则（Rule-based）的硬代码逻辑，面对地图上随机分布的资源点，机器人无法灵活应对复杂局势，容易陷入死板的路径规划。\n- **决策上限低**：传统启发式算法难以处理多船协同与敌方碰撞风险之间的平衡，导致机器人在高压对局中频繁失误，得分往往卡在中等水平，无法突破瓶颈。\n- **研发效率低下**：每次调整策略都需要人工重新定义大量判断条件，缺乏自我进化能力，团队需耗费数周时间反复试错才能微调出稍好的表现。\n\n### 使用 Halite-III 后\n- **智能自适应决策**：利用 Halite-III 集成的深度神经网络与强化学习技术，机器人能通过自我对弈自动习得最优航行策略，动态规避风险并精准锁定高价值资源区。\n- **性能显著跃升**：经过训练的分类器在单船资源采集任务中表现优异，实测水平超越了简单的规则机器人，甚至逼近经过基因调优的高级基准对手。\n- **泛化能力强**：模型不仅掌握了特定地图的打法，还能将学习到的博弈逻辑迁移到不同尺寸的地图或多玩家场景中，大幅减少了针对新环境重新编码的工作量。\n\nHalite-III 通过将监督学习与强化学习引入博弈场景，成功将原本依赖人工经验的策略开发转变为数据驱动的自我进化过程，显著提升了 AI 在复杂资源管理任务中的决策智慧。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flnmangione_Halite-III_9e8e9f0f.png","lnmangione","Luigi Mangione","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Flnmangione_b79f4c0b.jpg","M.S.E. and B.S.E. in Computer Science @ University of Pennsylvania\r\n","AppRoar Studios",null,"lnmangione@gmail.com","https:\u002F\u002Fwww.facebook.com\u002Fapproarstudios?fref=ts&ref=br_tf","https:\u002F\u002Fgithub.com\u002Flnmangione",[85,89,93,97,101],{"name":86,"color":87,"percentage":88},"Jupyter Notebook","#DA5B0B",51.2,{"name":90,"color":91,"percentage":92},"Java","#b07219",31.3,{"name":94,"color":95,"percentage":96},"Python","#3572A5",17,{"name":98,"color":99,"percentage":100},"Shell","#89e051",0.5,{"name":102,"color":103,"percentage":104},"Batchfile","#C1F12E",0,502,82,"2026-03-28T11:42:57",4,"","未说明",{"notes":112,"python":110,"dependencies":113},"README 主要描述了算法原理（SVM 和深度 Q 学习）及自定义的单船模拟环境构建，未提供具体的安装指南、操作系统兼容性、硬件配置要求或 Python 依赖列表。项目涉及修改游戏引擎以适配强化学习训练，并使用了基于 OpenAI 的 DQN 基线模型。",[114,115],"OpenAI Gym","DQN baseline (OpenAI)",[15,46],"2026-03-27T02:49:30.150509","2026-04-06T09:25:38.119985",[],[]]