多智能体搜索算法Python实现 CS188 Proj2 学习笔记

发布时间：2026/5/23 3:16:21

更好的阅读体验Q1.Reflex AgentProj2的开头是一个完成评估函数的问题评估函数的思想非常类似与启发式的思想。相较与Proj1不同的是该过程添加了对抗智能体还有ScaredTime的设定。完成评估函数的时候首先考虑就是评估函数的组成要素其中可以大致表示为总分游戏自带分数食物相关奖励 scared鬼奖励 - stop惩罚 - 危险鬼惩罚原评估函数在Reflex Agent类中的scoreEvaluationFunction中实现其只是返回了当前状态的分数缺少很多必要的组成元素可以用下列指令在shell中运行原评估函数的gui版本python pacman.py-pReflexAgent-ltestClassic很容易发现的是pacman在许多情况下采取了action集中的stop动作。这种行为及其拉低效率所以把当采取stop这一步的影响也要算入评估函数的表达式中代码实现defevaluationFunction(self,currentGameState:GameState,action): Design a better evaluation function here. The evaluation function takes in the current and proposed successor GameStates (pacman.py) and returns a number, where higher numbers are better. The code below extracts some useful information from the state, like the remaining food (newFood) and Pacman position after moving (newPos). newScaredTimes holds the number of moves that each ghost will remain scared because of Pacman having eaten a power pellet. Print out these variables to see what youre getting, then combine them to create a masterful evaluation function. # Useful information you can extract from a GameState (pacman.py)successorGameStatecurrentGameState.generatePacmanSuccessor(action)newPossuccessorGameState.getPacmanPosition()newFoodsuccessorGameState.getFood()newGhostStatessuccessorGameState.getGhostStates()newScaredTimes[ghostState.scaredTimerforghostStateinnewGhostStates]*** YOUR CODE HERE ***# print(Action: , action)# print(Pacmans position: , newPos)# print(Food count: , newFood.count())# print(Ghost positions: , [g.getPosition() for g in newGhostStates])# print(Scared times: , newScaredTimes)# print(- * 30)scoresuccessorGameState.getScore()ifactionDirections.STOP:score-10foodListnewFood.asList()iffoodList:minFoodDistmin(manhattanDistance(newPos,food)forfoodinfoodList)score10.0/minFoodDistforghostStateinnewGhostStates:ghostPosghostState.getPosition()distmanhattanDistance(ghostPos,newPos)ifghostState.scaredTimer0:ifdist0:score5.0/distelse:ifdist1:score-200else:score-5.0/distreturnscore在Q1原文的提示中提及到了尝试数值的倒数而不是数值本身在本阶段的代码实现也有体现Note: As features, try the reciprocal of important values (such as distance to food) rather than just the values themselves.最终结果Record: Win, Win, Win, Win, Win, Win, Win, Win, Win, Win *** PASS: test_cases/q1/grade-agent.test (4 of 4 points) *** 1278.3 average score (2 of 2 points) *** Grading scheme: *** 500: 0 points *** 500: 1 points *** 1000: 2 points *** 10 games not timed out (0 of 0 points) *** Grading scheme: *** 10: fail *** 10: 0 points *** 10 wins (2 of 2 points) *** Grading scheme: *** 1: fail *** 1: 0 points *** 5: 1 points *** 10: 2 points ### Question q1: 4/4 ### Finished at 20:20:52 Provisional grades Question q1: 4/4 ------------------ Total: 4/4在Q1测试中是可以完美拿到满分的但是我运行了shell指令更换其他的大型复杂layout,发现智能体在后期很容易出现反复踱步的问题。经过AI的详细解答后总结原因得知决策规则里有一行关键代码影响了智能体的行动决策过程chosenIndexrandom.choice(bestIndices)到后期之后食物减少的同时鬼的距离也变远两个方向的1/distance几乎一样而且ghost的距离也差不多总体来说多个action的分数会出现一样的情况。上述代码面对这种情况的时候会出现A → B → A → B的决策过程掉入Note4中提及到的局部最优震荡陷阱,每一步都是最优但整体状况上不前进。从根源上讲这和ReflexAgent天生没有长期记忆和规划能力有关系缓解这个问题的解决方式可以是当检测到新位置和当前位置重复连续出现太多次进行扣分操作Q2.Minimax首先可以回忆一下Note5中Minimax的伪代码在实现文件中代码已经提供了可能用到的函数非常全面。需要注意的是minimax树中的叶节点用self.evaluationFunction来评判状态分数代码实现classMinimaxAgent(MultiAgentSearchAgent): Your minimax agent (question 2) defgetAction(self,gameState:GameState): Returns the minimax action from the current gameState using self.depth and self.evaluationFunction. Here are some method calls that might be useful when implementing minimax. gameState.getLegalActions(agentIndex): Returns a list of legal actions for an agent agentIndex0 means Pacman, ghosts are 1 gameState.generateSuccessor(agentIndex, action): Returns the successor game state after an agent takes an action gameState.getNumAgents(): Returns the total number of agents in the game gameState.isWin(): Returns whether or not the game state is a winning state gameState.isLose(): Returns whether or not the game state is a losing state *** YOUR CODE HERE ***defminimax(state,agentIndex,depth):ifstate.isWin()orstate.isLose()ordepthself.depth:returnself.evaluationFunction(state)numAgentsstate.getNumAgents()ifagentIndex0:value-float(inf)foractioninstate.getLegalActions(0):successorstate.generateSuccessor(0,action)valuemax(value,minimax(successor,1,depth))returnvalueelse:valuefloat(inf)nextAgentIndexagentIndex1foractioninstate.getLegalActions(agentIndex):successorstate.generateSuccessor(agentIndex,action)ifnextAgentIndexnumAgents:valuemin(value,minimax(successor,0,depth1))else:valuemin(value,minimax(successor,nextAgentIndex,depth))returnvalue bestScore-float(inf)bestActionNoneforactioningameState.getLegalActions():successorgameState.generateSuccessor(0,action)scoreminimax(successor,1,0)ifscorebestScore:bestScorescore bestActionactionreturnbestAction这里一定要注意depth的逻辑一次搜索是一次pacman和所有ghost的反应即depth为2时涉及pacman和每个ghost移动两次这里我想了一下代码具体运行过程才能想明白假设self.depth是1,也就是pacman和ghost的移动次数都为1。最开始函数调用传入的minimax(successor, 1, 0),到运行到最后一个ghost智能体时则会执行value min(value, minimax(successor, 0, depth 1))再运行则会触发终止条件depth self.depth,返回状态的分数return self.evaluationFunction()结束递归嵌套,所以说最终depth还是为1的Q3.Alpha-Beta Pruning依旧可以回忆一下Note5中Alpha-Beta Pruning的伪代码有个小细节需要注意的是在Proj2的文档中函数的判断条件稍微改了一下如果不按下面的伪代码实现的话最后autograder没法通过6-tied-root.test*** FAIL: test_cases/q3/6-tied-root.test *** Incorrect generated nodes for depth3 *** Student generated nodes: A B max min1 min2 *** Correct generated nodes: A B C max min1 min2 *** Tree: *** max *** / \ *** min1 min2 *** | / \ *** A B C *** 10 10 0代码实现classAlphaBetaAgent(MultiAgentSearchAgent): Your minimax agent with alpha-beta pruning (question 3) defgetAction(self,gameState:GameState): Returns the minimax action using self.depth and self.evaluationFunction *** YOUR CODE HERE ***defalphabeta(state,alpha,beta,depth,agentIndex):ifstate.isWin()orstate.isLose()ordepthself.depth:returnself.evaluationFunction(state)numAgentsstate.getNumAgents()ifagentIndex0:value-float(inf)foractioninstate.getLegalActions(0):successorstate.generateSuccessor(0,action)valuemax(value,alphabeta(successor,alpha,beta,depth,1))ifvaluebeta:returnvalue alphamax(alpha,value)returnvalueelse:valuefloat(inf)nextAgentIndexagentIndex1foractioninstate.getLegalActions(agentIndex):successorstate.generateSuccessor(agentIndex,action)ifnextAgentIndexnumAgents:valuemin(value,alphabeta(successor,alpha,beta,depth1,0))else:valuemin(value,alphabeta(successor,alpha,beta,depth,nextAgentIndex))ifvaluealpha:returnvalue betamin(beta,value)returnvalue bestScore-float(inf)bestActionNonealpha-float(inf)betafloat(inf)foractioningameState.getLegalActions(0):successorgameState.generateSuccessor(0,action)scorealphabeta(successor,alpha,beta,0,1)ifscorebestScore:bestScorescore bestActionaction alphamax(alpha,bestScore)returnbestAction代码实现的整体框架和Minimax基本没有差异唯一需要注意的点就是根节点作为最大化节点也是需要剪枝的所以说一定不能忘了倒数第二行的alpha max(alpha, bestScore)这里还需加强理解一下alpha和beta在各个阶段的具体数值才能深刻理解alpha-beta pruning的运行过程拿下图来举个例子!最开始根节点值为None,alpha -float(inf)且beta float(inf)走最左子树min1开始遍历自己的一个子节点(3)遍历过后更新min1的beta 3之后遍历后两个子节点(12和6)得到min1的值已知min1的值为3,根节点的alpha开始更新执行alpha max(alpha, bestScore)语句此刻根节点的alpha 3遍历中间子树根节点alpha的值传了下来当遍历到中间子树的第一个子节点(2)触发判断语句value alpha就直接退出此次遍历了Q4.Expectimax依旧依旧是可以回顾一下Note6中的伪代码代码实现classExpectimaxAgent(MultiAgentSearchAgent): Your expectimax agent (question 4) defgetAction(self,gameState:GameState): Returns the expectimax action using self.depth and self.evaluationFunction All ghosts should be modeled as choosing uniformly at random from their legal moves. *** YOUR CODE HERE ***defexpectimax(state,agentIndex,depth):ifstate.isWin()orstate.isLose()ordepthself.depth:returnself.evaluationFunction(state)agentNumstate.getNumAgents()ifagentIndex0:value-float(inf)foractioninstate.getLegalActions(0):successorstate.generateSuccessor(0,action)valuemax(value,expectimax(successor,1,depth))else:value0actionsstate.getLegalActions(agentIndex)nextAgentIndexagentIndex1foractioninstate.getLegalActions(agentIndex):successorstate.generateSuccessor(agentIndex,action)probability1/len(actions)ifnextAgentIndexagentNum:valueprobability*expectimax(successor,0,depth1)else:valueprobability*expectimax(successor,nextAgentIndex,depth)returnvalue bestActionNonebestScore-float(inf)foractioningameState.getLegalActions(0):successorgameState.generateSuccessor(0,action)scoreexpectimax(successor,1,0)ifscorebestScore:bestScorescore bestActionactionreturnbestAction util.raiseNotDefined()Q4这个问题和前面几个问题代码基本上都是一个模板直接套就行唯一需要注意的地方是有关probability的代码对手将在getLegalActions中纯随机选择,所以probability的实现是probability 1 / len(actions),原文档如下To simplify your code, assume you will only be running against an adversary which chooses amongst their getLegalActions uniformly at random.原文档还提到了AlphaBetaAgent(以Minimax为基础)和ExpectimaxAgent的运行时候的区别python pacman.py-pAlphaBetaAgent-ltrappedClassic-adepth3-q-n10python pacman.py-pExpectimaxAgent-ltrappedClassic-adepth3-q-n10比较容易观察发现的现象是ExpectimaxAgent可以在多局比赛中赢得几乎一半的数量AlphaBetaAgent总是输。导致这个现象的根本原因就是在Note6开头中提及到的Minimax太过悲观,AlphaBetaAgent总是假设对手会采取最优行动所以pacman会直接放弃争取机会。而Expectimax中认定ghost是随机行动在计算期望值后会在关键时刻决定赌一把Q5.Evaluation Function和Q1问题相同都是完成Evaluation Function函数传入的参数里面没有action,所以没有stop惩罚了感觉没有了灵魂。我的整体思路是总分游戏自带分数最近食物距离 - 食物数量(长期目标) - 普通鬼惩罚可吃鬼奖励 - 胶囊距离惩罚代码实现defbetterEvaluationFunction(currentGameState:GameState): Your extreme ghost-hunting, pellet-nabbing, food-gobbling, unstoppable evaluation function (question 5). DESCRIPTION: write something here so we know what you did *** YOUR CODE HERE ***poscurrentGameState.getPacmanPosition()foodcurrentGameState.getFood()ghostStatescurrentGameState.getGhostStates()scaredTimes[ghostState.scaredTimerforghostStateinghostStates]capsulescurrentGameState.getCapsules()scorecurrentGameState.getScore()foodListfood.asList()iffoodList:distToClosestFoodmin(manhattanDistance(pos,food)forfoodinfoodList)score20.0/distToClosestFoodelse:score1000score-5*len(foodList)ifcapsules:distToClosestCapsulemin(manhattanDistance(pos,capsule)forcapsuleincapsules)score5.0/distToClosestCapsule score-20*len(capsules)forghostinghostStates:ghostPosghost.getPosition()ghostDistmanhattanDistance(pos,ghostPos)ifghost.scaredTimer0:score100.0/ghostDistelse:ifghostDist1:score-500elifghostDist3:score-200elifghostDist5:score-100returnscore基本思路和Q1是一样的没有了Stop的惩罚让pacman在执行动作过程中会出现特别特别多的无厘头停止动作启发式内容仍需要优化但是可以pass autograder

解决Keil MDK中Arm Compiler V6.6.1许可错误

1. 问题现象解析当你在Keil MDK-Plus或Essential版本中尝试使用Arm Compiler V6.6.1 Long Term Maintenance（长期维护版）编译项目时，会遇到以下错误提示：ARMClang.exe: error: CT.CompilerEM66 is not available with the current…

2026/5/23 3:15:00 阅读更多

NHSE存档编辑器深度解析：解锁动物森友会游戏数据修改的终极指南

NHSE存档编辑器深度解析：解锁动物森友会游戏数据修改的终极指南【免费下载链接】NHSE Animal Crossing: New Horizons save editor 项目地址: https://gitcode.com/gh_mirrors/nh/NHSE NHSE（New Horizons Save Editor）是一款专业的《…

2026/5/23 3:15:00 阅读更多

【NotebookLM显著性判断实战指南】：20年AI架构师亲授5大误判陷阱与3步精准验证法

更多请点击： https://intelliparadigm.com 第一章：NotebookLM显著性判断的核心概念与本质认知 NotebookLM 是 Google 推出的基于用户上传文档进行语义理解与对话生成的实验性 AI 工具，其“显著性判断”并非传统统计学中的 p 值检验&#xff…

2026/5/23 3:15:00 阅读更多

安全雷达性能及参数选型对比

在工业自动化和移动机器人领域，安全激光雷达正从可选配件转变为核心安全标准件。目前市场主要分为进口与国产两大阵营。本文梳理了八大主流品牌的核心技术参数与适用特点，并总结出一套包含五个步骤的选型方法，供实际项目参考。一、品牌阵营核心看点市场主流安全雷达分为…

2026/5/23 4:18:21 阅读更多

小程序数据采集（11）- IDA Pro逆向SO层与ARM汇编寻址详解

更多内容请见：《爬虫和逆向教程》 - 专栏介绍和目录本节内容：使用IDA Pro分析SO库，识别ARM汇编指令，定位JNI_OnLoad和核心加签函数的注册表。（SO层静态逆向分析）引言：当 Java 层的防线被击穿，SO 层才是真正的地心在前面的系列文章中，我们横推了 JS 反混淆、掌握了…

2026/5/23 4:15:38 阅读更多

快速复习C语言

快速复习C语言Windows基础入门了解Windows文件系统：C语言基础入门第一天安装与运行编写与运行代码第二天 C基础语法不要怕失败，失败并不可怕，可怕的是失去勇气。 Windows基础入门了解Windows文件系统： 绝对路径：…

2026/5/23 4:15:18 阅读更多

如何快速安装TrollStore：iOS 14-16.6.1设备一键安装的终极指南

如何快速安装TrollStore：iOS 14-16.6.1设备一键安装的终极指南【免费下载链接】TrollInstallerX A TrollStore installer for iOS 14.0 - 16.6.1 项目地址: https://gitcode.com/gh_mirrors/tr/TrollInstallerX 你是否曾经为了在iOS设备上安装TrollStore而感…

2026/5/23 4:13:35 阅读更多

Day03 Web应用OSS存储负载均衡CDN加速反向代理WAF防护部署影响

我的博客园笔记一、WebWAF WAF（Web应用防火墙）：是一种专门设计用于保护 Web 应用程序免受恶意攻击的安全设备，它能够实时监控、过滤和拦截可能对网站造成危害的网络流量，从而避免网站服务器被恶意入侵，导…

2026/5/23 4:10:12 阅读更多

深入了解指针（3）

文章目录数组名的理解对arr[i]的理解一维数组传参的本质二级指针指针数组指针数组的用处总结这里是think的博客希望可以一起交流知识，一起think 今天我们来学习指针(3)吧一起来think吧数组名的理解 //测试环境:X86 #include <stdio.h> int main() { int a…

2026/5/23 4:10:12 阅读更多

红黑树完全指南：从五条性质到完整插入删除实现

引言在前面的树系列中，我们学习了二叉搜索树（BST）和 AVL 树。AVL 树通过严格的平衡条件（|BF| ≤ 1）保证 O(log n) 的性能，但代价是删除操作可能触发 O(log n) 次旋转。红黑树（Red-Black Tree&am…

2026/5/23 0:01:37 阅读更多

黎曼猜想：哲学 × 数学思维范式全链条

黎曼猜想：哲学数学思维范式全链条华夏之光永存｜七大数学猜想思维范式全链条第二篇开篇黎曼猜想被公认为数学史上最伟大的未解难题。希尔伯特曾说：“如果我沉睡百年后醒来，第一个问题就是：黎曼猜想证明了吗&…

2026/5/23 0:02:38 阅读更多

在Nodejs后端服务中集成稳定可靠的大模型能力

🚀 告别海外账号与网络限制！稳定直连全球优质大模型，限时半价接入中。 👉 点击领取海量免费额度在Nodejs后端服务中集成稳定可靠的大模型能力应用场景类，针对需要构建智能对话或内容生成功能的后端工程师&#xff0…

2026/5/23 0:03:18 阅读更多

【实用小程序】超轻量级文件上传下载中心 (File Download Server)

站内源码及jar包下载一、项目概述文件下载中心一个基于 Java 内置 HTTP 服务器（com.sun.net.httpserver）构建的轻量级文件管理服务。它零第三方依赖，单 JAR 包即可运行，适合在内网环境或临时场景中快速搭建文件共享站点。你的团队需要临时共享一批日志文件或交付物，…

2026/5/22 17:05:13 阅读更多

py每日spider案例之某website之xin东方选课搜索接口(难度一般扣取代码即可)

加密位置: 逆向接口参数: 逆向接口: const g = globalThis; g.window = g; g.self = g; g.location = {<

2026/5/22 16:54:23 阅读更多

终极轻量级Android文本编辑器Markor：多格式笔记应用完全指南

终极轻量级Android文本编辑器Markor：多格式笔记应用完全指南【免费下载链接】markor Text editor - Notes & ToDo (for Android) - Markdown, todo.txt, plaintext, math, .. 项目地址: https://gitcode.com/gh_mirrors/ma/markor 在移动设备上寻找一款…

2026/5/21 2:29:29 阅读更多

MPC-BE：基于DirectShow架构的专业级开源媒体播放解决方案

MPC-BE：基于DirectShow架构的专业级开源媒体播放解决方案【免费下载链接】MPC-BE MPC-BE – универсальный проигрыватель аудио и видеофайлов для операционной системы Windows. 项目地址:…

2026/5/22 14:41:35 阅读更多

如何快速计算3D模型体积和重量：STL-Volume-Model-Calculator终极指南

如何快速计算3D模型体积和重量：STL-Volume-Model-Calculator终极指南【免费下载链接】STL-Volume-Model-Calculator STL Volume Model Calculator Python 项目地址: https://gitcode.com/gh_mirrors/st/STL-Volume-Model-Calculator 你是否曾经为3D打印项目…

2026/5/22 11:03:47 阅读更多

通过Taotoken CLI工具一键配置团队开发环境与模型密钥

通过Taotoken CLI工具一键配置团队开发环境与模型密钥 1. CLI工具安装与基本使用 Taotoken提供的CLI工具可通过npm全局安装或直接使用npx运行。对于需要频繁使用CLI的团队，推荐全局安装： npm install -g taotoken/taotoken对于临时使用或项目级配置&a…

2026/5/22 3:58:33 阅读更多

相关文章

解决Keil MDK中Arm Compiler V6.6.1许可错误

NHSE存档编辑器深度解析：解锁动物森友会游戏数据修改的终极指南

【NotebookLM显著性判断实战指南】：20年AI架构师亲授5大误判陷阱与3步精准验证法

安全雷达性能及参数选型对比

小程序数据采集（11）- IDA Pro逆向SO层与ARM汇编寻址详解

快速复习C语言

如何快速安装TrollStore：iOS 14-16.6.1设备一键安装的终极指南

Day03 Web应用OSS存储负载均衡CDN加速反向代理WAF防护部署影响

深入了解指针（3）

红黑树完全指南：从五条性质到完整插入删除实现

黎曼猜想：哲学 × 数学 思维范式全链条

在Nodejs后端服务中集成稳定可靠的大模型能力

【实用小程序】超轻量级文件上传下载中心 (File Download Server)

py每日spider案例之某website之xin东方选课搜索接口(难度一般 扣取代码即可)

终极轻量级Android文本编辑器Markor：多格式笔记应用完全指南

MPC-BE：基于DirectShow架构的专业级开源媒体播放解决方案

如何快速计算3D模型体积和重量：STL-Volume-Model-Calculator终极指南

通过Taotoken CLI工具一键配置团队开发环境与模型密钥

黎曼猜想：哲学 × 数学思维范式全链条

py每日spider案例之某website之xin东方选课搜索接口(难度一般扣取代码即可)