《Agentic Design Patterns》目录及前言
Agentic Design Patterns | 智能体设计模式
A Hands-On Guide to Building Intelligent Systems | 构建智能系统的实践指南
Table of Contents | 目录
总页数:424 页
Dedication | 献辞
Acknowledgment | 致谢
Foreword | 序言
A Thought Leader's Perspective: Power and Responsibility | 思想领袖的观点:权力与责任
Introduction | 介绍
What makes an AI system an "agent"? | 是什么让 AI 系统成为「智能体」?
Part One | 第一部分
总计:103 页
Chapter 1: Prompt Chaining | 第一章:提示链
Chapter 2: Routing | 第二章:路由
Chapter 3: Parallelization | 第三章:并行化
Chapter 4: Reflection | 第四章:反思
Chapter 5: Tool Use | 第五章:工具使用
Chapter 6: Planning | 第六章:规划
Chapter 7: Multi-Agent | 第七章:多智能体
Part Two | 第二部分
总计:61 页
Chapter 8: Memory Management | 第八章:记忆管理
Chapter 9: Learning and Adaptation | 第九章:学习与适应
Chapter 10: Model Context Protocol (MCP) | 第十章:模型上下文协议(MCP)
Chapter 11: Goal Setting and Monitoring | 第十一章:目标设定与监控
Part Three | 第三部分
总计:34 页
Chapter 12: Exception Handling and Recovery | 第十二章:异常处理与恢复
Chapter 13: Human-in-the-Loop | 第十三章:人机协作
Chapter 14: Knowledge Retrieval (RAG) | 第十四章:知识检索(RAG)
Part Four | 第四部分
总计:114 页
Chapter 15: Inter-Agent Communication (A2A) | 第十五章:智能体间通信(A2A)
Chapter 16: Resource-Aware Optimization | 第十六章:资源感知型优化
Chapter 17: Reasoning Techniques | 第十七章:推理技术
Chapter 18: Guardrails/Safety Patterns | 第十八章:护栏 / 安全模式
Chapter 19: Evaluation and Monitoring | 第十九章:评估与监控
Chapter 20: Prioritization | 第二十章:优先级排序
Chapter 21: Exploration and Discovery | 第二十一章:探索与发现
Appendix | 附录
总计:74 页
Appendix A - Advanced Prompting Techniques | 附录 A - 高级提示技术
Appendix B - AI Agentic: From GUI to Real world environment | 附录 B - AI 智能体:从图形界面到现实世界环境
Appendix C - Quick overview of Agentic Frameworks | 附录 C - 智能体框架速览
Appendix D - Building an Agent with AgentSpace (online only) | 附录 D - 使用 AgentSpace 构建智能体(仅在线)
Appendix E - AI Agents on the CLI (online) | 附录 E - 命令行中的 AI 智能体(在线)
Appendix F - Under the Hood: An Inside Look at the Agents' Reasoning Engines | 附录 F - 深入了解:智能体推理引擎内部机制
Appendix G - Coding agents | 附录 G - 编程智能体
Conclusion | 总结
Glossary | 术语表
Index of Terms | 术语索引(由 Gemini 生成,含推理步骤示例)
Online Contribution - Frequently Asked Questions: Agentic Design Patterns | 在线贡献 - 常见问题:智能体设计模式
Pre Print | 预印本
https://www.amazon.com/Agentic-Design-Patterns-Hands-Intelligent/dp/3032014018/
All my royalties will be donated to Save the Children.
本书的所有版税将捐赠给救助儿童会(Save the Children)。
Foreword | 序言
The field of artificial intelligence is at a fascinating inflection point. We are moving beyond building models that can simply process information to creating intelligent systems that can reason, plan, and act to achieve complex goals with ambiguous tasks. These "agentic" systems, as this book so aptly describes them, represent the next frontier in AI, and their development is a challenge that excites and inspires us at Google.
人工智能领域正处在一个激动人心的转折点。我们正在从构建仅能处理信息的模型,迈向创造能够推理、规划和行动,以便在任务模糊的情况下达成复杂目标的智能系统。正如本书所精准描述的,这些「智能体」系统代表了 AI 的下一个前沿,其研发工作是一项挑战,也正是这份挑战在激励和鼓舞着我们谷歌的每一个人。
"Agentic Design Patterns: A Hands-On Guide to Building Intelligent Systems" arrives at the perfect moment to guide us on this journey. The book rightly points out that the power of large language models, the cognitive engines of these agents, must be harnessed with structure and thoughtful design. Just as design patterns revolutionized software engineering by providing a common language and reusable solutions to common problems, the agentic patterns in this book will be foundational for building robust, scalable, and reliable intelligent systems.
《智能体设计模式:构建智能系统的实践指南》恰逢其时,为我们的旅程指明方向。本书明确指出,作为这些智能体认知引擎的大语言模型,其强大的力量必须通过结构和精心的设计来驾驭。正如设计模式通过为常见问题提供通用语言和可复用的解决方案,为软件工程带来了革命性的变革一样,本书中的智能体模式也将成为构建稳健、可扩展、可靠智能系统的基石。
The metaphor of a "canvas" for building agentic systems is one that resonates deeply with our work on Google's Vertex AI platform. We strive to provide developers with the most powerful and flexible canvas on which to build the next generation of AI applications. This book provides the practical, hands-on guidance that will empower developers to use that canvas to its full potential. By exploring patterns from prompt chaining and tool use to agent-to-agent collaboration, self-correction, safety and guardrails, this book offers a comprehensive toolkit for any developer looking to build sophisticated AI agents.
将构建智能体系统比作一块「画布」,这个比喻与我们在谷歌 Vertex AI 平台上的工作产生了深刻的共鸣。我们致力于为开发者提供最强大、最灵活的画布,让他们能够在其上构建下一代 AI 应用。本书则提供了翔实的实战指导,赋能开发者充分发挥这块画布的全部潜力。通过探索从提示链、工具使用到智能体间协作、自我修正、安全性与护栏等一系列模式,本书为所有期望构建复杂 AI 智能体的开发者提供了一个全面的工具包。
The future of AI will be defined by the creativity and ingenuity of developers who can build these intelligent systems. "Agentic Design Patterns" is an indispensable resource that will help to unlock that creativity. It provides the essential knowledge and practical examples to not only understand the "what" and "why" of agentic systems, but also the "how."
AI 的未来将由那些能够构建智能系统的开发者的创造力和独创性来定义。《智能体设计模式》是释放这种创造力不可或缺的资源。它提供了必要的知识和实践示例,不仅帮助我们理解智能体系统「是什么」「为什么」,更能掌握「如何做」。
I am thrilled to see this book in the hands of the developer community. The patterns and principles within these pages will undoubtedly accelerate the development of innovative and impactful AI applications that will shape our world for years to come.
看到这本书能交到广大开发者社区的手中,我倍感激动。毫无疑问,书中所蕴含的模式与原则,将加速那些创新且影响深远的 AI 应用的开发进程,而这些应用将在未来数年里塑造我们的世界。
Saurabh Tiwary
VP & General Manager, CloudAI @ Google
Saurabh Tiwary
Google CloudAI 副总裁兼总经理
A Thought Leader's Perspective: Power and Responsibility | 思想领袖的洞见:权力与责任
Of all the technology cycles I've witnessed over the past four decades—from the birth of the personal computer and the web, to the revolutions in mobile and cloud—none has felt quite like this one. For years, the discourse around Artificial Intelligence was a familiar rhythm of hype and disillusionment, the so-called "AI summers" followed by long, cold winters. But this time, something is different. The conversation has palpably shifted. If the last eighteen months were about the engine -the breathtaking, almost vertical ascent of Large Language Models (LLMs)- the next era will be about the car we build around it. It will be about the frameworks that harness this raw power, transforming it from a generator of plausible text into a true agent of action.
在过去四十年我所见证的所有技术浪潮中——从个人电脑和互联网的诞生,到移动和云计算的革命——没有一次像今天这样。多年以来,围绕人工智能的讨论始终遵循着一种熟悉的节奏:始于大肆宣传,终于幻想破灭,所谓「AI 之夏」之后,总是伴随着漫长而寒冷的冬天。但这一次,情况有所不同,风向发生了切实的转变。如果说过去的十八个月是关于「引擎」的故事——即大语言模型那惊人的、近乎垂直的飞跃——那么下一个时代将是关于我们如何围绕它造出一辆「汽车」。这个时代,将关乎我们如何构建框架来驾驭这股原始的力量,把它从能生成看似合理文本的工具,打造成真正能付诸行动的智能体。
I admit, I began as a skeptic. Plausibility, I've found, is often inversely proportional to one's own knowledge of a subject. Early models, for all their fluency, felt like they were operating with a kind of impostor syndrome, optimized for credibility over correctness. But then came the inflection point, a step-change brought about by a new class of "reasoning" models. Suddenly, we weren't just conversing with a statistical machine that predicted the next word in a sequence; we were getting a peek into a nascent form of cognition.
坦白说,我起初是怀疑的。我发现,一件事物的「貌似可信度」,往往与我们对该主题的了解程度成反比。早期的模型,尽管语言流畅,却仿佛患上了「冒名顶替综合征」,它们被优化的目标是追求可信度,而非正确性。然而,转折点随之而来——推理模型的出现,实现了一次质的飞跃。那一刻,我们对话的对象不再仅仅是那个预测词语的统计机器;我们所窥见的,是一种正在萌芽的全新认知。
The first time I experimented with one of the new agentic coding tools, I felt that familiar spark of magic. I tasked it with a personal project I'd never found the time for: migrating a charity website from a simple web builder to a proper, modern CI/CD environment. For the next twenty minutes, it went to work, asking clarifying questions, requesting credentials, and providing status updates. It felt less like using a tool and more like collaborating with a junior developer. When it presented me with a fully deployable package, complete with impeccable documentation and unit tests, I was floored.
当我第一次试用一款新型的智能体编程工具时,我感受到了那种久违的、如魔法般的火花。我让它去做一个一直无暇推进的个人项目:把一个慈善网站从简易的网页构建器,迁移到一个规范、现代的 CI/CD 环境中。在接下来的二十分钟里,它开始工作,不断提出澄清问题,请求授权凭证,并提供进度更新。这感觉不像是在使用一个工具,更像是在与一位初级开发人员协作。当它最终向我提交一个带有无可挑剔的文档和单元测试、可完全部署的软件包时,我被彻底震撼了。
Of course, it wasn't perfect. It made mistakes. It got stuck. It required my supervision and, crucially, my judgment to steer it back on course. The experience drove home a lesson I've learned the hard way over a long career: you cannot afford to trust blindly. Yet, the process was fascinating. Peeking into its "chain of thought" was like watching a mind at work—messy, non-linear, full of starts, stops, and self-corrections, not unlike our own human reasoning. It wasn't a straight line; it was a random walk toward a solution. Here was the kernel of something new: not just an intelligence that could generate content, but one that could generate a plan.
当然,它并非完美。它会犯错,会卡住。它需要我的监督,以及至关重要的——我的判断力来引导它重回正轨。这次经历让我深刻地体会到了我在漫长的职业生涯中历经坎坷才学到的一个教训:你永远不能盲目信任。然而,这个过程本身却极其迷人。窥视它的「思维链」,宛若观看一颗大脑的运作——杂乱、非线性,充满开始、停顿与自我修正,这与我们人类的推理别无二致。那不是一条直线,而是一场通往解决方案的随机游走。在这里,我看到了新事物的雏形:一种不仅能生成内容,更能制定计划的智能。
This is the promise of agentic frameworks. It's the difference between a static subway map and a dynamic GPS that reroutes you in real-time. A classic rules-based automaton follows a fixed path; when it encounters an unexpected obstacle, it breaks. An AI agent, powered by a reasoning model, has the potential to observe, adapt, and find another way. It possesses a form of digital common sense that allows it to navigate the countless edge cases of reality. It represents a shift from simply telling a computer what to do, to explaining why we need something done and trusting it to figure out the how.
这便是智能体框架所带来的希望。它就像一张静态的地铁线路图与一个能为你实时重新规划路线的动态 GPS 之间的区别。一个经典的、基于规则的自动程序遵循固定的路径,当遇到意外障碍时,它就会崩溃。而一个由推理模型驱动的 AI 智能体,则有潜力去观察、适应并找到另一条路。它拥有一种数字化的常识,使其能够应对现实世界中无数的边缘案例。这代表着一种转变:我们不再是简单地告诉计算机「做什么」,而是向它解释「为什么需要做某件事」,并相信它能自己找出「如何做」。
As exhilarating as this new frontier is, it brings a profound sense of responsibility, particularly from my vantage point as the CIO of a global financial institution. The stakes are immeasurably high. An agent that makes a mistake while creating a recipe for a "Chicken Salmon Fusion Pie" is a fun anecdote. An agent that makes a mistake while executing a trade, managing risk, or handling client data is a real problem. I've read the disclaimers and the cautionary tales: the web automation agent that, after failing a login, decided to email a member of parliament to complain about login walls. It's a darkly humorous reminder that we are dealing with a technology we don't fully understand.
尽管这个新领域令人振奋,但它也带来了一种深远的责任感,尤其从我作为一家全球金融机构首席信息官的视角来看更是如此。这里的风险之高,不可估量。一个智能体在为「鸡肉三文鱼融合派」创建菜谱时犯了错,不过是个有趣的轶事。但如果一个智能体在执行交易、管理风险或处理客户数据时犯了错,那就是一个实实在在的大问题。我读过那些免责声明和警示故事:一个网络自动化智能体在登录失败后,竟然决定给一位国会议员发邮件抱怨登录墙。这是一个黑色幽默般的提醒:我们正在打交道的,是一项我们尚未完全理解的技术。
This is where craft, culture, and a relentless focus on our principles become our essential guide. Our Engineering Tenets are not just words on a page; they are our compass. We must Build with Purpose, ensuring that every agent we design starts from a clear understanding of the client problem we are solving. We must Look Around Corners, anticipating failure modes and designing systems that are resilient by design. And above all, we must Inspire Trust, by being transparent about our methods and accountable for our outcomes.
正是在这里,专业精神、企业文化以及对原则的执着追求,成为了我们至关重要的指南。我们的工程信条不是纸上的口号,而是我们的指南针。我们必须为使命而构建:确保我们设计的每一个智能体都始于对我们正在解决的客户问题的清晰理解。我们必须洞见未来,防患未然:预见各种失败模式,并设计出具有内在韧性的系统。最重要的是,我们必须启迪信任,不负所托:对我们的方法保持透明,对我们的结果负责。
In an agentic world, these tenets take on new urgency. The hard truth is that you cannot simply overlay these powerful new tools onto messy, inconsistent systems and expect good results. Messy systems plus agents are a recipe for disaster. An AI trained on "garbage" data doesn't just produce garbage-out; it produces plausible, confident garbage that can poison an entire process. Therefore, our first and most critical task is to prepare the ground. We must invest in clean data, consistent metadata, and well-defined APIs. We have to build the modern "interstate system" that allows these agents to operate safely and at high velocity. It is the hard, foundational work of building a programmable enterprise, an "enterprise as software," where our processes are as well-architected as our code.
在一个智能体化的世界里,这些信条被赋予了新的紧迫性。一个残酷的现实是,你不可能简单地将这些强大的新工具叠加在混乱、不一致的系统之上,并期望得到好的结果。混乱的系统加上智能体,只会酿成灾难。一个用垃圾数据训练出来的 AI,不仅会产生垃圾结果,它还会产生貌似可信、充满自信的垃圾,足以毒化整个流程。因此,我们首要且最关键的任务,是打好基础。我们必须投资于干净的数据、一致的元数据和定义良好的 API。我们必须建立起现代化的“州际高速公路系统”,让这些智能体能够安全、高速地运行。这是一项艰巨的基础性工作,其目的就是构建一个可编程的企业——即实现“企业即软件”的理念,从而让我们的业务流程也能像代码一样,拥有精良的架构。
Ultimately, this journey is not about replacing human ingenuity, but about augmenting it. It demands a new set of skills from all of us: the ability to explain a task with clarity, the wisdom to delegate, and the diligence to verify the quality of the output. It requires us to be humble, to acknowledge what we don't know, and to never stop learning. The pages that follow in this book offer a technical map for building these new frameworks. My hope is that you will use them not just to build what is possible, but to build what is right, what is robust, and what is responsible.
归根结底,这段旅程并非要取代人类的智慧,而是要增强它。它要求我们所有人掌握一套新的技能:清晰解释任务的能力,明智授权的智慧,以及验证产出质量的勤勉。它要求我们保持谦逊,承认自己的未知,并永不停止学习。本书接下来的篇章给出搭建这些新框架的技术地图。愿各位不止于把可能之事做成,更能把正确、可靠且负责任的事做好。
The world is asking every engineer to step up. I am confident we are ready for the challenge.
这个世界正在召唤每一位工程师挺身而出。我相信,我们已经准备好迎接挑战。
Enjoy the journey.
愿你享受这段旅程。
Marco Argenti, CIO, Goldman Sachs
Marco Argenti,高盛首席信息官(CIO)
四种AI代理产品的比较
好的,下面我帮你 比较下面这几款 AI 代理/自动化工具的特点、优劣和适用场景:
- ChatGPT Operator(由 OpenAI)
- Project Mariner(由 Google DeepMind)
- Computer Use(由 Anthropic)
- Browser Use(一种较开放/工具化的浏览器自动化框架)
我先给每一款做一个简介,然后在最后做一个对比表格,这样你能更清楚地看出它们的不同。
1. ChatGPT Operator
简介
- Operator 是 OpenAI 提出的一个“代理”研究预览(research preview),能让 ChatGPT 使用浏览器执行任务(例如填写表单、预订、下单等)。 (OpenAI)
- 它所用的模型称为 “Computer-Using Agent (CUA)”:结合了 GPT-4o 的视觉能力 + 强化学习,以理解图形用户界面(GUI)中的按钮、菜单、文本框。 (OpenAI)
- 操作方式类似:用户下达自然语言任务 → Agent 在网页浏览器中“像人”点击、滚动、输入。 (OpenAI Help Center)
- 目前为研究预览,重点在“自动化常见网页任务”而不是高度复杂的系统集成。
优点
- 对普通网页任务(例如网上订票、下单、查信息)设计得比较直接,门槛较低。
- 与 ChatGPT 集成,用户体验上延续聊天+任务自动化的模式。
- OpenAI 在安全与拒绝高风险操作上也有明确机制(如:当需提供敏感信息、付款等操作时,Agent 会请求用户介入)(The Verge)
限制 /注意事项
- 仍处于“实验预览”阶段,稳定性或适用范围尚有限。用户反馈有“复杂界面出错率高”问题。 (The Verge)
- 很多场景下仍需要用户监督或介入,尤其涉及登录、付款、敏感数据等。
- 功能主要在网页浏览器环境,对桌面其它应用(如本地软件)支持较少。
适用场景
- 想在浏览器里自动完成重复、结构化的任务(如表单填写、订票、购物)
- 用户已有 ChatGPT 环境且愿意尝试新功能
- 任务相对“标准网页流程”而非高度定制化复杂系统
2. Project Mariner
简介
- Project Mariner 是 Google DeepMind 提出的浏览器内代理原型,目标是在用户的浏览器中执行任务:理解网页元素(文本、图像、代码、表单等)并代表用户操作。 (blog.google)
- 它通过 Chrome 扩展/浏览器环境运作,重点在“从浏览器视角”进行。 (labs.google.com)
- 官方宣称其能同时处理多个任务(如最多十个任务并行)– 展现其在多任务、并行处理方面的野心。 (blog.google)
优点
- 多任务能力较强(Google 强调可同时执行多个任务)
- 基于 Google 强大的浏览器、搜索和网页理解技术,理论上有优势在“网页导航+理解”上
- 有更广泛网页元素类型理解(图片、代码、表单)而不仅是纯文本或按钮。 (datacamp.com)
限制 /注意事项
- 同样是研究原型,尚未大规模成熟部署。 (TechCrunch)
- 浏览器扩展或环境依赖较强,可能在不同地域、平台上可用性有限。
- 虽强调多任务,但也可能在任务管理、上下文追踪、错误恢复方面尚未完善。
适用场景
- 用户频繁使用浏览器进行复杂网页流程(如研究、比较、购物、表单组合)
- 想要浏览器内部高级代理功能(如理解网页结构、并行任务)
- 开发者希望构建集成 Google 生态内网页代理应用
3. Computer Use (Anthropic)
简介
- “Computer Use”是由 Anthropic 提供的代理/工具集,允许其模型(如 Claude)通过 API 与计算机环境交互:包括截图、鼠标点击、键盘输入、拖拽操作等。 (docs.anthropic.com)
- 不仅限于网页浏览器,也能操作桌面应用/任意 GUI。 (docs.anthropic.com)
- 是一个偏开发者/工具化接口的解决方案,适合更自由/定制化的自动化环境。
优点
- 灵活性高:可以用于网页浏览器之外的各种桌面应用、 GUI 自动化。
- 对开发者友好:可通过 API 调用、容器化环境、迭代集成。
- 理论上适用范围广,不局限于一个浏览器、一个界面。
限制 /注意事项
- 较高的技术门槛:用户或开发者需要配置环境、理解 API、部署。
- 虽然能力强,但“稳定性”“易用性”“失败恢复”可能不如专为最终用户设计的产品。反馈称“当前阶段成功率有时低”。 (Medium)
- 安全、权限、隐私风险较高,因为完全控制桌面/鼠标键盘意味着潜在风险。 (docs.anthropic.com)
适用场景
- 高度定制的自动化需求—比如:桌面软件操作、批量数据处理、公司内部流程自动化。
- 开发者或团队希望把代理集成到工具/系统里,而不仅仅是“用户在浏览器里点几个按钮”。
- 若你在研究、学习“GUI自动化+LLM融合”技术。
4. Browser Use
简介
- “Browser Use” 是一种开源或社区驱动的浏览器自动化框架/工具,用于支持 AI 代理控制浏览器:通过截图、HTML、鼠标键盘事件等。 (Helicone.ai)
- 更像一个“工具箱”而不是成品产品:开发者可选不同大模型、定制浏览器操作流程、集成自动化脚本。
优点
- 开源或较为自由:可以选模型、修改策略、适合学习或自建。
- 成本可能较低、可调性高。
- 对于研究/教学/原型开发场景尤其适合。
限制 /注意事项
- 不是面向最终普通用户的“你说一句话它自动完成”的产品,而需要一定配置与技术能力。
- 功能、稳定性可能受限于社区支持、资源、模型选型。
- 没有像 OpenAI / Google 那样的大规模生态背书,可能在“可靠性”“任务失败率”方面更高风险。
适用场景
- 自己想做“浏览器自动化+LLM”项目、实验或学习用。
- 有技术基础、愿意搭建/调试工具。
- 想要控制或定制代理行为,而不是使用黑箱服务。
对比表格
| 产品 | 核心功能 | 优势 | 限制 | 适合用户 |
|---|---|---|---|---|
| ChatGPT Operator | 在浏览器中自动执行任务(点击、滚动、输入) | 集成聊天+任务自动化,用户体验优;安全机制有保障 | 仍为研究预览,复杂任务可能出错;局限于网页操作 | 想“语音或聊天下令网页操作”的普通用户 |
| Project Mariner | 浏览器内代理,能处理多个任务、理解多种网页元素 | 多任务能力强,Google生态支持;较强理解能力 | 处于原型阶段;依赖浏览器扩展;可用性可能受地域/平台限制 | 浏览器密集使用者+研究人员/早期采用者 |
| Computer Use | 桌面/浏览器交互工具集(GUI鼠标键盘动作等) | 灵活性高,可操作任意应用;开发者友好 | 技术门槛较高;用户体验不如专产品;失败率可能高 | 开发者/技术团队/自动化研究者 |
| Browser Use | 开源浏览器自动化+代理框架 | 成本低、自由度高、适合学习/原型开发 | 不是即用产品;需要技术配置;稳定性视具体实现 | 自己搭建系统/实验/教学用途 |
我的建议(基于你的背景)
你说你是一名高中生,对 Python、数据科学、自动化有兴趣,那么我建议你从 Computer Use 或 Browser Use 入手,然后再观察 Operator 或 Project Mariner:
- 如果你愿意做“技术实现+脚本+自动化”项目 → 试试 Browser Use 或 Anthropics Computer Use。这样你能自己理解底层机制。
- 如果你更想“看效果”“试用体验”而不是自己搞太多配置 → 等待 Operator 或 Mariner 更成熟/开放。
- 在学习过程中,你可以做一个小项目,例如:用 Browser Use 让一个模型自动“浏览网页、提取信息、生成报告”,然后你对照 Operator 的演示,看它哪里做得好/有什么差。
如果你愿意,我可以帮你 整理一份“最新版”这几款代理的功能对比细节(版本、开放范围、定价、平台支持),并指出在中国/国际对比的可用性(例如:哪些地区能用、限制是什么)。要不要?
Introduction | 介绍
Preface | 前言
Welcome to "Agentic Design Patterns: A Hands-On Guide to Building Intelligent Systems." As we look across the landscape of modern artificial intelligence, we see a clear evolution from simple, reactive programs to sophisticated, autonomous entities capable of understanding context, making decisions, and interacting dynamically with their environment and other systems. These are the intelligent agents and the agentic systems they comprise.
欢迎阅读《智能体设计模式:构建智能系统的实战指南》。纵观当今人工智能领域,我们能清晰地看到一条演进路线:从简单的响应式程序,到复杂的自主实体,后者能够理解上下文、做出决策,并与所处环境及其他系统进行动态交互。这些,便是智能体以及由它们构成的智能体系统。
The advent of powerful large language models (LLMs) has provided unprecedented capabilities for understanding and generating human-like content such as text and media, serving as the cognitive engine for many of these agents. However, orchestrating these capabilities into systems that can reliably achieve complex goals requires more than just a powerful model. It requires structure, design, and a thoughtful approach to how the agent perceives, plans, acts, and interacts.
强大的大语言模型的问世,为理解和生成类人内容(如文本和媒体)提供了前所未有的能力,并担当了许多这类智能体的认知引擎。然而,要将这些能力编排成能够可靠达成复杂目标的系统,仅仅拥有一个强大的模型是远远不够的。它还需要结构、设计,以及一套经过深思熟虑的方法,来指导智能体如何感知、规划、行动和交互。
Think of building intelligent systems as creating a complex work of art or engineering on a canvas. This canvas isn't a blank visual space, but rather the underlying infrastructure and frameworks that provide the environment and tools for your agents to exist and operate. It's the foundation upon which you'll build your intelligent application, managing state, communication, tool access, and the flow of logic.
不妨将构建智能系统想象成在一块画布上创作复杂的艺术品或工程作品。这块画布并非一块空白的视觉空间,而是指那些为智能体提供生存和操作环境的底层技术设施和框架。它是您构建智能应用所依赖的基石,负责管理状态、通信、工具访问和逻辑流。
Building effectively on this agentic canvas demands more than just throwing components together. It requires understanding proven techniques – patterns – that address common challenges in designing and implementing agent behavior. Just as architectural patterns guide the construction of a building, or design patterns structure software, agentic design patterns provide reusable solutions for the recurring problems you'll face when bringing intelligent agents to life on your chosen canvas.
想在这块智能体的画布上高效构建,简单地堆砌组件是远远不够的。我们需要掌握一套行之有效的技术,也就是模式。这些模式,是专门为了解决智能体设计与实现过程中的常见挑战而存在的。这就像建筑有建筑模式,软件有设计模式一样。最终,智能体设计模式的作用,就是为那些反复出现的老问题提供一套经过验证、可复用的解决方案,帮助你将智能体成功地构建出来。
What are Agentic Systems? | 什么是智能体系统?
At its core, an agentic system is a computational entity designed to perceive its environment (both digital and potentially physical), make informed decisions based on those perceptions and a set of predefined or learned goals, and execute actions to achieve those goals autonomously. Unlike traditional software, which follows rigid, step-by-step instructions, agents exhibit a degree of flexibility and initiative.
从本质上讲,智能体系统是一种计算实体,它能够感知环境(包括数字环境和可能的物理环境),根据感知结果以及预设或学习到的目标做出决策,并自主执行行动以实现目标。与遵循严格逐步指令的传统软件不同,智能体展现出一定的灵活性和主动性。
Imagine you need a system to manage customer inquiries. A traditional system might follow a fixed script. An agentic system, however, could perceive the nuances of a customer's query, access knowledge bases, interact with other internal systems (like order management), potentially ask clarifying questions, and proactively resolve the issue, perhaps even anticipating future needs. These agents operate on the canvas of your application's infrastructure, utilizing the services and data available to them.
想象一下,你需要一个系统来管理客户咨询。传统系统可能会遵循固定的脚本。而一个智能体系统则能够感知客户提问的细微差别,访问知识库,与公司其他内部系统(如订单管理系统)交互,还可能提出澄清性问题,并主动解决问题,甚至可能预测客户未来的需求。这些智能体就在您应用程序基础设施这块画布上运行,利用提供给它们的服务和数据。
Agentic systems are often characterized by features like autonomy, allowing them to act without constant human oversight; proactiveness, initiating actions towards their goals; and reactiveness, responding effectively to changes in their environment. They are fundamentally goal-oriented, constantly working towards objectives. A critical capability is tool use, enabling them to interact with external APIs, databases, or services – effectively reaching out beyond their immediate canvas. They possess memory, retain information across interactions, and can engage in communication with users, other systems, or even other agents operating on the same or connected canvases.
智能体系统通常具备以下特征:自主性(Autonomy),使其无需持续的人工监督即可行动;主动性(Proactiveness),能主动发起行动以实现其目标;反应性(Reactiveness),能有效应对环境变化。它们以目标为导向,持续推进任务。关键能力还包括工具使用(Tool Use),使之能够与外部 API、数据库或服务交互,将触角伸出自身运行环境;它们拥有记忆(Memory),能在多次交互中保留信息,并能与用户、其他系统、乃至在相同或互联的画布上运行的其他智能体进行通信(Communication)。
Effectively realizing these characteristics introduces significant complexity. How does the agent maintain state across multiple steps on its canvas? How does it decide when and how to use a tool? How is communication between different agents managed? How do you build resilience into the system to handle unexpected outcomes or errors?
要有效地实现这些特性,会带来巨大的复杂性。智能体如何在它的画布上跨越多个步骤来维持状态?它如何决定何时以及如何使用某个工具?不同智能体之间的通信如何管理?你又该如何在系统中确保可靠性,以处理意外结果或错误?
Why Patterns Matter in Agent Development | 为什么模式在智能体开发中很重要
This complexity is precisely why agentic design patterns are indispensable. They are not rigid rules, but rather battle-tested templates or blueprints that offer proven approaches to standard design and implementation challenges in the agentic domain. By recognizing and applying these design patterns, you gain access to solutions that enhance the structure, maintainability, reliability, and efficiency of the agents you build on your canvas.
这种复杂性,恰恰凸显了智能体设计模式的不可或缺。它们并非僵化的规则,而是久经沙场的模板或蓝图,为智能体领域中标准的设计和实现挑战提供了经过验证的方法。通过识别并应用这些设计模式,你将获得一套成熟的解决方案,从而提升你在画布上所构建智能体的结构性、可维护性、可靠性和效率。
Using design patterns helps you avoid reinventing fundamental solutions for tasks like managing conversational flow, integrating external capabilities, or coordinating multiple agent actions. They provide a common language and structure that makes your agent's logic clearer and easier for others (and yourself in the future) to understand and maintain. Implementing patterns designed for error handling or state management directly contributes to building more robust and reliable systems. Leveraging these established approaches accelerates your development process, allowing you to focus on the unique aspects of your application rather than the foundational mechanics of agent behavior.
使用设计模式可以帮助你避免为管理对话流、集成外部能力或协调多智能体行动等任务「重新发明轮子」。它们提供了一种通用语言和结构,使你的智能体逻辑更清晰,也便于他人(以及未来的你自己)理解和维护。应用专为错误处理或状态管理而设计的模式,能直接帮助你构建更具鲁棒性、更可靠的系统。利用这些成熟的方法可以加速你的开发进程,让你能专注于应用程序的独有之处,而不是智能体行为的基础机制。
This book extracts 21 key design patterns that represent fundamental building blocks and techniques for constructing sophisticated agents on various technical canvases. Understanding and applying these patterns will significantly elevate your ability to design and implement intelligent systems effectively.
本书提炼出 21 个关键设计模式,它们代表了在各种技术画布上构建复杂智能体的基本构建模块和技术。理解并应用这些模式,将极大提升你有效设计和实现智能系统的能力。
Overview of the Book and How to Use It | 本书概览与使用指南
This book, "Agentic Design Patterns: A Hands-On Guide to Building Intelligent Systems," is crafted to be a practical and accessible resource. Its primary focus is on clearly explaining each agentic pattern and providing concrete, runnable code examples to demonstrate its implementation. Across 21 dedicated chapters, we will explore a diverse range of design patterns, from foundational concepts like structuring sequential operations (Prompt Chaining) and external interaction (Tool Use) to more advanced topics like collaborative work (Multi-Agent Collaboration) and self-improvement (Self-Correction).
本书《智能体设计模式:构建智能系统的实战指南》旨在成为一本实用且易于上手的资源。其核心重点在于清晰地解释每一种智能体模式,并提供具体、可运行的代码示例来演示其实现。全书用 21 个专章覆盖多种设计模式:从结构化顺序操作(提示链)、外部交互(工具使用)等基础概念,到协同工作(多智能体协作)、自我改进(反思)等进阶主题。
The book is organized chapter by chapter, with each chapter delving into a single agentic pattern. Within each chapter, you will find:
本书按章节组织,每章聚焦一种智能体模式。在每一章中,你都会看到:
A detailed Pattern Overview providing a clear explanation of the pattern and its role in agentic design.
详细的模式概述,清晰解释模式及其在智能体设计中的作用。
A section on Practical Applications & Use Cases illustrating real-world scenarios where the pattern is invaluable and the benefits it brings.
实际应用和用例部分,说明模式发挥重要作用的实际场景及其带来的好处。
A Hands-On Code Example offering practical, runnable code that demonstrates the pattern's implementation using prominent agent development frameworks. This is where you'll see how to apply the pattern within the context of a technical canvas.
实践代码示例:提供实用的、可运行的代码,演示如何使用主流的智能体开发框架来实现该模式。在这里,你将看到如何在一个技术框架下应用该模式。
Key Takeaways summarizing the most crucial points for quick review.
核心要点,总结最关键的内容以便快速回顾。
References for further exploration, providing resources for deeper learning on the pattern and related concepts.
参考资料,提供用于进一步探索的资源,帮助你更深入地学习该模式及相关概念。
While the chapters are ordered to build concepts progressively, feel free to use the book as a reference, jumping to chapters that address specific challenges you face in your own agent development projects. The appendices provide a comprehensive look at advanced prompting techniques, principles for applying AI agents in real-world environments, and an overview of essential agentic frameworks. To complement this, practical online-only tutorials are included, offering step-by-step guidance on building agents with specific platforms like AgentSpace and for the command-line interface. The emphasis throughout is on practical application; we strongly encourage you to run the code examples, experiment with them, and adapt them to build your own intelligent systems on your chosen canvas.
虽然各章节的排序是为了循序渐进地构建概念,但你完全可以将本书作为参考手册,直接跳转到那些能解决你在智能体开发项目中遇到的特定挑战的章节。附录部分全面介绍了高级提示技术、在真实环境中应用 AI 智能体的原则,以及主流智能体框架的概览。作为补充,我们还提供了仅在线上发布的实战教程,逐步指导你如何使用 AgentSpace 等特定平台以及在命令行界面中构建智能体。全书自始至终都强调实际应用;我们强烈鼓励你运行代码示例,动手实验,并将其改造应用于在你选择的画布上构建你自己的智能系统。
A great question I hear is, 'With AI changing so fast, why write a book that could be quickly outdated?' My motivation was actually the opposite. It's precisely because things are moving so quickly that we need to step back and identify the underlying principles that are solidifying. Patterns like RAG, Reflection, Routing, Memory and the others I discuss, are becoming fundamental building blocks. This book is an invitation to reflect on these core ideas, which provide the foundation we need to build upon. Humans need these reflection moments on foundation patterns.
我常被问到:「AI 日新月异,为何还要写一本可能很快过时的书?」我的动机恰恰相反:正是因为一切变化太快,我们才更需要退后一步,去识别那些正在固化成型的底层原则。诸如 RAG、反思、路由、记忆等模式,正在成为基本的构建模块。本书正是一份邀请,旨在引导大家一同审视这些核心思想,它们为我们未来的构建工作提供了必要的基础。我们正需要这样的时刻,来深入思考这些奠基性的模式。
Introduction to the Frameworks Used | 本书使用的框架介绍
To provide a tangible "canvas" for our code examples (see also Appendix), we will primarily utilize three prominent agent development frameworks. LangChain, along with its stateful extension LangGraph, provides a flexible way to chain together language models and other components, offering a robust canvas for building complex sequences and graphs of operations. Crew AI provides a structured framework specifically designed for orchestrating multiple AI agents, roles, and tasks, acting as a canvas particularly well-suited for collaborative agent systems. The Google Agent Developer Kit (Google ADK) offers tools and components for building, evaluating, and deploying agents, providing another valuable canvas, often integrated with Google's AI infrastructure.
为了给代码示例提供具体的「画布」(亦可参阅附录),本书主要采用三个主流的智能体开发框架。LangChain 及其有状态扩展 LangGraph,提供了一种灵活的方式来将语言模型和其他组件链接在一起,为构建复杂的操作序列和图谱提供了一个鲁棒的画布;Crew AI 提供了一个专为编排多个 AI 智能体、角色和任务而设计的结构化框架,它扮演的画布角色尤其适合协作型智能体系统;谷歌智能体开发者套件(Google ADK) 则提供了用于构建、评估和部署智能体的工具与组件,这是另一块极具价值的画布,通常与谷歌的 AI 基础设施集成。
These frameworks represent different facets of the agent development canvas, each with its strengths. By showing examples across these tools, you will gain a broader understanding of how the patterns can be applied regardless of the specific technical environment you choose for your agentic systems. The examples are designed to clearly illustrate the pattern's core logic and its implementation on the framework's canvas, focusing on clarity and practicality.
这些框架代表了智能体开发画布的不同侧面,各有其长处。通过展示跨越这些工具的示例,你将更广泛地理解,无论你为自己的智能体系统选择哪种具体的技术环境,这些模式都可以被应用。这些示例旨在清晰地阐明模式的核心逻辑及其在相应框架画布上的实现,重点突出清晰性和实用性。
By the end of this book, you will not only understand the fundamental concepts behind 21 essential agentic patterns but also possess the practical knowledge and code examples to apply them effectively, enabling you to build more intelligent, capable, and autonomous systems on your chosen development canvas. Let's begin this hands-on journey!
在读完本书时,你不仅将理解 21 种关键智能体设计模式的基本理念,还将收获足以落地的实践知识与代码示例,助你在所选「画布」上高效应用这些模式,构建更智能、更强大、更具自主性的系统。让我们开始这段动手实践之旅吧!
What makes an AI system an Agent? | 是什么让 AI 系统成为「智能体」?
In simple terms, an AI agent is a system designed to perceive its environment and take actions to achieve a specific goal. It's an evolution from a standard Large Language Model (LLM), enhanced with the abilities to plan, use tools, and interact with its surroundings. Think of an Agentic AI as a smart assistant that learns on the job. It follows a simple, five-step loop to get things done (see Fig.1):
简单来说,AI 智能体是一个能够感知环境并采取行动以实现特定目标的系统。它从标准大语言模型演进而来,被赋予了规划、使用工具以及与周围环境交互的能力。可以把智能体 AI 想象成一个能在工作中不断学习的智能助手。它遵循一个简单的五步循环来完成任务(见图 1)。
Get the Mission: You give it a goal, like "organize my schedule."
获取任务:你给它一个目标,比如「帮我安排日程」。
Scan the Scene: It gathers all the necessary information—reading emails, checking calendars, and accessing contacts—to understand what's happening.
分析环境:收集所有必要信息——阅读邮件、查看日历、访问联系人——以了解当前状况。
Think It Through: It devises a plan of action by considering the optimal approach to achieve the goal.
思考对策:它通过考量达成目标的最佳方法来制定一个行动计划。
Take Action: It executes the plan by sending invitations, scheduling meetings, and updating your calendar.
采取行动:通过发送邀请、安排会议、更新日历来执行计划。
Learn and Get Better: It observes successful outcomes and adapts accordingly. For example, if a meeting is rescheduled, the system learns from this event to enhance its future performance.
学习并改进:它观察成功的产出并相应地调整自身。例如,如果一个会议被重新安排,系统会从这一事件中学习,以提升其未来的表现。
Fig.1: Agentic AI functions as an intelligent assistant, continuously learning through experience. It operates via a straightforward five-step loop to accomplish tasks.
图 1:AI 智能体如同一位智能助手,通过经验持续学习。它通过一个简单的五步循环来完成任务。
Agents are becoming increasingly popular at a stunning pace. According to recent studies, a majority of large IT companies are actively using these agents, and a fifth of them just started within the past year. The financial markets are also taking notice. By the end of 2024, AI agent startups had raised more than $2 billion, and the market was valued at $5.2 billion. It's expected to explode to nearly $200 billion in value by 2034. In short, all signs point to AI agents playing a massive role in our future economy.
智能体的普及速度惊人。根据最近的研究,大多数大型 IT 公司正在积极使用这些智能体,其中五分之一的公司是在过去一年内才开始使用的。金融市场也注意到了这一点。到 2024 年底,AI 智能体初创公司已筹集了超过 20 亿美元,市场估值达到 52 亿美元。预计到 2034 年,其市场价值将爆炸式增长至近 2000 亿美元。简而言之,所有迹象都表明 AI 智能体将在我们未来的经济中扮演极为重要的角色。
In just two years, the AI paradigm has shifted dramatically, moving from simple automation to sophisticated, autonomous systems (see Fig. 2). Initially, workflows relied on basic prompts and triggers to process data with LLMs. This evolved with Retrieval-Augmented Generation (RAG), which enhanced reliability by grounding models on factual information. We then saw the development of individual AI Agents capable of using various tools. Today, we are entering the era of Agentic AI, where a team of specialized agents works in concert to achieve complex goals, marking a significant leap in AI's collaborative power.
仅仅两年时间,AI 的范式就发生了巨大转变,从简单的自动化演进为复杂的自主系统(见图 2)。最初,工作流依赖于基本的提示和触发器来通过大语言模型处理数据。随后,检索增强生成(RAG)的出现提升了系统的可靠性,因为它将模型建立在事实信息之上。接着,我们看到了能够使用各种工具的独立智能体的发展。如今,我们正在进入 AI 智能体的时代,在这个时代里,一个由专业化智能体组成的团队协同工作以实现复杂目标,这标志着AI协作能力的一次重大飞跃。
Fig 2.: Transitioning from LLMs to RAG, then to Agentic RAG, and finally to Agentic AI.
图 2:从 LLM 到 RAG,再到智能体 RAG,最终走向 AI 智能体的演进。
The intent of this book is to discuss the design patterns of how specialized agents can work in concert and collaborate to achieve complex goals, and you will see one paradigm of collaboration and interaction in each chapter.
本书旨在讨论专业化智能体如何协同工作以实现复杂目标的设计模式,你将在每一章中看到一种协作与交互的范式。
Before doing that, let's examine examples that span the range of agent complexity (see Fig. 3).
在此之前,让我们先来看几个贯穿智能体复杂度范围的例子(见图 3)。
Level 0: The Core Reasoning Engine | 0 级:核心推理引擎
While an LLM is not an agent in itself, it can serve as the reasoning core of a basic agentic system. In a 'Level 0' configuration, the LLM operates without tools, memory, or environment interaction, responding solely based on its pretrained knowledge. Its strength lies in leveraging its extensive training data to explain established concepts. The trade-off for this powerful internal reasoning is a complete lack of current-event awareness. For instance, it would be unable to name the 2025 Oscar winner for "Best Picture" if that information is outside its pre-trained knowledge.
虽然大语言模型本身不是智能体,但它可以作为基础智能体系统的推理核心。在一个「0 级」配置中,大语言模型在没有工具、记忆或环境交互的情况下运行,仅仅基于其预训练的知识进行响应。它的优势在于利用其海量的训练数据来解释已有的概念,代价是完全缺乏对当前事件的感知。例如,如果关于“2025年奥斯卡最佳影片奖”得主的信息超出了它的预训练知识范围,它将无法给出答案。
Level 1: The Connected Problem-Solver | 1 级:连接外部的问题解决者
At this level, the LLM becomes a functional agent by connecting to and utilizing external tools. Its problem-solving is no longer limited to its pre-trained knowledge. Instead, it can execute a sequence of actions to gather and process information from sources like the internet (via search) or databases (via Retrieval Augmented Generation, or RAG). For detailed information, refer to Chapter 14.
在这个级别,大语言模型通过连接并使用外部工具,摇身成为功能性智能体。它解决问题的能力不再局限于其预训练的知识。相反,它能够执行一系列动作,从互联网(通过搜索)或数据库(通过检索增强生成,即 RAG)等来源收集和处理信息。更多详细信息,请参阅第 14 章。
For instance, to find new TV shows, the agent recognizes the need for current information, uses a search tool to find it, and then synthesizes the results. Crucially, it can also use specialized tools for higher accuracy, such as calling a financial API to get the live stock price for AAPL. This ability to interact with the outside world across multiple steps is the core capability of a Level 1 agent.
例如,为了寻找新的电视节目,智能体识别出需要最新信息,于是使用搜索工具来查找,然后综合处理结果。至关重要的一点是,它还可以使用专业工具以获得更高精度,例如调用金融 API 来获取苹果公司的实时股价。这种跨多个步骤与外部世界交互的能力,正是 1 级智能体的核心。
Level 2: The Strategic Problem-Solver | 2 级:战略性问题解决者
At this level, an agent's capabilities expand significantly, encompassing strategic planning, proactive assistance, and self-improvement, with prompt engineering and context engineering as core enabling skills.
在这个级别,智能体的能力显著扩展,涵盖战略规划、主动协助和自我提升,而提示工程和上下文工程是其核心赋能技能。
First, the agent moves beyond single-tool use to tackle complex, multi-part problems through strategic problem-solving. As it executes a sequence of actions, it actively performs context engineering: the strategic process of selecting, packaging, and managing the most relevant information for each step. For example, to find a coffee shop between two locations, it first uses a mapping tool. It then engineers this output, curating a short, focused context—perhaps just a list of street names—to feed into a local search tool, preventing cognitive overload and ensuring the second step is efficient and accurate. To achieve maximum accuracy from an AI, it must be given a short, focused, and powerful context. Context engineering is the discipline that accomplishes this by strategically selecting, packaging, and managing the most critical information from all available sources. It effectively curates the model's limited attention to prevent overload and ensure high-quality, efficient performance on any given task. For detailed information, refer to the Appendix A.
首先,智能体超越了单一工具的使用,通过战略性问题解决来处理复杂、多部分的问题。在执行一系列动作时,它会主动进行上下文工程(Context Engineering):即为每一步战略性地选择、打包和管理最相关信息的过程。例如,要在两个地点之间找一家咖啡店,它首先会使用地图工具。然后,它会对输出结果进行工程化处理,筛选出一个简短、集中的上下文——也许只是一串街道名称列表——再输入给本地搜索工具,以避免认知过载,确保第二步既高效又准确。要从 AI 获得最高精度,就必须给它一个简短、专注且有力的上下文。上下文工程正是实现这一目标的学科,它通过战略性地从所有可用来源中选择、打包和管理最关键的信息来做到这一点。它有效地管理模型的有限注意力以防止过载,确保在任何给定任务上都能实现高质量、高效率的表现。更多详细信息,请参阅附录A。
This level leads to proactive and continuous operation. A travel assistant linked to your email demonstrates this by engineering the context from a verbose flight confirmation email; it selects only the key details (flight numbers, dates, locations) to package for subsequent tool calls to your calendar and a weather API.
这个级别带来主动且持续的运行方式。一个与你的邮箱关联的旅行助手就展示了这一点:它会从一封冗长的航班确认邮件中进行上下文工程,只选择关键细节(航班号、日期、地点),然后打包这些信息用于后续调用你的日历和天气 API。
In specialized fields like software engineering, the agent manages an entire workflow by applying this discipline. When assigned a bug report, it reads the report and accesses the codebase, then strategically engineers these large sources of information into a potent, focused context that allows it to efficiently write, test, and submit the correct code patch.
在软件工程等专业领域,智能体通过应用这门学科来管理整个工作流。当分配给它一个错误报告时,它会阅读报告并访问代码库,然后战略性地将这些海量信息源工程化处理成一个强有力、高度集中的上下文,使其能够高效地编写、测试并提交正确的代码补丁。
Finally, the agent achieves self-improvement by refining its own context engineering processes. When it asks for feedback on how a prompt could have been improved, it is learning how to better curate its initial inputs. This allows it to automatically improve how it packages information for future tasks, creating a powerful, automated feedback loop that increases its accuracy and efficiency over time. For detailed information, refer to Chapter 17.
最后,智能体通过优化自身的上下文工程流程来实现自我提升。当它就“某个提示本可以如何改进”而征求反馈时,它实际上是在学习如何更好地筛选其初始输入。这使其能够自动改进为未来任务打包信息的方式,从而创建一个强大的自动化反馈循环,随着时间的推移不断提高其准确性和效率。更多详细信息,请参阅第 17 章。
Fig. 3: Various instances demonstrating the spectrum of agent complexity.
图 3:展示不同复杂度智能体的实例。
Level 3: The Rise of Collaborative Multi-Agent Systems | 3 级:协作型多智能体系统的兴起
At Level 3, we see a significant paradigm shift in AI development, moving away from the pursuit of a single, all-powerful super-agent and towards the rise of sophisticated, collaborative multi-agent systems. In essence, this approach recognizes that complex challenges are often best solved not by a single generalist, but by a team of specialists working in concert. This model directly mirrors the structure of a human organization, where different departments are assigned specific roles and collaborate to tackle multi-faceted objectives. The collective strength of such a system lies in this division of labor and the synergy created through coordinated effort. For detailed information, refer to Chapter 7.
在 3 级,我们看到了 AI 发展的一次重大范式转变:不再追求单一、全能的超级智能体,而是转向发展复杂的、协作式的多智能体系统。本质上,这种方法认识到,复杂的挑战通常不是由一个通才,而是由一个协同工作的专家团队来解决的。这个模型直接映射了人类组织的结构,其中不同部门被赋予特定角色,并协作处理多方面的目标。这种系统的集体力量在于劳动分工以及通过协调努力产生的协同效应。更多详细信息,请参阅第 7 章。
To bring this concept to life, consider the intricate workflow of launching a new product. Rather than one agent attempting to handle every aspect, a "Project Manager" agent could serve as the central coordinator. This manager would orchestrate the entire process by delegating tasks to other specialized agents: a "Market Research" agent to gather consumer data, a "Product Design" agent to develop concepts, and a "Marketing" agent to craft promotional materials. The key to their success would be the seamless communication and information sharing between them, ensuring all individual efforts align to achieve the collective goal.
为了将这个概念具体化,可以想象一下发布一款新产品的复杂工作流。并非由一个智能体尝试处理所有方面,而是一个「项目经理」智能体可以作为中心协调者。这个经理会通过将任务委派给其他专业化智能体来统筹整个过程:一个「市场研究」智能体负责收集消费者数据,一个「产品设计」智能体负责开发概念,以及一个「市场营销」智能地负责制作宣传材料。它们成功的关键在于彼此之间无缝的沟通和信息共享,确保所有个体努力都统一指向集体目标。
While this vision of autonomous, team-based automation is already being developed, it's important to acknowledge the current hurdles. The effectiveness of such multi-agent systems is presently constrained by the reasoning limitations of LLMs they are using. Furthermore, their ability to genuinely learn from one another and improve as a cohesive unit is still in its early stages. Overcoming these technological bottlenecks is the critical next step, and doing so will unlock the profound promise of this level: the ability to automate entire business workflows from start to finish.
虽然这种基于团队的自主自动化愿景已在开发中,但认识到当前的障碍也很重要。这类多智能体系统的有效性目前受限于它们所使用模型的推理能力。此外,它们真正相互学习并作为一个有凝聚力的整体来改进的能力仍处于早期阶段。克服这些技术瓶颈是关键的一步,而一旦做到这一点,将释放这一级别的深远潜力:实现从头到尾自动化整个业务工作流的能力。
The Future of Agents: Top 5 Hypotheses | 智能体的未来:五大假设
AI agent development is progressing at an unprecedented pace across domains such as software automation, scientific research, and customer service among others. While current systems are impressive, they are just the beginning. The next wave of innovation will likely focus on making agents more reliable, collaborative, and deeply integrated into our lives. Here are five leading hypotheses for what's next (see Fig. 4).
AI 智能体开发正在软件自动化、科学研究和客户服务等领域以前所未有的速度推进。虽然当前的系统令人印象深刻,但它们仅仅是开始。下一波创新浪潮可能会聚焦于让智能体更可靠、更具协作性,并更深度融入我们的生活。以下是关于未来的五个主要假说(见图 4)。
Hypothesis 1: The Emergence of the Generalist Agent | 假设 1:通用智能体的崛起
The first hypothesis is that AI agents will evolve from narrow specialists into true generalists capable of managing complex, ambiguous, and long-term goals with high reliability. For instance, you could give an agent a simple prompt like, "Plan my company's offsite retreat for 30 people in Lisbon next quarter." The agent would then manage the entire project for weeks, handling everything from budget approvals and flight negotiations to venue selection and creating a detailed itinerary from employee feedback, all while providing regular updates. Achieving this level of autonomy will require fundamental breakthroughs in AI reasoning, memory, and near-perfect reliability. An alternative, yet not mutually exclusive, approach is the rise of Small Language Models (SLMs). This "Lego-like" concept involves composing systems from small, specialized expert agents rather than scaling up a single monolithic model. This method promises systems that are cheaper, faster to debug, and easier to deploy. Ultimately, the development of large generalist models and the composition of smaller specialized ones are both plausible paths forward, and they could even complement each other.
第一个假设是,AI 智能体将从狭隘的专家演变为真正的通用型选手,能够高可靠性地管理复杂、模糊和长期的目标。例如,你可以给智能体一个简单的提示,如「为我们公司 30 名员工筹划下个季度在里斯本的异地团建」。随后,这个智能体将管理整个项目长达数周,处理从预算审批、航班谈判到场地选择,再到根据员工反馈创建详细行程的所有事宜,并同时提供定期更新。实现这种级别的自主性将需要在 AI 推理、记忆与近乎完美可靠性方面取得根本性突破。一种替代性但并非相互排斥的方法是小型语言模型(SLM)的兴起。这种「乐高式」的概念涉及用小型的、专业化的专家智能体来组合成系统,而不是扩展单一的巨型模型。这种方法有望使系统更便宜、调试更快、部署更容易。最终,大型通用模型的发展和小型专业模型的组合都是未来可行的路径,它们甚至可能相得益彰。
Hypothesis 2: Deep Personalization and Proactive Goal Discovery | 假设 2:深度个性化与主动发现目标
The second hypothesis posits that agents will become deeply personalised and proactive partners. We are witnessing the emergence of a new class of agent: the proactive partner. By learning from your unique patterns and goals, these systems are beginning to shift from just following orders to anticipating your needs. AI systems operate as agents when they move beyond simply responding to chats or instructions. They initiate and execute tasks on behalf of the user, actively collaborating in the process. This moves beyond simple task execution into the realm of proactive goal discovery.
第二个假设认为智能体将成为深度个性化且主动的合作伙伴。我们正在见证类新型智能体的诞生:主动合作伙伴。通过学习你独特的模式与目标,这些系统开始从仅仅遵循命令,转向预测你的需求。当 AI 系统超越简单地响应聊天或指令时,它们便作为智能体在运作。它们代表用户发起并执行任务,在过程中积极协作。这超越了简单的任务执行,进入主动目标发现的领域。
For instance, if you're exploring sustainable energy, the agent might identify your latent goal and proactively support it by suggesting courses or summarizing research. While these systems are still developing, their trajectory is clear. They will become increasingly proactive, learning to take initiative on your behalf when highly confident that the action will be helpful. Ultimately, the agent becomes an indispensable ally, helping you discover and achieve ambitions you have yet to fully articulate.
例如,如果你正在探索可持续能源,智能体可能会识别你的潜在目标,并主动支持它,比如推荐相关课程或总结研究报告。虽然这些系统仍在发展中,但它们的轨迹很清楚。它们将变得越来越主动,并在高度确信该行动会有帮助时,学会代表你采取行动。最终,智能体将成为不可或缺的盟友,帮助你发现并实现那些你尚未完全清晰表达的抱负。
Fig. 4: Five hypotheses about the future of agents
图 4:关于智能体未来的五个假设
Hypothesis 3: Embodiment and Physical World Interaction | 假设 3:具身化与物理世界交互
This hypothesis foresees agents breaking free from their purely digital confines to operate in the physical world. By integrating agentic AI with robotics, we will see the rise of "embodied agents." Instead of just booking a handyman, you might ask your home agent to fix a leaky tap. The agent would use its vision sensors to perceive the problem, access a library of plumbing knowledge to formulate a plan, and then control its robotic manipulators with precision to perform the repair. This would represent a monumental step, bridging the gap between digital intelligence and physical action, and transforming everything from manufacturing and logistics to elder care and home maintenance.
这个假说预见智能体将挣脱纯粹的数字束缚,在物理世界中运作。通过将 AI 智能体与机器人技术相结合,我们将看到具身智能体(Embodied Agents)的兴起。你或许不再是仅仅预订一个水电工,而是直接让你的家庭智能体修理一个漏水的水龙头。智能体将使用其视觉传感器来感知问题,访问一个管道知识库来制定计划,然后精确地控制其机械臂来执行修复。这将是里程碑式的一步,弥合了数字智能与物理行动之间的鸿沟,并将彻底改变从制造业、物流到老年护理和家庭维护的方方面面。
Hypothesis 4: The Agent-Driven Economy | 假设 4:智能体驱动的经济
The fourth hypothesis is that highly autonomous agents will become active participants in the economy, creating new markets and business models. We could see agents acting as independent economic entities, tasked with maximising a specific outcome, such as profit. An entrepreneur could launch an agent to run an entire e-commerce business. The agent would identify trending products by analysing social media, generate marketing copy and visuals, manage supply chain logistics by interacting with other automated systems, and dynamically adjust pricing based on real-time demand. This shift would create a new, hyper-efficient "agent economy" operating at a speed and scale impossible for humans to manage directly.
第四个假设是,高度自主的智能体将成为经济中的积极参与者,创造新的市场和商业模式。我们可能会看到智能体作为独立的经济实体,其任务是最大化一个特定结果,例如利润。企业家可以启动一个智能体来运营整个电子商务业务。该智能体将通过分析社交媒体来识别热门产品,生成营销文案和视觉材料,通过与其他自动化系统交互来管理供应链物流,并根据实时需求动态调整定价。这一转变将创造一个全新的、超高效率的「智能体经济」,其运行速度和规模是人类无法直接管理的。
Hypothesis 5: The Goal-Driven, Metamorphic Multi-Agent System | 假设 5:目标驱动的、可演化的多智能体系统
This hypothesis posits the emergence of intelligent systems that operate not from explicit programming, but from a declared goal. The user simply states the desired outcome, and the system autonomously figures out how to achieve it. This marks a fundamental shift towards metamorphic multi-agent systems capable of true self-improvement at both the individual and collective levels.
该假说断言,将会出现一种并非基于显式编程,而是基于一个声明性目标来运作的智能系统。用户只需陈述期望的结果,系统便能自主地找出如何实现它。这标志着向可演化多智能体系统的根本性转变,这种系统能够在个体和集体层面实现真正的自我提升。
This system would be a dynamic entity, not a single agent. It would have the ability to analyze its own performance and modify the topology of its multi-agent workforce, creating, duplicating, or removing agents as needed to form the most effective team for the task at hand. This evolution happens at multiple levels:
这个系统将是一个动态实体,而非单个智能体。它将有能力分析自身表现并修改其多智能体工作团队的拓扑结构,根据需要创建、复制或移除智能体,以组成最适合当前任务的团队。这种演化发生在多个层面:
Architectural Modification: At the deepest level, individual agents can rewrite their own source code and re-architect their internal structures for higher efficiency, as in the original hypothesis.
架构层面的修改:在最深层次,单个智能体可以重写自身的源代码并重构其内部结构以提高效率,正如最初的假说所设想的那样。
Instructional Modification: At a higher level, the system continuously performs automatic prompt engineering and context engineering. It refines the instructions and information given to each agent, ensuring they are operating with optimal guidance without any human intervention.
指令层面的修改:在更高层次,系统持续进行自动化的提示工程和上下文工程。它不断优化给予每个智能体的指令和信息,确保它们在没有任何人工干预的情况下以最佳指导进行运作。
For instance, an entrepreneur would simply declare the intent: "Launch a successful e-commerce business selling artisanal coffee." The system, without further programming, would spring into action. It might initially spawn a "Market Research" agent and a "Branding" agent. Based on the initial findings, it could decide to remove the branding agent and spawn three new specialized agents: a "Logo Design" agent, a "Webstore Platform" agent, and a "Supply Chain" agent. It would constantly tune their internal prompts for better performance. If the webstore agent becomes a bottleneck, the system might duplicate it into three parallel agents to work on different parts of the site, effectively re-architecting its own structure on the fly to best achieve the declared goal.
例如,企业家只需声明一个意图:「启动一个成功的手工咖啡电商业务」。系统无需进一步编程即刻行动:它可能先生成「市场研究」与「品牌」两个智能体;随后基于初步结论,移除品牌智能体,并衍生出三个更细分的角色:「Logo 设计」「网店平台」「供应链」。系统会持续调校它们的内部提示以优化表现。如果网店智能体成为瓶颈,系统可能会将其复制成三个并行的智能体来处理网站的不同部分,从而动态地重构自身结构,以更好地实现声明的目标。
Conclusion | 总结
In essence, an AI agent represents a significant leap from traditional models, functioning as an autonomous system that perceives, plans, and acts to achieve specific goals. The evolution of this technology is advancing from single, tool-using agents to complex, collaborative multi-agent systems that tackle multifaceted objectives. Future hypotheses predict the emergence of generalist, personalized, and even physically embodied agents that will become active participants in the economy. This ongoing development signals a major paradigm shift towards self-improving, goal-driven systems poised to automate entire workflows and fundamentally redefine our relationship with technology.
本质上,AI 智能体代表了从传统模型的一次重大飞跃,它作为一个自主系统,能够感知、规划和行动以达成特定目标。这项技术正从使用单一工具的智能体,演进为处理多方面目标的复杂、协作式多智能体系统。未来的假说预测了通用型、个性化、乃至物理具身化的智能体的出现,它们将成为经济活动的积极参与者。这一持续的发展标志着一次重大的范式转变,即向能够自动化整个工作流并从根本上重新定义我们与技术关系的、自我提升的、目标驱动的系统迈进。
References | 参考文献
Cloudera, Inc. (April 2025), 96% of enterprises are increasing their use of AI agents. https://www.cloudera.com/about/news-and-blogs/press-releases/2025-04-16-96-percent-of-enterprises-are-expanding-use-of-ai-agents-according-to-latest-data-from-cloudera.html
Cloudera, Inc.(2025 年 4 月),96% 的企业正在增加对 AI 智能体的使用。
Autonomous generative AI agents: https://www.deloitte.com/us/en/insights/industry/technology/technology-media-and-telecom-predictions/2025/autonomous-generative-ai-agents-still-under-development.html
自主生成式 AI 智能体:
Market.us. Global Agentic AI Market Size, Trends and Forecast 2025–2034. https://market.us/report/agentic-ai-market/
Market.us. 全球智能体 AI 市场规模、趋势和 2025-2034 年预测。
书籍各章的机翻md文件:
《Agentic Design Patterns》目录及前言
《Agentic Design Patterns》第一章 提示链
《Agentic Design Patterns》第二章 路由
《Agentic Design Patterns》第三章 并行化
《Agentic Design Patterns》第四章 反思
《Agentic Design Patterns》第五章 工具使用(函数调用)
《Agentic Design Patterns》第六章 规划
《Agentic Design Patterns》第七章 多智能体协作
《Agentic Design Patterns》第八章 记忆管理
《Agentic Design Patterns》第九章 学习与适应
《Agentic Design Patterns》第十章 模型上下文协议
《Agentic Design Patterns》第十一章 目标设定与监控
《Agentic Design Patterns》第十二章 异常处理与恢复
《Agentic Design Patterns》第十三章 人机协同
《Agentic Design Patterns》第十四章 知识检索(RAG)
《Agentic Design Patterns》第十五章 智能体间通信 (A2A)
《Agentic Design Patterns》第十六章 资源感知优化
《Agentic Design Patterns》第十七章 推理技术
《Agentic Design Patterns》第十八章 护栏/安全模式
《Agentic Design Patterns》第十九章 评估与监控
《Agentic Design Patterns》第二十章 优先级排序
《Agentic Design Patterns》第二十一章 探索与发现