openai agents SDK原理详解

硅星纯牛码

已于 2025-03-12 23:16:57 修改

阅读量1.3k

点赞数 24

分类专栏： agent 文章标签： agent

于 2025-03-12 23:12:45 首次发布

本文链接：https://blog.csdn.net/spatial_coder/article/details/146204123

版权

agent 专栏收录该内容

5 篇文章

订阅专栏

文章目录

openai agents开发新套件：Responses API和Agents SDK
- Responses API⁠
agents SDK
REF

openai agents开发新套件：Responses API和Agents SDK

上周manus大火，这周openai也坐不住了。3月12日凌晨，openai推出了新的agent开发套件，包括网页搜索、文件搜索、电脑使用、Responses API等，以及Agents SDK。
在这里插入图片描述

Responses API⁠

Responses API 是 OpenAI 推出的一种新的 API 原语，它结合了 Chat Completions API 的简洁性和 Assistants API 的工具使用能力，旨在帮助开发者更轻松地利用 OpenAI 的内置工具构建agents.
新增Responses API⁠的Built-in tools包括web search、file search、computer use。

Web Search：允许模型通过网络搜索获取最新信息，支持 gpt-4o 和 gpt-4o-mini 模型。
File Search：支持从大量文档中检索相关信息，支持多种文件类型、查询优化、元数据过滤和自定义重排。
Computer Use：基于Computer-Using Agent (CUA) 模型，允许开发者通过模拟鼠标和键盘操作来自动化计算机任务，例如浏览器自动化、数据录入等。

web search

const response = await openai.responses.create({
    model: "gpt-4o",
    tools: [ { type: "web_search_preview" } ],
    input: "What was a positive news story that happened today?",
});

console.log(response.output_text);

Web Search 的性能由与 ChatGPT 搜索相同的模型驱动。在 SimpleQA 基准测试中，GPT-4o search preview 和 GPT-4o mini search preview 分别达到了 90% 和 88% 的准确率。
在这里插入图片描述

File search
开发者现在可以利用改进后的文件搜索工具，轻松从海量文档中检索相关信息。该工具支持多种文件类型、查询优化、元数据过滤以及自定义重排，能够快速且准确地返回搜索结果。js的调用代码为：

const productDocs = await openai.vectorStores.create({
    name: "Product Documentation",
    file_ids: [file1.id, file2.id, file3.id],
});

const response = await openai.responses.create({
    model: "gpt-4o-mini",
    tools: [{
        type: "file_search",
        vector_store_ids: [productDocs.id],
    }],
    input: "What is deep research by OpenAI?",
});

console.log(response.output_text);

computer use
Computer Use tool是responses API内置的计算机使用工具，能够捕捉模型生成的鼠标和键盘操作，并将其直接转换为开发者环境中的可执行命令，从而实现计算机任务的自动化。
js调用代码为：

const response = await openai.responses.create({
    model: "computer-use-preview",
    tools: [{
        type: "computer_use_preview", # 说明工具类型
        display_width: 1024,
        display_height: 768,
        environment: "browser",
    }],
    truncation: "auto",
    input: "I'm looking for a new camera. Help me find the best one.",
});

console.log(response.output);

该工具由支持 Operator 的相同模型——计算机使用代理（Computer-Using Agent, CUA）模型驱动。这一研究预览模型创造了新的最高水平记录：在 OSWorld上成功率达到 38.1%，在 WebArena 上达到 58.1%，以及在 WebVoyager上达到 87%。
在这里插入图片描述

agents SDK

agents SDK是一个多智能体工作流协作的开源框架，可视为openai去年发布的swarm框架的升级版。
swarm框架的源码讲解可以查看这篇文章OpenAI Swarm框架源码详解及案例实战。

Agents SDK 在设计时受到社区中其他优秀项目（如 Pydantic、Griffe 和 MkDocs）的启发。该框架与 openai的Responses API 和 Chat Completions API 兼容。

文档地址：https://openai.github.io/openai-agents-python/
代码地址：https://github.com/openai/openai-agents-python

相较于swarm，agents SDK主要新增了两个功能：

防护栏（Guardrails）：
提供可配置的安全检查，用于输入和输出验证，确保agents行为符合安全和合规要求。
追踪与可观测性（Tracing & Observability）：
可视化agents的执行轨迹，帮助开发者调试和优化性能，确保工作流的高效运行。

Guardrails: 智能体安全护栏

guardrails的实现在https://github.com/openai/openai-agents-python/blob/main/src/agents/guardrail.py中，包含了智能体输入护栏和输出护栏两类防护机制。
防护栏的输出结果用GuardrailFunctionOutput类封装，该类包含输入信息output_info和防护栏是否被触发 tripwire_triggered两个方法。若防护栏被触发，智能体的执行将会挂起。

@dataclass
class GuardrailFunctionOutput:
    """The output of a guardrail function."""

    output_info: Any
    """
    Optional information about the guardrail's output. For example, the guardrail could include
    information about the checks it performed and granular results.
    """

    tripwire_triggered: bool
    """
    Whether the tripwire was triggered. If triggered, the agent's execution will be halted.
    """

输入防护栏

输入防护栏应用于最初的智能体，分三个运行防护。首先，护栏接收传递给agent的相同输入。
接下来，运行护栏函数以生成GuardrailFunctionOutput，然后将其包装在InputGuardrailResult中
最后，检查.tripwire_triggered是否为真。如果为true，则会引发InputGuardrailTripwireTriggered异常，因此可以适当地响应用户或处理异常。
输入防护栏的功能具体有三个类实现。
(1) InputGuardrail：类，定义输入防护栏，包含防护栏函数和名称，用于执行输入检查。
InputGuardrail类使用@dataclass装饰器来简化类的定义，并自动生成常见的方法，如 __init__、__repr__ 和__eq__等。InputGuardrail封装了防护栏函数（guardrail_function）和防护栏的名称（name），并提供了一个 run 方法来执行防护栏函数。

@dataclass
class InputGuardrail(Generic[TContext]):
    guardrail_function: callable[
        [RunContextWrapper[TContext], Agent[Any], str | list[TResponseInputItem]],
        MaybeAwaitable[GuardrailFunctionOutput],
    ]
    """A function that receives the the agent input and the context, and returns a
     `GuardrailResult`. The result marks whether the tripwire was triggered, and can optionally
     include information about the guardrail's output.
    """
    
    name : str | None = None 
    """The name of the guardrail, used for tracing. If not provided, we'll use the guardrail
    function's name.
    """
    def get_name(self) -> str:
        if self.name:
            return self.name
        return self.guardrail_function.__name__
    
    async def run(
        self,
        agent: Agent[Any],
        input: str | list[TResponseInputItem],
        context: RunContextWrapper[TContext],
    )-> InputGuardrailResult:
        if not callable(self.guardrail_function):
            raise UserError(f"Guardrail function must be callable, got {self.guardrail_function}")

        output = self.guardrail_function(context,agent,input)
        if inspect.isawaitable(output):
            return InputGuardrailResult(
                guardrail=self,
                output = await output,
            )
        return InputGuardrailResult(
            guardrail=self,
            output=output,
        )

(2) InputGuardrailResult：数据类，封装输入防护栏的运行结果，便于后续处理。该类的属性包括guardrail(运行的输入防护栏对象)和output(防护栏函数的输出结果)。

@dataclass
class InputGuardrailResult:
    """The result of a guardrail run."""

    guardrail: InputGuardrail[Any]
    """
    The guardrail that was run.
    """

    output: GuardrailFunctionOutput
    """The output of the guardrail function."""

(3) input_guardrail：装饰器，用于将将普通函数转换为 InputGuardrail 输入防护栏对象，简化防护栏的定义和配置。

def input_guardrail(
    func: _InputGuardrailFuncSync[TContext_co]
    | _InputGuardrailFuncAsync[TContext_co]
    | None = None,
    *,
    name: str | None = None,
) -> (
    InputGuardrail[TContext_co]
    | Callable[
        [_InputGuardrailFuncSync[TContext_co] | _InputGuardrailFuncAsync[TContext_co]],
        InputGuardrail[TContext_co],
    ]
):
    """
    Decorator that transforms a sync or async function into an `InputGuardrail`.
    It can be used directly (no parentheses) or with keyword args, e.g.:

        @input_guardrail
        def my_sync_guardrail(...): ...

        @input_guardrail(name="guardrail_name")
        async def my_async_guardrail(...): ...
    """
    def decorator(
            f: _InputGuardrailFuncSync[TContext_co] | _InputGuardrailFuncAsync[TContext_co],
    )-> InputGuardrail[TContext_co]:
        return InputGuardrail(guardrail_function=f,name=name)
    if func is not None:
         # Decorator was used without parentheses
        return decorator(func)
     # Decorator used with keyword arguments
    return decorator

input_guardrail装饰器支持同步、异步、带关键字参数三种不同的使用方式，通过@overload提供了三种重载定义：

直接装饰同步函数


# 输入防护栏函数类型别名定义
_InputGuardrailFuncSync = Callable[
    [RunContextWrapper[TContext_co], "Agent[Any]", Union[str, list[TResponseInputItem]]],
    GuardrailFunctionOutput,
]
_InputGuardrailFuncAsync = Callable[
    [RunContextWrapper[TContext_co], "Agent[Any]", Union[str, list[TResponseInputItem]]],
    Awaitable[GuardrailFunctionOutput],
]

@overload
def input_guardrail(
    func: _InputGuardrailFuncSync[TContext_co],
) -> InputGuardrail[TContext_co]: ...

直接装饰异步函数

@overload
def input_guardrail(
    func: _InputGuardrailFuncAsync[TContext_co],
) -> InputGuardrail[TContext_co]: ...

使用关键字参数（如@input_guardrail(name=“guardrail_name”)）


@overload
def input_guardrail(
    *,
    name: str | None = None,
) -> callable[
    [_InputGuardrailFuncSync[TContext_co] | _InputGuardrailFuncAsync[TContext_co]],
    InputGuardrail[TContext_co],
]: ...

输出防护栏

输出防护栏应用于最终的agent.
输出防护栏的工作同样分为三个步骤。首先，护栏接收传递给智能体的相同输入。
接下来，护栏函数运行以生成GuardrailFunctionOutput，然后将其封装在OutputGuardrailResult中
最后，检查 .tripwire_triggered 是否为 true。如果为 true，则会引发OutputGuardrailTripwireTriggered 异常，以便可以适当地响应用户或处理该异常。
输出防护栏的三个主要类OutputGuardrail、OutputGuardrailResult和output_guardrail的属性和方法，与输入防护栏类似。

Tracing：智能体行为观测追踪

Tracing模块用于监控和追踪agents的行为，通过在关键事件（如 Span 的开始和结束）触发时调用处理器（Processor）来记录和处理数据。
Tracing模块的核心组件包括trace、span、TracingProcessor和其他工具函数。

Tracing UI追踪agent的行为案例如下图。
在这里插入图片描述

trace

Trace 是追踪的根对象，表示一个完整的逻辑工作流。它记录了从开始到结束的整个流程，并可以包含多个 Span。实现代码在src/agents/tracing/traces.py,包括进入追踪、退出追踪、开始追踪、结束追踪、导出追踪为字典等抽象方法。

class Trace:
    """
    A trace is the root level object that tracing creates. It represents a logical "workflow".
    """

    @abc.abstractmethod
    def __enter__(self) -> Trace:
        pass

    @abc.abstractmethod
    def __exit__(self, exc_type, exc_val, exc_tb):
        pass

    @abc.abstractmethod
    def start(self, mark_as_current: bool = False):
        """
        Start the trace.

        Args:
            mark_as_current: If true, the trace will be marked as the current trace.
        """
        pass

    @abc.abstractmethod
    def finish(self, reset_current: bool = False):
        """
        Finish the trace.

        Args:
            reset_current: If true, the trace will be reset as the current trace.
        """
        pass

    @property
    @abc.abstractmethod
    def trace_id(self) -> str:
        """
        The trace ID.
        """
        pass

    @property
    @abc.abstractmethod
    def name(self) -> str:
        """
        The name of the workflow being traced.
        """
        pass

    @abc.abstractmethod
    def export(self) -> dict[str, Any] | None:
        """
        Export the trace as a dictionary.
        """
        pass

span

Span 表示一个具体的操作或任务。它可以记录操作的开始和结束时间、错误信息以及其他元数据。
Tracing 模块提供了多种函数用于创建不同类型的 Span，例如：
agent_span：用于创建Agent相关的 Span。
custom_span：用于创建自定义 Span。
function_span：用于创建函数调用相关的 Span。
generation_span：用于记录模型生成的详细信息。
response_span：用于记录模型响应信息。
guardrail_span：用于记录防护栏（Guardrail）的触发情况。
handoff_span：用于记录代理之间的交接操作。

processors

processors是一个接口，用于处理 Trace 和 Span 的生命周期事件。它是 Tracing 模块的扩展点，允许开发者自定义数据处理逻辑。

使用示例：创建辅导孩子写作业的多个智能体教师

安装agents sdk:

pip install openai-agents

在开发环境中导入自己的openai api：

export OPENAI_API_KEY=sk-…

创建不同学科的agent、检查输入是否是家庭作业问题的agent,并创建异步式输入检查guardrail：


from agents import Agent, InputGuardrail, GuardrailFunctionOutput, Runner
from pydantic import BaseModel
import asyncio

# 定义一个 Pydantic 模型，用于表示作业检查的输出结果
class HomeworkOutput(BaseModel):
    is_homework: bool  # 是否是作业相关问题
    reasoning: str  # 判断的依据

# 创建一个名为 "Guardrail check" 的agent，用于检查用户是否在询问作业相关问题
guardrail_agent = Agent(
    name="Guardrail check", 
    instructions="Check if the user is asking about homework.", 
    output_type=HomeworkOutput,  # 输出类型
)

# 创建一个名为 "Math Tutor" agent，用于解答数学问题
math_tutor_agent = Agent(
    name="Math Tutor",  
    handoff_description="Specialist agent for math questions",  
    instructions="You provide help with math problems. Explain your reasoning at each step and include examples",  # 指令
)

# 创建一个名为 "History Tutor" 的agent，用于解答历史问题
history_tutor_agent = Agent(
    name="History Tutor",  
    handoff_description="Specialist agent for historical questions",  
    instructions="You provide assistance with historical queries. Explain important events and context clearly.",  
)

# 定义一个异步函数，用于检查用户输入是否与作业相关
async def homework_guardrail(ctx, agent, input_data):
    # 使用 Runner 运行 guardrail_agent，并传入输入数据和上下文
    result = await Runner.run(guardrail_agent, input_data, context=ctx.context)
    # 将结果的输出转换为 HomeworkOutput 类型
    final_output = result.final_output_as(HomeworkOutput)
    # 返回 GuardrailFunctionOutput，包含输出信息和是否触发警报
    return GuardrailFunctionOutput(
        output_info=final_output,
        tripwire_triggered=not final_output.is_homework,  # 如果不是作业相关问题，则触发警报
    )

# 创建一个名为 "Triage Agent" 的agent，用于根据用户的问题选择合适的代理
triage_agent = Agent(
    name="Triage Agent",
    instructions="You determine which agent to use based on the user's homework question",  
    handoffs=[history_tutor_agent, math_tutor_agent],  
    input_guardrails=[  # 输入防护栏
        InputGuardrail(guardrail_function=homework_guardrail),  # 使用 homework_guardrail 函数作为防护栏
    ],
)

# 定义主函数，用于运行 triage_agent 并打印结果
async def main():
    # 运行 triage_agent，传入问题 "who was the first president of the united states?"
    result = await Runner.run(triage_agent, "who was the first president of the united states?")
    print(result.final_output)  # 打印结果

    # 运行 triage_agent，传入问题 "what is life"
    result = await Runner.run(triage_agent, "what is life")
    print(result.final_output)  # 打印结果

# 如果直接运行此脚本，则执行 main 函数
if __name__ == "__main__":
    asyncio.run(main())

REF

https://openai.com/index/new-tools-for-building-agents/
https://platform.openai.com/docs/guides/agents-sdk
https://x.com/OpenAIDevs/status/1899531225468969240
https://openai.github.io/openai-agents-python/
https://github.com/openai/openai-agents-python
https://openai.github.io/openai-agents-python/ref/guardrail/#agents.guardrail.input_guardrail