openai agents SDK原理详解

openai agents开发新套件:Responses API和Agents SDK

上周manus大火,这周openai也坐不住了。3月12日凌晨,openai推出了新的agent开发套件,包括网页搜索、文件搜索、电脑使用、Responses API等,以及Agents SDK。
在这里插入图片描述

Responses API⁠

Responses API 是 OpenAI 推出的一种新的 API 原语,它结合了 Chat Completions API 的简洁性和 Assistants API 的工具使用能力,旨在帮助开发者更轻松地利用 OpenAI 的内置工具构建agents.
新增Responses API⁠的Built-in tools包括web search、file search、computer use。

  • Web Search:允许模型通过网络搜索获取最新信息,支持 gpt-4o 和 gpt-4o-mini 模型。
  • File Search:支持从大量文档中检索相关信息,支持多种文件类型、查询优化、元数据过滤和自定义重排。
  • Computer Use:基于Computer-Using Agent (CUA) 模型,允许开发者通过模拟鼠标和键盘操作来自动化计算机任务,例如浏览器自动化、数据录入等。

web search

const response = await openai.responses.create({
    model: "gpt-4o",
    tools: [ { type: "web_search_preview" } ],
    input: "What was a positive news story that happened today?",
});

console.log(response.output_text);

Web Search 的性能由与 ChatGPT 搜索相同的模型驱动。在 SimpleQA 基准测试中,GPT-4o search preview 和 GPT-4o mini search preview 分别达到了 90% 和 88% 的准确率。
在这里插入图片描述

File search
开发者现在可以利用改进后的文件搜索工具,轻松从海量文档中检索相关信息。该工具支持多种文件类型、查询优化、元数据过滤以及自定义重排,能够快速且准确地返回搜索结果。js的调用代码为:

const productDocs = await openai.vectorStores.create({
    name: "Product Documentation",
    file_ids: [file1.id, file2.id, file3.id],
});

const response = await openai.responses.create({
    model: "gpt-4o-mini",
    tools: [{
        type: "file_search",
        vector_store_ids: [productDocs.id],
    }],
    input: "What is deep research by OpenAI?",
});

console.log(response.output_text);

computer use
Computer Use tool是responses API内置的计算机使用工具,能够捕捉模型生成的鼠标和键盘操作,并将其直接转换为开发者环境中的可执行命令,从而实现计算机任务的自动化。
js调用代码为:

const response = await openai.responses.create({
    model: "computer-use-preview",
    tools: [{
        type: "computer_use_preview", # 说明工具类型
        display_width: 1024,
        display_height: 768,
        environment: "browser",
    }],
    truncation: "auto",
    input: "I'm looking for a new camera. Help me find the best one.",
});

console.log(response.output);

该工具由支持 Operator 的相同模型——计算机使用代理(Computer-Using Agent, CUA)模型驱动。这一研究预览模型创造了新的最高水平记录:在 OSWorld上成功率达到 38.1%,在 WebArena 上达到 58.1%,以及在 WebVoyager上达到 87%。
在这里插入图片描述


agents SDK

agents SDK是一个多智能体工作流协作的开源框架,可视为openai去年发布的swarm框架的升级版。
swarm框架的源码讲解可以查看这篇文章OpenAI Swarm框架源码详解及案例实战。

Agents SDK 在设计时受到社区中其他优秀项目(如 Pydantic、Griffe 和 MkDocs)的启发。该框架与 openai的Responses API 和 Chat Completions API 兼容。

文档地址:https://openai.github.io/openai-agents-python/
代码地址:https://github.com/openai/openai-agents-python

相较于swarm,agents SDK主要新增了两个功能:

  • 防护栏(Guardrails):
    提供可配置的安全检查,用于输入和输出验证,确保agents行为符合安全和合规要求。
  • 追踪与可观测性(Tracing & Observability):
    可视化agents的执行轨迹,帮助开发者调试和优化性能,确保工作流的高效运行。

Guardrails: 智能体安全护栏

guardrails的实现在https://github.com/openai/openai-agents-python/blob/main/src/agents/guardrail.py中,包含了智能体输入护栏和输出护栏两类防护机制。
防护栏的输出结果用GuardrailFunctionOutput类封装,该类包含输入信息output_info和防护栏是否被触发 tripwire_triggered两个方法。若防护栏被触发,智能体的执行将会挂起。

@dataclass
class GuardrailFunctionOutput:
    """The output of a guardrail function."""

    output_info: Any
    """
    Optional information about the guardrail's output. For example, the guardrail could include
    information about the checks it performed and granular results.
    """

    tripwire_triggered: bool
    """
    Whether the tripwire was triggered. If triggered, the agent's execution will be halted.
    """

输入防护栏

输入防护栏应用于最初的智能体,分三个运行防护。首先,护栏接收传递给agent的相同输入。
接下来,运行护栏函数以生成GuardrailFunctionOutput,然后将其包装在InputGuardrailResult中
最后,检查.tripwire_triggered是否为真。如果为true,则会引发InputGuardrailTripwireTriggered异常,因此可以适当地响应用户或处理异常。
输入防护栏的功能具体有三个类实现。
(1) InputGuardrail:类,定义输入防护栏,包含防护栏函数和名称,用于执行输入检查。
InputGuardrail类使用@dataclass装饰器来简化类的定义,并自动生成常见的方法,如 __init____repr__ __eq__等。InputGuardrail封装了防护栏函数(guardrail_function)和防护栏的名称(name),并提供了一个 run 方法来执行防护栏函数。

@dataclass
class InputGuardrail(Generic[TContext]):
    guardrail_function: callable[
        [RunContextWrapper[TContext], Agent[Any], str | list[TResponseInputItem]],
        MaybeAwaitable[GuardrailFunctionOutput],
    ]
    """A function that receives the the agent input and the context, and returns a
     `GuardrailResult`. The result marks whether the tripwire was triggered, and can optionally
     include information about the guardrail's output.
    """
    
    name : str | None = None 
    """The name of the guardrail, used for tracing. If not provided, we'll use the guardrail
    function's name.
    """
    def get_name(self) -> str:
        if self.name:
            return self.name
        return self.guardrail_function.__name__
    
    async def run(
        self,
        agent: Agent[Any],
        input: str | list[TResponseInputItem],
        context: RunContextWrapper[TContext],
    )-> InputGuardrailResult:
        if not callable(self.guardrail_function):
            raise UserError(f"Guardrail function must be callable, got {self.guardrail_function}")

        output = self.guardrail_function(context,agent,input)
        if inspect.isawaitable(output):
            return InputGuardrailResult(
                guardrail=self,
                output = await output,
            )
        return InputGuardrailResult(
            guardrail=self,
            output=output,
        )

(2) InputGuardrailResult:数据类,封装输入防护栏的运行结果,便于后续处理。该类的属性包括guardrail(运行的输入防护栏对象)和output(防护栏函数的输出结果)。

@dataclass
class InputGuardrailResult:
    """The result of a guardrail run."""

    guardrail: InputGuardrail[Any]
    """
    The guardrail that was run.
    """

    output: GuardrailFunctionOutput
    """The output of the guardrail function."""

(3) input_guardrail:装饰器,用于将将普通函数转换为 InputGuardrail 输入防护栏对象,简化防护栏的定义和配置。

def input_guardrail(
    func: _InputGuardrailFuncSync[TContext_co]
    | _InputGuardrailFuncAsync[TContext_co]
    | None = None,
    *,
    name: str | None = None,
) -> (
    InputGuardrail[TContext_co]
    | Callable[
        [_InputGuardrailFuncSync[TContext_co] | _InputGuardrailFuncAsync[TContext_co]],
        InputGuardrail[TContext_co],
    ]
):
    """
    Decorator that transforms a sync or async function into an `InputGuardrail`.
    It can be used directly (no parentheses) or with keyword args, e.g.:

        @input_guardrail
        def my_sync_guardrail(...): ...

        @input_guardrail(name="guardrail_name")
        async def my_async_guardrail(...): ...
    """
    def decorator(
            f: _InputGuardrailFuncSync[TContext_co] | _InputGuardrailFuncAsync[TContext_co],
    )-> InputGuardrail[TContext_co]:
        return InputGuardrail(guardrail_function=f,name=name)
    if func is not None:
         # Decorator was used without parentheses
        return decorator(func)
     # Decorator used with keyword arguments
    return decorator

input_guardrail装饰器支持同步、异步、带关键字参数三种不同的使用方式,通过@overload提供了三种重载定义:

  • 直接装饰同步函数

# 输入防护栏函数类型别名定义
_InputGuardrailFuncSync = Callable[
    [RunContextWrapper[TContext_co], "Agent[Any]", Union[str, list[TResponseInputItem]]],
    GuardrailFunctionOutput,
]
_InputGuardrailFuncAsync = Callable[
    [RunContextWrapper[TContext_co], "Agent[Any]", Union[str, list[TResponseInputItem]]],
    Awaitable[GuardrailFunctionOutput],
]

@overload
def input_guardrail(
    func: _InputGuardrailFuncSync[TContext_co],
) -> InputGuardrail[TContext_co]: ...

  • 直接装饰异步函数
@overload
def input_guardrail(
    func: _InputGuardrailFuncAsync[TContext_co],
) -> InputGuardrail[TContext_co]: ...

  • 使用关键字参数(如@input_guardrail(name=“guardrail_name”))

@overload
def input_guardrail(
    *,
    name: str | None = None,
) -> callable[
    [_InputGuardrailFuncSync[TContext_co] | _InputGuardrailFuncAsync[TContext_co]],
    InputGuardrail[TContext_co],
]: ...

输出防护栏

输出防护栏应用于最终的agent.
输出防护栏的工作同样分为三个步骤。首先,护栏接收传递给智能体的相同输入。
接下来,护栏函数运行以生成GuardrailFunctionOutput,然后将其封装在OutputGuardrailResult中
最后,检查 .tripwire_triggered 是否为 true。如果为 true,则会引发OutputGuardrailTripwireTriggered 异常,以便可以适当地响应用户或处理该异常。
输出防护栏的三个主要类OutputGuardrail、OutputGuardrailResult和output_guardrail的属性和方法,与输入防护栏类似。

Tracing:智能体行为观测追踪

Tracing模块用于监控和追踪agents的行为,通过在关键事件(如 Span 的开始和结束)触发时调用处理器(Processor)来记录和处理数据。
Tracing模块的核心组件包括trace、span、TracingProcessor和其他工具函数。

Tracing UI追踪agent的行为案例如下图。
在这里插入图片描述

trace

Trace 是追踪的根对象,表示一个完整的逻辑工作流。它记录了从开始到结束的整个流程,并可以包含多个 Span。实现代码在src/agents/tracing/traces.py,包括进入追踪、退出追踪、开始追踪、结束追踪、导出追踪为字典等抽象方法。

class Trace:
    """
    A trace is the root level object that tracing creates. It represents a logical "workflow".
    """

    @abc.abstractmethod
    def __enter__(self) -> Trace:
        pass

    @abc.abstractmethod
    def __exit__(self, exc_type, exc_val, exc_tb):
        pass

    @abc.abstractmethod
    def start(self, mark_as_current: bool = False):
        """
        Start the trace.

        Args:
            mark_as_current: If true, the trace will be marked as the current trace.
        """
        pass

    @abc.abstractmethod
    def finish(self, reset_current: bool = False):
        """
        Finish the trace.

        Args:
            reset_current: If true, the trace will be reset as the current trace.
        """
        pass

    @property
    @abc.abstractmethod
    def trace_id(self) -> str:
        """
        The trace ID.
        """
        pass

    @property
    @abc.abstractmethod
    def name(self) -> str:
        """
        The name of the workflow being traced.
        """
        pass

    @abc.abstractmethod
    def export(self) -> dict[str, Any] | None:
        """
        Export the trace as a dictionary.
        """
        pass

span

Span 表示一个具体的操作或任务。它可以记录操作的开始和结束时间、错误信息以及其他元数据。
Tracing 模块提供了多种函数用于创建不同类型的 Span,例如:
agent_span:用于创建Agent相关的 Span。
custom_span:用于创建自定义 Span。
function_span:用于创建函数调用相关的 Span。
generation_span:用于记录模型生成的详细信息。
response_span:用于记录模型响应信息。
guardrail_span:用于记录防护栏(Guardrail)的触发情况。
handoff_span:用于记录代理之间的交接操作。

processors

processors是一个接口,用于处理 Trace 和 Span 的生命周期事件。它是 Tracing 模块的扩展点,允许开发者自定义数据处理逻辑。

使用示例:创建辅导孩子写作业的多个智能体教师

安装agents sdk:

pip install openai-agents

在开发环境中导入自己的openai api:

export OPENAI_API_KEY=sk-…

创建不同学科的agent、检查输入是否是家庭作业问题的agent,并创建异步式输入检查guardrail:


from agents import Agent, InputGuardrail, GuardrailFunctionOutput, Runner
from pydantic import BaseModel
import asyncio

# 定义一个 Pydantic 模型,用于表示作业检查的输出结果
class HomeworkOutput(BaseModel):
    is_homework: bool  # 是否是作业相关问题
    reasoning: str  # 判断的依据

# 创建一个名为 "Guardrail check" 的agent,用于检查用户是否在询问作业相关问题
guardrail_agent = Agent(
    name="Guardrail check", 
    instructions="Check if the user is asking about homework.", 
    output_type=HomeworkOutput,  # 输出类型
)

# 创建一个名为 "Math Tutor" agent,用于解答数学问题
math_tutor_agent = Agent(
    name="Math Tutor",  
    handoff_description="Specialist agent for math questions",  
    instructions="You provide help with math problems. Explain your reasoning at each step and include examples",  # 指令
)

# 创建一个名为 "History Tutor" 的agent,用于解答历史问题
history_tutor_agent = Agent(
    name="History Tutor",  
    handoff_description="Specialist agent for historical questions",  
    instructions="You provide assistance with historical queries. Explain important events and context clearly.",  
)

# 定义一个异步函数,用于检查用户输入是否与作业相关
async def homework_guardrail(ctx, agent, input_data):
    # 使用 Runner 运行 guardrail_agent,并传入输入数据和上下文
    result = await Runner.run(guardrail_agent, input_data, context=ctx.context)
    # 将结果的输出转换为 HomeworkOutput 类型
    final_output = result.final_output_as(HomeworkOutput)
    # 返回 GuardrailFunctionOutput,包含输出信息和是否触发警报
    return GuardrailFunctionOutput(
        output_info=final_output,
        tripwire_triggered=not final_output.is_homework,  # 如果不是作业相关问题,则触发警报
    )

# 创建一个名为 "Triage Agent" 的agent,用于根据用户的问题选择合适的代理
triage_agent = Agent(
    name="Triage Agent",
    instructions="You determine which agent to use based on the user's homework question",  
    handoffs=[history_tutor_agent, math_tutor_agent],  
    input_guardrails=[  # 输入防护栏
        InputGuardrail(guardrail_function=homework_guardrail),  # 使用 homework_guardrail 函数作为防护栏
    ],
)

# 定义主函数,用于运行 triage_agent 并打印结果
async def main():
    # 运行 triage_agent,传入问题 "who was the first president of the united states?"
    result = await Runner.run(triage_agent, "who was the first president of the united states?")
    print(result.final_output)  # 打印结果

    # 运行 triage_agent,传入问题 "what is life"
    result = await Runner.run(triage_agent, "what is life")
    print(result.final_output)  # 打印结果

# 如果直接运行此脚本,则执行 main 函数
if __name__ == "__main__":
    asyncio.run(main())

REF

https://openai.com/index/new-tools-for-building-agents/
https://platform.openai.com/docs/guides/agents-sdk
https://x.com/OpenAIDevs/status/1899531225468969240
https://openai.github.io/openai-agents-python/
https://github.com/openai/openai-agents-python
https://openai.github.io/openai-agents-python/ref/guardrail/#agents.guardrail.input_guardrail

### OpenAI Agent Services and Platforms OpenAI focuses on developing advanced artificial intelligence technologies that can be integrated into various applications through APIs or specialized platforms. While not directly addressing edge computing, the integration of AI agents within distributed systems like those found in edge environments offers significant potential for enhancing real-time decision-making capabilities[^1]. However, specific details about dedicated agent services provided by OpenAI remain limited compared to more generalized API offerings such as GPT models which are accessible via RESTful interfaces. For developers looking to implement intelligent agents using OpenAI's technology, several third-party platforms facilitate this process: - **LangChain**: A platform designed specifically around integrating large language models including those from OpenAI into conversational applications. - **Ragtag**: Provides tools aimed at simplifying interactions between human users and machine learning algorithms powered by OpenAI among others. These intermediary solutions help bridge gaps between raw model outputs and practical business needs without requiring deep expertise in natural language processing or other underlying AI disciplines. Regarding direct support for building custom agents leveraging OpenAI’s resources, documentation suggests focusing primarily on utilizing available APIs creatively rather than relying on pre-built service layers explicitly labeled as 'agent' oriented products. ```python import openai openai.api_key = "your_api_key" response = openai.Completion.create( engine="text-davinci-003", prompt="Create an email draft summarizing today's meeting.", max_tokens=150 ) print(response.choices[0].text.strip()) ``` This code snippet demonstrates how one might interact programmatically with OpenAI's text generation capabilities, forming part of what could constitute an automated assistant system when combined appropriately.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值