Skip to content

Fix OpenAI cua agent#1899

Merged
tkattkat merged 3 commits intomainfrom
fix--Openai-cua-agent
Mar 27, 2026
Merged

Fix OpenAI cua agent#1899
tkattkat merged 3 commits intomainfrom
fix--Openai-cua-agent

Conversation

@tkattkat
Copy link
Copy Markdown
Collaborator

@tkattkat tkattkat commented Mar 27, 2026

why

  • OpenAI now requires a screenshot on initial message for computer use agents

what changed

  • added screenshot to initial message
  • imported types from openai sdk for stronger typing

test plan

  • tested locally using operator-example script

Summary by cubic

Send an initial screenshot with the first message in OpenAICUAClient to meet OpenAI’s computer-use requirement and prevent startup errors. Also adopt openai SDK response types for stricter typing.

  • Bug Fixes

    • Capture and attach a high-detail screenshot on the first request when available; gracefully skip on failure.
    • Build the first message using EasyInputMessage, ResponseInputText, and ResponseInputImage; include an optional system message from userProvidedInstructions; make createInitialInputItems async and update method signatures to accept a unified OpenAIRequestInputItem type.
  • Dependencies

    • Add changeset for a patch release of @browserbasehq/stagehand.

Written for commit f01d8a8. Summary will update on new commits. Review in cubic

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Mar 27, 2026

🦋 Changeset detected

Latest commit: f01d8a8

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server-v3 Patch
@browserbasehq/stagehand-server-v4 Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.
Architecture diagram
sequenceDiagram
    participant App as Client Application
    participant CUA as OpenAICUAClient
    participant Driver as Browser / Page Driver
    participant OpenAI as OpenAI API

    Note over App, OpenAI: Initial Message Flow for Computer Use Agent

    App->>CUA: step(input)
    
    rect rgb(240, 240, 240)
    Note right of CUA: NEW: Requirement for initial screenshot
    CUA->>CUA: CHANGED: createInitialInputItems() (async)
    
    opt Screenshot capture
        CUA->>Driver: screenshot()
        alt Success
            Driver-->>CUA: base64 image data
            CUA->>CUA: NEW: Create ResponseInputImage object
        else Failure
            Driver-->>CUA: error
            Note right of CUA: Gracefully skip image attachment
        end
    end
    
    CUA->>CUA: NEW: Construct OpenAIRequestInputItem list
    Note right of CUA: Includes both User Text and Screenshot
    end

    CUA->>OpenAI: chat.completions.create(messages)
    Note over CUA, OpenAI: Using NEW: Stronger SDK Types (ResponseInputText, etc.)
    
    OpenAI-->>CUA: Assistant Response (Tool Calls)
    CUA-->>App: Agent Action / Observation
Loading

@tkattkat tkattkat merged commit 6dc2276 into main Mar 27, 2026
379 of 380 checks passed
miguelg719 pushed a commit that referenced this pull request Apr 8, 2026
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## @browserbasehq/stagehand@3.2.1

### Patch Changes

- [#1843](#1843)
[`144e18e`](144e18e)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - apply user
defined toolTimeout to all agent tools (other than wait & think tools)

- [#1872](#1872)
[`d3c3736`](d3c3736)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add support for LLM
provider middleware

- [#1953](#1953)
[`5c889df`](5c889df)
Thanks [@github-actions](https://github.com/apps/github-actions)! -
(NEW) Model Gateway: make model api key optional on API

- [#1924](#1924)
[`a1ab39e`](a1ab39e)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix issue
where stagehand could not attach to new tabs that were created manually.

- [#1874](#1874)
[`f3fe7ce`](f3fe7ce)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add headers (LLM)
to ModelConfig

- [#1964](#1964)
[`5fb9785`](5fb9785)
Thanks [@github-actions](https://github.com/apps/github-actions)! -
chore: update examples

- [#1901](#1901)
[`f5d1f1f`](f5d1f1f)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - pass
timeout as timeoutMs in goto()

- [#1858](#1858)
[`8bf5db8`](8bf5db8)
Thanks [@monadoid](https://github.com/monadoid)! - Add explicit SSE
event names for local v3 streaming and update the generated SDK contract
to match.

- [#1899](#1899)
[`6dc2276`](6dc2276)
Thanks [@tkattkat](https://github.com/tkattkat)! - fix: include
screenshot in openai cua agents first message

## @browserbasehq/stagehand-evals@1.1.10

### Patch Changes

- Updated dependencies
\[[`144e18e`](144e18e),
[`d3c3736`](d3c3736),
[`5c889df`](5c889df),
[`a1ab39e`](a1ab39e),
[`f3fe7ce`](f3fe7ce),
[`5fb9785`](5fb9785),
[`f5d1f1f`](f5d1f1f),
[`8bf5db8`](8bf5db8),
[`6dc2276`](6dc2276)]:
    -   @browserbasehq/stagehand@3.2.1

## @browserbasehq/stagehand-server-v3@3.6.2

### Patch Changes

- [#1901](#1901)
[`f5d1f1f`](f5d1f1f)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - pass
timeout as timeoutMs in goto()

- [#1873](#1873)
[`a98801a`](a98801a)
Thanks [@miguelg719](https://github.com/miguelg719)! - Fix schema
parsing bug for Pydantic `.model_json_schema()` on missing nested
references

- [#1858](#1858)
[`8bf5db8`](8bf5db8)
Thanks [@monadoid](https://github.com/monadoid)! - Add explicit SSE
event names for local v3 streaming and update the generated SDK contract
to match.

- [#1937](#1937)
[`249f5ed`](249f5ed)
Thanks [@monadoid](https://github.com/monadoid)! - Improve server-v3
error passthrough for local operation failures

- Updated dependencies
\[[`144e18e`](144e18e),
[`d3c3736`](d3c3736),
[`5c889df`](5c889df),
[`a1ab39e`](a1ab39e),
[`f3fe7ce`](f3fe7ce),
[`5fb9785`](5fb9785),
[`f5d1f1f`](f5d1f1f),
[`8bf5db8`](8bf5db8),
[`6dc2276`](6dc2276)]:
    -   @browserbasehq/stagehand@3.2.1

## @browserbasehq/stagehand-server-v4@3.6.2

### Patch Changes

- Updated dependencies
\[[`144e18e`](144e18e),
[`d3c3736`](d3c3736),
[`5c889df`](5c889df),
[`a1ab39e`](a1ab39e),
[`f3fe7ce`](f3fe7ce),
[`5fb9785`](5fb9785),
[`f5d1f1f`](f5d1f1f),
[`8bf5db8`](8bf5db8),
[`6dc2276`](6dc2276)]:
    -   @browserbasehq/stagehand@3.2.1

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants