Fix atkgen verbose output displaying incorrect conversation turns #1542

snehalvartak · 2025-12-25T22:19:55Z

This PR fixes issue #200 - two bugs in the atkgen probe's verbose output feature that prevented users from viewing the conversation between the attack model and target model during red-teaming runs.

Bug 1: Line 181 - Incorrect string indexing

Problem: challenge_text[1] only displayed the second character of the probe text
Fix: Changed to challenge_text to display the full probe text
Impact: Users can now see complete attack prompts in verbose mode

Bug 2: Line 213 - Wrong variable displayed

Problem: this_attempt.prompt.turns[-1].content.text displayed the prompt sent to the model instead of the model's response
Fix: Changed to response_text to display the actual model response
Impact: Users now see the target model's actual responses, not echoes of the prompts

How it works

The atkgen probe has an existing verbose mode feature (triggered when --verbose >= 2 or -vv flag) that should display conversation turns with emoji markers. These bugs prevented the feature from working correctly. The fixes ensure:

Full probe text is displayed with 🔴 marker in yellow
Actual model responses are displayed with 🦜 marker in bright text
New conversation headers display with 🆕 marker

Testing added

Added test_atkgen_verbose_output() in tests/probes/test_probes_atkgen.py that:

Sets verbosity to 2 to enable conversation output
Captures stdout during probe execution
Verifies all three conversation markers appear (🆕, 🔴 probe:, 🦜 model:)

All 7 tests in test_probes_atkgen.py pass.

Verification

Steps to verify this fix works correctly:

Run all atkgen tests:

python -m pytest tests/probes/test_probes_atkgen.py -v
All 7 tests should pass.

Test verbose output manually:
python -m garak --target_type test.Blank --probes atkgen.Tox -vv --generations 1
Verify the console displays:
- atkgen: 🆕 ⋅.˳˳.⋅ॱ˙˙ॱ New conversation ॱ˙˙ॱ⋅.˳˳.⋅ 🗣️ (conversation header)
- atkgen: 🔴 probe: [full probe text in yellow] (complete challenge text, not single character)
- atkgen: 🦜 model: [model response in bright] (actual response, not the prompt)

Verify with a real model (optional - requires API key):

export OPENAI_API_KEY="your-key"
python -m garak --target_type openai --target_name gpt-3.5-turbo --probes atkgen.Tox -vv --generations 1

Should show multi-turn conversations with visible responses.
Verify the thing does not do what it should not:
- Non-verbose mode (no -vv) should still show progress bars, not conversation text
- Debug logs should still capture full conversation data

github-actions · 2025-12-25T22:20:07Z

DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅

Fixed two bugs in the atkgen probe's verbose output (issue NVIDIA#200): 1. Line 181: Removed incorrect [1] indexing that only showed the second character of challenge_text instead of the full probe text 2. Line 213: Changed to print response_text (actual model response) instead of the prompt that was sent to the model Added test_atkgen_verbose_output() to verify that verbose mode (verbose >= 2) correctly displays conversation markers: - 🆕 New conversation indicator - 🔴 probe: Red team challenge text - 🦜 model: Target model response text The verbose output feature is triggered with -vv flag or when _config.system.verbose >= 2. Fixes NVIDIA#200 Signed-off-by: Snehal Vartak <snehal.inusa@gmail.com>

snehalvartak · 2025-12-25T22:35:27Z

I have read the DCO Document and I hereby sign the DCO

snehalvartak · 2025-12-25T22:39:37Z

recheck

jmartin-tech

Nice catch, thanks for the added tests as well.

snehalvartak force-pushed the fix/atkgen-verbose-output branch from 14d56f0 to 066cf4a Compare December 25, 2025 22:33

github-actions bot added a commit that referenced this pull request Dec 25, 2025

@snehalvartak has signed the CLA in #1542

5752992

snehalvartak mentioned this pull request Dec 25, 2025

option to see atkgen auto red-team turns #200

Closed

jmartin-tech approved these changes Dec 28, 2025

View reviewed changes

jmartin-tech merged commit 1408cf9 into NVIDIA:main Jan 2, 2026
15 of 16 checks passed

github-actions bot locked and limited conversation to collaborators Jan 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix atkgen verbose output displaying incorrect conversation turns #1542

Fix atkgen verbose output displaying incorrect conversation turns #1542

snehalvartak commented Dec 25, 2025

Uh oh!

github-actions bot commented Dec 25, 2025 •

edited

Loading

Uh oh!

snehalvartak commented Dec 25, 2025 •

edited

Loading

Uh oh!

snehalvartak commented Dec 25, 2025

Uh oh!

jmartin-tech left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix atkgen verbose output displaying incorrect conversation turns #1542

Fix atkgen verbose output displaying incorrect conversation turns #1542

Conversation

snehalvartak commented Dec 25, 2025

Bug 1: Line 181 - Incorrect string indexing

Bug 2: Line 213 - Wrong variable displayed

How it works

Testing added

Verification

Uh oh!

github-actions bot commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snehalvartak commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snehalvartak commented Dec 25, 2025

Uh oh!

jmartin-tech left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Dec 25, 2025 •

edited

Loading

snehalvartak commented Dec 25, 2025 •

edited

Loading